折腾:
【未解决】在线环境中用gunicorn部署的产品demo无法正常初始化运行
期间,线上部署Flask的app,用supervisor和gunicorn去运行,结果始终在初始化,感觉就像:
无限循环
死循环
循环初始化
然后突然看到:
/Users/crifan/dev/dev_root/xxx/logs/gunicorn_error.log
[2018-08-28 09:48:03 +0800] [1745] [INFO] Starting gunicorn 19.9.0
[2018-08-28 09:48:03 +0800] [1745] [INFO] Listening at: http://0.0.0.0:32851 (1745)
[2018-08-28 09:48:03 +0800] [1745] [INFO] Using worker: threads
[2018-08-28 09:48:03 +0800] [1760] [INFO] Booting worker with pid: 1760
[2018-08-28 09:48:03 +0800] [1762] [INFO] Booting worker with pid: 1762
[2018-08-28 09:48:03 +0800] [1765] [INFO] Booting worker with pid: 1765
[2018-08-28 09:48:03 +0800] [1768] [INFO] Booting worker with pid: 1768
[2018-08-28 09:48:03 +0800] [1770] [INFO] Booting worker with pid: 1770
[2018-08-28 09:48:04 +0800] [1772] [INFO] Booting worker with pid: 1772
[2018-08-28 09:48:04 +0800] [1780] [INFO] Booting worker with pid: 1780
[2018-08-28 09:48:04 +0800] [1783] [INFO] Booting worker with pid: 1783
[2018-08-28 09:48:04 +0800] [1787] [INFO] Booting worker with pid: 1787
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1760)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1762)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1765)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1768)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1770)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1772)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1780)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1783)
[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1787)
[2018-08-28 09:48:34 +0800] [1760] [INFO] Worker exiting (pid: 1760)
[2018-08-28 09:48:34 +0800] [1765] [INFO] Worker exiting (pid: 1765)
[2018-08-28 09:48:34 +0800] [1787] [INFO] Worker exiting (pid: 1787)
[2018-08-28 09:48:34 +0800] [1783] [INFO] Worker exiting (pid: 1783)
[2018-08-28 09:48:34 +0800] [1772] [INFO] Worker exiting (pid: 1772)
[2018-08-28 09:48:34 +0800] [1762] [INFO] Worker exiting (pid: 1762)
[2018-08-28 09:48:34 +0800] [1770] [INFO] Worker exiting (pid: 1770)
[2018-08-28 09:48:34 +0800] [1768] [INFO] Worker exiting (pid: 1768)
[2018-08-28 09:48:34 +0800] [1780] [INFO] Worker exiting (pid: 1780)
[2018-08-28 09:48:34 +0800] [1896] [INFO] Booting worker with pid: 1896
[2018-08-28 09:48:34 +0800] [1897] [INFO] Booting worker with pid: 1897
[2018-08-28 09:48:34 +0800] [1900] [INFO] Booting worker with pid: 1900
[2018-08-28 09:48:35 +0800] [1903] [INFO] Booting worker with pid: 1903
[2018-08-28 09:48:35 +0800] [1907] [INFO] Booting worker with pid: 1907
[2018-08-28 09:48:35 +0800] [1909] [INFO] Booting worker with pid: 1909
[2018-08-28 09:48:35 +0800] [1912] [INFO] Booting worker with pid: 1912
[2018-08-28 09:48:35 +0800] [1914] [INFO] Booting worker with pid: 1914
[2018-08-28 09:48:35 +0800] [1919] [INFO] Booting worker with pid: 1919
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1896)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1897)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1900)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1903)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1907)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1909)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1912)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1914)
[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1919)
-》从
[CRITICAL] WORKER TIMEOUT
感觉是:
内部初始化的时间太长,超时了,导致gunicorn挂掉了
然后又继续重新初始化了。
难道gunicorn初始化app是有最大超时时间的?
flask gunicorn [CRITICAL] WORKER TIMEOUT
gunicorn CRITICAL WORKER TIMEOUT · Issue #1440 · benoitc/gunicorn
gunicorn … –timeout 30
好像是可以自己设置timeout的时间的
人家是把:
–worker-class gevent
换成:
–worker-class eventlet
好像就好了?
抽空试试“increasing the timeout and changing the worker class type”
CRITICAL WORKER TIMEOUT when running Flask app · Issue #1801 · benoitc/gunicorn
Gunicorn worker timeout error – Stack Overflow
python – Gunicorn worker critical time out (using with flask) – Stack Overflow
频繁500错误,需要重启Gunicorn ,这是 Flask 的问题还是 Gunicorn 的问题? – 知乎
python – CRITICAL WORKER TIMEOUT error on gunicorn django – Stack Overflow
python – Gunicorn workers timeout no matter what – Stack Overflow
nginx – How to resolve the gunicorn critical worker timeout error? – Server Fault
用threading 解决 gunicorn worker timeout – hugo – ITeye博客
才注意到:
conf/gunicorn/gunicorn_config.py
中有:
timeout = 30 #超时
考虑到单个SearchBasedQA的初始化要4分钟以上,暂时改为更大的,比如:10分钟:
timeout = 600 #SearchBasedQA初始化耗时>4分钟,所以暂时设置比较大的超时时间:10分钟=600秒
然后重新去运行:
[root@x-general-01 robotDemo]# cd logs
[root@x-general-01 logs]# rm -rf *
[root@x-general-01 logs]# supervisorctl restart all
robotDemo: started
robotDemo_CeleryBeat: started
redis: started
gunicorn: started
robotDemo_CeleryWorker: started
[root@x-general-01 logs]# supervisorctl stauts
*** Unknown syntax: stauts
[root@x-general-01 logs]# supervisorctl status
gunicorn RUNNING pid 15534, uptime 0:00:14
redis RUNNING pid 15533, uptime 0:00:14
robotDemo RUNNING pid 15531, uptime 0:00:14
robotDemo_CeleryBeat RUNNING pid 15532, uptime 0:00:14
robotDemo_CeleryWorker RUNNING pid 15535, uptime 0:00:14
然后刚初始化后,目前暂时log没有继续增加:
另外,如果后续想要看gunicorn的更加详细的日志,可以把:
loglevel = ‘info’
改为:
loglevel = ‘debug’
去看详细日志
目前的的日志中,有9个初始化:
initing SearchBasedQA
感觉是对的,然后就是继续等待初始化完成了。
希望的是:过了几分钟,比如4分钟后,就能看到后续的日志:
SearchBasedQA loaded
就对了。
过了会,还真的是:
log大小有变化,而且看起来是正常的大小,不是很大:
48K变到55K
然后下载下来看到的确有
SearchBasedQA loaded
[2018-08-28 10:48:27,198 INFO qa.py:29 <module>] [2018-08-28 10:48:27.198341] initing SearchBasedQA
[2018-08-28 10:48:27,306 INFO qa.py:26 <module>] aiContext=<DialogueManager.Context object at 0x7f828cb4f080>
[2018-08-28 10:48:27,737 INFO qa.py:29 <module>] [2018-08-28 10:48:27.737117] initing SearchBasedQA
[2018-08-28 10:48:27,811 INFO qa.py:29 <module>] [2018-08-28 10:48:27.811883] initing SearchBasedQA
[2018-08-28 10:48:28,483 INFO qa.py:29 <module>] [2018-08-28 10:48:28.483930] initing SearchBasedQA
[2018-08-28 10:48:28,740 INFO qa.py:29 <module>] [2018-08-28 10:48:28.740353] initing SearchBasedQA
[2018-08-28 10:48:28,826 INFO qa.py:29 <module>] [2018-08-28 10:48:28.826255] initing SearchBasedQA
[2018-08-28 10:48:31,790 INFO qa.py:29 <module>] [2018-08-28 10:48:31.790330] initing SearchBasedQA
[2018-08-28 10:55:11,788 INFO qa.py:31 <module>] [2018-08-28 10:55:11.788224] SearchBasedQA loaded
[2018-08-28 10:55:11,811 INFO asr.py:57 <module>] ASR init complete
[2018-08-28 10:55:11,826 DEBUG factory.py:89 register_extensions] api=<flask_restful.Api object at 0x7f828dd85518>
[2018-08-28 10:55:11,827 INFO factory.py:63 create_app] flask app extensions init completed
[2018-08-28 10:55:11,828 DEBUG app.py:32 <module>] app=<Flask ‘RobotQA’>
[2018-08-28 10:55:11,828 DEBUG app.py:34 <module>] log=<Logger flask.app (DEBUG)>
[2018-08-28 10:55:11,829 DEBUG app.py:35 <module>] settings.FLASK_ENV=production
[2018-08-28 10:55:11,830 DEBUG app.py:37 <module>] settings.DEBUG=False, settings.MONGODB_HOST=localhost, settings.FILE_URL_HOST=ip
[2018-08-28 10:55:12,618 INFO qa.py:31 <module>] [2018-08-28 10:55:12.618525] SearchBasedQA loaded
[2018-08-28 10:55:12,619 INFO asr.py:57 <module>] ASR init complete
[2018-08-28 10:55:12,623 DEBUG factory.py:89 register_extensions] api=<flask_restful.Api object at 0x7f828dd83358>
[2018-08-28 10:55:12,624 INFO factory.py:63 create_app] flask app extensions init completed
[2018-08-28 10:55:12,624 DEBUG app.py:32 <module>] app=<Flask ‘RobotQA’>
[2018-08-28 10:55:12,625 DEBUG app.py:34 <module>] log=<Logger flask.app (DEBUG)>
[2018-08-28 10:55:12,626 DEBUG app.py:35 <module>] settings.FLASK_ENV=production
[2018-08-28 10:55:12,626 DEBUG app.py:37 <module>] settings.DEBUG=False, settings.MONGODB_HOST=localhost, settings.FILE_URL_HOST=ip
[2018-08-28 10:55:13,927 INFO qa.py:31 <module>] [2018-08-28 10:55:13.927493] SearchBasedQA loaded
[2018-08-28 10:55:13,928 INFO asr.py:57 <module>] ASR init complete
可以看到此处的:
[2018-08-28 10:48:27,109 INFO qa.py:29 <module>] [2018-08-28 10:48:27.109262] initing SearchBasedQA
[2018-08-28 10:55:11,788 INFO qa.py:31 <module>] [2018-08-28 10:55:11.788224] SearchBasedQA loaded
-》算出:SearchBasedQA的初始化时间是:8分钟左右
【总结】
此处gunicorn的超时时间timeout(默认本身的)设置是30秒:
所以当内部代码有个SearchBasedQA初始化要好几分钟,超过30秒后,
gunicorn就会干掉进程,强制重启,导致gunicorn的log和Flask的app的log,看起来是在永远的无限循环的初始化。
解决办法是:
增加timeout到足够长时间,比如:
conf/gunicorn/gunicorn_config.py
timeout = 600 #SearchBasedQA初始化耗时>4分钟,所以暂时设置比较大的超时时间:10分钟=600秒
即可。
转载请注明:在路上 » 【已解决】线上环境通过gunicorn去运行Flask出错:CRITICAL WORKER TIMEOUT