最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】线上环境通过gunicorn去运行Flask出错:CRITICAL WORKER TIMEOUT

Flask crifan 16277浏览 0评论

折腾:

【未解决】在线环境中用gunicorn部署的产品demo无法正常初始化运行

期间,线上部署Flask的app,用supervisor和gunicorn去运行,结果始终在初始化,感觉就像:

无限循环

死循环

循环初始化

然后突然看到:

/Users/crifan/dev/dev_root/xxx/logs/gunicorn_error.log

[2018-08-28 09:48:03 +0800] [1745] [INFO] Starting gunicorn 19.9.0

[2018-08-28 09:48:03 +0800] [1745] [INFO] Listening at: http://0.0.0.0:32851 (1745)

[2018-08-28 09:48:03 +0800] [1745] [INFO] Using worker: threads

[2018-08-28 09:48:03 +0800] [1760] [INFO] Booting worker with pid: 1760

[2018-08-28 09:48:03 +0800] [1762] [INFO] Booting worker with pid: 1762

[2018-08-28 09:48:03 +0800] [1765] [INFO] Booting worker with pid: 1765

[2018-08-28 09:48:03 +0800] [1768] [INFO] Booting worker with pid: 1768

[2018-08-28 09:48:03 +0800] [1770] [INFO] Booting worker with pid: 1770

[2018-08-28 09:48:04 +0800] [1772] [INFO] Booting worker with pid: 1772

[2018-08-28 09:48:04 +0800] [1780] [INFO] Booting worker with pid: 1780

[2018-08-28 09:48:04 +0800] [1783] [INFO] Booting worker with pid: 1783

[2018-08-28 09:48:04 +0800] [1787] [INFO] Booting worker with pid: 1787

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1760)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1762)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1765)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1768)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1770)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1772)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1780)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1783)

[2018-08-28 09:48:34 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1787)

[2018-08-28 09:48:34 +0800] [1760] [INFO] Worker exiting (pid: 1760)

[2018-08-28 09:48:34 +0800] [1765] [INFO] Worker exiting (pid: 1765)

[2018-08-28 09:48:34 +0800] [1787] [INFO] Worker exiting (pid: 1787)

[2018-08-28 09:48:34 +0800] [1783] [INFO] Worker exiting (pid: 1783)

[2018-08-28 09:48:34 +0800] [1772] [INFO] Worker exiting (pid: 1772)

[2018-08-28 09:48:34 +0800] [1762] [INFO] Worker exiting (pid: 1762)

[2018-08-28 09:48:34 +0800] [1770] [INFO] Worker exiting (pid: 1770)

[2018-08-28 09:48:34 +0800] [1768] [INFO] Worker exiting (pid: 1768)

[2018-08-28 09:48:34 +0800] [1780] [INFO] Worker exiting (pid: 1780)

[2018-08-28 09:48:34 +0800] [1896] [INFO] Booting worker with pid: 1896

[2018-08-28 09:48:34 +0800] [1897] [INFO] Booting worker with pid: 1897

[2018-08-28 09:48:34 +0800] [1900] [INFO] Booting worker with pid: 1900

[2018-08-28 09:48:35 +0800] [1903] [INFO] Booting worker with pid: 1903

[2018-08-28 09:48:35 +0800] [1907] [INFO] Booting worker with pid: 1907

[2018-08-28 09:48:35 +0800] [1909] [INFO] Booting worker with pid: 1909

[2018-08-28 09:48:35 +0800] [1912] [INFO] Booting worker with pid: 1912

[2018-08-28 09:48:35 +0800] [1914] [INFO] Booting worker with pid: 1914

[2018-08-28 09:48:35 +0800] [1919] [INFO] Booting worker with pid: 1919

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1896)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1897)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1900)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1903)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1907)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1909)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1912)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1914)

[2018-08-28 09:49:05 +0800] [1745] [CRITICAL] WORKER TIMEOUT (pid:1919)

-》从

[CRITICAL] WORKER TIMEOUT

感觉是:

内部初始化的时间太长,超时了,导致gunicorn挂掉了

然后又继续重新初始化了。

难道gunicorn初始化app是有最大超时时间的?

flask gunicorn [CRITICAL] WORKER TIMEOUT

gunicorn CRITICAL WORKER TIMEOUT · Issue #1440 · benoitc/gunicorn

gunicorn … –timeout 30

好像是可以自己设置timeout的时间的

人家是把:

–worker-class gevent

换成:

–worker-class eventlet

好像就好了?

抽空试试“increasing the timeout and changing the worker class type”

CRITICAL WORKER TIMEOUT when running Flask app · Issue #1801 · benoitc/gunicorn

CRITICAL WORKER TIMEOUT when there is not enough memory , need better warning message · Issue #1646 · benoitc/gunicorn

Gunicorn worker timeout error – Stack Overflow

python – Gunicorn worker critical time out (using with flask) – Stack Overflow

频繁500错误,需要重启Gunicorn ,这是 Flask 的问题还是 Gunicorn 的问题? – 知乎

python – CRITICAL WORKER TIMEOUT error on gunicorn django – Stack Overflow

python – Gunicorn workers timeout no matter what – Stack Overflow

nginx – How to resolve the gunicorn critical worker timeout error? – Server Fault

用threading 解决 gunicorn worker timeout – hugo – ITeye博客

才注意到:

conf/gunicorn/gunicorn_config.py

中有:

timeout = 30                                    #超时

考虑到单个SearchBasedQA的初始化要4分钟以上,暂时改为更大的,比如:10分钟:

timeout = 600                                    #SearchBasedQA初始化耗时>4分钟,所以暂时设置比较大的超时时间:10分钟=600秒

然后重新去运行:

[root@x-general-01 robotDemo]# cd logs

[root@x-general-01 logs]# rm -rf *

[root@x-general-01 logs]# supervisorctl restart all

robotDemo: started

robotDemo_CeleryBeat: started

redis: started

gunicorn: started

robotDemo_CeleryWorker: started

[root@x-general-01 logs]# supervisorctl stauts

*** Unknown syntax: stauts

[root@x-general-01 logs]# supervisorctl status

gunicorn                         RUNNING   pid 15534, uptime 0:00:14

redis                            RUNNING   pid 15533, uptime 0:00:14

robotDemo                        RUNNING   pid 15531, uptime 0:00:14

robotDemo_CeleryBeat             RUNNING   pid 15532, uptime 0:00:14

robotDemo_CeleryWorker           RUNNING   pid 15535, uptime 0:00:14

然后刚初始化后,目前暂时log没有继续增加:

另外,如果后续想要看gunicorn的更加详细的日志,可以把:

loglevel = ‘info’

改为:

loglevel = ‘debug’

去看详细日志

目前的的日志中,有9个初始化:

initing SearchBasedQA

感觉是对的,然后就是继续等待初始化完成了。

希望的是:过了几分钟,比如4分钟后,就能看到后续的日志:

SearchBasedQA loaded

就对了。

过了会,还真的是:

log大小有变化,而且看起来是正常的大小,不是很大:

48K变到55K

然后下载下来看到的确有

SearchBasedQA loaded

[2018-08-28 10:48:27,198 INFO qa.py:29 <module>] [2018-08-28 10:48:27.198341] initing SearchBasedQA

[2018-08-28 10:48:27,306 INFO qa.py:26 <module>] aiContext=<DialogueManager.Context object at 0x7f828cb4f080>

[2018-08-28 10:48:27,737 INFO qa.py:29 <module>] [2018-08-28 10:48:27.737117] initing SearchBasedQA

[2018-08-28 10:48:27,811 INFO qa.py:29 <module>] [2018-08-28 10:48:27.811883] initing SearchBasedQA

[2018-08-28 10:48:28,483 INFO qa.py:29 <module>] [2018-08-28 10:48:28.483930] initing SearchBasedQA

[2018-08-28 10:48:28,740 INFO qa.py:29 <module>] [2018-08-28 10:48:28.740353] initing SearchBasedQA

[2018-08-28 10:48:28,826 INFO qa.py:29 <module>] [2018-08-28 10:48:28.826255] initing SearchBasedQA

[2018-08-28 10:48:31,790 INFO qa.py:29 <module>] [2018-08-28 10:48:31.790330] initing SearchBasedQA

[2018-08-28 10:55:11,788 INFO qa.py:31 <module>] [2018-08-28 10:55:11.788224] SearchBasedQA loaded

[2018-08-28 10:55:11,811 INFO asr.py:57 <module>] ASR init complete

[2018-08-28 10:55:11,826 DEBUG factory.py:89 register_extensions] api=<flask_restful.Api object at 0x7f828dd85518>

[2018-08-28 10:55:11,827 INFO factory.py:63 create_app] flask app extensions init completed

[2018-08-28 10:55:11,828 DEBUG app.py:32 <module>] app=<Flask ‘RobotQA’>

[2018-08-28 10:55:11,828 DEBUG app.py:34 <module>] log=<Logger flask.app (DEBUG)>

[2018-08-28 10:55:11,829 DEBUG app.py:35 <module>] settings.FLASK_ENV=production

[2018-08-28 10:55:11,830 DEBUG app.py:37 <module>] settings.DEBUG=False, settings.MONGODB_HOST=localhost, settings.FILE_URL_HOST=ip

[2018-08-28 10:55:12,618 INFO qa.py:31 <module>] [2018-08-28 10:55:12.618525] SearchBasedQA loaded

[2018-08-28 10:55:12,619 INFO asr.py:57 <module>] ASR init complete

[2018-08-28 10:55:12,623 DEBUG factory.py:89 register_extensions] api=<flask_restful.Api object at 0x7f828dd83358>

[2018-08-28 10:55:12,624 INFO factory.py:63 create_app] flask app extensions init completed

[2018-08-28 10:55:12,624 DEBUG app.py:32 <module>] app=<Flask ‘RobotQA’>

[2018-08-28 10:55:12,625 DEBUG app.py:34 <module>] log=<Logger flask.app (DEBUG)>

[2018-08-28 10:55:12,626 DEBUG app.py:35 <module>] settings.FLASK_ENV=production

[2018-08-28 10:55:12,626 DEBUG app.py:37 <module>] settings.DEBUG=False, settings.MONGODB_HOST=localhost, settings.FILE_URL_HOST=ip

[2018-08-28 10:55:13,927 INFO qa.py:31 <module>] [2018-08-28 10:55:13.927493] SearchBasedQA loaded

[2018-08-28 10:55:13,928 INFO asr.py:57 <module>] ASR init complete

可以看到此处的:

[2018-08-28 10:48:27,109 INFO qa.py:29 <module>] [2018-08-28 10:48:27.109262] initing SearchBasedQA

[2018-08-28 10:55:11,788 INFO qa.py:31 <module>] [2018-08-28 10:55:11.788224] SearchBasedQA loaded

-》算出:SearchBasedQA的初始化时间是:8分钟左右

【总结】

此处gunicorn的超时时间timeout(默认本身的)设置是30秒:

所以当内部代码有个SearchBasedQA初始化要好几分钟,超过30秒后,

gunicorn就会干掉进程,强制重启,导致gunicorn的log和Flask的app的log,看起来是在永远的无限循环的初始化。

解决办法是:

增加timeout到足够长时间,比如:

conf/gunicorn/gunicorn_config.py

timeout = 600                                    #SearchBasedQA初始化耗时>4分钟,所以暂时设置比较大的超时时间:10分钟=600秒

即可。

转载请注明:在路上 » 【已解决】线上环境通过gunicorn去运行Flask出错:CRITICAL WORKER TIMEOUT

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
84 queries in 0.193 seconds, using 22.24MB memory