最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【未解决】在线环境中用gunicorn部署的产品demo无法正常初始化运行

Flask crifan 2415浏览 0评论

折腾:

【记录】把合并了基于搜索的兜底对话的产品demo部署到在线环境中

期间,已经把合并后的,带兜底对话的产品demo,部署到在线环境中了,然后去运行,结果好像运行不起来。

去看log:

[root@xxx-general-01 logs]# tail gunicorn_error.log

[2018-08-28 09:07:51 +0800] [9908] [ERROR] Retrying in 1 second.

[2018-08-28 09:07:55 +0800] [9908] [ERROR] Retrying in 1 second.

[2018-08-28 09:07:56 +0800] [9908] [ERROR] Can’t connect to (‘0.0.0.0’, 32851)

好像是flask的端口被占用了。

去干掉之前的进程:

[root@xxx-general-01 logs]# ps aux | grep 32581

root      9934  0.0  0.0 112660   976 pts/0    S+   09:08   0:00 grep –color=auto 32581

[root@xxx-general-01 logs]# ps aux | grep flask

root      9942  0.0  0.0 112660   976 pts/0    S+   09:08   0:00 grep –color=auto flask

[root@xxx-general-01 logs]# ps aux | grep gunicorn

root      9843  0.2  0.1 215760 19528 ?        S    09:06   0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root      9861  0.8  0.3 313856 61060 ?        S    09:06   0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root      9943  4.4  0.1 203164 20080 ?        S    09:08   0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app

root      9947  0.0  0.0 112660   976 pts/0    S+   09:08   0:00 grep –color=auto gunicorn

root     22426  0.5  0.2 468376 37348 ?        Sl   Aug27   3:20 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app

[root@xxx-general-01 logs]# kill -9 9943 22426

-bash: kill: (9943) – No such process

[root@xxx-general-01 logs]# ps aux | grep gunicorn

root      9843  0.1  0.1 215760 19528 ?        S    09:06   0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root      9861  0.7  0.4 564488 75016 ?        Sl   09:06   0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root     10098 15.5  0.1 203296 20204 ?        S    09:11   0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app

root     10140  0.0  0.2 463036 35416 ?        Rl   09:11   0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app

root     10165  0.0  0.0 112664   976 pts/0    S+   09:11   0:00 grep –color=auto gunicorn

[root@xxx-general-01 logs]# kill -9 10098 10101 10104 10106 10108 10111 10119 10122 10133 10140

-bash: kill: (10098) – No such process

-bash: kill: (10101) – No such process

-bash: kill: (10104) – No such process

-bash: kill: (10106) – No such process

-bash: kill: (10108) – No such process

-bash: kill: (10111) – No such process

-bash: kill: (10119) – No such process

-bash: kill: (10122) – No such process

-bash: kill: (10133) – No such process

-bash: kill: (10140) – No such process

结果找不到进程。

感觉如果是程序正在运行,会影响,所以先去停止所有的进程:

[root@xx-general-01 logs]# supervisorctl stop all

robotDemo: stopped

robotDemo_CeleryBeat: stopped

redis: stopped

gunicorn: stopped

robotDemo_CeleryWorker: stopped

[root@xx-general-01 logs]#

[root@xx-general-01 logs]# ps aux | grep gunicorn

root     11296  0.0  0.0 112660   976 pts/0    S+   09:13   0:00 grep –color=auto gunicorn

好像是对的啊。

[root@xx-general-01 logs]# ll

total 2700

-rw-r–r– 1 root root      0 Aug 28 09:06 celery-beat-robotDemo_CeleryBeat-stderr.log

-rw-r–r– 1 root root    928 Aug 28 09:12 celery-beat-robotDemo_CeleryBeat-stdout.log

-rw-r–r– 1 root root    390 Aug 28 09:12 celery-worker-robotDemo_CeleryWorker-stderr.log

-rw-r–r– 1 root root   1141 Aug 28 09:12 celery-worker-robotDemo_CeleryWorker-stdout.log

-rw-r–r– 1 root root      0 Aug 28 09:06 gunicorn_access.log

-rw-r–r– 1 root root 474925 Aug 28 09:12 gunicorn_error.log

-rw-r–r– 1 root root      0 Aug 28 09:06 redis-redis-stderr.log

-rw-r–r– 1 root root      0 Aug 28 09:06 redis-redis-stdout.log

-rw-r–r– 1 root root 895356 Aug 28 09:12 RobotQA.log

-rw-r–r– 1 root root 896297 Aug 28 09:12 supervisord-robotDemo-stderr.log

-rw-r–r– 1 root root 473968 Aug 28 09:12 supervisord-robotDemo-stdout.log

现在log文件好像也是对的了:

出现了:RobotQA.log

再去删除log,重启,进程对了,生成8个gunicorn:

[root@xx-general-01 logs]# rm -rf *

[root@xx-general-01 logs]# ll

total 0

[root@xx-general-01 logs]# supervisorctl restart all

robotDemo: started

robotDemo_CeleryBeat: started

redis: started

gunicorn: started

robotDemo_CeleryWorker: started

[root@xxx-general-01 logs]# supervisorctl status

gunicorn                         RUNNING   pid 11304, uptime 0:00:26

redis                            RUNNING   pid 11303, uptime 0:00:26

robotDemo                        RUNNING   pid 11789, uptime 0:00:02

robotDemo_CeleryBeat             RUNNING   pid 11302, uptime 0:00:26

robotDemo_CeleryWorker           RUNNING   pid 11305, uptime 0:00:26

[root@xx-general-01 logs]# ps aux | grep gunicorn

root     11304  1.2  0.1 215760 19540 ?        S    09:13   0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root     11332  4.0  0.3 313844 61048 ?        S    09:13   0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root     11922  0.0  0.1 203164 20084 ?        S    09:14   0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app

root     11926  0.0  0.0 112660   972 pts/0    S+   09:14   0:00 grep –color=auto gunicorn

[root@xxx-general-01 logs]# ps aux | grep gunicorn

root     11304  0.9  0.1 215760 19540 ?        S    09:13   0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root     11332  3.0  0.3 313844 61048 ?        S    09:13   0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi

root     12056 11.5  0.1 203296 20200 ?        S    09:14   0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app

root     12088  0.0  0.1 290348 22104 ?        Rl   09:14   0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app

root     12096  0.0  0.0 112660   976 pts/0    R+   09:14   0:00 grep –color=auto gunicorn

结果页面还是不对:qa接口没返回:

去把在线环境的log下载到本地看看:

找到一些出错的log:

/Users/crifan/dev/dev_root/xxx/logs/gunicorn_error.log

[2018-08-28 09:13:55 +0800] [11341] [INFO] Booting worker with pid: 11341

[2018-08-28 09:13:57 +0800] [11316] [ERROR] Exception in worker process

Traceback (most recent call last):

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker

worker.init_process()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/workers/gthread.py", line 104, in init_process

super(ThreadWorker, self).init_process()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/workers/base.py", line 129, in init_process

self.load_wsgi()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi

self.wsgi = self.app.wsgi()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi

self.callable = self.load()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 52, in load

return self.load_wsgiapp()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp

return util.import_app(self.app_uri)

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/util.py", line 350, in import_app

__import__(module)

File "/root/xxx/web/server/robotDemo/app.py", line 28, in <module>

app = create_app(settings)

File "/root/xxx/web/server/robotDemo/factory.py", line 62, in create_app

register_extensions(app)

File "/root/xxx/web/server/robotDemo/factory.py", line 88, in register_extensions

api = create_rest_api(app)

File "/root/xxx/web/server/robotDemo/factory.py", line 96, in create_rest_api

from resources.qa import RobotQaAPI

File "/root/xxx/web/server/robotDemo/resources/qa.py", line 24, in <module>

from GenerateResponse import GenerateResponse

File "/root/xxx/nlp/dialog/GenerateResponse.py", line 8, in <module>

story_responses = [item.split(‘\n’) for item in open(absFilePath).read().split(‘\n\n’)]

FileNotFoundError: [Errno 2] No such file or directory: ‘/root/xxx/nlp/dialog/data/reply.txt’

[2018-08-28 09:13:57 +0800] [11316] [INFO] Worker exiting (pid: 11316)

/Users/crifan/dev/dev_root/xxx/logs/supervisord-robotDemo-stderr.log

During handling of the above exception, another exception occurred:

Traceback (most recent call last):

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn", line 11, in <module>

sys.exit(run())

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 61, in run

WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/base.py", line 223, in run

super(Application, self).run()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/base.py", line 72, in run

Arbiter(self).run()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 232, in run

self.halt(reason=inst.reason, exit_status=inst.exit_status)

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 345, in halt

self.stop()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 393, in stop

time.sleep(0.1)

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 245, in handle_chld

self.reap_workers()

File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 525, in reap_workers

raise HaltServer(reason, self.WORKER_BOOT_ERROR)

gunicorn.errors.HaltServer: <HaltServer ‘Worker failed to boot.’ 3>

看来是

File "/root/xxx/GenerateResponse.py", line 8, in <module>

story_responses = [item.split(‘\n’) for item in open(absFilePath).read().split(‘\n\n’)]

所以去找找代码

GenerateResponse.py

curFolderPath = os.path.abspath(os.path.dirname(__file__))

absFilePath = os.path.join(curFolderPath, "data/reply.txt")

story_responses = [item.split(‘\n’) for item in open(absFilePath).read().split(‘\n\n’)]

结果是:

上次上传代码期间,不小心把:

data中的reply.txt弄丢了

所以去:

【已解决】git和fabric中排除项目根目录下data文件夹而保留某子文件夹中data文件夹

然后再去线上:

停止supervisorctl stop all

删除log

重启supervisorctl restart all

看看结果

此处下载log看到:

有9个initing SearchBasedQA

而代码中:

from nlp.search.qa.iqa import SearchBasedQA

log.info(‘[%s] initing SearchBasedQA’, datetime.now())

searchBasedQa = SearchBasedQA(settings.SOLR_CORE)

log.info(‘[%s] SearchBasedQA loaded’, datetime.now())

中的SearchBasedQA的初始化,本身的确好耗时:

调试时,单个SearchBasedQA初始化,就耗费了4分钟

-》那9个SearchBasedQA初始化,估计要好几十分钟。

-》所以,感觉需要等待个 9×4=36分钟之后,才能知道是否初始化正常?

另外,log中,其实遇到了47个initing SearchBasedQA:

如果实际上真的有40多个SearchBasedQA初始化

-》那岂不是要等上 40*3=120分钟=2小时

才能完全初始化好?

-》虽说需要后续优化缩短时间,但是感觉此处应该需要去优化一下,只有真正app初始化时才去调用这个SearchBasedQA,从而尽量降低SearchBasedQA被调用的次数

另外,等了10多分钟后,发现log变很多:

感觉初始化还是不太正常。

去下载log看看

更不正常:

300多个初始化:

怎么感觉像是:

无限循环

死循环

循环初始化

之类的问题了

然后也才想起来:

之前本地调试时,PyCharm中用gunicorn去调试运行,然后也是类似现象:

一直在初始化,一直没有停,就像:循环初始化,无限初始化的感觉

然后去:

【已解决】线上环境通过gunicorn去运行Flask出错:CRITICAL WORKER TIMEOUT

接着还有问题:

web页访问,还是没有返回,后续证明是出错:

然后看到:

gunicorn_error.log

有增加log:

然后去:

【已解决】在线CentOS中Flask运行mongo出错:pymongo.errors.ServerSelectionTimeoutError: localhost:32018: [Errno 111] Connection refused

然后内部目前就暂时可以运行了:

但是还有点其他小问题:

【已解决】Flask中ms的tts返回401感觉是获取token错误导致无法生成语音文件

另外,抽空要去确认:

为何celery的worker和beat的log,都是空的:

感觉celery没有正常运行。

转载请注明:在路上 » 【未解决】在线环境中用gunicorn部署的产品demo无法正常初始化运行

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.185 seconds, using 22.23MB memory