折腾:
【记录】把合并了基于搜索的兜底对话的产品demo部署到在线环境中
期间,已经把合并后的,带兜底对话的产品demo,部署到在线环境中了,然后去运行,结果好像运行不起来。
去看log:
[[email protected] logs]# tail gunicorn_error.log
[2018-08-28 09:07:51 +0800] [9908] [ERROR] Retrying in 1 second.
…
[2018-08-28 09:07:55 +0800] [9908] [ERROR] Retrying in 1 second.
[2018-08-28 09:07:56 +0800] [9908] [ERROR] Can’t connect to (‘0.0.0.0’, 32851)
好像是flask的端口被占用了。
去干掉之前的进程:
[[email protected] logs]# ps aux | grep 32581
root 9934 0.0 0.0 112660 976 pts/0 S+ 09:08 0:00 grep –color=auto 32581
[[email protected] logs]# ps aux | grep flask
root 9942 0.0 0.0 112660 976 pts/0 S+ 09:08 0:00 grep –color=auto flask
[[email protected] logs]# ps aux | grep gunicorn
root 9843 0.2 0.1 215760 19528 ? S 09:06 0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
…
root 9861 0.8 0.3 313856 61060 ? S 09:06 0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
root 9943 4.4 0.1 203164 20080 ? S 09:08 0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app
root 9947 0.0 0.0 112660 976 pts/0 S+ 09:08 0:00 grep –color=auto gunicorn
root 22426 0.5 0.2 468376 37348 ? Sl Aug27 3:20 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app
[[email protected] logs]# kill -9 9943 22426
-bash: kill: (9943) – No such process
[[email protected] logs]# ps aux | grep gunicorn
root 9843 0.1 0.1 215760 19528 ? S 09:06 0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
…
root 9861 0.7 0.4 564488 75016 ? Sl 09:06 0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
root 10098 15.5 0.1 203296 20204 ? S 09:11 0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app
…
root 10140 0.0 0.2 463036 35416 ? Rl 09:11 0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app
root 10165 0.0 0.0 112664 976 pts/0 S+ 09:11 0:00 grep –color=auto gunicorn
[[email protected] logs]# kill -9 10098 10101 10104 10106 10108 10111 10119 10122 10133 10140
-bash: kill: (10098) – No such process
-bash: kill: (10101) – No such process
-bash: kill: (10104) – No such process
-bash: kill: (10106) – No such process
-bash: kill: (10108) – No such process
-bash: kill: (10111) – No such process
-bash: kill: (10119) – No such process
-bash: kill: (10122) – No such process
-bash: kill: (10133) – No such process
-bash: kill: (10140) – No such process
结果找不到进程。
感觉如果是程序正在运行,会影响,所以先去停止所有的进程:
[[email protected] logs]# supervisorctl stop all
robotDemo: stopped
robotDemo_CeleryBeat: stopped
redis: stopped
gunicorn: stopped
robotDemo_CeleryWorker: stopped
[[email protected] logs]#
[[email protected] logs]# ps aux | grep gunicorn
root 11296 0.0 0.0 112660 976 pts/0 S+ 09:13 0:00 grep –color=auto gunicorn
好像是对的啊。
[[email protected] logs]# ll
total 2700
-rw-r–r– 1 root root 0 Aug 28 09:06 celery-beat-robotDemo_CeleryBeat-stderr.log
-rw-r–r– 1 root root 928 Aug 28 09:12 celery-beat-robotDemo_CeleryBeat-stdout.log
-rw-r–r– 1 root root 390 Aug 28 09:12 celery-worker-robotDemo_CeleryWorker-stderr.log
-rw-r–r– 1 root root 1141 Aug 28 09:12 celery-worker-robotDemo_CeleryWorker-stdout.log
-rw-r–r– 1 root root 0 Aug 28 09:06 gunicorn_access.log
-rw-r–r– 1 root root 474925 Aug 28 09:12 gunicorn_error.log
-rw-r–r– 1 root root 0 Aug 28 09:06 redis-redis-stderr.log
-rw-r–r– 1 root root 0 Aug 28 09:06 redis-redis-stdout.log
-rw-r–r– 1 root root 895356 Aug 28 09:12 RobotQA.log
-rw-r–r– 1 root root 896297 Aug 28 09:12 supervisord-robotDemo-stderr.log
-rw-r–r– 1 root root 473968 Aug 28 09:12 supervisord-robotDemo-stdout.log
现在log文件好像也是对的了:
出现了:RobotQA.log
再去删除log,重启,进程对了,生成8个gunicorn:
[[email protected] logs]# rm -rf *
[[email protected] logs]# ll
total 0
[[email protected] logs]# supervisorctl restart all
robotDemo: started
robotDemo_CeleryBeat: started
redis: started
gunicorn: started
robotDemo_CeleryWorker: started
[[email protected] logs]# supervisorctl status
gunicorn RUNNING pid 11304, uptime 0:00:26
redis RUNNING pid 11303, uptime 0:00:26
robotDemo RUNNING pid 11789, uptime 0:00:02
robotDemo_CeleryBeat RUNNING pid 11302, uptime 0:00:26
robotDemo_CeleryWorker RUNNING pid 11305, uptime 0:00:26
[[email protected] logs]# ps aux | grep gunicorn
root 11304 1.2 0.1 215760 19540 ? S 09:13 0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
…
root 11332 4.0 0.3 313844 61048 ? S 09:13 0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
root 11922 0.0 0.1 203164 20084 ? S 09:14 0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app
root 11926 0.0 0.0 112660 972 pts/0 S+ 09:14 0:00 grep –color=auto gunicorn
[[email protected] logs]# ps aux | grep gunicorn
root 11304 0.9 0.1 215760 19540 ? S 09:13 0:00 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
…
root 11332 3.0 0.3 313844 61048 ? S 09:13 0:01 /usr/bin/python3 /usr/bin/gunicorn –workers 4 –bind unix:/root/xxx/web/server/xxxCmsServer.sock xxxCmsServer.wsgi
root 12056 11.5 0.1 203296 20200 ? S 09:14 0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app
…
root 12088 0.0 0.1 290348 22104 ? Rl 09:14 0:00 /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/python3.6m /root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn -c conf/gunicorn/gunicorn_config.py app:app
root 12096 0.0 0.0 112660 976 pts/0 R+ 09:14 0:00 grep –color=auto gunicorn
结果页面还是不对:qa接口没返回:
去把在线环境的log下载到本地看看:
找到一些出错的log:
/Users/crifan/dev/dev_root/xxx/logs/gunicorn_error.log
[2018-08-28 09:13:55 +0800] [11341] [INFO] Booting worker with pid: 11341
[2018-08-28 09:13:57 +0800] [11316] [ERROR] Exception in worker process
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 583, in spawn_worker
worker.init_process()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/workers/gthread.py", line 104, in init_process
super(ThreadWorker, self).init_process()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/workers/base.py", line 129, in init_process
self.load_wsgi()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/workers/base.py", line 138, in load_wsgi
self.wsgi = self.app.wsgi()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/base.py", line 67, in wsgi
self.callable = self.load()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 52, in load
return self.load_wsgiapp()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 41, in load_wsgiapp
return util.import_app(self.app_uri)
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/util.py", line 350, in import_app
__import__(module)
File "/root/xxx/web/server/robotDemo/app.py", line 28, in <module>
app = create_app(settings)
File "/root/xxx/web/server/robotDemo/factory.py", line 62, in create_app
register_extensions(app)
File "/root/xxx/web/server/robotDemo/factory.py", line 88, in register_extensions
api = create_rest_api(app)
File "/root/xxx/web/server/robotDemo/factory.py", line 96, in create_rest_api
from resources.qa import RobotQaAPI
File "/root/xxx/web/server/robotDemo/resources/qa.py", line 24, in <module>
from GenerateResponse import GenerateResponse
File "/root/xxx/nlp/dialog/GenerateResponse.py", line 8, in <module>
story_responses = [item.split(‘\n’) for item in open(absFilePath).read().split(‘\n\n’)]
FileNotFoundError: [Errno 2] No such file or directory: ‘/root/xxx/nlp/dialog/data/reply.txt’
[2018-08-28 09:13:57 +0800] [11316] [INFO] Worker exiting (pid: 11316)
/Users/crifan/dev/dev_root/xxx/logs/supervisord-robotDemo-stderr.log
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/bin/gunicorn", line 11, in <module>
sys.exit(run())
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/wsgiapp.py", line 61, in run
WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/base.py", line 223, in run
super(Application, self).run()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/app/base.py", line 72, in run
Arbiter(self).run()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 232, in run
self.halt(reason=inst.reason, exit_status=inst.exit_status)
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 345, in halt
self.stop()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 393, in stop
time.sleep(0.1)
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 245, in handle_chld
self.reap_workers()
File "/root/.local/share/virtualenvs/robotDemo-dwdcgdaG/lib/python3.6/site-packages/gunicorn/arbiter.py", line 525, in reap_workers
raise HaltServer(reason, self.WORKER_BOOT_ERROR)
gunicorn.errors.HaltServer: <HaltServer ‘Worker failed to boot.’ 3>
看来是
File "/root/xxx/GenerateResponse.py", line 8, in <module>
story_responses = [item.split(‘\n’) for item in open(absFilePath).read().split(‘\n\n’)]
所以去找找代码
GenerateResponse.py
curFolderPath = os.path.abspath(os.path.dirname(__file__))
absFilePath = os.path.join(curFolderPath, "data/reply.txt")
story_responses = [item.split(‘\n’) for item in open(absFilePath).read().split(‘\n\n’)]
结果是:
上次上传代码期间,不小心把:
data中的reply.txt弄丢了
所以去:
【已解决】git和fabric中排除项目根目录下data文件夹而保留某子文件夹中data文件夹
然后再去线上:
停止supervisorctl stop all
删除log
重启supervisorctl restart all
看看结果
此处下载log看到:
有9个initing SearchBasedQA
而代码中:
from nlp.search.qa.iqa import SearchBasedQA
log.info(‘[%s] initing SearchBasedQA’, datetime.now())
searchBasedQa = SearchBasedQA(settings.SOLR_CORE)
log.info(‘[%s] SearchBasedQA loaded’, datetime.now())
中的SearchBasedQA的初始化,本身的确好耗时:
调试时,单个SearchBasedQA初始化,就耗费了4分钟
-》那9个SearchBasedQA初始化,估计要好几十分钟。
-》所以,感觉需要等待个 9×4=36分钟之后,才能知道是否初始化正常?
另外,log中,其实遇到了47个initing SearchBasedQA:
如果实际上真的有40多个SearchBasedQA初始化
-》那岂不是要等上 40*3=120分钟=2小时
才能完全初始化好?
-》虽说需要后续优化缩短时间,但是感觉此处应该需要去优化一下,只有真正app初始化时才去调用这个SearchBasedQA,从而尽量降低SearchBasedQA被调用的次数
另外,等了10多分钟后,发现log变很多:
感觉初始化还是不太正常。
去下载log看看
更不正常:
300多个初始化:
怎么感觉像是:
无限循环
死循环
循环初始化
之类的问题了
然后也才想起来:
之前本地调试时,PyCharm中用gunicorn去调试运行,然后也是类似现象:
一直在初始化,一直没有停,就像:循环初始化,无限初始化的感觉
然后去:
【已解决】线上环境通过gunicorn去运行Flask出错:CRITICAL WORKER TIMEOUT
接着还有问题:
web页访问,还是没有返回,后续证明是出错:
然后看到:
gunicorn_error.log
有增加log:
然后去:
然后内部目前就暂时可以运行了:
但是还有点其他小问题:
【已解决】Flask中ms的tts返回401感觉是获取token错误导致无法生成语音文件
另外,抽空要去确认:
为何celery的worker和beat的log,都是空的:
感觉celery没有正常运行。