折腾:
期间,运行了很长时间之后回来发现:
运行状态是PAUSED
-》难道是屏幕锁屏了导致的?
主要是看到错误信息了:
<code>1. FETCH_ERROR autohomeBrandData > https://www.autohome.com.cn/spec/29030/#pvareaid=2042128 46 seconds ago 2043.8+0.40ms +0 2. FETCH_ERROR autohomeBrandData > https://www.autohome.com.cn/spec/28679/#pvareaid=2042128 46 seconds ago 2044.0+0.39ms +0 </code>
但是对于出错的地址,都是可以的正常打开的,比如:
https://www.autohome.com.cn/spec/28679/#pvareaid=2042128
去看单个错误:
http://0.0.0.0:5000/task/autohomeBrandData:96d3707c65d5cb10b34803af3dd40c1b
<code>ACTIVE autohomeBrandData.carModelSpecPage > https://www.autohome.com.cn/spec/9007/#pvareaid=2042128 (5 minutes ago updated ) taskid 96d3707c65d5cb10b34803af3dd40c1b lastcrawltime 1526094255.333958 (5 minutes ago) updatetime 1526094255.333979 (5 minutes ago) exetime 1526115855.3339572 (May 12, 2018 at 9:04) track.fetch 27513.09ms { "content": "", "encoding": null, "error": "HTTP 599: Connection timed out after 27266 milliseconds", "headers": {}, "ok": false, "redirect_url": null, "status_code": 599, "time": 27.513092756271362 } track.process 0.44ms Document is empty s/base_handler.py", line 155, in _run_func ret = function(*arguments[:len(args) - 1]) File "<autohomeBrandData>", line 162, in carModelSpecPage File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py", line 144, in doc elements = self.etree File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py", line 160, in etree self._elements = lxml.html.fromstring(self.content) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py", line 876, in fromstring doc = document_fromstring(html, parser=parser, base_url=base_url, **kw) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py", line 765, in document_fromstring "Document is empty") lxml.etree.ParserError: Document is empty { "exception": "Document is empty", "follows": 0, "logs": "s/base_handler.py\", line 155, in _run_func\n ret = function(*arguments[:len(args) - 1])\n File \"<autohomeBrandData>\", line 162, in carModelSpecPage\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py\", line 144, in doc\n elements = self.etree\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py\", line 160, in etree\n self._elements = lxml.html.fromstring(self.content)\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py\", line 876, in fromstring\n doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py\", line 765, in document_fromstring\n \"Document is empty\")\n lxml.etree.ParserError: Document is empty\n", "ok": false, "result": null, "time": 0.0004379749298095703 } schedule { "exetime": 1526115855.3339572, "retried": 3 } fetch { "fetch_type": "js", "save": { "品牌": "欧宝", "子品牌": "欧宝", "状态": "停售", "车型": "2011款 三门基本型", "车型ID": "9007", "车系": "欧宝Corsa", "车系ID": "753" } } process { "callback": "carModelSpecPage" } </code>
所以要去搞清楚怎么回事。
去点击继续运行试试
等了一小会可以继续了:
然后终端中log继续输出了:
但是很快又出错:
状态又是PAUSED
然后又继续运行了。
但是log中:
<code>[I 180512 11:26:09 scheduler:906] task done autohomeBrandData:3f86597065a83874034d67801fa37cf1 https://www.autohome.com.cn/spec/32129/#pvareaid=2042128 [E 180512 11:26:09 tornado_fetcher:212] [599] autohomeBrandData:af7432f913d9923b98780ceb3d1cf953 https://www.autohome.com.cn/spec/20582/#pvareaid=2042128, HTTP 599: Failed to connect to 127.0.0.1 port 25555: Connection reset by peer 3.63s [E 180512 11:26:09 processor:202] process autohomeBrandData:af7432f913d9923b98780ceb3d1cf953 https://www.autohome.com.cn/spec/20582/#pvareaid=2042128 -> [599] len:0 -> result:None fol:0 msg:0 err:ParserError('Document is empty',) [I 180512 11:26:09 scheduler:959] task retry 0/3 autohomeBrandData:af7432f913d9923b98780ceb3d1cf953 https://www.autohome.com.cn/spec/20582/#pvareaid=2042128 [200] https://www.autohome.com.cn/spec/32128/#pvareaid=2042128 23.898 [I 180512 11:26:09 tornado_fetcher:520] [200] autohomeBrandData:71e9d3ff8cccd373c8316c682cec0970 https://www.autohome.com.cn/spec/32128/#pvareaid=2042128 23.90s [I 180512 11:26:10 processor:202] process autohomeBrandData:71e9d3ff8cccd373c8316c682cec0970 https://www.autohome.com.cn/spec/32128/#pvareaid=2042128 -> [200] len:259120 -> result:{'品牌': '福特 fol:0 msg:0 err:None AutohomeResultWorker on_result: task={'taskid': '71e9d3ff8cccd373c8316c682cec0970', 'project': 'autohomeBrandData', 'url': 'https://www.autohome.com.cn/spec/32128/#pvareaid=2042128', 'status': 1, 'schedule': {'retried': 2, 'exetime': 1526095409.917301}, 'fetch': {'save': {'品牌': '福特', '子品牌': '长安福特', '车系': '翼虎', '车系ID': '2863', '车型': '2018款 EcoBoost 180 两驱豪翼型', '车型ID': '32128', '状态': '在售'}, 'fetch_type': 'js'}, 'process': {'callback': 'carModelSpecPage'}, 'tr </code>
看起来像是正常保存数据了??
pyspider FETCH_ERROR HTTP 599 Connection timed out after milliseconds
https problems ? HTTP 599: Unknown · Issue #326 · binux/pyspider
pyspider Exception: HTTP 599: Resolving timed out after 20000 milliseconds | xdzhcs
pyspider – 抓取任何网站 HTTPError: HTTP 599: Resolving timed out – IT屋-程序员软件开发技术分享社区
此处怀疑:
之前Mac电脑开着pyspider,就出去了,时间长,屏幕锁屏了,导致断网了?
所以此处PAUSE了?
算了,停止了pyspider,重新去运行试试,确保电脑不锁屏,一直有网络,看看是否还会出现上述问题。
等等看是否会出现
然后之前错误又出现了:
<code>dictKeyList= dict_keys(['品牌', '子品牌', '车系', '车系ID', '车型', '车型ID', '状态', '经销商参考价', '厂商指导价']) dictValueList= dict_values(['斯柯达', '斯柯达(进口)', '速派(进口)', '357', '2010款 Combi 四驱', '7040', '停售', 0, 0]) keyListSql= `品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价` eachValueInSql= "斯柯达" eachValueInSql= "斯柯达(进口)" eachValueInSql= "速派(进口)" eachValueInSql= "357" eachValueInSql= "2010款 Combi 四驱" eachValueInSql= "7040" eachValueInSql= "停售" eachValueInSql= 0 eachValueInSql= 0 valueListSql= "斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0 insertSql= INSERT INTO tbl_autohome_car_info (`品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价`) VALUES ("斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0) executeSql: sqlStr=INSERT INTO tbl_autohome_car_info (`品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价`) VALUES ("斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0), actionDescription=Insert value to table tbl_autohome_car_info cursor= <pymysql.cursors.DictCursor object at 0x111ecad30> [I 180512 11:55:24 processor:202] process autohomeBrandData:ca522ee18e1592539931c927159e84f1 https://www.autohome.com.cn/spec/7040/#pvareaid=2042128 -> [200] len:226666 -> result:{'品牌': '斯柯 fol:0 msg:0 err:None Execute sql INSERT INTO tbl_autohome_car_info (`品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价`) VALUES ("斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0) occur error (1054, "Unknown column '品牌' in 'field list'") for Insert value to table tbl_autohome_car_info insertOk=False [I 180512 11:55:24 scheduler:959] task retry 0/3 autohomeBrandData:e0014ea9a7c519a8ce1269ec52fd06cc https://www.autohome.com.cn/spec/30996/#pvareaid=2042128 [E 180512 11:55:24 processor:202] process autohomeBrandData:27def1bc64a739b80a5c0e9d84d06077 https://www.autohome.com.cn/spec/21162/#pvareaid=2042128 -> [599] len:0 -> result:None fol:0 msg:0 err:ParserError('Document is empty',) [E 180512 11:55:24 processor:202] process autohomeBrandData:0ac27b91a1984105143abaa48c3850d1 https://www.autohome.com.cn/spec/19830/#pvareaid=2042128 -> [599] len:0 -> result:None fol:0 msg:0 err:ParserError('Document is empty',) </code>
而此处去用浏览器去打开:
https://www.autohome.com.cn/spec/30996/#pvareaid=2042128
是正常的。
先去把
10/20
改为:
5/20
降低每秒打开url的频率:
看看之后是否还会出现上面的错误。
结果又出现了:
<code>I 180512 15:13:17 scheduler:959] task retry 0/3 autohomeBrandData:b09c3afbec138a5dc585792e4cc598e5 https://www.autohome.com.cn/spec/11422/#pvareaid=2042128 [E 180512 15:13:18 tornado_fetcher:212] [599] autohomeBrandData:9095b295684996611ba1d4098bbb28f2 https://www.autohome.com.cn/spec/14476/#pvareaid=2042128, HTTP 599: Connection timed out after 20000 milliseconds 20.00s ... [I 180512 15:13:19 scheduler:959] task retry 0/3 autohomeBrandData:4dedfd01165f87670 </code>
后来还是出错:
所以,看起来不太像是:每秒请求太多导致的
最后发现是用了phantomjs导致的问题:
转载请注明:在路上 » 【未解决】pyspider运行出错:FETCH_ERROR HTTP 599 Connection timed out after milliseconds