最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【未解决】pyspider运行出错:FETCH_ERROR HTTP 599 Connection timed out after milliseconds

pyspider crifan 4972浏览 0评论

折腾:

【已解决】PySpider中保存数据到mysql

期间,运行了很长时间之后回来发现:

运行状态是PAUSED

-》难道是屏幕锁屏了导致的?

主要是看到错误信息了:

<code>1. FETCH_ERROR autohomeBrandData &gt; https://www.autohome.com.cn/spec/29030/#pvareaid=2042128 46 seconds ago 2043.8+0.40ms +0
2. FETCH_ERROR autohomeBrandData &gt; https://www.autohome.com.cn/spec/28679/#pvareaid=2042128 46 seconds ago 2044.0+0.39ms +0
</code>

但是对于出错的地址,都是可以的正常打开的,比如:

https://www.autohome.com.cn/spec/28679/#pvareaid=2042128

去看单个错误:

http://0.0.0.0:5000/task/autohomeBrandData:96d3707c65d5cb10b34803af3dd40c1b

<code>ACTIVE autohomeBrandData.carModelSpecPage &gt; https://www.autohome.com.cn/spec/9007/#pvareaid=2042128 (5 minutes ago updated )
taskid
96d3707c65d5cb10b34803af3dd40c1b
lastcrawltime
1526094255.333958 (5 minutes ago)
updatetime
1526094255.333979 (5 minutes ago)
exetime
1526115855.3339572 (May 12, 2018 at 9:04)
track.fetch  27513.09ms
{ "content": "", "encoding": null, "error": "HTTP 599: Connection timed out after 27266 milliseconds", "headers": {}, "ok": false, "redirect_url": null, "status_code": 599, "time": 27.513092756271362 }
track.process  0.44ms
Document is empty s/base_handler.py", line 155, in _run_func ret = function(*arguments[:len(args) - 1]) File "&lt;autohomeBrandData&gt;", line 162, in carModelSpecPage File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py", line 144, in doc elements = self.etree File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py", line 160, in etree self._elements = lxml.html.fromstring(self.content) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py", line 876, in fromstring doc = document_fromstring(html, parser=parser, base_url=base_url, **kw) File "/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py", line 765, in document_fromstring "Document is empty") lxml.etree.ParserError: Document is empty { "exception": "Document is empty", "follows": 0, "logs": "s/base_handler.py\", line 155, in _run_func\n ret = function(*arguments[:len(args) - 1])\n File \"&lt;autohomeBrandData&gt;\", line 162, in carModelSpecPage\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py\", line 144, in doc\n elements = self.etree\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/pyspider/libs/response.py\", line 160, in etree\n self._elements = lxml.html.fromstring(self.content)\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py\", line 876, in fromstring\n doc = document_fromstring(html, parser=parser, base_url=base_url, **kw)\n File \"/Users/crifan/.local/share/virtualenvs/AutocarData-xI-iqIq4/lib/python3.6/site-packages/lxml/html/__init__.py\", line 765, in document_fromstring\n \"Document is empty\")\n lxml.etree.ParserError: Document is empty\n", "ok": false, "result": null, "time": 0.0004379749298095703 }
schedule
{ "exetime": 1526115855.3339572, "retried": 3 }
fetch
{ "fetch_type": "js", "save": { "品牌": "欧宝", "子品牌": "欧宝", "状态": "停售", "车型": "2011款 三门基本型", "车型ID": "9007", "车系": "欧宝Corsa", "车系ID": "753" } }
process
{ "callback": "carModelSpecPage" }
</code>

所以要去搞清楚怎么回事。

去点击继续运行试试

等了一小会可以继续了:

然后终端中log继续输出了:

但是很快又出错:

状态又是PAUSED

然后又继续运行了。

但是log中:

<code>[I 180512 11:26:09 scheduler:906] task done autohomeBrandData:3f86597065a83874034d67801fa37cf1 https://www.autohome.com.cn/spec/32129/#pvareaid=2042128
[E 180512 11:26:09 tornado_fetcher:212] [599] autohomeBrandData:af7432f913d9923b98780ceb3d1cf953 https://www.autohome.com.cn/spec/20582/#pvareaid=2042128, HTTP 599: Failed to connect to 127.0.0.1 port 25555: Connection reset by peer 3.63s
[E 180512 11:26:09 processor:202] process autohomeBrandData:af7432f913d9923b98780ceb3d1cf953 https://www.autohome.com.cn/spec/20582/#pvareaid=2042128 -&gt; [599] len:0 -&gt; result:None fol:0 msg:0 err:ParserError('Document is empty',)
[I 180512 11:26:09 scheduler:959] task retry 0/3 autohomeBrandData:af7432f913d9923b98780ceb3d1cf953 https://www.autohome.com.cn/spec/20582/#pvareaid=2042128
[200] https://www.autohome.com.cn/spec/32128/#pvareaid=2042128 23.898
[I 180512 11:26:09 tornado_fetcher:520] [200] autohomeBrandData:71e9d3ff8cccd373c8316c682cec0970 https://www.autohome.com.cn/spec/32128/#pvareaid=2042128 23.90s
[I 180512 11:26:10 processor:202] process autohomeBrandData:71e9d3ff8cccd373c8316c682cec0970 https://www.autohome.com.cn/spec/32128/#pvareaid=2042128 -&gt; [200] len:259120 -&gt; result:{'品牌': '福特 fol:0 msg:0 err:None
AutohomeResultWorker on_result: task={'taskid': '71e9d3ff8cccd373c8316c682cec0970', 'project': 'autohomeBrandData', 'url': 'https://www.autohome.com.cn/spec/32128/#pvareaid=2042128', 'status': 1, 'schedule': {'retried': 2, 'exetime': 1526095409.917301}, 'fetch': {'save': {'品牌': '福特', '子品牌': '长安福特', '车系': '翼虎', '车系ID': '2863', '车型': '2018款 EcoBoost 180 两驱豪翼型', '车型ID': '32128', '状态': '在售'}, 'fetch_type': 'js'}, 'process': {'callback': 'carModelSpecPage'}, 'tr
</code>

看起来像是正常保存数据了??

pyspider FETCH_ERROR HTTP 599 Connection timed out after milliseconds

Problem with traceback on “HTTP 599: Connection timed out after 20001 milliseconds” · Issue #578 · binux/pyspider

https problems ? HTTP 599: Unknown · Issue #326 · binux/pyspider

pyspider Exception: HTTP 599: Resolving timed out after 20000 milliseconds | xdzhcs

pyspider – 抓取任何网站 HTTPError: HTTP 599: Resolving timed out – IT屋-程序员软件开发技术分享社区

此处怀疑:

之前Mac电脑开着pyspider,就出去了,时间长,屏幕锁屏了,导致断网了?

所以此处PAUSE了?

算了,停止了pyspider,重新去运行试试,确保电脑不锁屏,一直有网络,看看是否还会出现上述问题。

等等看是否会出现

然后之前错误又出现了:

<code>dictKeyList= dict_keys(['品牌', '子品牌', '车系', '车系ID', '车型', '车型ID', '状态', '经销商参考价', '厂商指导价']) dictValueList= dict_values(['斯柯达', '斯柯达(进口)', '速派(进口)', '357', '2010款 Combi 四驱', '7040', '停售', 0, 0])
keyListSql= `品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价`
eachValueInSql= "斯柯达"
eachValueInSql= "斯柯达(进口)"
eachValueInSql= "速派(进口)"
eachValueInSql= "357"
eachValueInSql= "2010款 Combi 四驱"
eachValueInSql= "7040"
eachValueInSql= "停售"
eachValueInSql= 0
eachValueInSql= 0
valueListSql= "斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0
insertSql= INSERT INTO tbl_autohome_car_info (`品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价`) VALUES ("斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0)
executeSql: sqlStr=INSERT INTO tbl_autohome_car_info (`品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价`) VALUES ("斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0), actionDescription=Insert value to table tbl_autohome_car_info
cursor= &lt;pymysql.cursors.DictCursor object at 0x111ecad30&gt;
[I 180512 11:55:24 processor:202] process autohomeBrandData:ca522ee18e1592539931c927159e84f1 https://www.autohome.com.cn/spec/7040/#pvareaid=2042128 -&gt; [200] len:226666 -&gt; result:{'品牌': '斯柯 fol:0 msg:0 err:None
Execute sql INSERT INTO tbl_autohome_car_info (`品牌`, `子品牌`, `车系`, `车系ID`, `车型`, `车型ID`, `状态`, `经销商参考价`, `厂商指导价`) VALUES ("斯柯达", "斯柯达(进口)", "速派(进口)", "357", "2010款 Combi 四驱", "7040", "停售", 0, 0) occur error (1054, "Unknown column '品牌' in 'field list'") for Insert value to table tbl_autohome_car_info
insertOk=False
[I 180512 11:55:24 scheduler:959] task retry 0/3 autohomeBrandData:e0014ea9a7c519a8ce1269ec52fd06cc https://www.autohome.com.cn/spec/30996/#pvareaid=2042128
[E 180512 11:55:24 processor:202] process autohomeBrandData:27def1bc64a739b80a5c0e9d84d06077 https://www.autohome.com.cn/spec/21162/#pvareaid=2042128 -&gt; [599] len:0 -&gt; result:None fol:0 msg:0 err:ParserError('Document is empty',)
[E 180512 11:55:24 processor:202] process autohomeBrandData:0ac27b91a1984105143abaa48c3850d1 https://www.autohome.com.cn/spec/19830/#pvareaid=2042128 -&gt; [599] len:0 -&gt; result:None fol:0 msg:0 err:ParserError('Document is empty',)
</code>

而此处去用浏览器去打开:

https://www.autohome.com.cn/spec/30996/#pvareaid=2042128

是正常的。

先去把

10/20

改为:

5/20

降低每秒打开url的频率:

看看之后是否还会出现上面的错误。

结果又出现了:

<code>I 180512 15:13:17 scheduler:959] task retry 0/3 autohomeBrandData:b09c3afbec138a5dc585792e4cc598e5 https://www.autohome.com.cn/spec/11422/#pvareaid=2042128
[E 180512 15:13:18 tornado_fetcher:212] [599] autohomeBrandData:9095b295684996611ba1d4098bbb28f2 https://www.autohome.com.cn/spec/14476/#pvareaid=2042128, HTTP 599: Connection timed out after 20000 milliseconds 20.00s
...
[I 180512 15:13:19 scheduler:959] task retry 0/3 autohomeBrandData:4dedfd01165f87670
</code>

后来还是出错:

所以,看起来不太像是:每秒请求太多导致的

最后发现是用了phantomjs导致的问题:

【未解决】PySpider中给crawl加fetch_type为js后内部调用phantomjs结果导致大量出错:HTTP 599 Connection timed out after 20000 milliseconds 20.00s

转载请注明:在路上 » 【未解决】pyspider运行出错:FETCH_ERROR HTTP 599 Connection timed out after milliseconds

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
84 queries in 0.181 seconds, using 22.21MB memory