折腾:
【未解决】pyspider运行出错:FETCH_ERROR HTTP 599 Connection timed out after milliseconds
期间,看到:
Command Line Interface | PhantomJS
“* –load-images=[true|false] load all inlined images (default is true). Also accepted: [yes|no].”
希望通过给pyspider中phantomjs传递额外参数,比如:
–load-images=false
-》从而实现不加载图片
-〉或许能提高phantomj加载页面速度
-》 减少甚至消除此处问题?
通过搜:
pyspider phantomjs memory limit
看到:
pyspider爬虫学习-文档翻译-Command-Line.md – sijinge
“phantomjs
———
“`
用法
Usage: run.py phantomjs [OPTIONS] [ARGS]…
如果安装了phantomjs,就可以运行phantomjs抓取
Run phantomjs fetcher if phantomjs is installed.
选项
Options:
–phantomjs-path TEXT phantomjs path #phantomjs 路径
–port INTEGER phantomjs port #phantomjs 端口
–auto-restart TEXT auto restart phantomjs if crashed #如果phantomjs崩溃自动重启
–help Show this message and exit.#显示帮助信息并退出
“`
#### ARGS
#添加args传递到phantomjs命令行
Addition args pass to phantomjs command line.”
好像
单独运行phantomjs后
然后可以传递额外参数给phantomjs的
去想办法给pyspider中的phantomjs传递额外参数,比如不加载图片,
pyspider phantomjs pass extra param
pyspider phantomjs extra param
phantomjs 运行
PhantomJS基础及示例 – 腾讯Web前端 IMWeb 团队社区 | blog | 团队博客
WebStorm用PhantomJS运行JavaScript程序 – CSDN博客
phantomjs pyspider 参数
phantomjs免安装怎么连上 pyspider_百度知道
在搜:
pyspider phantomjs proxy
pyspider – 究竟怎么给phantomjs设置代理? – SegmentFault 思否
好像看到了我要的:
此处应参考:
去运行phantomjs时,指定配置文件:
<code># phantomjs pyspider -c config.json phantomjs </code>
然后在配置文件中,加上自己想要设置的参数
除了专门针对phantomjs的参数:
<code>➜ AutocarData pyspider phantomjs --help Usage: pyspider phantomjs [OPTIONS] [ARGS]... Run phantomjs fetcher if phantomjs is installed. Options: --phantomjs-path TEXT phantomjs path --port INTEGER phantomjs port --auto-restart TEXT auto restart phantomjs if crashed --help Show this message and exit. </code>
其他参数,都会被传递给phantomjs的
所以去加上:
phantomjs_config.json
<code>{ "port": 23450, "auto-restart": true, "load-images": false, "debug": true } </code>
然后先去运行pyspider的phantomjs
<code>➜ AutocarData ll total 88 -rw-r--r-- 1 crifan staff 9.5K 5 12 12:19 AutohomeResultWorker.py -rw-r--r-- 1 crifan staff 218B 5 5 22:53 Pipfile -rw-r--r-- 1 crifan staff 5.7K 5 5 22:54 Pipfile.lock drwxr-xr-x 5 crifan staff 160B 5 12 08:48 ResultWorker_refer -rw-r--r-- 1 crifan staff 11K 5 15 22:28 autohomeBrandData.py -rw-r--r-- 1 crifan staff 218B 5 17 11:03 config.json drwxr-xr-x 8 crifan staff 256B 5 17 11:22 data -rw-r--r-- 1 crifan staff 84B 5 17 11:46 phantomjs_config.json ➜ AutocarData cat phantomjs_config.json { "port": 23450, "auto-restart": true, "load-images": false, "debug": true }% ➜ AutocarData pyspider -c phantomjs_config.json phantomjs phantomjs fetcher running on port 25555 </code>
突然发现好像不对啊:
端口没传递进去,之前是:
<code>➜ AutocarData pyspider phantomjs --port 23450 --auto-restart true phantomjs fetcher running on port 23450 </code>
端口生效了。
此处:
phantomjs fetcher running on port 25555
端口没生效啊
那算了,干脆命令行去指定参数算了
pyspider phantomjs ARGS
“phantomjs
Usage: run.py phantomjs [OPTIONS] [ARGS]…
Run phantomjs fetcher if phantomjs is installed.
Options:
–phantomjs-path TEXT phantomjs path
–port INTEGER phantomjs port
–auto-restart TEXT auto restart phantomjs if crashed
–help Show this message and exit.
ARGS
Addition args pass to phantomjs command line.”
所以好像是:
只有在命令行中指定的参数(而不是json配置中),才会传递给phantomjs
然后再去运行pyspider
<code>➜ AutocarData ll total 88 -rw-r--r-- 1 crifan staff 9.5K 5 12 12:19 AutohomeResultWorker.py -rw-r--r-- 1 crifan staff 218B 5 5 22:53 Pipfile -rw-r--r-- 1 crifan staff 5.7K 5 5 22:54 Pipfile.lock drwxr-xr-x 5 crifan staff 160B 5 12 08:48 ResultWorker_refer -rw-r--r-- 1 crifan staff 11K 5 15 22:28 autohomeBrandData.py -rw-r--r-- 1 crifan staff 218B 5 17 11:03 config.json drwxr-xr-x 8 crifan staff 256B 5 17 11:22 data -rw-r--r-- 1 crifan staff 84B 5 17 11:46 phantomjs_config.json ➜ AutocarData cat config.json { "resultdb": "mysql+resultdb://root:[email protected]:3306/AutohomeResultdb", "result_worker":{ "result_cls": "AutohomeResultWorker.AutohomeResultWorker" }, "phantomjs-proxy": "127.0.0.1:23450" }% ➜ AutocarData pyspider -c config.json </code>
所以先去确保参数能传递进去再说
<code>➜ AutocarData pyspider phantomjs --port 23450 --auto-restart true --load-images false --debug true Error: no such option: --load-images </code>
很明显,无法传递过去
把参数放到主配置中:
<code>➜ AutocarData cat config.json { "resultdb": "mysql+resultdb://root:[email protected]:3306/AutohomeResultdb", "result_worker":{ "result_cls": "AutohomeResultWorker.AutohomeResultWorker" }, "phantomjs-proxy": "127.0.0.1:23450", "phantomjs" : { "port": 23450, "auto-restart": true, "load-images": false, "debug": true } }% ➜ AutocarData pyspider -c config.json phantomjs phantomjs fetcher running on port 23450 </code>
看起来,参数传递进去了?
至少是:此处输出看起来是连接到23450了?
改为:
<code>➜ AutocarData cat config.json { "resultdb": "mysql+resultdb://root:[email protected]:3306/AutohomeResultdb", "result_worker":{ "result_cls": "AutohomeResultWorker.AutohomeResultWorker" }, "phantomjs" : { "port": 23450, "auto-restart": true, "load-images": false, "debug": true } }% ➜ AutocarData pyspider -c config.json phantomjs phantomjs fetcher running on port 23450 </code>
结果:看起来的确是正确把phantomjs中的参数传递给了phantomjs了?
转载请注明:在路上 » 【未解决】pyspider中如何给phantomjs传递额外参数