最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】PySpider访问汽车之家报错:requests.exceptions.HTTPError HTTP 403 Forbidden

HTTP crifan 1674浏览 0评论
折腾:
【未解决】用Python爬取汽车之家的车型车系详细数据
期间,
去写代码调试
class Handler(BaseHandler):
    UserAgent_Mac_Chrome = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"
    crawl_config = {
        "User-Agent": UserAgent_Mac_Chrome,
    }

    # @every(minutes=24 * 60)
    def on_start(self):
        autohomeEntryUrl = "https://www.autohome.com.cn/car/"
        self.crawl(autohomeEntryUrl, callback=self.carBrandListCallback)
还是会
HTTP 403: Forbidden
[E 200814 20:38:23 base_handler:203] HTTP 403: Forbidden
    Traceback (most recent call last):
      File "/Users/xxx/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider/libs/base_handler.py", line 196, in run_task
        result = self._run_task(task, response)
      File "/Users/xxx/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider/libs/base_handler.py", line 175, in _run_task
        response.raise_for_status()
      File "/Users/xxx/.pyenv/versions/3.6.5/lib/python3.6/site-packages/pyspider/libs/response.py", line 184, in raise_for_status
        raise http_error
    requests.exceptions.HTTPError: HTTP 403: Forbidden
看了看html是:
<html>
  <head>
    <title>
      ERROR: ACCESS DENIED
    </title>
  </head>
  <body>
    <center>
      <h1>
        ERROR: ACCESS DENIED
      </h1>
    </center>
    <hr>
    <center>
      Fri, 14 Aug 2020 12:38:23 GMT (taikoo/BC232_dx-jiangsu-xuzhou-4-cache-1)
    </center>
    </BODY>
  </HTML>
<!-- web cache -->
去看了看:
request headers还是不少的
尤其是cookie,感觉还是不容易解决。
在想,万一不好解决,就换:puppeteer
先去加上一些其他header试试
self.crawl – pyspider
写错了,改为:
    crawl_config = {
        "headers": {
            "User-Agent": UserAgent_Mac_Chrome,
        }
    }
结果:
还真就可以了。。。
【总结】
此处加上User-Agent:
class Handler(BaseHandler):
    UserAgent_Mac_Chrome = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36"
    crawl_config = {
        "headers": {
            "User-Agent": UserAgent_Mac_Chrome,
        }
    }


    # @every(minutes=24 * 60)
    def on_start(self):
        autohomeEntryUrl = "https://www.autohome.com.cn/car/"
        self.crawl(autohomeEntryUrl, callback=self.carBrandListCallback)
即可。

转载请注明:在路上 » 【已解决】PySpider访问汽车之家报错:requests.exceptions.HTTPError HTTP 403 Forbidden

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
89 queries in 0.186 seconds, using 22.11MB memory