最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】pyppeteer中page.querySelectorAll运行时无法获取到结果

结果 crifan 2435浏览 0评论
折腾:
【未解决】pyppeteer中提取百度搜索结果中的信息
期间,之前调试时可以获取到结果的代码:
    resultASelector = "h3[class^='t'] a"
    searchResultAList = await page.querySelectorAll(resultASelector)
去直接运行,没加断点,结果就找不到结果了:
直接运行,会导致找不到数据
但是也加了await了啊
也没报警告啊
加了断点,等了点时间,就可以找到:
需要搞清楚原因
javascript – Using Puppeteer, how do you get text from an <h1> tag? – Stack Overflow
有个:page.waitForSelector
去看看
API Reference — Pyppeteer 0.0.25 documentation
有:
“waitFor(selectorOrFunctionOrTimeout: Union[str, int, float], options: dict = None, *args, **kwargs) → Awaitable[T_co]”
和:
“waitForSelector(selector: str, options: dict = None, **kwargs) → Awaitable[T_co]”
但是没有希望的:
waitForSelectorAll
-》看来只能用:
waitForSelector 或 waitFor
去匹配之前的:
【整理】用Chrome或Chromium查看百度首页中各元素的html源码
的:
<span class="nums_text">百度为您找到相关结果约2,370,000个</span>
puppeteer querySelectorAll no result
puppeteer querySelector no result
javascript – Puppeteer page.evaluate querySelectorAll return empty objects – Stack Overflow
puppeteer: document.querySelector works on console but not on pupppeteer – Stack Overflow
javascript – Puppeteer’s page.evaluate() always returns empty object when used in Electron – Stack Overflow
javascript – Puppeteer with document.querySelector() returning null – Stack Overflow
node.js – Query selector with puppeteer returns empty array – Stack Overflow
await page.waitForSelector('div.landscape h3.title'); // <-- add this line //updated from page.waitFor that is getting deprecated in 2020
去试试:
    await page.waitForSelector('span.nums_text')
结果:
直接运行,是可以的
去看看源码:
/Users/crifan/.pyenv/versions/3.6.6/lib/python3.6/site-packages/pyppeteer/page.py
    def waitForSelector(self, selector: str, options: dict = None,
                        **kwargs: Any) -> Awaitable:
        """Wait until element which matches ``selector`` appears on page.
【规避解决】pyppeteer不调试直接运行waitForSelector报错:ElementHandleError Evaluation failed TypeError MutationObserver is not a constructor at pollMutation
此处用代码:
    # Method 2: wait element showing
    SingleWaitSeconds = 1
    while not await page.querySelector(SearchFoundWordsSelector):
      print("Still not found %s, wait %s seconds" % (SearchFoundWordsSelector, SingleWaitSeconds))
      await asyncio.sleep(SingleWaitSeconds)
      # pass
算是实现了等待元素出现。
后续即可正常检测到想要的元素了。
【总结】
此处,之所以
await page.querySelectorAll(resultASelector)
获取结果为空
是由于页面还没加载出结果
所以要去等待页面内容,点击搜索按钮后,重启加载完毕
所以最后用代码:
################################################################################
    # Wait page reload complete
    ################################################################################
    SearchFoundWordsSelector = 'span.nums_text'
    SearchFoundWordsXpath = "//span[@class='nums_text']"


    # await page.waitForSelector(SearchFoundWordsSelector)
    # await page.waitFor(SearchFoundWordsSelector)
    # await page.waitForXPath(SearchFoundWordsXpath)
    # Note: all above exception: 发生异常: ElementHandleError Evaluation failed: TypeError: MutationObserver is not a constructor
    #   so change to following


    # # Method 1: just wait
    # await page.waitFor(2000) # millisecond


    # Method 2: wait element showing
    SingleWaitSeconds = 1
    while not await page.querySelector(SearchFoundWordsSelector):
      print("Still not found %s, wait %s seconds" % (SearchFoundWordsSelector, SingleWaitSeconds))
      await asyncio.sleep(SingleWaitSeconds)
      # pass
即可实现,等待页面中:
<span class="nums_text">百度为您找到相关结果约2,370,000个</span>
的这种元素出现
此处意味着百度搜索结果加载完毕。
后续即可正常搜索到内容了。

转载请注明:在路上 » 【已解决】pyppeteer中page.querySelectorAll运行时无法获取到结果

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
82 queries in 0.506 seconds, using 22.06MB memory