折腾:
【未解决】PySpider中PyQuery中把得到的html的text转换为纯文本字符串
期间,用代码:
import xml def htmlToString(htmlText): return ''.join(xml.etree.ElementTree.fromstring(htmlText).itertext()) description = htmlToString(descriptionText) print("description=%s" % description)
结果出错:
[E 181011 11:53:18 base_handler:203] module 'xml' has no attribute 'etree' Traceback (most recent call last): File "/Users/crifan/.local/share/virtualenvs/crawler_scholastic_storybook-ttmbK5Yf/lib/python3.6/site-packages/pyspider/libs/base_handler.py", line 196, in run_task result = self._run_task(task, response) File "/Users/crifan/.local/share/virtualenvs/crawler_scholastic_storybook-ttmbK5Yf/lib/python3.6/site-packages/pyspider/libs/base_handler.py", line 176, in _run_task return self._run_func(function, response, task) File "/Users/crifan/.local/share/virtualenvs/crawler_scholastic_storybook-ttmbK5Yf/lib/python3.6/site-packages/pyspider/libs/base_handler.py", line 155, in _run_func ret = function(*arguments[:len(args) - 1]) File "<ScholasticStorybook>", line 226, in singleBookCallback File "<ScholasticStorybook>", line 18, in htmlToString AttributeError: module 'xml' has no attribute 'etree'
但是刚发现:
Python 3.3之后就加了:xml.etree.ElementTree
-》所以前面不应该报错:
AttributeError: module ‘xml’ has no attribute ‘etree’
才对啊
python AttributeError: module ‘xml’ has no attribute ‘etree’
此处说是:
导入xm
l-》正常会导入其下所有子模块
-》但是此处没有导入子模块etree.ElementTree
-》自己去完整导入:
然后用:
import xml.etree.ElementTree as ElementTree def htmlToString(htmlText): # return ''.join(xml.etree.ElementTree.fromstring(htmlText).itertext()) # return ''.join(lxml.etree.ElementTree.fromstring(htmlText).itertext()) # return ''.join(lxml.etree.fromstring(htmlText).itertext()) return ''.join(ElementTree.fromstring(htmlText).itertext())
好像就解决了导入的问题。
【总结】
此处Python 3.6中,其实xml下面是有etree.ElementTree的。但是由于:
import xml
正常情况:会导入其中所有子模块的
特殊:此处xml并不会导入etree
所以会报错:
AttributeError: module ‘xml’ has no attribute ‘etree’
解决办法:
手动导入需要的etree(的ElementTree):
import xml.etree.ElementTree as ElementTree
后面再去直接用ElementTree:
''.join(ElementTree.fromstring(htmlText).itertext())
即可。
转载请注明:在路上 » 【已解决】Python中xml.etree.ElementTree出错:AttributeError: module ‘xml’ has no attribute ‘etree’