【问题】
在折腾自己的一个python脚本,把所使用的chardet库,从1.0.1升级到1.1后,结果运行我的的python脚本,却出错:
1 2 3 4 5 6 | LINE 810 : INFO [0001] http: //againinput4 .blog.163.com /blog/static/172799491201091513711591 LINE 886 : INFO Title = intro libs /thirdparty \chardet\universaldetector.py:90: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - inte rpreting them as being unequal if aBuf[:len(chunk)] == chunk: LINE 1617 : INFO Exporting items at last ... |
【解决过程】
1.后来换为旧的1.0.1版本的chardet,就可以正常运行了,就没了此警告。
2。去比较了一下,两者之间的代码的区别:
(1)1.0.1的chardet的universaldetector.py的相关的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | def feed( self , aBuf): if self .done: return charmap = ( # EF BB BF UTF-8 with BOM ( '\xEF\xBB\xBF' , { 'encoding' : "UTF-8" , 'confidence' : 1.0 }), # FF FE 00 00 UTF-32, little-endian BOM ( '\xFF\xFE\x00\x00' , { 'encoding' : "UTF-32LE" , 'confidence' : 1.0 }), # 00 00 FE FF UTF-32, big-endian BOM ( '\x00\x00\xFE\xFF' , { 'encoding' : "UTF-32BE" , 'confidence' : 1.0 }), # FE FF 00 00 UCS-4, unusual octet order BOM (3412) (u '\xFE\xFF\x00\x00' , { 'encoding' : "X-ISO-10646-UCS-4-3412" , 'confidence' : 1.0 }), # 00 00 FF FE UCS-4, unusual octet order BOM (2143) (u '\x00\x00\xFF\xFE' , { 'encoding' : "X-ISO-10646-UCS-4-2143" , 'confidence' : 1.0 }), # FF FE UTF-16, little endian BOM ( '\xFF\xFE' , { 'encoding' : "UTF-16LE" , 'confidence' : 1.0 }), # FE FF UTF-16, big endian BOM ( '\xFE\xFF' , { 'encoding' : "UTF-16BE" , 'confidence' : 1.0 }), ) aLen = len (aBuf) if not aLen: return if not self ._mGotData: # If the data starts with BOM, we know it is UTF for chunk, result in charmap: if aBuf[: len (chunk)] = = chunk: self .result = result break |
(2)1.1的chardet的universaldetector.py的相关的代码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | def feed( self , aBuf): if self .done: return aLen = len (aBuf) if not aLen: return if not self ._mGotData: # If the data starts with BOM, we know it is UTF if aBuf[: 3 ] = = '\xEF\xBB\xBF' : # EF BB BF UTF-8 with BOM self .result = { 'encoding' : "UTF-8" , 'confidence' : 1.0 } elif aBuf[: 4 ] = = '\xFF\xFE\x00\x00' : # FF FE 00 00 UTF-32, little-endian BOM self .result = { 'encoding' : "UTF-32LE" , 'confidence' : 1.0 } elif aBuf[: 4 ] = = '\x00\x00\xFE\xFF' : # 00 00 FE FF UTF-32, big-endian BOM self .result = { 'encoding' : "UTF-32BE" , 'confidence' : 1.0 } elif aBuf[: 4 ] = = '\xFE\xFF\x00\x00' : # FE FF 00 00 UCS-4, unusual octet order BOM (3412) self .result = { 'encoding' : "X-ISO-10646-UCS-4-3412" , 'confidence' : 1.0 } elif aBuf[: 4 ] = = '\x00\x00\xFF\xFE' : # 00 00 FF FE UCS-4, unusual octet order BOM (2143) self .result = { 'encoding' : "X-ISO-10646-UCS-4-2143" , 'confidence' : 1.0 } elif aBuf[: 2 ] = = '\xFF\xFE' : # FF FE UTF-16, little endian BOM self .result = { 'encoding' : "UTF-16LE" , 'confidence' : 1.0 } elif aBuf[: 2 ] = = '\xFE\xFF' : # FE FF UTF-16, big endian BOM self .result = { 'encoding' : "UTF-16BE" , 'confidence' : 1.0 } |
从代码中,其实好像也看不出太多问题。
3. 而网上找了些相关的错误的解释,很多说是,由于使用chardet之前,需要先用yourStr.decode("utf16-be")之类的先去解码,然后就可以解决问题了。
而我此处,懒得去详细折腾了。毕竟出现的警告是由于chardet库从1.0.1升级到1.1所导致的。
【总结】
而此处,或许是需要我调用chardet前,也需要处理编码/解码等事情,或许是chardet库本身写的不够好。
但是我的此处的需求是,chardet够用,也就好了,暂时没兴趣去详细调试,等以后有需要时再说。
所以此处的解决办法是:
暂时不把chardet升级到1.1了,还是使用旧的1.0.1的chardet就够用了,也不会有此警告出现。
转载请注明:在路上 » 【已解决】Python脚本运行出错:libs/thirdparty\chardet\universaldetector.py:90: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode – interpreting them as being unequal