【问题】
折腾:
【未解决】Python中使用json.loads解码字符串时出错:ValueError: Expecting property name: line 1 column 1 (char 1)
的过程中,结果又出错:
LINE 106 : INFO photoInfoJson={id:’379879-87329678′,owner:’379879′,ownername:’shanshu’,title:’IMG_3464′,description:”,bucket:’shanshu’,key:’CsFzMuHz’,license:0,stats_notes: 0,albums: [‘379879-18 。。。 。。。 photoInfoDict = json.loads(photoInfoJsonDoubleQuote); |
【解决过程】
1.参考自己的帖子:
【已解决】Python中解析Json文件出错:ValueError : No JSON object could be decoded –> Python中Json库不支持带BOM的UTF-8
去添加编码参数试试:
1 2 3 4 5 6 7 8 9 10 11 | photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace( "'", "\""); logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'" , '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8" ); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); logging.info( "photoInfoDict=%s" , photoInfoDict); |
结果是问题依旧。
虽然知道原因,此处json库不支持带BOM的UTF-8,但是此处是获得的字符串photoInfoJson,
所以,没法通过notepad++等去转换文件为不带BOM的UTF-8。
所以,需要重新想办法。
2.结果手动重新解码和编码:
1 2 3 4 5 6 7 8 9 10 11 12 13 | photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace( "'", "\""); logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); photoInfoJsonDoubleQuoteUni = photoInfoJsonDoubleQuote.decode("UTF-8"); photoInfoJsonDoubleQuoteUtf8 = photoInfoJsonDoubleQuoteUni.encode("UTF-8"); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'" , '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8"); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); photoInfoDict = json.loads(photoInfoJsonDoubleQuoteUtf8); logging.info( "photoInfoDict=%s" , photoInfoDict); |
结果还是
ValueError: No JSON object could be decoded
的错误。
3.结果去用代码测试了下,当前的确本身就是UTF-8的字符串:
1 2 | print "type(photoInfoJson)=" , type (photoInfoJson); #type(photoInfoJson)= <type 'str'> print crifanLib.getStrPossibleCharset(photoInfoJson); #utf-8 |
但是不知道为何无法解码。
4.直接试试,单引号:
1 2 3 4 5 6 7 8 9 10 11 12 13 | photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); #photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace("'", "\""); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoJsonDoubleQuoteUni = photoInfoJsonDoubleQuote.decode("UTF-8"); #photoInfoJsonDoubleQuoteUtf8 = photoInfoJsonDoubleQuoteUni.encode("UTF-8"); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'", '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8"); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); photoInfoDict = json.loads(photoInfoJsonAddQuote); logging.info( "photoInfoDict=%s" , photoInfoDict); |
看看效果,你的确会出现:
ValueError: Expecting property name: line 1 column 1 (char 1)
的错误。
5.仍旧再参考:
【已解决】Python中解析Json文件出错:ValueError : No JSON object could be decoded –> Python中Json库不支持带BOM的UTF-8
去试试,把其转换为ANSI的GB18030,:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace( "'", "\""); logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); photoInfoJsonDoubleQuoteUni = photoInfoJsonDoubleQuote.decode("UTF-8"); photoInfoJsonAddQuoteAnsi = photoInfoJsonDoubleQuoteUni.encode("GB18030"); print "type(photoInfoJson)=",type(photoInfoJsonAddQuoteAnsi); print crifanLib.getStrPossibleCharset(photoInfoJsonAddQuoteAnsi); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'" , '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8"); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); photoInfoDict = json.loads(photoInfoJsonAddQuoteAnsi); logging.info( "photoInfoDict=%s" , photoInfoDict); |
结果问题依旧。
6.参考手册的解释:
json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]]) Deserialize s (a str or unicode instance containing a JSON document) to a Python object.
If s is a str instance and is encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified. Encodings that are not ASCII based (such as UCS-2) are not allowed and should be decoded to unicode first.
The other arguments have the same meaning as in load().
去改为unicode试试:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace( "'", "\""); logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); photoInfoJsonDoubleQuoteUni = photoInfoJsonDoubleQuote.decode("UTF-8"); #photoInfoJsonAddQuoteAnsi = photoInfoJsonDoubleQuoteUni.encode("GB18030"); print "type(photoInfoJson)=",type(photoInfoJsonDoubleQuoteUni); #print crifanLib.getStrPossibleCharset(photoInfoJsonDoubleQuoteUni); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'" , '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8"); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonAddQuoteAnsi); photoInfoDict = json.loads(photoInfoJsonDoubleQuoteUni); logging.info( "photoInfoDict=%s" , photoInfoDict); |
结果是问题依据。
7.再去试试,使用GB18030的看看是否能解码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace( "'", "\""); logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); photoInfoJsonDoubleQuoteUni = photoInfoJsonDoubleQuote.decode("UTF-8"); photoInfoJsonAddQuoteAnsi = photoInfoJsonDoubleQuoteUni.encode("GB18030"); #print "type(photoInfoJson)=",type(photoInfoJsonDoubleQuoteUni); #print crifanLib.getStrPossibleCharset(photoInfoJsonDoubleQuoteUni); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'" , '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8"); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); photoInfoDict = json.loads(photoInfoJsonAddQuoteAnsi, "GB18030" ); #photoInfoDict = json.loads(photoInfoJsonDoubleQuoteUni); logging.info( "photoInfoDict=%s" , photoInfoDict); |
结果是问题依旧。
8.单独,写上原始字符串,看看能否正确解码:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | #debug here write fix string json to test photoInfoJson = """{id:'379879-87329678',owner:'379879',ownername:'shanshu',title:'IMG_3464',description:'',bucket:'shanshu',key:'CsFzMuHz',license:0,stats_notes: 0,albums: ['379879-18 1880',],tags:[{name:'20121202', author: '379879'},{name:'天平山赏红枫', author: '379879'}],owner:{id: 379879,username: 'shanshu',nickname: 'shanshu'}}""" ; photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace( "'", "\""); logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoJsonDoubleQuoteUni = photoInfoJsonDoubleQuote.decode("UTF-8"); #photoInfoJsonAddQuoteAnsi = photoInfoJsonDoubleQuoteUni.encode("GB18030"); #print "type(photoInfoJson)=",type(photoInfoJsonDoubleQuoteUni); #print crifanLib.getStrPossibleCharset(photoInfoJsonDoubleQuoteUni); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'" , '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8"); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonAddQuoteAnsi, "GB18030"); photoInfoDict = json.loads(photoInfoJsonDoubleQuote); logging.info( "photoInfoDict=%s" , photoInfoDict); |
结果却又是其他错误:
photoInfoDict = json.loads(photoInfoJsonDoubleQuote); return _default_decoder.decode(s) File "D:\tmp\dev_install_root\Python27_x64\lib\json\decoder.py", line 366, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "D:\tmp\dev_install_root\Python27_x64\lib\json\decoder.py", line 382, in raw_decode obj, end = self.scan_once(s, idx) ValueError: Invalid control character at: line 1 column 195 (char 195) |
很诡异。
其看看195是哪个字符。
结果看到,其中有CR LF:
9.然后去把CR LF去掉:
然后再去运行试试,结果问题依旧。
10.把测试代码变为:
1 | photoInfoJson = """{id:'379879-87329678',owner:'379879',ownername:'shanshu',title:'IMG_3464',description:'xxx',bucket:'shanshu',key:'CsFzMuHz',license:0,stats_notes: 0,albums: ['379879-181880',],tags:[{name:'20121202', author: '379879'},{name:'天平山赏红枫', author: '379879'}],owner:{id: 379879,username: 'shanshu',nickname: 'shanshu'}}""" ; |
结果问题依旧。
11.结果把代码改为:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | #debug here write fix string json to test photoInfoJson = """{id:'379879-87329678',owner:'379879',ownername:'shanshu',title:'IMG_3464',description:'xxx',bucket:'shanshu',key:'CsFzMuHz',license:0,stats_notes: 0,albums: ['379879-181880'],tags:[{name:'20121202', author: '379879'},{name:'天平山赏红枫', author: '379879'}],owner:{id: 379879,username: 'shanshu',nickname: 'shanshu'}}""" ; photoInfoJsonAddQuote = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , photoInfoJson); logging.info( "photoInfoJsonAddQuote=%s" , photoInfoJsonAddQuote); photoInfoJsonDoubleQuote = photoInfoJsonAddQuote.replace( "'", "\""); logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoJsonDoubleQuoteUni = photoInfoJsonDoubleQuote.decode("UTF-8"); #photoInfoJsonAddQuoteAnsi = photoInfoJsonDoubleQuoteUni.encode("GB18030"); #print "type(photoInfoJson)=",type(photoInfoJsonDoubleQuoteUni); #print crifanLib.getStrPossibleCharset(photoInfoJsonDoubleQuoteUni); #photoInfoJsonDoubleQuote = photoInfoJson.replace("'" , '"'); #logging.info("photoInfoJsonDoubleQuote=%s", photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote, "UTF-8"); #photoInfoDict = json.loads(photoInfoJsonDoubleQuote); #photoInfoDict = json.loads(photoInfoJsonAddQuoteAnsi, "GB18030"); photoInfoDict = json.loads(photoInfoJsonDoubleQuote); logging.info( "photoInfoDict=%s" , photoInfoDict); |
竟然,终于,可以解码了。。。
其中,是把:
albums: [‘379879-181880’,] |
改为了:
albums: [‘379879-181880’] |
即,把列表类型的值的内部的最后一个“多余”的逗号去掉,就可以了。
【总结】
对于列表最后,添加上一个逗号,对于本身Python中的语法,是允许的;
对于其他语言,比如C语言,记得也是允许的。
但是很变态的是,在Python 2.7.3中的json库中,是不支持的。。。
导致,很多人,如果遇到类似问题,根本无从下手。。。
即:
- 原因:
在Python 2.7.3中的json库中,是不支持的这种的:
albums: [‘379879-181880’,] |
必须写成:
albums: [‘379879-181880’] |
- 解决办法:
对于上述这种非法的字符串:
{id:’379879-87329678′,owner:’379879′,ownername:’shanshu’,title:’IMG_3464′,description:’xxx’,bucket:’shanshu’,key:’CsFzMuHz’,license:0,stats_notes: 0,albums: [‘379879-181880’,],tags:[{name:’20121202′, author: ‘379879’},{name:’天平山赏红枫’, author: ‘379879’}],owner:{id: 379879,username: ‘shanshu’,nickname: ‘shanshu’}} |
可以用:
1 2 3 4 | addedSingleQuoteJsonStr = re.sub(r "(,?)(\w+?)\s*?:" , r "\1'\2':" , orginalJsonStr); doubleQuotedJsonStr = addedSingleQuoteJsonStr.replace( "'" , "\"" ); #remove comma before end of list removedLastCommaInList = re.sub(r ",\s*?]" , "]" , addedSingleQuoteJsonStr); |
处理成,合法的:
{id:’379879-87329678′,owner:’379879′,ownername:’shanshu’,title:’IMG_3464′,description:’xxx’,bucket:’shanshu’,key:’CsFzMuHz’,license:0,stats_notes: 0,albums: [‘379879-181880′],tags:[{name:’20121202’, author: ‘379879’},{name:’天平山赏红枫’, author: ‘379879’}],owner:{id: 379879,username: ‘shanshu’,nickname: ‘shanshu’}} |
转载请注明:在路上 » 【已解决】Python中用json.loads解码字符串出错:ValueError: No JSON object could be decoded