记录一些关于Python中使用json.loads时候的注意事项。
在贴出注意事项之前,先贴上,python文档中,关于json.loads的说明:
json.loads(s[, encoding[, cls[, object_hook[, parse_float[, parse_int[, parse_constant[, object_pairs_hook[, **kw]]]]]]]]) Deserialize s (a str or unicode instance containing a JSON document) to a Python object.
If s is a str instance and is encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified. Encodings that are not ASCII based (such as UCS-2) are not allowed and should be decoded to unicode first.
The other arguments have the same meaning as in load().
1.如果传入的字符串的编码不是UTF-8的话,需要用encoding指定字符编码
对于:
dataDict = json.loads(dataJsonStr); |
其中dataJsonStr是json字符串,如果其编码本身是非UTF-8的话,比如是GB2312的,那么上述代码,就会导致出错。改为对应的:
dataDict = json.loads(dataJsonStr, encoding="GB2312"); |
就可以了。
此处,即对应着上面函数解释中的:
If s is a str instance and is encoded with an ASCII based encoding other than UTF-8 (e.g. latin-1), then an appropriate encoding name must be specified
2.如果要解析的字符串,本身的编码类型,不是基于ASCII的,那么,调用json.loads之前,需要先将对应字符串,转换为Unicode类型的
还是以上述的:
dataDict = json.loads(dataJsonStr, encoding="GB2312");
为例,即使你此处的字符串dataJsonStr,已经通过encoding指定了合适的编码,但是由于其中,包含了其他的编码的字符,比如我本身dataJsonStr是GB2312的字符,但是其中又包含了的一些日文字符,此时,json.loads还是会出错,因为此处的dataJsonStr不是以ASCII为基础的字符编码,所以,需要先去将dataJsonStr转换为Unicode,然后再调用json.loads,就可以了。
代码如下:
dataJsonStrUni = dataJsonStr.decode("GB2312"); |
此处对应着上面解释中的:
Encodings that are not ASCII based (such as UCS-2) are not allowed and should be decoded to unicode first.