此代码中是判断是否大于0.5来决定是否是可能的字符串类型。使用者可根据自己需要,改为自己想要的概率,比如0.8等。
#------------------------------------------------------------------------------ # get the possible(possiblility > 0.5) charset of input string def getStrPossibleCharset(inputStr) : possibleCharset = "ascii"; #possibleCharset = "UTF-8"; encInfo = chardet.detect(inputStr); #print "encInfo=",encInfo; if (encInfo['confidence'] > 0.5): possibleCharset = encInfo['encoding']; return possibleCharset; #return encInfo['encoding'];
例 2.28. getStrPossibleCharset的使用范例
validCharset = getStrPossibleCharset(dataJsonStr); logging.debug("Now try use the detected charset %s to decode it again", validCharset);