【问题】
Python中,想要解析unicode类型的字符串"Sep 5, 2010"为对应的time或datetime变量:
parsedTime = time.strptime(dateStr, "%B %d, %Y");
结果出错:
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 202: ordinal not in range(128)
【解决过程】
1.此处,dateStr本身是unicode的。
以为是编码方面的问题,所以去试了半天编码转换,包括:
dateStr = dateStr.encode("utf-8");
和
dateStr=str(dateStr);
都还是没用。
2.看起来,以为是原先的字符dateStr中也许无意间包含了0xe5这个(超过了128)的特殊字符呢。
然后用
for c in dateStr : asciiVal = ord(c); print "c=",c," asciiVal=",asciiVal;
打印出的结果是:
c= S asciiVal= 83 c= e asciiVal= 101 c= p asciiVal= 112 c= asciiVal= 32 c= 5 asciiVal= 53 c= , asciiVal= 44 c= asciiVal= 32 c= 2 asciiVal= 50 c= 0 asciiVal= 48 c= 1 asciiVal= 49 c= 0 asciiVal= 48
即,没有那个0xe5.
3.网上找了半天,都只是说关于locale的,让后也试了试:
import sys reload(sys) sys.setdefaultencoding('utf8')
还是没有用,错误依旧。
4.也参考了:strftime("%B") returns a String unusable with unicode
去试了:
import locale; locale.setlocale(locale.LC_ALL, ('zh_CN', 'UTF8'));
但是也还是没解决问题。
5.后来还是自己通过尝试,用:
parsedTime = time.strptime(dateStr, "%b %d, %Y");
然后就解决问题了。
对应的strptime的参数含义是:
Directive | Meaning | Notes |
%a | Locale’s abbreviated weekday name. | |
%A | Locale’s full weekday name. | |
%b | Locale’s abbreviated month name. | |
%B | Locale’s full month name. | |
%c | Locale’s appropriate date and time representation. | |
%d | Day of the month as a decimal number [01,31]. | |
%f | Microsecond as a decimal number [0,999999], zero-padded on the left | (1) |
%H | Hour (24-hour clock) as a decimal number [00,23]. | |
%I | Hour (12-hour clock) as a decimal number [01,12]. | |
%j | Day of the year as a decimal number [001,366]. | |
%m | Month as a decimal number [01,12]. | |
%M | Minute as a decimal number [00,59]. | |
%p | Locale’s equivalent of either AM or PM. | (2) |
%S | Second as a decimal number [00,61]. | (3) |
%U | Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0. | (4) |
%w | Weekday as a decimal number [0(Sunday),6]. | |
%W | Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Monday are considered to be in week 0. | (4) |
%x | Locale’s appropriate date representation. | |
%X | Locale’s appropriate time representation. | |
%y | Year without century as a decimal number [00,99]. | |
%Y | Year with century as a decimal number. | |
%z | UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive). | (5) |
%Z | Time zone name (empty string if the object is naive). | |
%% | A literal ‘%’ character. |
然后才注意到,此处%b是月份名英语单词的缩写,而%B才是全称。
即,%b用于解析Sep为九月份,%B解析September为九月份。
用%B去解析Sep,肯定是会出错的。
而之前自己的代码可以正常解析"May 6, 2012"是因为,对于五月份的May来说,其缩写和全称都是May,所以当时用%B去解析May,是可以正确解析为五月份的。
所以,此处的错误,算是自己的不小心。
但是觉得很诡异的是,为何strptime的报错,会报成:
UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 202: ordinal not in range(128)
而不是说,月份对应的字符串,无法解析之类的错误。
【总结】
此处是由于在使用strptime,错误的传递了%B参数,结果去解析缩写了的九月份的英文字符Sep,所以无法解析,
换为%b去解析月份单词的缩写Sep,就可以正确解析了。
转载请注明:在路上 » 【已解决】Python中,使用strptime出错:UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 202: ordinal not in range(128)