最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【已解决】Python中,使用strptime出错:UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 202: ordinal not in range(128)

Python crifan 3701浏览 0评论

【问题】

Python中,想要解析unicode类型的字符串"Sep 5, 2010"为对应的time或datetime变量:

parsedTime = time.strptime(dateStr, "%B %d, %Y");

结果出错:

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 202: ordinal not in range(128)

【解决过程】

1.此处,dateStr本身是unicode的。

以为是编码方面的问题,所以去试了半天编码转换,包括:

dateStr = dateStr.encode("utf-8");

dateStr=str(dateStr);

都还是没用。

2.看起来,以为是原先的字符dateStr中也许无意间包含了0xe5这个(超过了128)的特殊字符呢。

然后用

for c in dateStr :
    asciiVal = ord(c);
    print "c=",c," asciiVal=",asciiVal;

打印出的结果是:

c= S  asciiVal= 83
c= e  asciiVal= 101
c= p  asciiVal= 112
c=    asciiVal= 32
c= 5  asciiVal= 53
c= ,  asciiVal= 44
c=    asciiVal= 32
c= 2  asciiVal= 50
c= 0  asciiVal= 48
c= 1  asciiVal= 49
c= 0  asciiVal= 48

即,没有那个0xe5.

3.网上找了半天,都只是说关于locale的,让后也试了试:

import sys 
reload(sys) 
sys.setdefaultencoding('utf8') 

还是没有用,错误依旧。

4.也参考了:strftime("%B") returns a String unusable with unicode

去试了:

import locale;
locale.setlocale(locale.LC_ALL, ('zh_CN', 'UTF8'));

但是也还是没解决问题。

5.后来还是自己通过尝试,用:

parsedTime = time.strptime(dateStr, "%b %d, %Y");

然后就解决问题了。

对应的strptime的参数含义是:

Directive

Meaning

Notes

%a

Locale’s abbreviated weekday name.

 

%A

Locale’s full weekday name.

 

%b

Locale’s abbreviated month name.

 

%B

Locale’s full month name.

 

%c

Locale’s appropriate date and time representation.

 

%d

Day of the month as a decimal number [01,31].

 

%f

Microsecond as a decimal number [0,999999], zero-padded on the left

(1)

%H

Hour (24-hour clock) as a decimal number [00,23].

 

%I

Hour (12-hour clock) as a decimal number [01,12].

 

%j

Day of the year as a decimal number [001,366].

 

%m

Month as a decimal number [01,12].

 

%M

Minute as a decimal number [00,59].

 

%p

Locale’s equivalent of either AM or PM.

(2)

%S

Second as a decimal number [00,61].

(3)

%U

Week number of the year (Sunday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Sunday are considered to be in week 0.

(4)

%w

Weekday as a decimal number [0(Sunday),6].

 

%W

Week number of the year (Monday as the first day of the week) as a decimal number [00,53]. All days in a new year preceding the first Monday are considered to be in week 0.

(4)

%x

Locale’s appropriate date representation.

 

%X

Locale’s appropriate time representation.

 

%y

Year without century as a decimal number [00,99].

 

%Y

Year with century as a decimal number.

 

%z

UTC offset in the form +HHMM or -HHMM (empty string if the the object is naive).

(5)

%Z

Time zone name (empty string if the object is naive).

 

%%

A literal ‘%’ character.

 

 

然后才注意到,此处%b是月份名英语单词的缩写,而%B才是全称。

即,%b用于解析Sep为九月份,%B解析September为九月份。

用%B去解析Sep,肯定是会出错的。

而之前自己的代码可以正常解析"May 6, 2012"是因为,对于五月份的May来说,其缩写和全称都是May,所以当时用%B去解析May,是可以正确解析为五月份的。

所以,此处的错误,算是自己的不小心。

但是觉得很诡异的是,为何strptime的报错,会报成:

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 202: ordinal not in range(128)

而不是说,月份对应的字符串,无法解析之类的错误。

【总结】

此处是由于在使用strptime,错误的传递了%B参数,结果去解析缩写了的九月份的英文字符Sep,所以无法解析,

换为%b去解析月份单词的缩写Sep,就可以正确解析了。

转载请注明:在路上 » 【已解决】Python中,使用strptime出错:UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xe5 in position 202: ordinal not in range(128)

发表我的评论
取消评论

表情

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
89 queries in 0.231 seconds, using 22.16MB memory