最新消息:20210816 当前crifan.com域名已被污染,为防止失联,请关注(页面右下角的)公众号

【教程】用Python的codecs处理各种字符编码的字符串和文件

Python crifan 10760浏览

【背景】

之前就遇到很多次,对于将,不仅仅是普通的ASCII的字符串,读取或写入文件

之前也就知道用codecs。

后来见到不止一人:

【问题解答】python爬虫保存为txt的编码问题求解决?

python怎么读取文件名中包含特殊字符的文件 比如xiân.txt

遇到类似问题,但是不会处理,所以,此处,专门去写个教程,简要解释一下codecs如何使用。

【Python中用codecs处理各种字符编码的文件】

完整示例代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
#!/usr/bin/python
# -*- coding: utf-8 -*-
"""
Function:
【教程】用Python的codecs处理各种字符编码的字符串和文件
<blockquote class="wp-embedded-content" data-secret="nPCVLNHYAu"><a href="https://www.crifan.com/tutorial_python_codecs_process_file_char_encoding/">【教程】用Python的codecs处理各种字符编码的字符串和文件</a></blockquote><iframe class="wp-embedded-content" sandbox="allow-scripts" security="restricted" style="position: absolute; visibility: hidden;" title="《 【教程】用Python的codecs处理各种字符编码的字符串和文件 》—在路上" src="https://www.crifan.com/tutorial_python_codecs_process_file_char_encoding/embed/#?secret=cDIlaGjC6I#?secret=nPCVLNHYAu" data-secret="nPCVLNHYAu" width="500" height="282" frameborder="0" marginwidth="0" marginheight="0" scrolling="no"></iframe>
  
Author:     Crifan Li
Version:    2013-10-20
"""
 
import codecs;
 
def python_codecs_demo():
    """demo how to use codecs to handle file with specific encoding"""
    testStrUnicode = u"中文测试Unicode字符串";
    print "testStrUnicode=",testStrUnicode
    testStrUtf8 = testStrUnicode.encode("UTF-8");
    testStrGbk = testStrUnicode.encode("GBK");
 
    outputFilename = "outputFile.txt"
 
    print "------------ 1.UTF-8 write and read ------------"
    print "--- (1) write UTF-8 string into file ---"
    # 'a+': read,write,append
    # 'w' : clear before, then write
    outputFp = codecs.open(outputFilename, 'w');
    outputFp.write(testStrUtf8);
    outputFp.flush();
    outputFp.close();
    print "--- (2) read out previously written UTF-8 content ---"
    readoutFp = codecs.open(outputFilename, 'r', 'UTF-8');
    #here already is unicode, for we have pass "UTF-8" to codecs.open
    readOutStrUnicodeFromUtf8 = readoutFp.read()
    readoutFp.close();
    print "readOutStrUnicodeFromUtf8=",readOutStrUnicodeFromUtf8
 
    print "------------ 2.GBK write and read ------------"
    print "--- (1) write GBK string into file ---"
    # 'a+': read,write,append
    # 'w' : clear before, then write
    outputFp = codecs.open(outputFilename, 'w');
    outputFp.write(testStrGbk);
    outputFp.flush();
    outputFp.close();
    print "--- (2) read out previously written GBK content ---"
    readoutFp = codecs.open(outputFilename, 'r', 'GBK');
    #here already is unicode, for we have pass "GBK" to codecs.open
    readOutStrUnicodeFromGbk = readoutFp.read()
    readoutFp.close();
    print "readOutStrUnicodeFromGbk=",readOutStrUnicodeFromGbk
 
    print "Note: "
    print "1. more about encoding, please refer:"
    print u"【详解】python中的文件操作模式"
 
if __name__ == "__main__":
    python_codecs_demo()

输出为:

E:\dev_root\python\tutorial_summary\python_codecs_demo>python_codecs_demo.py

testStrUnicode= 中文测试Unicode字符串

———— 1.UTF-8 write and read ————

— (1) write UTF-8 string into file —

— ()2) read out previously written UTF-8 content —

readOutStrUnicodeFromUtf8= 中文测试Unicode字符串

———— 2.GBK write and read ————

— (1) write GBK string into file —

— (2) read out previously written GBK content —

readOutStrUnicodeFromGbk= 中文测试Unicode字符串

Note:

1. more about encoding, please refer:

【详解】python中的文件操作模式

https://www.crifan.com/summary_python_file_operation_mode/

如图:

summary_python_file_operation_mode output

 

注:

1.关于字符编码,不熟悉的可参考:

字符编码详解

2.关于文件操作模式,不熟悉的可参考:

【详解】python中的文件操作模式

3.关于Python中的字符串编码,不熟悉的可参考:

Python专题教程:字符串和字符编码

 

【总结】

还是要多参考官网的api的解释,多练习,才会慢慢的真正理解编码的事情。

转载请注明:在路上 » 【教程】用Python的codecs处理各种字符编码的字符串和文件

92 queries in 0.393 seconds, using 19.17MB memory