【背景】
之前就遇到很多次,对于将,不仅仅是普通的ASCII的字符串,读取或写入文件
之前也就知道用codecs。
后来见到不止一人:
python怎么读取文件名中包含特殊字符的文件 比如xiân.txt
遇到类似问题,但是不会处理,所以,此处,专门去写个教程,简要解释一下codecs如何使用。
【Python中用codecs处理各种字符编码的文件】
完整示例代码如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | #!/usr/bin/python # -*- coding: utf-8 -*- """ Function: 【教程】用Python的codecs处理各种字符编码的字符串和文件 Author: Crifan Li Version: 2013-10-20 Contact: https://www.crifan.com/about/me """ import codecs; def python_codecs_demo(): """demo how to use codecs to handle file with specific encoding""" testStrUnicode = u "中文测试Unicode字符串" ; print "testStrUnicode=" ,testStrUnicode testStrUtf8 = testStrUnicode.encode( "UTF-8" ); testStrGbk = testStrUnicode.encode( "GBK" ); outputFilename = "outputFile.txt" print "------------ 1.UTF-8 write and read ------------" print "--- (1) write UTF-8 string into file ---" # 'a+': read,write,append # 'w' : clear before, then write outputFp = codecs. open (outputFilename, 'w' ); outputFp.write(testStrUtf8); outputFp.flush(); outputFp.close(); print "--- (2) read out previously written UTF-8 content ---" readoutFp = codecs. open (outputFilename, 'r' , 'UTF-8' ); #here already is unicode, for we have pass "UTF-8" to codecs.open readOutStrUnicodeFromUtf8 = readoutFp.read() readoutFp.close(); print "readOutStrUnicodeFromUtf8=" ,readOutStrUnicodeFromUtf8 print "------------ 2.GBK write and read ------------" print "--- (1) write GBK string into file ---" # 'a+': read,write,append # 'w' : clear before, then write outputFp = codecs. open (outputFilename, 'w' ); outputFp.write(testStrGbk); outputFp.flush(); outputFp.close(); print "--- (2) read out previously written GBK content ---" readoutFp = codecs. open (outputFilename, 'r' , 'GBK' ); #here already is unicode, for we have pass "GBK" to codecs.open readOutStrUnicodeFromGbk = readoutFp.read() readoutFp.close(); print "readOutStrUnicodeFromGbk=" ,readOutStrUnicodeFromGbk print "Note: " print "1. more about encoding, please refer:" print u "【详解】python中的文件操作模式" if __name__ = = "__main__" : python_codecs_demo() |
输出为:
E:\dev_root\python\tutorial_summary\python_codecs_demo>python_codecs_demo.py ———— 1.UTF-8 write and read ———— — (1) write UTF-8 string into file — — ()2) read out previously written UTF-8 content — readOutStrUnicodeFromUtf8= 中文测试Unicode字符串 ———— 2.GBK write and read ———— — (1) write GBK string into file — — (2) read out previously written GBK content — readOutStrUnicodeFromGbk= 中文测试Unicode字符串 Note: 1. more about encoding, please refer: 【详解】python中的文件操作模式 https://www.crifan.com/summary_python_file_operation_mode/ |
如图:
注:
1.关于字符编码,不熟悉的可参考:
2.关于文件操作模式,不熟悉的可参考:
3.关于Python中的字符串编码,不熟悉的可参考:
【总结】
还是要多参考官网的api的解释,多练习,才会慢慢的真正理解编码的事情。
转载请注明:在路上 » 【教程】用Python的codecs处理各种字符编码的字符串和文件