python设置编码格式_python 修改文件编码方式

最新推荐文章于 2024-05-02 10:21:54 发布

weixin_39713538

最新推荐文章于 2024-05-02 10:21:54 发布

阅读量746

点赞数 1

文章标签： python设置编码格式

1 import chardet2 import os3

4 def strJudgeCode(str):5 returnchardet.detect(str)6 '''7 chardet.detect()返回字典，其中confidence是检测精确度，encoding是编码形式8 {'confidence': 0.98999999999999999, 'encoding': 'GB2312'}9 （1）网页编码判断：10

11 >>>import urllib12 >>> rawdata = urllib.urlopen('http://www.google.cn/').read()13 >>>import chardet14 >>>chardet.detect(rawdata)15 {'confidence': 0.98999999999999999, 'encoding': 'GB2312'}16 （2）文件编码判断17

18 复制代码19 import chardet20 tt=open('c:\\111.txt','rb')21 ff=tt.readline()22 #这里试着换成read(5)也可以，但是换成readlines()后报错23 enc=chardet.detect(ff)24 print enc['encoding']25 tt.close()26 '''27

28 def readFile(path):29 try:30 f = open(path, 'r')31 filecontent =f.read()32 finally:33 iff:34 f.close()35

36 returnfilecontent37

38 def WriteFile(str, path):39 try:40 f = open(path, 'w')41 f.write(str)42 finally:43 iff:44 f.close()45

46 def converCode(path):47 file_con =readFile(path)48 result =strJudgeCode(file_con)49 #print(file_con)50 if result['encoding'] == 'utf-8':51 #os.remove(path)52 a_unicode = file_con.decode('utf-8')53 '''54 使用decode()和encode()来进行解码和编码55 u = '中文'#指定字符串类型对象u56 str = u.encode('gb2312') #以gb2312编码对u进行编码，获得bytes类型对象str57 u1 = str.decode('gb2312')#以gb2312编码对字符串str进行解码，获得字符串类型对象u158 u2 = str.decode('utf-8')#如果以utf-8的编码对str进行解码得到的结果，将无法还原原来的字符串内容59 '''60 gb2312 = a_unicode.encode('gbk')61 WriteFile(gb2312, path)62

63 def listDirFile(dir):64 list =os.listdir(dir)#返回指定路径下的文件和文件夹列表。65 for line inlist:66 filepath =os.path.join(dir, line)67 '''68 是在拼接路径的时候用的。举个例子，69 os.path.join(“home”, "me", "mywork")70 在Linux系统上会返回71 “home/me/mywork"72 在Windows系统上会返回73 "home\me\mywork"

74 好处是可以根据系统自动选择正确的路径分隔符"/"或"\"

75 '''76 ifos.path.isdir(filepath):#os.path.isdir()函数判断某一路径是否为目录77 listDirFile(filepath)78 else:79 print(line)80 converCode(filepath)81

82 if __name__ == '__main__':83 listDirFile(u'.\TRMD')84 '''85 u'string' 表示已经是 unicode 编码的 'string'字符串86 # -*- coding: UTF-8 -*- 这句是告诉python程序中的文本是utf-8编码，让python可以按照utf-8读取程87 中文前加u就是告诉python后面的是个unicode编码，存储时按unicode格式存储。88 '''

weixin_39713538

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python设置编码格式_python 修改文件编码方式

1 import chardet2 import os34 def strJudgeCode(str):5 returnchardet.detect(str)6 '''7 chardet.detect()返回字典，其中confidence是检测精确度，encoding是编码形式8 {'confidence': 0.98999999999999999, 'encoding': 'GB...
复制链接

扫一扫