因多语种造成的编码问题
很多代码文件移到其他开发环境IDE时,会因编码问题编译出错。
典型的例如:ecllipse下的.java文件不是utf-8格式,在AS上中文是乱码。一些非中英文的其他语种文字存在于文件.c , .cpp, .h, .hpp时,如果文件不是utf-8则在visual studio上打开会出现乱码,直接使得代码排版出现问题而编译失败。
解决方法
使用python将所有文件编码转换为utf-8
运行环境: python3.7.4
这里废话不多,直接上全部代码。解释一下使用方法:
dump_file_encode(source_dir) 只作分析source_dir目录下所有文件的编码格式,有助于分析源文件是什么语言的编码
convert(path)
将path目录下所有.c, .cpp, .h, .hpp文件转换为utf8编码,详细看代码extension变量,有些文件识别不出是什么编码的情况
elif src_file_encode is None:
src_file_encode = 'windows-1251'
时,我这里强制指定为windows-1251编码(因为我编译用到的一些源文件有俄语),可按需修改。完整代码如下:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
# author:Staney.Chan [staney_chan@126.com]
# datetime:2021/10/22 10:59
# description:批量修改文件编码,例如从ansi转为utf-8
import os
import sys
import codecs
import chardet
def get_file_extension(file):
(filepath, filename) = os.path.split(file)
(shortname, extension) = os.path.splitext(filename)
return extension
def get_file_encode(filename):
with open(filename, 'rb') as f:
data = f.read()
encoding_type = chardet.detect(data)
# print(encoding_type)
return encoding_type
def process_dir(root_path):
for path, dirs, files in os.walk(root_path):
for file in files:
file_path = os.path.join(path, file)
process_file(file_path, file_path)
def process_file(filename_in, filename_out):
"""
filename_in :输入文件(全路径+文件名)
filename_out :保存文件(全路径+文件名)
文件编码类型: 'windows-1251','UTF-8-SIG'
"""
extension = get_file_extension(filename_in).lower()
if not (extension == '.c' or extension == '.h' or extension == '.cpp' or extension == '.hpp'):
return
# 输出文件的编码类型
dest_file_encode = 'utf-8'
encoding_type = get_file_encode(filename_in)
src_file_encode = encoding_type['encoding']
if src_file_encode == 'utf-8':
return
elif src_file_encode is None:
src_file_encode = 'windows-1251'
print("[Convert]File:" + filename_in + " from:" + encoding_type['encoding'] + " to:UTF-8")
try:
with codecs.open(filename=filename_in, mode='r', encoding=src_file_encode) as fi:
data = fi.read()
with open(filename_out, mode='w', encoding=dest_file_encode) as fo:
fo.write(data)
fo.close()
with open(filename_out, 'rb') as f:
data = f.read()
print(chardet.detect(data))
except Exception as e:
print(e)
def dump_file_encode(root_path):
for path, dirs, files in os.walk(root_path):
for file in files:
filename = os.path.join(path, file)
with open(filename, 'rb') as f:
data = f.read()
encoding_type = chardet.detect(data)
print("FILE:" + file + " ENCODE:" + str(encoding_type))
def convert(path):
"""
批量转换文件编码格式
path :输入文件或文件夹
"""
# sys.argv[1], sys.argv[2]
if os.path.isfile(path):
process_file(path, path)
elif os.path.isdir(path):
process_dir(path)
if __name__ == '__main__':
# convert(r'F:\OpenPapyrus-11.1.12\Src')
dump_file_encode(r'C:\Users\Administrator\Desktop\cc')