因为目前很多厂家提供的程序都是中文注释, 在一些场合下会导致乱码问题, 就花了半天时间写了一个脚本用于批量将 c 语言文件中的注释批量翻译为英文, 一方面可以便于统一注释, 同时也避免了乱码问题。
此代码支持python3.x运行, 只需要在python中加入如下包:
pip install translators
支持转换的注释类型
提醒: 翻译前记得先备份! 如果翻译中网络等出了问题, 则文件可能丢失, 所以一定先做好备份!
首先支持如下几种格式的注释转换(一般正确格式示例)
/*********** 第一个 ****************
* 注释1
* 注释2
**/
int a = 1; // 可转换为英文
int b = 2; /* 正确转换 */
int c = 3; /* 注释1
* 这样也支持
*/
// 转换部分
/* 均可以转换 */
int func(){
}
错误使用和不能转换的情况
首先需要说明的是, 对于转换的注释, 有一定的格式要求, 具体如下 :
- 对于单行注释, 如果单行直接以
//
或者\*
开头, 则会将整行均视为注释, 例如: 如下格式是不允许出现的, 即不建议将注释放在语句后面:
/* 中文注释 */ int a = 1 ;
- 对于多行注释, 每一行均必须以 * 进行开头, 否则将不会进行转换:
/*
* 正确的语句: 此句进行翻译
错误: 此句话不进行翻译
******************/
- 所有注释均尽可能不出现 “;”, 否则将会视为程序语句, 不执行翻译
/*
* 这是多行注释; // 注意: 此处由于加上分号, 则为了避免有
*/
- 不要把分号挪在下一行, 例如
int a = 1 // comment : 正确写法是 int a = 1; + 注释
;
这样可能会导致将前面一并翻译;
代码部分
这个转换注释的代码也是非常简单的, 此处我使用了 bing 作为 translator 的 API, 但有的句子 bing 翻译出会保留部分中文, 则改用 google 引擎翻译。
import os
import re
import time
import warnings
import concurrent.futures
import translators as ts
open_encoding = 'gbk'
dst_encoding = 'utf-8'
max_retry_time = 3
translator = 'bing'
use_google_for_better_trans = True
def translate_comments(file_path):
cnt = 0
with open(file_path, 'r', encoding=open_encoding) as f:
lines = f.readlines()
with open(file_path, 'w', encoding=dst_encoding) as f:
for line in lines:
# Find comments in the line
comments = re.findall(r'//.*|/\* .*|\* .*', line) # also '*' will be captured here
# remove the comments without Chinese Character
comments = ['' if not re.search(r'[\u4e00-\u9fff]+', comment) else comment for comment in comments]
comments = list(filter(None, comments))
l = len(comments)
for i in range(l):
comment = comments[i]
if (re.match(r"^\s*//", comment)) or (re.match(r"^\s*/\*", comment)): # starts with // or /*
# the whole line is considered as comment,
if (re.match(r"^\s*//", comment)):
comment = re.sub(r"^\s*//",'',comment)
elif re.match(r"^\s*/\*", comment):
comment = re.sub(r"^/\*", '', comment)
if re.search(r"\*/\s*$", comment):
comment = re.sub(r"\*/\s*$",'',comment)
elif (re.search(r"\*.*;", comment)): # * + program sentence + comment (optional)
sub_comments = re.findall(r"//.*|/\* .*", comment)
sub_comments = list(filter(None, sub_comments))
comment = '' # delete this comment and add sub comment
for c in sub_comments:
if re.match(r"//", c) or re.match(r"/\*", c):
c = c[2:]
if re.search(r"\*/\s*$", c):
c = re.sub(r"\*/\s*$",'',c);
comments.append(c)
elif re.match(r"^\s*\*",comment): # * + Chinese Sentences + (*/) optional
if not re.search(r"\*.*;", comment):
comment = re.sub(r"^s?\*",'', comment) # remove * before comment
if re.search(r"\*/\s*$", comment):
comment = re.sub(r"\*/\s*$", '',comment)
# else retain the string and not process it to prevent destory code
else:
warnings.warn("can't process comment: %s"%(comment))
comments[i] = comment
comments = list(filter(None, comments))
for comment in comments:
cnt = cnt + 1
translated_comment = translate_sentence(comment)
line = line.replace(comment, translated_comment);
f.write(line)
return cnt
def translate_sentence(comment:str):
engine = translator
# Translate the comment to English
retry = 0
while retry < max_retry_time:
try:
translated_comment = ts.translate_text(comment, translator=translator, from_language='zh',
to_language='en')
break;
except Exception as e:
warnings.warn(f"Error: {e}. Retrying after 15 second.")
time.sleep(15)
retry = retry + 1
if (retry == max_retry_time):
raise Exception("Exceed retry times %d" % (max_retry_time))
# if there is still Chinese in translated sentences, translate it with google
if (re.search(r'[\u4e00-\u9fff]+', translated_comment) and use_google_for_better_trans):
while retry < max_retry_time:
try:
translated_comment = ts.translate_text(comment, translator='google', from_language='auto',
to_language='en')
engine = 'google'
break;
except Exception as e:
warnings.warn(f"Error: {e}. Retrying after 25 second.")
time.sleep(25)
retry = retry + 1
# Replace the original comment with the translated comment
print("engine ", engine, ":", comment, " ----> ", translated_comment) # comment this if output not used
return translated_comment
def main():
time1 = time.time()
cnt_tot = 0
# Walk through the working directory
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = []
for root, dirs, files in os.walk("."):
for file in files:
# Check if the file is a .c or .h file
if file.endswith('.c') or file.endswith('.h') or file.endswith('.cpp'):
print("tranlating:", file)
file_path = os.path.join(root, file)
futures.append(executor.submit(translate_comments, file_path))
for future in concurrent.futures.as_completed(futures):
cnt_tot += future.result()
time2 = time.time()
print("translate %d sentences, time used is %s s"%(cnt_tot,time2-time1))
if __name__ == "__main__":
# ts.preaccelerate_and_speedtest()
main()
测试随便找一批 .c 或者 .h 进行编译测试就可以了, 最后附带上测试效果图:
# 首先把要翻译的 .c, .h .cpp 文件备份好, 然后放在和该脚本同一目录下:
python trans.py # 运行脚本
附上一张转换前后的对比图像:(左边是翻译之前的)