Python 批量转目录下的c语言的注释为英文

程序菜鸟一只

已于 2024-07-25 17:19:46 修改

阅读量302

点赞数 4

文章标签： python c语言

于 2024-07-25 16:52:14 首次发布

本文链接：https://blog.csdn.net/sbsbsb666666/article/details/140693161

版权

因为目前很多厂家提供的程序都是中文注释, 在一些场合下会导致乱码问题，就花了半天时间写了一个脚本用于批量将 c 语言文件中的注释批量翻译为英文，一方面可以便于统一注释，同时也避免了乱码问题。

此代码支持python3.x运行, 只需要在python中加入如下包:

pip install  translators

支持转换的注释类型

提醒: 翻译前记得先备份! 如果翻译中网络等出了问题, 则文件可能丢失, 所以一定先做好备份!

首先支持如下几种格式的注释转换(一般正确格式示例)

/*********** 第一个 ****************
* 注释1 
* 注释2
**/
int a = 1;   // 可转换为英文
int b = 2;   /* 正确转换 */ 
int c = 3;   /* 注释1
			* 这样也支持
			*/

// 转换部分
/* 均可以转换 */
int func(){
}

错误使用和不能转换的情况

首先需要说明的是, 对于转换的注释, 有一定的格式要求, 具体如下 :

对于单行注释, 如果单行直接以 // 或者 \* 开头, 则会将整行均视为注释, 例如: 如下格式是不允许出现的，即不建议将注释放在语句后面:

/* 中文注释 */  int a = 1 ;

对于多行注释, 每一行均必须以 * 进行开头, 否则将不会进行转换:

/*
 *   正确的语句: 此句进行翻译
     错误: 此句话不进行翻译
 ******************/

所有注释均尽可能不出现 “;”, 否则将会视为程序语句, 不执行翻译

/*
*         这是多行注释;     // 注意: 此处由于加上分号, 则为了避免有
*/

不要把分号挪在下一行, 例如

int a = 1   // comment : 正确写法是 int a = 1; + 注释
;

这样可能会导致将前面一并翻译;

代码部分

这个转换注释的代码也是非常简单的, 此处我使用了 bing 作为 translator 的 API, 但有的句子 bing 翻译出会保留部分中文, 则改用 google 引擎翻译。

import os
import re
import time
import warnings
import concurrent.futures
import translators as ts

open_encoding = 'gbk'
dst_encoding = 'utf-8'
max_retry_time = 3
translator = 'bing'
use_google_for_better_trans = True

def translate_comments(file_path):
    cnt = 0
    with open(file_path, 'r', encoding=open_encoding) as f:
        lines = f.readlines()
    with open(file_path, 'w', encoding=dst_encoding) as f:
        for line in lines:
            # Find comments in the line
            comments = re.findall(r'//.*|/\* .*|\* .*', line)   # also  '*' will be  captured  here
            # remove the comments without Chinese Character
            comments = ['' if not re.search(r'[\u4e00-\u9fff]+', comment) else comment for comment in comments]
            comments = list(filter(None, comments))

            l = len(comments)
            for i in range(l):
                comment = comments[i]
                if (re.match(r"^\s*//", comment)) or (re.match(r"^\s*/\*", comment)):   # starts with // or /*
                    # the whole line is considered as comment,
                    if (re.match(r"^\s*//", comment)):
                        comment = re.sub(r"^\s*//",'',comment)
                    elif re.match(r"^\s*/\*", comment):
                        comment = re.sub(r"^/\*", '', comment)
                    if re.search(r"\*/\s*$", comment):
                        comment = re.sub(r"\*/\s*$",'',comment)
                elif (re.search(r"\*.*;", comment)):    # * + program sentence + comment (optional)
                    sub_comments = re.findall(r"//.*|/\* .*", comment)
                    sub_comments = list(filter(None, sub_comments))
                    comment = ''   # delete this comment and add sub comment
                    for c in sub_comments:
                        if re.match(r"//", c) or re.match(r"/\*", c):
                            c = c[2:]
                        if re.search(r"\*/\s*$", c):
                            c = re.sub(r"\*/\s*$",'',c);
                        comments.append(c)
                elif re.match(r"^\s*\*",comment):      #  * + Chinese Sentences  + (*/) optional
                    if not re.search(r"\*.*;", comment):
                        comment = re.sub(r"^s?\*",'', comment) # remove * before comment
                        if re.search(r"\*/\s*$", comment):
                            comment = re.sub(r"\*/\s*$", '',comment)
                    # else retain the string and not process it to prevent destory code
                else:
                    warnings.warn("can't process comment: %s"%(comment))
                comments[i] = comment

            comments = list(filter(None, comments))
            for comment in comments:
                cnt = cnt + 1
                translated_comment = translate_sentence(comment)
                line = line.replace(comment, translated_comment);
            f.write(line)
    return cnt

def translate_sentence(comment:str):
    engine = translator
    # Translate the comment to English
    retry = 0
    while retry < max_retry_time:
        try:
            translated_comment = ts.translate_text(comment, translator=translator, from_language='zh',
                                                   to_language='en')
            break;
        except Exception as e:
            warnings.warn(f"Error: {e}. Retrying after 15 second.")
            time.sleep(15)
            retry = retry + 1
    if (retry == max_retry_time):
        raise Exception("Exceed retry times %d" % (max_retry_time))
    # if there is still Chinese in translated sentences, translate it with google
    if (re.search(r'[\u4e00-\u9fff]+', translated_comment) and use_google_for_better_trans):
        while retry < max_retry_time:
            try:
                translated_comment = ts.translate_text(comment, translator='google', from_language='auto',
                                                       to_language='en')
                engine = 'google'
                break;
            except Exception as e:
                warnings.warn(f"Error: {e}. Retrying after 25 second.")
                time.sleep(25)
                retry = retry + 1
    # Replace the original comment with the translated comment
    print("engine ", engine, ":", comment, " ----> ", translated_comment)  # comment this if output not used
    return translated_comment

def main():
    time1 = time.time()
    cnt_tot = 0
    # Walk through the working directory
    with concurrent.futures.ThreadPoolExecutor() as executor:
        futures = []
        for root, dirs, files in os.walk("."):
            for file in files:
                # Check if the file is a .c or .h file
                if file.endswith('.c') or file.endswith('.h') or file.endswith('.cpp'):
                    print("tranlating:", file)
                    file_path = os.path.join(root, file)
                    futures.append(executor.submit(translate_comments, file_path))
        for future in concurrent.futures.as_completed(futures):
            cnt_tot += future.result()
    time2 = time.time()
    print("translate %d sentences, time used is %s s"%(cnt_tot,time2-time1))

if __name__ == "__main__":
    # ts.preaccelerate_and_speedtest()
    main()

测试随便找一批 .c 或者 .h 进行编译测试就可以了, 最后附带上测试效果图:

# 首先把要翻译的 .c, .h .cpp 文件备份好, 然后放在和该脚本同一目录下:
python trans.py   # 运行脚本

在这里插入图片描述

附上一张转换前后的对比图像:(左边是翻译之前的)
在这里插入图片描述

程序菜鸟一只

关注

4
点赞
踩
4

收藏

觉得还不错? 一键收藏
打赏
0
评论
Python 批量转目录下的c语言的注释为英文

因为目前很多厂家提供的程序都是中文注释, 在一些场合下会导致乱码问题，就花了半天时间写了一个脚本用于批量将 c 语言文件中的注释批量翻译为英文，一方面可以便于统一注释，同时也避免了乱码问题。
复制链接

扫一扫