Python中文文件读写&参数传递

最新推荐文章于 2022-08-09 14:18:01 发布

hozhangel

最新推荐文章于 2022-08-09 14:18:01 发布

阅读量467

点赞数

分类专栏： python常用使用 python自然语言处理

本文链接：https://blog.csdn.net/ZHO9504/article/details/81714265

版权

python常用使用同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

python自然语言处理

8 篇文章 0 订阅

订阅专栏

文本一些冗余标点符号清洗

#encoding=utf-8
import sys  
import re
outfile = 'result.txt'
file = sys.argv[1]     
if len(sys.argv) > 2:
    outfile = sys.argv[2]
print("Deading" + file + " now...\n")

lines = []
n = 0
with open(file, 'r', encoding='UTF-8') as f:    #打开文件
    for line in f:             
        line.strip() #去掉换行符
        line,nu = re.subn(r'`','\'',line)
        if nu > 0 :
            print("eedddddd"+str(nu))
        line,nu = re.subn(r'"\s{0,}"|\'\s{0,}"|\'\s{0,}\'|\'\s{0,}"','"',line)
        ch_en = re.split(r"\|\|\|", line)
        ch = ch_en[0]
        en = ch_en[1]
        
        #if():
            
        lines.append(ch + '|||' + en +'\n')
        

with open(outfile, 'w', encoding='utf-8') as g:       #写文件
    for line in lines:
        g.write(line)
# for line in lines:
    # try:
        # print(line)
    # except UnicodeEncodeError as e:
        # print('UnicodeEncodeError')
        # print("\n      Please open the " + outfile + "(current path)!!")

hozhangel

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python中文文件读写&参数传递

文本一些冗余标点符号清洗#encoding=utf-8import sys import reoutfile = 'result.txt'file = sys.argv[1] if len(sys.argv) &gt; 2: outfile = sys.argv[2]print("Deading" + file + " now...\n")lines ...
复制链接

扫一扫