将整个文本处理成一行一个单词的形式

path = r'h:\kill a bird.txt'
path_over = r'h:\kill a bird new.txt'

# read all article and every_line put into list
max_list_1 = []
with open(path, 'r') as f:
    lines = f.readlines()

# process every_line : split ' '  convert signal word
for line in lines:
    line_1 = line.split(' ')
    for lin_word_1 in line_1:
        max_list_1.append(lin_word_1 + '\n')

temp_list = []
for lin_word_2 in max_list_1:
    if lin_word_2 != '\n':
        temp_list.append(lin_word_2)

max_list_2 = []
for lin_word_3 in temp_list:
    lin = lin_word_3.replace(',', '').replace('.', '').replace('\n{2,}', '\n').replace('(', '').replace(')', '').replace('!', ''). \
                    replace('\s+', '').replace('`', '').replace("'", '').replace(';', '').replace('-', '').replace('\xa1+', '')
    max_list_2.append(lin)

max_list_3 = []
for lin_word_3 in max_list_2:
    if lin_word_3 == '\n\n':
        continue
    max_list_3.append(lin_word_3)

with open(path_over, 'w') as f_over:
    f_over.writelines(max_list_3)

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值