python文本去除标点符号_Python从文本文件中删除标点符号

I'm trying to remove a list of punctuation from my text file but I have only one problem with words separated from hyphen. For example, if I have the word "post-trauma" I get "posttrama" conversely I want to get "post" "trauma".

My code is:

punct=['!', '#', '"', '%', '$', '&', ')', '(', '+', '*', '-']

with open(myFile, "r") as f:

text= f.read()

remove = '|'.join(REMOVE_LIST) #list of word to remove

regex = re.compile(r'('+remove+r')', flags=re.IGNORECASE)

out = regex.sub("", text)

delta= " ".join(out.split())

txt = "".join(c for c in delta if c not in punct )

Is there a way to solve it?

解决方案

I believe you can just call the built-in replace function on delta, so your last line would become the following:

txt = "".join(c for c in delta.replace("-", " ") if c not in punct )

This means all the hyphens in your text will become spaces, so the words will be treated as if they were separate.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值