近期作业,刚好复习一下re库和正则表达式
题目:选取一段英文小说,将文中的人称代词(I,you, she等)替换为“**”
编写函数如下:
def replaceWord(wordList, repl, contents): # wordlist为需要替换的字符数组,repl为替换词,contents为需要处理的文本
import re
# 将单词组转化为正则表达式原生字符串 格式如r'|(\s(abc)\s)|(\s(bcd)\s)'
word = "r'|"
n = len(wordList)
for i in range(0, n):
cur = '(\s(' + wordList[i] + ')\s)' # 为确保单词内的字母不被转换,将正则表达式设置前后空格检测
if i != n-1:
cur += '|'
word += cur
word += "'"
# 替换
repl = ' '+repl+' ' # 替换词前后增加空格保证阅读体验
contents = ' ' + contents + ' ' # 避免首尾单词检测不出
ans = re.sub(word, repl, contents, flags = re.I)
# 返回处理后文本
return ans
调用结果如下:
wordList = ['I', 'you', 'we', 'she', 'he', 'it', 'they', 'we']
contents = "I did not. They aren't here. I'm sorry. I need those papers very much. I may lose a large sum of money if I don't find them. I can't see what could have happened to them. I had them on my desk in the office yesterday, and I was looking at them when Mr. Johnson came along to see about buying some lumber6 from the pile in the yard next to my office."
print(replaceWord(wordList, '**', contents))
## 结果:
## ** did not. ** aren't here. I'm sorry. ** need those papers very much. ** may lose a large sum of money if ** don't find them. ** can't see what could have happened to them. ** had them on my desk in the office yesterday, and ** was looking at them when Mr. Johnson came along to see about buying some lumber6 from the pile in the yard next to my office.