python 正则分组替换,根据分组值的长度判读进行替换
strs='''
aaaaaa<a id="a_detail_0" href="javascript:;" partid='0' startIndex='1571' class=red>bbbbbbb</a>cccccc<a id="a_detail_0" href="javascript:;" partid='0' startIndex='1571' class=red>dddddd</a>eeeee
'''
pat=r'(<a id="a_detail_\d+" href="javascript:;" partid=\'\d+\' startIndex=\'\d+\' class=red>)(.+?)(</a>)'
#定义一个过滤函数str_len,如果分组内容<100字,则,替换的内容还是自己,否则填好为去掉a标签的内容
def str_len(matched):
Strs = matched.group(2)#匹配的第二项
if len(Strs) < 100:
Strs = matched.group(1) + matched.group(2) + matched.group(3)
return Strs#返回填好的词
正则替换,正在表达式第二项为可以为字符串,表达式,分组\g<1>,函数
line = re.sub(pat,str_len,line)