python正则匹配字符串列表_在python中使用正则表达式查找可嵌套字符串组

最新推荐文章于 2024-06-22 09:36:52 发布

weixin_39858275

最新推荐文章于 2024-06-22 09:36:52 发布

阅读量646

点赞数

文章标签： python正则匹配字符串列表

在网上看到一个小需求，需要用正则表达式来处理。原需求如下：

找出文本中包含”因为……所以”的句子，并以两个词为中心对齐输出前后3个字，中间全输出，如果“因为”和“所以”中间还存在“因为”“所以”，也要找出来，另算一行，输出格式为：

行号前面3个字 *因为* 全部 &所以& 后面3个字(标点符号算一个字)

2 还不是 *因为* 这里好， &所以& 没有人

实现方法如下：

#encoding:utf-8

import os

import re

def getPairStriList(filename):

pairStrList = []

textFile = open(filename, 'r')

pattern = re.compile(u'.{3}\u56e0\u4e3a.*\u6240\u4ee5.{3}') #u'\u56e0\u4e3a和u'\u6240\u4ee5'分别为“因为”和“所以”的utf8码

for line in textFile:

utfLine = line.decode('utf8')

result = pattern.search(utfLine)

while result:

resultStr = result.group()

pairStrList.append(resultStr)

result = pattern.search(resultStr,2,len(resultStr)-2)

#对每个字符串进行格式转换和拼接

for i in range(len(pairStrList)):

pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'\u56e0\u4e3a',u' *\u56e0\u4e3a* ',1) + pairStrList[i][5:]

pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'\u6240\u4ee5',u' &\u6240\u4ee5& ',1)

pairStrList[i] = str(i+1) + ' ' + pairStrList[i]

return pairStrList

if __name__ == '__main__':

pairStrList = getPairStriList('test.txt')

for str in pairStrList:

print str

PS：下面看下python里使用正则表达式的组嵌套

由于组本身是一个完整的正则表达式，所以可以将组嵌套在其他组中，以构建更复杂的表达式。下面的例子，就是进行组嵌套的例子：

#python 3.6

#蔡军生

#http://blog.csdn.net/caimouse/article/details/51749579

#

import re

def test_patterns(text, patterns):

"""Given source text and a list of patterns, look for

matches for each pattern within the text and print

them to stdout.

"""

# Look for each pattern in the text and print the results

for pattern, desc in patterns:

print('{!r} ({})\n'.format(pattern, desc))

print(' {!r}'.format(text))

for match in re.finditer(pattern, text):

s = match.start()

e = match.end()

prefix = ' ' * (s)

print(

' {}{!r}{} '.format(prefix,

text[s:e],

' ' * (len(text) - e)),

end=' ',

)

print(match.groups())

if match.groupdict():

print('{}{}'.format(

' ' * (len(text) - s),

match.groupdict()),

)

print()

return

例子：

#python 3.6

#蔡军生

#http://blog.csdn.net/caimouse/article/details/51749579

#

from re_test_patterns_groups import test_patterns

test_patterns(

'abbaabbba',

[(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')],

)

结果输出如下：

'a((a*)(b*))' (a followed by 0-n a and 0-n b)

'abbaabbba'

'abb' ('bb', '', 'bb')

'aabbb' ('abbb', 'a', 'bbb')

'a' ('', '', '')

总结

以上所述是小编给大家介绍的在python中使用正则表达式查找可嵌套字符串组，希望对大家有所帮助，如果大家有任何疑问请给我留言，小编会及时回复大家的。在此也非常感谢大家对脚本之家网站的支持！

weixin_39858275

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则匹配字符串列表_在python中使用正则表达式查找可嵌套字符串组

在网上看到一个小需求，需要用正则表达式来处理。原需求如下：找出文本中包含”因为……所以”的句子，并以两个词为中心对齐输出前后3个字，中间全输出，如果“因为”和“所以”中间还存在“因为”“所以”，也要找出来，另算一行，输出格式为：行号前面3个字 *因为* 全部 &所以& 后面3个字(标点符号算一个字)2 还不是 *因为* 这里好， &所以& 没有人实现方法如下：#e...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。