python中正则表达式格式_Python正则表达式格式

对我来说,Martijn的解决方案不起作用.所以我给出了解决方案.

请注意我不使用re.IGNORECASE的事实

因此,我的正则表达式能够结束

PC_bos_7659Ae_1450sp_rev_2_1_nc_woof

我不知道在这种情况下它是否真的是你想要的

inputtext = """XY_efgh_1234_0040_rev_2_1_NC_asdf

XY_abcd_1122Ae_1150_rev2_1_NC

XY_efgh_0124e_50_NC

asdf_1980_2234a_2

XY_abcd_5098_2270_2_1_NC

PC_bos_7659Ae_1450sp_rev_2_1_NC_GRAPH

PC_bos_7659Ae_1450sp_rev_2_1_nc_woof"""

print inputtext

.

import re

print """\n----------------------------------------

WANTED

('1234', '0040_rev_2_1')

('1122Ae', '1150_rev2_1')

('0124e', '50')

('1980', '2234a_2')

('5098', '2270_2_1')

('7659Ae', '1450sp_rev_2_1')"""

print '----------- eyquem ----------------------'

ri = re.compile('^\D+'

'(\d{4}[a-zA-Z]{0,3})'

'[_-]+'

'(.+?)'

'(?:[_-]+NC.*)?$',

re.MULTILINE)

for match in ri.findall(inputtext):

print match

print '----------- Martijn ----------------------'

ro = re.compile(

r"""

([0-9]{4}

[A-Z]{0,3})

[_-]{1,3}

([0-9]{2,4}

[0-9A-Z_-]{0,16}?)

(?:[-_]NC)?

""",

re.IGNORECASE | re.VERBOSE)

for match in ro.findall(inputtext):

print match

结果

----------------------------------------

WANTED

('1234', '0040_rev_2_1')

('1122Ae', '1150_rev2_1')

('0124e', '50')

('1980', '2234a_2')

('5098', '2270_2_1')

('7659Ae', '1450sp_rev_2_1')

----------- eyquem ----------------------

('1234', '0040_rev_2_1')

('1122Ae', '1150_rev2_1')

('0124e', '50')

('1980', '2234a_2')

('5098', '2270_2_1')

('7659Ae', '1450sp_rev_2_1')

('7659Ae', '1450sp_rev_2_1_nc_woof')

----------- Martijn ----------------------

('1234', '0040')

('1122Ae', '1150')

('0124e', '50')

('1980', '2234')

('5098', '2270')

('7659Ae', '1450')

('7659Ae', '1450')

我的正则表达式可用于个别行::

for s in inputtext.splitlines(True):

print ri.match(s).groups()

同样的结果

.

编辑

import re

inputtext = """XY_efgh_1234_0040_rev_2_1_NC_asdf

XY_abcd_1122Ae_1150_rev2_1_NC

XY_efgh_0124e_50_NC

XY_efgh_0228e_66-__NC

asdf_1980_2234a_2

asdf_2999_133a

XY_abcd_5098_2270_2_1_NC

XY_abcd_6099_33370_2_1_NC

XY_abcd_6099_3370abcd_2_1_NC

PC_bos_7659Ae_1450sp_rev_2_1_NC_GRAPH

PC_bos_7659Ae_1450sp_rev_2_1___NC_GRAPH

PC_bos_7659Ae_1450sp_rev_2_1_nc_woof_NC

PC_bos_7659Ae_1450sp_rev_2_1_anc_woof_NC

PC_bos_7659Ae_1450sp_rev_2_1_abNC_woof_NC"""

print '----------- Martijn 2 ------------'

ruu = re.compile(r"""

( [0-9]{4} [A-Z]{0,3} )

[_-]{1,3}

( [0-9]{2,4} (?:[0-9A-Z_-](?!NC))* )

""", re.IGNORECASE | re.VERBOSE)

for match in ruu.findall(inputtext):

print match

print '----------- eyquem 2 ------------'

rii = re.compile('[_-]'

'(\d{4}[A-Z]{0,3})'

'[_-]{1,3}'

'('

'(?=\d{2,4}[A-Z]{0,3}(?![\dA-Z]))'

'(?:[0-9A-Z_-]+?)'

')'

'(?:[-_]+NC.*)?'

'(?![0-9A-Z_-])',

re.IGNORECASE)

for m in rii.findall(inputtext):

print m

结果

----------- Martijn 2 ------------

('1234', '0040_rev_2_1')

('1122Ae', '1150_rev2_1')

('0124e', '50')

('0228e', '66-_')

('1980', '2234a_2')

('2999', '133a')

('5098', '2270_2_1')

('6099', '33370_2_1')

('6099', '3370abcd_2_1')

('7659Ae', '1450sp_rev_2_1')

('7659Ae', '1450sp_rev_2_1__')

('7659Ae', '1450sp_rev_2_1')

('7659Ae', '1450sp_rev_2_1_')

('7659Ae', '1450sp_rev_2_1_a')

----------- eyquem 2 ------------

('1234', '0040_rev_2_1')

('1122Ae', '1150_rev2_1')

('0124e', '50')

('0228e', '66')

('1980', '2234a_2')

('2999', '133a')

('5098', '2270_2_1')

('7659Ae', '1450sp_rev_2_1')

('7659Ae', '1450sp_rev_2_1')

('7659Ae', '1450sp_rev_2_1')

('7659Ae', '1450sp_rev_2_1_anc_woof')

('7659Ae', '1450sp_rev_2_1_abNC_woof')

备注:

>我的正则表达式没有捕获’33370_2_1’或’3370abcd_2_1’因为他们不尊重“2到4个字母可能后跟最多3个数字”的模式而Martijn的解决方案吸引了他们>我的正则表达式抓住的部分的末端是干净的;在Martijn的代码中他们不是> Martijn的正则表达式在每个序列NC或nc前面停止,即使它没有下划线,也就是说即使这些序列是字母是所需部分的一部分.如果我的正则表达式的这个特性不合适,请告诉我,我会修改它

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值