如何提取两个标记之间的子字符串？

最新推荐文章于 2023-04-06 17:24:15 发布

xfxf996

最新推荐文章于 2023-04-06 17:24:15 发布

阅读量330

点赞数

文章标签： python string substring

原文链接：https://oldbug.net/q/jA5L/How-to-extract-the-substring-between-two-markers

版权

本文翻译自：How to extract the substring between two markers?

Let's say I have a string 'gfgfdAAA1234ZZZuijjk' and I want to extract just the '1234' part. 假设我有一个字符串'gfgfdAAA1234ZZZuijjk'而我只想提取'1234'部分。

I only know what will be the few characters directly before AAA , and after ZZZ the part I am interested in 1234 . 我只知道AAA之前和ZZZ之后我对1234感兴趣的那几个字符是什么。

With sed it is possible to do something like this with a string: 使用sed可以用字符串执行以下操作：

echo "$STRING" | sed -e "s|.*AAA\(.*\)ZZZ.*|\1|"

And this will give me 1234 as a result. 结果是我会得到1234 。

How to do the same thing in Python? 如何在Python中做同样的事情？

#1楼

参考：https://stackoom.com/question/jA5L/如何提取两个标记之间的子字符串

#2楼

Just in case somebody will have to do the same thing that I did. 以防万一某人必须做与我相同的事情。 I had to extract everything inside parenthesis in a line. 我必须在一行中提取括号内的所有内容。 For example, if I have a line like 'US president (Barack Obama) met with ...' and I want to get only 'Barack Obama' this is solution: 例如，如果我有一条类似“美国总统（巴拉克·奥巴马）与...会面……”这样的句子，而我只想得到“巴拉克·奥巴马”，这就是解决方案：

regex = '.*\((.*?)\).*'
matches = re.search(regex, line)
line = matches.group(1) + '\n'

Ie you need to block parenthesis with slash \\ sign. 也就是说，您需要使用slash \\符号来阻止括号。 Though it is a problem about more regular expressions that Python. 尽管这是关于Python的更多正则表达式的问题。

Also, in some cases you may see 'r' symbols before regex definition. 另外，在某些情况下，您可能会在正则表达式定义之前看到“ r”符号。 If there is no r prefix, you need to use escape characters like in C. Here is more discussion on that. 如果没有r前缀，则需要像C中那样使用转义字符。此处对此进行了更多讨论。

#3楼

>>> s = '/tmp/10508.constantstring'
>>> s.split('/tmp/')[1].split('constantstring')[0].strip('.')

#4楼

With sed it is possible to do something like this with a string: 使用sed可以用字符串执行以下操作：

echo "$STRING" | sed -e "s|.*AAA\$.*\$ZZZ.*|\\1|"

And this will give me 1234 as a result. 结果是我会得到1234。

You could do the same with re.sub function using the same regex. 您可以使用相同的正则表达式对re.sub函数执行相同的操作。

>>> re.sub(r'.*AAA(.*)ZZZ.*', r'\1', 'gfgfdAAA1234ZZZuijjk')
'1234'

In basic sed, capturing group are represented by \$..\$ , but in python it was represented by (..) . 在基本sed中，捕获组由\$..\$ ，但是在python中，捕获组由(..)表示。

#5楼

>>> s = 'gfgfdAAA1234ZZZuijjk'
>>> start = s.find('AAA') + 3
>>> end = s.find('ZZZ', start)
>>> s[start:end]
'1234'

Then you can use regexps with the re module as well, if you want, but that's not necessary in your case. 然后，如果需要，也可以将reexpexp与re模块一起使用，但这在您的情况下不是必需的。

#6楼

import re
print re.search('AAA(.*?)ZZZ', 'gfgfdAAA1234ZZZuijjk').group(1)

xfxf996

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫