python 正则表达式前瞻_我应该如何在python中编写这个正则表达式(How should I write this regex in python)...

最新推荐文章于 2021-05-17 10:38:57 发布

贫民窟的大富翁

最新推荐文章于 2021-05-17 10:38:57 发布

阅读量111

点赞数

文章标签： python 正则表达式前瞻

本文链接：https://blog.csdn.net/weixin_31092009/article/details/113642814

版权

我应该如何在python中编写这个正则表达式(How should I write this regex in python)

我有字符串。

st = "12345 hai how r u @3456? Awer12345 7890"

re.findall('([0-9]+)',st)

它应该不会像：

['12345', '3456', '12345', '7890']

我应该得到

['12345','7890']

我应该只取数值

和

它不应该包含任何其他字符，如字母，特殊字符

I have the string.

st = "12345 hai how r u @3456? Awer12345 7890"

re.findall('([0-9]+)',st)

It should not come like :

['12345', '3456', '12345', '7890']

I should get

['12345','7890']

I should only take the numeric values

and

It should not contain any other chars like alphabets,special chars

原文：https://stackoverflow.com/questions/9216382

更新时间：2019-12-16 16:50

最满意答案

In [21]: re.findall(r'(?:^|\s)(\d+)(?=$|\s)', st)

Out[21]: ['12345', '7890']

这里，

(?:^|\s)是一个非捕获组，它匹配字符串的开头或空格。

(\d+)是一个匹配一个或多个数字的捕获组。

(?=$|\s)是前瞻性断言，它匹配字符串的末尾或空格，而不消耗它。

In [21]: re.findall(r'(?:^|\s)(\d+)(?=$|\s)', st)

Out[21]: ['12345', '7890']

Here,

(?:^|\s) is a non-capture group that matches the start of the string, or a space.

(\d+) is a capture group that matches one or more digits.

(?=$|\s) is lookahead assertion that matches the end of the string, or a space, without consuming it.

2012-02-09

相关问答

In [21]: re.findall(r'(?:^|\s)(\d+)(?=$|\s)', st)

Out[21]: ['12345', '7890']

这里， (?:^|\s)是一个非捕获组，它匹配字符串的开头或空格。 (\d+)是一个匹配一个或多个数字的捕获组。 (?=$|\s)是前瞻性断言，它匹配字符串的末尾或空格，而不消耗它。 In [21]: re.findall(r'(?:^|\s)(\d+)(?=$|\s)', st)

Out[21]: ['12345', '7890']

...

>>> import re

>>> regex = re.compile('(?:red,(?P\w+)|(?P\w+),red)')

>>> string = "blue,red red,yellow blue,yellow red,green purple red, ..."

>>> for matches in regex.finditer(string):

... if matches.group('redf

...

如果使用Python 2.7，请使用Unicode字符串。我假设你的“我需要的”例子是不正确的，或者你真的想要AAAAA用于طس ？如果从文件中读取字符串，请首先将字符串解码为Unicode。 #!python2

#coding: utf8

import re

# Note leading u

data = u'TX 35-L|М-21|A 1 طس|US-50|yeni sinop-erfelek yolu çevre yolu|Av Antônio Ribeiro'.split('|')

...

>>> import re

>>> strs = 'The output is\n1) python\nA dynamic language\neasy to learn\n2) C++\ndifficult to learn\n3244) PHP\neay to learn\n'

>>> re.findall(r'\d+\)\s[^\d]+',strs)

['1) python\nA dynamic language\neasy to learn\n',

'2) C++\ndifficult t

...

(从技术上讲，多行字符串！=多行注释。但这不是重点) 正则表达式(['"])\1\1(.*?)\1{3}应该有效，但请确保使用re.DOTALL 。 (['"])找到'或"并在\1捕获它 \1\1找到2个相同的引号 (.*?)抓住一切，直到... \1{3}找到三个相同的引号 (Technically, multiline strings != multiline comments. But that's aside from the point) The regex (['"])\1\1(.*?

...

您可以使用 date ([a-z]{3} \d{2}) at (\d{2}) ([PA]M)

见演示比较你的两个选择： date ([a-z]{3} [0-9]{2}) at ([0-9]{2}) ([P][M])

date ([a-z]{3} [0-9]{2}) at ([0-9]{2}) ([A][M])

注意它们有多相似。我们只需为PM或AM添加1个替代方案。可以使用与P或A匹配的字符类[PA]来完成。而不是[0-9] ，你可以使用速记类\d (它有点短:)，并且不要忘记将正则表

...

使用正则表达式非捕获组和regex.findAll函数的解决方案： import regex

...

fh = open('lines.txt', 'r'); // considering 'lines.txt' is your initial file

commlines = fh.read()

_sched_wakeup_pattern = regex.compile(r"""

comm=(?P[\S]+?)

\spid=(?P\d+)

\spri

...

[^\W\d]

丢弃非单词字符并丢弃数字。保持休息。 [^\W\d]

Throw out non-word characters and throw out digits. Keep the rest.

阅读文档。 re.findall返回组，如果有的话。如果您想要整个匹配，则必须将其全部分组，或使用re.finditer 。看到这个问题。 Read the documentation. re.findall returns the groups, if there are any. If you want the entire match you must group it all, or use re.finditer. See this question.

尝试使用 \[xxxcixxx\[\[_'.*?'\] \[_'.*?'\]\]xxxcixxx\]

演示： http ： //regexr.com/3d887 Try using \[xxxcixxx\[\[_'.*?'\] \[_'.*?'\]\]xxxcixxx\]

Demo: http://regexr.com/3d887