正则表达式 高级用法_正则表达式组的高级用法

正则表达式 高级用法

Regular Expressions are a useful thing to learn. Essentially, they are a tool for parsing text.

正则表达式是学习有用的东西。 本质上,它们是解析文本的工具。

They are present in almost every language, albeit with some variance. Therefore regular expressions are knowledge you can transfer between different technologies. This article will focus on the Python version of Regex.

它们几乎以每种语言显示,尽管有些差异。 因此,正则表达式是可以在不同技术之间传递的知识。 本文将重点介绍Regex的Python版本。

I am building on the articles I have read from Aphinya Dechalert and Sadrach Pierre, Ph.D.. Link to respective articles are at the bottom.

我以我从Aphinya DechalertSadrach Pierre博士那里阅读的文章为基础 。 链接到各个文章在底部。

They can be applied in parsing data structure without any protocol. Formats such as JSON or XML have custom parser that are usually to use. However sometimes you might run into an unfamiliar format and only regular expressions will come to your rescue in such a scenario. Furthermore, regular expressions are extremely useful for parsing logs.

无需任何协议即可将它们应用于解析数据结构。 JSON或XML等格式具有通常使用的自定义解析器。 但是,有时您可能会遇到陌生的格式,在这种情况下,只有正则表达式会起作用。 此外,正则表达式对于解析日志非常有用。

Today I shall try to present the advanced use of groups. These are used to capture information from strings. They are enclosed in parentheses().

今天,我将尝试介绍组的高级用法。 这些用于捕获字符串信息。 它们包含在括号()中。

m=re.search("(Trump)","Trump will win the election")
m.groups()

(‘Trump’,)

('王牌',)

This is a simple of example looking for a word in a string. The name inside the parenthesis is indicative of the name that is to be captured.

这是一个简单的示例,用于在字符串中查找单词。 括号内的名称表示要捕获的名称。

前瞻性和后向断言 (Look-ahead and look-behind assertions)

We are moving on to a more advanced use of groups. Let’s say you want to find something that is preceded by a series of characters, however you wish not include them in the group. Such a case call for look-behind and look-ahead assertions.

我们正在进一步使用组。 假设您要查找以一系列字符开头的内容,但是您不希望将它们包括在组中。 这种情况需要先行断言和先行断言。

txt=”Date: 03/07/20 We saw him on 04/05/20"
m=re.search(“(?<=Date: )(\d{2}/\d{2}/\d{2})”,txt)

(‘03/07/20’,)

('03 / 07/20',)

In this example we are trying to capture the date in the beginning of string. Specifically, we are looking for the date that follow the term “Date: “. In the above pattern this is specified by the “?<=” at the beginning. As you can see there are two dates in the text(ie., txt), our search only captures the first one due to the look-behind assertion. The expression can also be negated by stating “?<!”. If we had inserted this instead of the positive assertion our expression would look for date patterns that are not preceded by the sequence “Date: “. Furthermore, there are also look-ahead assertions, which make the capturing of the group contingent on the sequence following it. Both positive and negative look-ahead assertions exist, “?=” and “?!” respectively.

在此示例中,我们尝试捕获字符串开头的日期。 具体来说,我们正在寻找“日期:”之后的日期。 在上面的模式中,以“?<=”开头指定。 如您所见,文本(即txt)中有两个日期,由于后向断言,我们的搜索仅捕获第一个日期。 也可以通过声明“?<!”来否定表达式。 如果我们插入了该语句而不是肯定的断言,则我们的表达式将查找日期序列前没有“ Date:”的日期模式。 此外,还有前瞻性断言,这些断言使捕获组取决于在其后的序列。 正向和负向超前断言都存在,“?=”和“?!” 分别。

非捕获组 (Non-capturing groups)

We also know of so called non-capturing groups, which are required to be in string for the pattern to match, yet they are not captured. You are probably thinking this looks similar to the look-behind and look-ahead assertions. You would be correct in thinking that, as you can perform most tasks you would otherwise with the assertions.

我们还知道所谓的非捕获组,为了使模式匹配,它们必须以字符串形式出现,但不会被捕获。 您可能会认为这看起来与后向断言和前瞻性断言相似。 您会认为这是正确的,因为您可以执行大多数任务,否则可以使用断言。

txt=”Date: 03/07/20 We saw him on 04/05/20"
m=re.search(“(?:Date: )(\d{2}/\d{2}/\d{2})”,txt)

(‘03/07/20’,)

('03 / 07/20',)

Non-capturing group instead of look-behind assertion, otherwise the example is the same and the result is the same as well.

非捕获组而不是后向断言,否则示例相同,结果也相同。

There are however some differences when using other function. Namely, they behave differently when using match and sub. The sub function is used to replace the found pattern with a new string.

但是,使用其他功能时会有一些差异。 即,当使用match和sub时,它们的行为不同。 子函数用于用新字符串替换找到的模式。

m1=re.sub(“(?:Date: )(\d{2}/\d{2}/\d{2})”,”02/02/20”,txt)
m2=re.sub(“(?<=Date: )(\d{2}/\d{2}/\d{2})”,”02/02/20”,txt)
print(m1)
print(m2)

02/02/20 We saw him on 04/05/20Date: 02/02/20 We saw him on 04/05/20

20/02/02我们在04/05/20看到了他日期:02/02/20我们在04/05/20看到了他

As you can see there is a difference with the sub method. Namely, the no-capture group includes the non-captured material in the substitution. Therefore in the situation when you want to swap something that is preceded by specified symbols you are better of using look-behind assertions.

如您所见,sub方法有所不同。 即,非捕获组在替换中包括非捕获材料。 因此,在您要交换以指定符号开头的内容的情况下,最好使用后置断言。

The match function always looks for the pattern at the beginning of the string, hence if a look-behind assertion was included at the beginning of the string there would be no match.

match函数始终在字符串的开头查找模式,因此,如果在字符串的开头包含后向断言,则将不存在匹配项。

Groups can also capture alternatives. The “|” captures either the expression on the left or on the right side. An annoying feature of the assertions is that the alternatives must be of equal length, which forces you to use non-capturing groups in some scenarios. For instance, let’s say in some texts the date we are looking for is preceded by the sequence “Start: “ in addition to other texts that use the sequence “Date: “.

小组也可以捕获替代方案。 “ |” 捕获左侧或右侧的表达式。 断言的一个令人讨厌的特征是,替代项的长度必须相等,这迫使您在某些情况下使用非捕获组。 例如,假设在某些文本中,我们要查找的日期之前带有序列“开始:”,以及使用序列“日期:”的其他文本。


m=re.search(“(?<=Date: |Start: )(\d{2}/\d{2}/\d{2})”,txt)

raise error(“look-behind requires fixed-width pattern”)re.error: look-behind requires fixed-width pattern

引发错误(“后视需要固定宽度模式”)re.error:后视需要固定宽度模式

An error is returned due to the look-behind assertion requiring fix-width patterns. Consequently we should use non-capturing groups here.

由于需要固定宽度模式的后视断言而返回错误。 因此,我们应该在这里使用非捕获组。

结论 (Conclusion)

We took a look a two useful concepts in regular expressions. We will continue this series with an article focusing on named groups.

我们看了正则表达式中的两个有用概念。 我们将在本系列的后续文章中重点介绍命名组。

翻译自: https://medium.com/dev-genius/advanced-use-of-regex-groups-147ebfcbb139

正则表达式 高级用法

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值