regex_适用于初学者的简单RegEx技巧

最新推荐文章于 2023-07-13 11:03:06 发布

cumian9828

最新推荐文章于 2023-07-13 11:03:06 发布

阅读量234

点赞数

文章标签： python 正则表达式 java 人工智能 linux

原文链接：https://www.freecodecamp.org/news/simple-regex-tricks-for-beginners-3acb3fa257cb/

版权

regex

Always wanted to learn Regular Expressions but got put off by their complexity? In this article, I will show you five easy-to-learn RegEx tricks which you can start using immediately in your favorite text editor.

一直想学习正则表达式，却因其复杂性而推迟？在本文中，我将向您展示五个易于学习的RegEx技巧，您可以在自己喜欢的文本编辑器中立即开始使用这些技巧。

文字编辑器设定 (Text Editor Setup)

While almost any text editor supports Regular Expressions now, I will use Visual Studio Code for this tutorial, but you can use any editor you like. Also, note that you usually need to turn on RegEx somewhere near the search input. Here is how you do this in VS Code:

尽管现在几乎所有的文本编辑器都支持正则表达式，但本教程将使用Visual Studio Code，但是您可以使用任何喜欢的编辑器。另外，请注意，您通常需要在搜索输入附近的某个地方打开RegEx。这是在VS Code中执行此操作的方法：

1) `.` —匹配任何字符 (1) `.` — Match Any Character)

Let’s start simple. The dot symbol . matches any character:

让我们开始简单。点符号. 匹配任何字符：

b.t

Above RegEx matches "bot”, "bat” and any other word of three characters which starts with b and ends in t. But if you want to search for the dot symbol, you need to escape it with \, so this RegEx will only match the exact text "b.t":

RegEx上方匹配"bot” ， "bat”和其他任何以b开头并以t结尾的三个字符的单词。但是，如果要搜索点符号，则需要使用\对其进行转义，因此此RegEx仅匹配确切的文本"bt" ：

b\.t

2)。* —符合条件 (2) .* — Match Anything)

Here . means “any character” and * means “anything before this symbol repeated any number of times.” Together (.*) they mean “any symbol any number of times.” You can use it, for example, to find matches starting with or ending in some text. Let’s suppose we have a javascript method with the following signature:

在这里. 表示“任何字符” ， *表示“此符号之前重复任何次数的任何内容”。 在一起( .* )，它们的意思是“任何次数的任何符号”。 例如，您可以使用它来查找以某些文本开头或结尾的匹配项。假设我们有一个具有以下签名的javascript方法：

loadScript(scriptName: string, pathToFile: string)

And we want to find all calls of this method wherepathToFile points to any file in the folder “lua” . You can use the following Regular Expression for this:

而且，我们希望找到该方法的所有调用，其中pathToFile指向文件夹“lua”中的任何文件。您可以为此使用以下正则表达式：

loadScript.*lua

Which means, “match all text starting with“loadScript” followed by anything up to the last occurrence of “lua”“

这意味着， “匹配以“loadScript”开头的所有文本，然后匹配直到最后一个“lua”出现的所有文本”

3)？ —非贪婪的比赛 (3) ? — Non-Greedy Match)

The ? symbol after .* and some other RegEx sequences means “match as little as possible.” If you look at the previous picture, you will see that text “lua” is seen twice in every match, and everything up to the second “lua” was matched. If you wanted to match everything up to the first occurrence of "lua" instead, you would use the following RegEx:

? .*和其他RegEx序列后的符号表示“匹配得越少越好”。如果您查看上一张图片，您将看到在每次匹配中都两次看到“lua”文本，并且匹配直到第二个“lua”所有内容。如果您想将所有内容匹配到首次出现的"lua" ，则可以使用以下RegEx：

loadScript.*?lua

Which means, “match everything starting with"loadScript" followed by anything up to the first occurrence of "lua""

这意味着， “匹配从"loadScript"开始的所有内容，然后匹配直到首次出现的"lua" ”

4)()$ —捕获组和反向引用 (4) ( ) $ — Capture Groups and Backreferences)

Okay, now we can match some text. But what if we want to change parts of the text we found? We often have to make use of capture groups for that.

好的，现在我们可以匹配一些文本了。但是，如果我们想更改找到的文本的一部分怎么办？为此，我们经常不得不使用捕获组。

Let’s suppose we changed our loadScript method and now it suddenly needs another argument inserted between its two arguments. Let’s name this new argument id, so the new function signature should look like this: loadScript(scriptName, id, pathToFile). We can’t use normal replace feature of our text editor here, but a Regular Expression is exactly what we need.

假设我们更改了loadScript方法，现在它突然需要在其两个参数之间插入另一个参数。让我们将此新参数命名为id ，因此新函数签名应如下所示： loadScript(scriptName, id, pathToFile) 。我们无法在此处使用文本编辑器的常规替换功能，但正则表达式正是我们所需要的。

Above you can see the result of running the following Regular Expression:

在上方可以看到运行以下正则表达式的结果：

loadScript\(.*?,.*?\)

Which means: “match everything starting with "loadScript(" followed by anything up to the first ,, then followed by anything up to the first )”

这意味着：“比赛开始的一切与"loadScript("之后任何东西到第一, ，再其次是任何东西到第一) ”

The only things which might seem strange here for you are the \ symbols. They are used to escape brackets.

\符号对您来说似乎很奇怪。它们用于转义括号。

We need to escape symbols ( and ) because they are special characters used by RegEx to capture parts of the matched text. But we need to match actual bracket characters.

我们需要转义符号(和)因为它们是RegEx用来捕获匹配文本部分的特殊字符。但是我们需要匹配实际的括号字符。

In the previous RegEx, we defined two arguments of our method call with the .*? symbols. Let’s make each of our arguments a separate capture group by adding ( and ) symbols around them:

在之前的RegEx中，我们使用.*?定义了方法调用的两个参数.*? 符号。通过在它们周围添加(和)符号，使每个参数成为一个单独的捕获组 ：

loadScript\((.*?),(.*?)\)

If you run this RegEx, you will see that nothing changed. This is because it matches the same text. But now we can refer to the first argument as $1 and to the second argument as $2. This is called backreference, and it will help us do what we want: add another argument in the middle of the call:

如果运行此RegEx，将看不到任何更改。这是因为它匹配相同的文本。但是现在我们可以将第一个参数称为$1 ，并将第二个参数称为$2 。这称为反向引用，它将帮助我们完成所需的操作：在调用中间添加另一个参数：

Search input:

搜索输入：

loadScript\((.*?),(.*?)\)

Which means the same thing as the previous RegEx but maps arguments to capture groups 1 and 2 respectively.

这意味着与以前的RegEx相同，但是将参数分别映射到捕获组1和2。

Replace input:

替换输入：

loadScript($1,id,$2)

Which means “replace every matched text with text “loadScript(“ followed by capture group 1, “id”, capture group 2 and )”. Note that you do not need to escape brackets in the replace input.

这意味着“将每个匹配的文本替换为文本“loadScript(“然后是捕获组1， “id” ，捕获组2和) 。 请注意，您无需在replace输入中使用方括号。

5)[] —字符类 (5) [ ] — Character Classes)

You can list characters you want to match at a specific position by placing [ and ] symbols around these characters. For example, class [0-9] matches all digits from 0 to 9. You can also list all digits explicitly: [0123456789] — the meaning is the same. You can use dash with letters too, [a-z] will match any lowercase Latin character,[A-Z] will match any uppercase Latin character and [a-zA-Z] will match both.

通过在这些字符周围放置[和]符号，可以列出要在特定位置匹配的字符。例如，类[0-9]匹配从0到9的所有数字。您也可以显式列出所有数字： [0123456789] -含义相同。您也可以使用带字母的破折号， [az]匹配任何小写拉丁字符， [AZ]匹配任何大写拉丁字符， [a-zA-Z]都匹配。

You can also use * after a character class just like after ., which in this case means: “match any number of occurrences of the characters in this class”

您也可以像在after之后一样在字符类之后使用* . ，在这种情况下，其含义是： “匹配此类中出现的任意数量的字符”

遗言 (Last Word)

You should know that there are several RegEx flavors. The one I discussed here is javascript RegEx engine. Most modern engines are similar, but there may be some differences. Usually, these differences include escape characters and backreferences marks.

您应该知道有几种RegEx口味。我在这里讨论的是javascript RegEx引擎。大多数现代引擎都相似，但可能会有一些差异。通常，这些差异包括转义符和反向引用标记。

I urge you to open your text editor and start using some of these tricks right now. You will see that you can now complete many refactoring tasks much faster than before. Once you are comfortable with these tricks, you can start researching more into regular expressions.

我敦促您打开文本编辑器并立即开始使用其中一些技巧。您将看到，您现在可以比以前更快地完成许多重构任务。一旦熟悉了这些技巧，就可以开始对正则表达式进行更多研究。

Thank you for reading my article to the end. Add claps if you found it useful and subscribe for more updates. I will publish more articles on regular expressions, javascript, and programming in general.

感谢您阅读我的文章。 如果您觉得有用，请添加拍手，并订阅更多更新。 我将发布更多有关正则表达式，javascript和程序设计的文章。

翻译自: https://www.freecodecamp.org/news/simple-regex-tricks-for-beginners-3acb3fa257cb/

regex

cumian9828

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
regex_适用于初学者的简单RegEx技巧

regexAlways wanted to learn Regular Expressions but got put off by their complexity? In this article, I will show you five easy-to-learn RegEx tricks which you can start using immediately in your favo...
复制链接

扫一扫