正则表达式替换文字表达式_通过搜索和替换练习正则表达式

最新推荐文章于 2024-07-05 07:00:00 发布

culi4814

最新推荐文章于 2024-07-05 07:00:00 发布

阅读量490

点赞数

文章标签：字符串正则表达式 java python 大数据

原文链接：https://www.sitepoint.com/practicing-regular-expressions/

版权

正则表达式替换文字表达式

If you’re just starting out with regular expressions (regex), the syntax can seem a bit puzzling at first (I would recommend Jason Pasnikowski’s article as a good starting point). One of the things that make it difficult to grasp regex in the beginning is the small number of times you have a chance to use them in your code, which in turn limits the amount of practice you have using them. Professionals in any capacity, be it sports, entertainment, or development always practice – some practice more than others.

如果您只是从正则表达式(regex)开始，那么语法起初似乎有些令人费解(我建议Jason Pasnikowski的文章作为一个很好的起点)。一开始让正则表达式难以理解的一件事是您有机会在代码中使用正则表达式的次数很少，这反过来又限制了您使用正则表达式的次数。无论是运动，娱乐还是发展，任何能力的专业人员都经常练习-有些练习比其他练习更多。

So how can you practice using regex if you are limited to just using them in your code? The answer is to use a utility, of which there are many, that uses regex for performing search and replace. I’m sure everyone is familiar with the standard “find x and replace it with y” type of search and replace. Most IDEs and text editors have built in regex engines to handle search and replace. In this article I’d like to walk through a series of exercises to help you practice using regex.

因此，如果您仅限于在代码中使用正则表达式，该如何练习使用正则表达式？答案是使用一个实用程序，其中有很多使用regex执行搜索和替换。我确信每个人都熟悉标准的“查找x并将其替换为y ”类型的搜索和替换。大多数IDE和文本编辑器都内置了正则表达式引擎来处理搜索和替换。在本文中，我想通过一系列练习来帮助您练习使用正则表达式。

I’ll be using NetBeans for this article. Some editors might have slightly different regex behavior that what you see here, so if you’re using something other than NetBeans and it doesn’t work quite as you’d expect, be sure to read the documentation for your specific editor.

我将在本文中使用NetBeans。一些编辑器的正则表达式行为可能与您在此处看到的略有不同，因此，如果您使用的不是NetBeans，则不能正常使用，请确保阅读特定编辑器的文档。

词边界 (Word Boundaries)

Let’s use the following code to start with for our examples; I’ve crafted it specifically to illustrate particular caveats of search and replace as your progress.

让我们使用以下代码作为示例。我特意制作了它，以说明搜索的特定注意事项并随您的进度进行替换。

<div id="navigation">
 <a href="divebomb.php" title="All About Divebombs">Divebombs</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="endives.php" title="All About Endives">Endives</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="indivisible.php" title="Indivisible by Zero">Indivisible Numbers</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="division.php" title="All About Division">Divison</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="skydiving.php" title="All About Skydiving">Skydiving</a>&nbsp;&nbsp;|&nbsp;&nbsp;
</div>

This navigation code should ideally be an unordered list, not free anchors inside div tags. You can’t just replace the word “div” with “ul” however because divebomb would become ulebomb, endives would become enules, etc. You also can’t use “<div” because it would miss the closing div tag. You can manually replace the div tags with ul tags, or you can use the special sequence b which denotes a word boundary.

理想情况下，此导航代码应该是无序列表，而不是div标签中的自由锚。但是，您不能仅用“ ul”替换“ div”一词，因为Divedbomb会变成ulebomb，菊苣会变成enule，等等。您也不能使用“ <div”，因为它会错过结束div标签。您可以使用ul标签手动替换div标签，也可以使用表示单词边界的特殊序列b 。

In the Search field, type: bdivbIn the Replace field, type: ul

在“搜索”字段中，键入： bdivb在“替换”字段中，键入： ul

This only replaces the text “div” that was delimited by word boundaries. Word boundaries allow you to perform whole word only searches, so the word “div” in <div id=”navigation”> and </div> both get matched while the substrings in the anchors are left alone.

这仅替换了由单词边界分隔的文本“ div”。单词边界允许您执行全单词搜索，因此<div id =“ navigation”>和</ div>中的单词“ div”都将匹配，而锚点中的子字符串则保持不变。

Later you’ll also see w, which is used to match non-whitespace “word” characters.

稍后您还将看到w ，它用于匹配非空白的“单词”字符。

分组和反向引用 (Groupings and Back References)

Continuing with the modified code from the first example, let’s continue refactoring the list. Right now your code should look like this:

继续第一个示例中的修改后的代码，让我们继续重构列表。现在，您的代码应如下所示：

<ul id="navigation">
 <a href="divebomb.php" title="All About Divebombs">Divebombs</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="endives.php" title="All About Endives">Endives</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="indivisible.php" title="Indivisible by Zero">Indivisible Numbers</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="division.php" title="All About Division">Divison</a>&nbsp;&nbsp;|&nbsp;&nbsp;
 <a href="skydiving.php" title="All About Skydiving">Skydiving</a>&nbsp;&nbsp;|&nbsp;&nbsp;
</ul>

You can easily do a standard search and replace on the anchor tags without any of the issues that prevented you from doing so with div, but where is the fun in that? In the spirit of practice, let’s use regex to wrap the anchors in li tags.

您可以轻松地进行标准搜索并替换锚标记，而不会遇到任何妨碍div的问题，但是这样做的乐趣何在？本着实践的精神，让我们使用正则表达式将锚点包装在li标签中。

To select the anchors, type the following in the Search field: (<a.*>)In the Replace field, type: <li>$1</li>

若要选择锚点，请在“搜索”字段中键入以下内容： (<a.*>)在“替换”字段中，键入： <li>$1</li>

Ignoring the parentheses in the search pattern for now, let’s break up the pattern and discuss each piece of it. The first piece is <a, which tells the regex engine to match a less-than symbol followed by the letter a. The next part of this piece is .*>, which tells the engine to match any character zero or more times followed by a greater-than symbol. This piece matches the anchor tags in the block of code above.

现在暂时忽略搜索模式中的括号，让我们分解一下模式并讨论它的每一部分。第一个是<a，它告诉正则表达式引擎匹配一个小于号，后跟字母a 。这部分的下一部分是。*>，它告诉引擎将任何字符匹配零次或多次，后跟一个大于号。这部分与上面代码块中的anchor标签匹配。

The parentheses in the search pattern perform a special function; they group the individual matches which you can access later. By adding the parentheses, you are telling the regex engine to store the matching result because you’ll need them later. You can access these groups by number.

搜索模式中的括号执行特殊功能；他们将各个匹配项分组，您以后可以访问。通过添加括号，您是在告诉正则表达式引擎存储匹配结果，因为稍后将需要它们。您可以按编号访问这些组。

The replace pattern tells the engine to replace the search pattern with an opening li tag, followed by the contents in the first grouping, and a closing li tag. In this example there is only one group (because there is only one set of parenthesis), so the $1 in the middle of the li tags indicates this is the group you want to use. (Some editors may use 1 instead of $1. If $1 does work, then undo your replacement and try the other variant.)

替换模式告诉引擎用开头的li标签，后跟第一个分组中的内容和结尾的li标签替换搜索模式。在此示例中，只有一个组(因为只有一组括号)，因此li标记中间的$1表示这是您要使用的组。 (某些编辑器可能会使用1而不是$1如果$1起作用，请撤消替换，然后尝试其他变体。)

You can have multiple groups, and groups can be nested which you’ll see in just a moment. You’re going to modify the patterns you just used to add the li tags in order to create a more robust navigation. Undo the replacements you’ve just made. Usually something like Ctrl+Z works just fine, but if it doesn’t here’s the search and replace patterns to revert the code:

您可以有多个组，并且可以嵌套组，稍后您将看到它们。您将修改刚刚用于添加li标签的模式，以创建更强大的导航。撤消刚刚完成的替换。通常，像Ctrl + Z类的东西就可以正常工作，但是如果不行，这里是搜索和替换模式以还原代码：

In the Search field, type: <li>(<a.*>)</li>In the Replace field, type: $1

在“搜索”字段中，键入： <li>(<a.*>)</li>在“替换”字段中，键入： $1

多个分组 (Multiple Groupings)

Alright, now let’s wrap the anchor tags in li tags complete with class and id attributes for use with CSS. To accomplish this, you’ll use the following:

好了，现在让我们将锚标签包装在带有class和id属性的li标签中，以供CSS使用。为此，您将使用以下内容：

Search: (<a.*>(w+).*</a>)Replace: <li class="navEntry" id="$2">$1</li>

搜索： (<a.*>(w+).*</a>)替换： <li class="navEntry" id="$2">$1</li>

As in the second example’s search pattern, <a.*> matches the anchor tags. You’re asking the regex engine to find a string that begins with a greater-than symbol, followed by the letter a, followed by series of zero or more characters that ends with a less-than symbol. With w+ you are also asking the engine to look for a sequence of characters that doesn’t contain any whitespace or symbol characters and has a length greater than zero. The parentheses around w+ indicate you want to store the match as a group. Next you added .* to the pattern to match any other characters that may appear before the closing of the anchor tag. The result is that $1 will have the matched anchor string, and $2 will have the first word of the link’s text.

与第二个示例的搜索模式一样，<a。*>匹配锚标记。您要让正则表达式引擎查找以大于号开头的字符串，然后是字母a ，然后是一系列零个或多个以小于号结尾的字符。使用w+您还要求引擎寻找不包含任何空格或符号字符且长度大于零的字符序列。 w+周围的括号表示您要将匹配存储为一个组。接下来，您将。*添加到模式中，以匹配锚标记关闭之前可能出现的任何其他字符。结果是$1将具有匹配的定位字符串，而$2将具有链接文本的第一个单词。

Breaking down the replacement, you begin with li, a class attribute and its value, followed by the id attribute. Instead of providing an id value however you have $2. This tells the regex engine you want to use the content stored in the second group from the search pattern, which in this case is the w+. Then you open the li tag, tell the regex engine you want to use the first grouping ($1 is the entire anchor tag), and finally close the li tag.

要分解替换项，请从li ， class属性及其值开始，然后是id属性。除了提供id值之外，您还有$2 。这告诉正则表达式引擎您要使用搜索模式中第二组中存储的内容，本例中为w+ 。然后，您打开li标签，告诉正则表达式引擎您要使用第一个分组( $1是整个锚标签)，最后关闭li标签。

Be careful when you are determining which groups to replace. Consider the following hypothetical example (I’ve used group names instead of patterns to illustrate how grouping works):

确定要替换的组时请小心。考虑下面的假设示例(我使用组名而不是模式来说明分组的工作方式)：

(group1(group2))(group3)

Using the above gives you the following results:

使用上面的方法可以为您提供以下结果：

$1 = group1group2
$2 = group2
$3 = group3

$1 contains both group1 and group2 because parentheses enclose both of them. This is true even though group2 is a group by itself. And then of course group3 is a group to itself.

$1包含group1和group2，因为括号将它们都包围了。即使group2本身是一个组，也是如此。当然， group3本身就是一个组。

To finish cleaning things up, you can remove the non-breaking space entities and the pipe character from the end of the lines and replace them with an empty string (the pipe needs to be preceded by a backslash in the expression because it has special meaning to the engine).

要清理完所有内容，您可以从行尾删除不间断的空格实体和管道字符，然后将它们替换为空字符串(管道在表达式中必须以反斜杠开头，因为它具有特殊含义引擎)。

Search:   |  Leave the Replace field empty.

搜索：   |   将“替换”字段留空。

You’re code should now look like this – a nice, neat, well-structured list you can use CSS with to style:

您的代码现在应该看起来像这样–一个漂亮，整洁，结构良好的列表，可以使用CSS进行样式设置：

<ul id="navigation">
 <li class="navEntry" id="Divebombs"><a href="divebomb.php" title="All About Divebombs">Divebombs</a></li>
 <li class="navEntry" id="Endives"><a href="endives.php" title="All About Endives">Endives</a></li>
 <li class="navEntry" id="Indivisible"><a href="indivisible.php" title="Indivisible by Zero">Indivisible Numbers</a></li>
 <li class="navEntry" id="Divison"><a href="division.php" title="All About Division">Divison</a></li>
 <li class="navEntry" id="Skydiving"><a href="skydiving.php" title="All About Skydiving">Skydiving</a></li>
</ul>

摘要 (Summary)

Thanks for taking some time to learn a little bit more about regular expressions and practicing with them using search and replace. I encourage anyone who is struggling to grasp the concepts to practice using search and replace in their editor because it’s convenient and generally provides immediate visual feedback. If necessary, you can copy and paste the content you’re working with into a blank file and experiment with it, running replacements and undoing them, until you get what you like.

感谢您抽出一些时间来学习有关正则表达式的更多信息，并使用搜索和替换对其进行练习。我鼓励任何努力掌握概念的人在其编辑器中使用搜索和替换来实践，因为它很方便并且通常会立即提供视觉反馈。如有必要，您可以将正在使用的内容复制并粘贴到空白文件中并进行实验，运行替换并撤消替换，直到获得所需的内容。