正则表达式基本符号_如何使用基本正则表达式更好地搜索并节省时间

正则表达式基本符号

正则表达式基本符号

banner-01

Whether you’ve been searching with Grep or looking at programs that can batch rename files for you, you’ve probably wondered if there was an easier way to get your job done. Thankfully, there is, and it’s called “regular expressions.”

无论您是在使用Grep搜索还是在寻找可以为您批量重命名文件的程序,您都可能想知道是否有更简单的方法来完成您的工作。 值得庆幸的是,它被称为“正则表达式”。

(Comic from XKCD.com)

(来自XKCD.com的漫画)

什么是正则表达式? (What are Regular Expressions?)

Regular expressions are statements formatted in a very specific way and that can stand for many different results. Also known as “regex” or “regexp,” they are primarily used in search and file naming functions. One regex can be used like a formula to create a number of different possible outputs, all of which are searched for. Alternatively, you can specify how a group of files should be named by specifying a regex, and your software can incrementally move to the next intended output. This way, you can rename multiple files in multiple folders very easily and efficiently, and you can move beyond the limitations of a simple numbering system.

正则表达式是以非常特定的方式格式化的语句,可以代表许多不同的结果。 也称为“ regex”或“ regexp”,它们主要用于搜索和文件命名功能。 一个正则表达式可以像公式一样使用,以创建许多不同的可能输出,所有这些输出都将被搜索。 或者,您可以通过指定正则表达式来指定应如何命名一组文件,然后您的软件可以逐步移至下一个预期的输出。 这样,您可以轻松高效地重命名多个文件夹中的多个文件,并且可以摆脱简单编号系统的限制。

Because the use of regular expressions relies on a special syntax, your program must be capable of reading and parsing them. Many batch file renaming programs for Windows and OS X have support for regexps, as well as the cross-platform searching tool GREP (which we touched on in our Bash Scripting for Beginners Guide) and the Awk command-line tool for *Nix. In addition, many alternative file managers, launchers, and searching tools use them, and they have a very important place in programming languages like Perl and Ruby. Other development environments like .NET, Java, and Python, as well as the upcoming C++ 11, all provide standard libraries for using regular expressions. As you can imagine, they can be really useful when trying to minimize the amount of code you put into a program.

因为正则表达式的使用依赖于特殊的语法,所以您的程序必须能够读取和解析它们。 许多适用于Windows和OS X的批处理文件重命名程序都支持正则表达式,以及跨平台搜索工具GREP(我们在Bash脚本入门指南中已涉及到该工具)和* Nix的Awk命令行工具。 此外,许多替代文件管理器,启动器和搜索工具都在使用它们,它们在Perl和Ruby等编程语言中占有非常重要的地位。 .NET,Java和Python等其他开发环境以及即将推出的C ++ 11都提供了使用正则表达式的标准库。 可以想像,在尽量减少放入程序中的代码量时,它们非常有用。

有关转义字符的注意事项 (A Note About Escaping Characters)

Before we show you with examples, we’d like to point something out. We’re going to be using the bash shell and the grep command to show you how to apply regular expressions. The problem is that sometimes we want to use special characters that need to be passed to grep, and the bash shell will interpret that character because the shell uses it as well. In these circumstances, we need to “escape” these characters. This can get confusing because this “escaping” of characters also occurs inside regexps. For example, if we want to enter this into grep:

在向您展示示例之前,我们想指出一点。 我们将使用bash shell和grep命令向您展示如何应用正则表达式。 问题是有时候我们想使用需要传递给grep的特殊字符,而bash shell也会解释该字符,因为该shell也使用该字符。 在这种情况下,我们需要“转义”这些字符。 这可能会引起混淆,因为这种字符的“转义”也发生在正则表达式内。 例如,如果我们要将其输入到grep中:

\<

\ <

we’ll have to replace that with:

我们将不得不替换为:

\\\<

\\\ <

Each special character here gets one backslash. Alternatively, you can also use single quotes:

这里的每个特殊字符都有一个反斜杠。 另外,您也可以使用单引号:

‘\<’

'\ <'

Single quotes tell bash NOT to interpret what’s inside of them. While we require these steps to be taken so we can demonstrate for you, your programs (especially GUI-based ones) often won’t require these extra steps. To keep things simple and straightforward, the actual regular expression will be given to you as quoted text, and you’ll see the escaped syntax in the command-line screenshots.

单引号告诉bash不能解释其中的内容。 尽管我们要求采取这些步骤以便为您演示,但是您的程序(尤其是基于GUI的程序)通常不需要这些额外的步骤。 为了使事情简单明了,实际的正则表达式将作为带引号的文本提供给您,并且您将在命令行屏幕截图中看到转义的语法。

它们如何扩展? (How Do They Expand?)

Regexps are a really concise way of stating terms so that your computer can expand them into multiple options. Let’s take a look at the following example:

正则表达式是一种非常简洁的陈述术语的方式,因此您的计算机可以将它们扩展为多个选项。 让我们看下面的例子:

tom[0123456789]

汤姆[0123456789]

The square brackets – [ and ] – tell the parsing engine that whatever is inside, any ONE character may be used to match. Whatever is inside those brackets is called a character set.

方括号[和] –告诉解析引擎,其中的任何内容都可以使用任何一个字符进行匹配。 这些括号内的内容称为字符集。

So, if we had a huge list of entries and we used this regex to search, the following terms would be matched:

因此,如果我们有大量条目,并且使用此正则表达式进行搜索,则将匹配以下术语:

  • tom

    汤姆
  • tom0

    tom0
  • tom1

    汤姆1
  • tom2

    汤姆2
  • tom3

    汤姆3

and so on. However, the following list would NOT be matched, and so would NOT show up in your results:

等等。 但是,以下列表将不匹配,因此不会出现在您的结果中:

  • tomato ; the regex does not account for any letters after “tom”

    番茄 ; 正则表达式不解释“ tom”之后的任何字母
  • Tom ; the regex is case sensitive!

    汤姆; 正则表达式区分大小写!

You can also choose to search with a period (.) which will allow any character present, as long as there is a character present.

您也可以选择使用句点(。)进行搜索,只要存在一个字符,该字符就可以显示任何字符。

reg vs period

As you can see, grepping with

如您所见,

.tom

.tom

did not bring up terms that only had “tom” at the beginning. Even “green tomatoes” came in, because the space before “tom” counts as a character, but terms like “tomF” did not have a character at the beginning and were thus ignored.

没有提出一开始只带有“ tom”的术语。 甚至“绿色西红柿”也出现了,因为“ tom”之前的空格被视为一个字符,但是诸如“ tomF”之类的词开头没有字符,因此被忽略。

Note: Grep’s default behavior is to return a whole line of text when some part matches your regex. Other programs may not do this, and you can turn this off in grep with the ‘-o’ flag.

注意:Grep的默认行为是当某些部分与您的正则表达式匹配时返回整行文本。 其他程序可能不会执行此操作,您可以在grep中使用“ -o”标志将其关闭。

You can also specify alternation using a pipe (|), like here:

您也可以使用竖线(|)指定交替,例如:

speciali(s|z)e

特殊(s | z)e

This will find both:

这将找到两个:

  • specialise

    专攻
  • specialize

    专攻

When using the grep command, we need to escape the special characters (, |, and ) with backslashes as well as utilize the ‘-E’ flag to get this to work and avoid ugly errors.

使用grep命令时,我们需要使用反斜杠转义特殊字符(,|和),并利用'-E'标志使其正常工作并避免难看的错误。

escape paren pipe

As we mentioned above, this is because we need to tell the bash shell to pass these characters to grep and not to do anything with them. The ‘-E’ flag tells grep to use the parentheses and pipe as special characters.

正如我们上面提到的,这是因为我们需要告诉bash shell将这些字符传递给grep,而不对它们做任何事情。 '-E'标志告诉grep使用括号和竖线作为特殊字符。

You can search by exclusion using a caret that is both inside of your square brackets and at the beginning of a set:

您可以使用方括号内和集合开头的插入号进行排除搜索:

tom[^F|0-9]

汤姆[^ F | 0-9]

Again, if you’re using grep and bash, remember to escape that pipe!

同样,如果您使用的是grep和bash,请记住逃脱该管道!

caret

Terms that were in the list but did NOT show up are:

列表中未显示的术语是:

  • tom0

    tom0
  • tom5

    汤姆5
  • tom9

    汤姆9
  • tomF

    汤姆

These did not match our regex.

这些与我们的正则表达式不匹配。

如何利用环境? (How Can I Utilize Environments?)

Often, we search based on boundaries. Sometimes we only want strings that appear at the beginning of a word, at the end of a word, or at the end of a line of code. This is can be easily done using what we call anchors.

通常,我们基于边界进行搜索。 有时我们只希望出现在单词开头,单词结尾或代码行结尾的字符串。 使用我们所谓的锚点,很容易做到这一点。

Using a caret (outside of brackets) allows you to designate the “beginning” of a line.

使用尖号(在括号之外)可以指定线的“开始”。

^tom

^ tom

beg of line

To search for the end of a line, use the dollar sign.

要搜索行尾,请使用美元符号。

tom$

汤姆$

end of line

You can see that our search string comes BEFORE the anchor in this case.

在这种情况下,您可以看到我们的搜索字符串位于锚点之前。

You can also for matches that appear at the beginning or end of words, not whole lines.

您还可以针对出现在单词开头或结尾而不是整行的匹配项。

\<tom

\ <汤姆

tom\>

汤姆\>

beg of word
end of word

As we mentioned in the note at the beginning of this article, we need to escape these special characters because we’re using bash. Alternatively, you can also use single quotes:

正如我们在本文开头的注释中所提到的,我们需要转义这些特殊字符,因为我们正在使用bash。 另外,您也可以使用单引号:

beg of word q
end of word q

The results are the same. Make sure you use single quotes, and not double quotes.

结果是一样的。 确保使用单引号而不是双引号。

高级正则表达式的其他资源 (Other Resources For Advanced Regexps)

We’ve only hit the tip of the iceberg here. You can also search for money terms delineated by the currency marker, and search for any of three or more matching terms. Things can get really complicated. If you’re interested in learning more about regular expressions, then please take a look at the following sources.

我们只是在这里碰到了冰山一角。 您还可以搜索货币标记所描绘的货币术语,并搜索三个或更多匹配术语中的任何一个。 事情可能变得非常复杂。 如果您想了解有关正则表达式的更多信息,请查看以下资源。

  • Zytrax.com has a few pages with specific examples of why things do and do not match.

    Zytrax.com上有一些页面,其中包含特定的示例,说明了事情为何如此与不匹配。

  • Regular-Expressions.info also has a killer guide to a lot of the more advanced stuff, as well as a handy reference page.

    Regular-Expressions.info还提供了许多高级内容的杀手guide,以及方便的参考页。

  • Gnu.org has a page dedicated to using regexps with grep.

    Gnu.org有一个页面专门用于将正则表达式与grep一起使用。

You can also build and test out your regular expressions using a free Flash-based online tool called RegExr. It works as you type, is free, and can be used in most browsers.

您还可以使用免费的基于Flash的在线工具RegExr来构建和测试正则表达式。 它可以在您键入时工作,是免费的,并且可以在大多数浏览器中使用。



Do you have a favorite use for regular expressions? Know of a great batch renamer that uses them? Maybe you just want to brag about your grep-fu. Contribute your thoughts by commenting!

您对正则表达式有偏爱吗? 知道使用它们的批量重命名器吗? 也许您只是想炫耀自己的grep-fu。 通过评论发表您的想法!

翻译自: https://www.howtogeek.com/69451/how-to-use-basic-regular-expressions-to-search-better-and-save-time/

正则表达式基本符号

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值