我们其他人的正则表达式

Sooner or later you'll run across a regular expression. With their cryptic syntax, confusing documentation and massive learning curve, most developers settle for copying and pasting them from StackOverflow and hoping they work. But what if you could decode regular expressions and harness their power? In this article, I'll show you why you should take a second look at regular expressions, and how you can use them in the real world.

迟早您将遇到一个正则表达式。 凭借其神秘的语法,令人困惑的文档和庞大的学习曲线,大多数开发人员都愿意从StackOverflow复制和粘贴它们,并希望它们能正常工作。 但是,如果您可以解码正则表达式并利用它们的功能怎么办? 在本文中,我将向您展示为什么您应该再看一下正则表达式,以及如何在现实世界中使用它们。

为什么使用正则表达式? (Why Regular Expressions?)

Why bother with regular expressions at all? Why should you care?

为什么要烦恼正则表达式呢? 你为什么要在乎呢?

  • Matching: Regular expressions are great at determining if a string matches some format, such as a phone number, email or credit card number.

    匹配 :正则表达式非常适合确定字符串是否与某种格式匹配,例如电话号码,电子邮件或信用卡号。

  • Replacement: Regular expressions make it easy to find and replace patterns in a string. For example, text.replace(/\s+/g, " ") replaces all chunks of whitespace in text, such as " \n\t ", with a single space.

    替换 :正则表达式使查找和替换字符串中的模式变得容易。 例如, text.replace(/\s+/g, " ")用单个空格替换text所有空白,例如" \n\t "

  • Extraction: It's easy to extract pieces of information from a pattern with regular expressions. For example, name.matches(/^(Mr|Ms|Mrs|Dr)\.?\s/i)[1] extracts a person's title from a string, such as "Mr" from "Mr. Schropp".

    提取 :使用正则表达式很容易从模式中提取信息。 例如, name.matches(/^(Mr|Ms|Mrs|Dr)\.?\s/i)[1]从字符串中提取一个人的头衔,如"Mr" ,从"Mr. Schropp"

  • Portability: Almost every major language has a regular expression library. The syntax is mostly standardized, so you don't have to worry about relearning regexes when you switch languages.

    可移植性 :几乎每种主要语言都有一个正则表达式库。 语法大部分是标准化的,因此您在切换语言时不必担心重新学习正则表达式。

  • Coding: When writing code, you can use regular expressions to search through files with tools such as find and replace in Atom or ack in the command line.

    编码 :编写代码时,您可以使用正则表达式通过诸如Atom中的find和replace或命令行中的ack之类的工具搜索文件。

  • Clear and Concise: If you're comfortable with regular expressions, you can perform some pretty tricky operations with a very small amount of code.

    简洁 :如果您对正则表达式感到满意,则可以用很少的代码执行一些非常棘手的操作。

  • Fame and Glory: Regular expressions will give you superpowers.

    名誉与荣耀 :正则表达式将赋予您超能力

如何编写正则表达式 (How to Write Regular Expressions)

The best way to learn regular expressions is by using an example. Let's say you're building a web page with a phone number input. Because you're a rockstar developer, you decide to display a checkmark when the phone number is valid and an X when it's invalid.

学习正则表达式的最佳方法是使用示例。 假设您正在建立一个带有电话号码输入的网页。 因为您是Rockstar开发人员,所以您决定在电话号码有效时显示对号,在电话号码无效时显示X。

See the Pen Regular Expression Demo by Landon Schropp (@LandonSchropp) on CodePen.

见笔正则表达式演示由兰登Schropp( @LandonSchropp )上CodePen

<input id="phone-number" type="text">
<label class="valid" for="phone-number"><img src="check.svg"></label>
<label class="invalid" for="phone-number"><img src="x.svg"></label>


input:not([data-validation="valid"]) ~ label.valid,
input:not([data-validation="invalid"]) ~ label.invalid {
  display: none;
}


$("input").on("input blur", function(event) {
  if (isPhoneNumber($(this).val())) {
    $(this).attr({ "data-validation": "valid" });
    return;
  }

  if (event.type == "blur") {
    $(this).attr({ "data-validation": "invalid" });
  }
  else {
    $(this).removeAttr("data-validation");
  }
});


With the above code, whenever a person types or pastes a valid number into the input, the check image is displayed. When the user blurs the input and the value is invalid, the error X is displayed.

使用以上代码,每当有人在输入中键入或粘贴有效数字时,就会显示支票图像。 当用户模糊输入并且值无效时,将显示错误X。

Since you know that phone numbers are made up of ten digits, your first pass at isPhoneNumber looks like this:

由于您知道电话号码是由十位数字组成的,因此您在isPhoneNumber的首次通过如下所示:

function isPhoneNumber(string) {
  return /\d\d\d\d\d\d\d\d\d\d/.test(string);
}


This function contains a regular expression between the / characters with ten \d's, or digit characters. The test method returns true if the regex matches the string and false if it doesn't. If you run isPhoneNumber("5558675309"), it returns true! Woohoo!

此函数在带有十个\d/字符或数字字符之间包含一个正则表达式。 如果正则表达式匹配字符串,则test方法返回true;否则,返回false。 如果运行isPhoneNumber("5558675309") ,则返回true ! hoo!

However, writing ten \d's is little redundant. Luckily, you can use the curly braces to accomplish the same thing.

但是,写十个\d几乎没有多余。 幸运的是,您可以使用花括号来完成同一件事。

function isPhoneNumber(string) {
  return /\d{10}/.test(string);
}


Sometimes, when people type in phone numbers, they start with a leading 1. Wouldn't it be nice if your regex could handle those cases? You can with the ? character!

有时,当人们输入电话号码时,他们以1开头。 如果您的正则表达式可以处理这些情况,那不是很好吗? 你可以用? 字符!

function isPhoneNumber(string) {
  return /1?\d{10}/.test(string);
}


The ? symbol means zero or one, so now isPhoneNumber returns true for both "5558675309" and "15558675309"!

? symbol表示零或一,因此isPhoneNumber对于"5558675309""15558675309"均返回true

So far, isPhoneNumber is pretty good, but you're missing one key thing: regexes are more than happy to match parts of a string. As it stands, isPhoneNumber("555555555555555555") returns true because that string contains ten numbers. You can fix this problem by using the ^ and $ anchors.

到目前为止, isPhoneNumber相当不错,但是您缺少一件事:正则表达式非常乐于匹配字符串的各个部分。 就目前而言, isPhoneNumber("555555555555555555")返回true,因为该字符串包含十个数字。 您可以使用^$锚点解决此问题。

function isPhoneNumber(string) {
  return /^1?\d{10}$/.test(string);
}


Roughly, ^ matches the beginning of the string and $ matches the end, so now your regex will match the whole phone number.

大致来说, ^匹配字符串的开头, $匹配字符串的结尾,因此现在您的正则表达式将匹配整个电话号码。

变得认真 (Getting Serious)

You released your page, and it's a smashing success, but there's one major problem. In the U.S., there are many common ways to write a phone number:

您发布了页面,这是非常成功的,但是存在一个主要问题。 在美国,有许多常见的写电话号码的方法:

  • (234) 567-8901

    (234) 567-8901

  • 234-567-8901

    234-567-8901

  • 234.567.8901

    234.567.8901

  • 234/567-8901

    234/567-8901

  • 234 567 8901

    234 567 8901

  • +1 (234) 567-8901

    +1 (234) 567-8901

  • 1-234-567-8901

    1-234-567-8901

While your users could leave out the punctuation, it's much easier for them to type out a formatted number.

尽管您的用户可以省略标点符号,但他们键入格式化的数字要容易得多。

While you could write a regular expression to handle all of those formats, it's probably a bad idea. Even if you nail every format in this list, it's very easy to miss one. Besides, you really only care about the data, not how it's formatted. So, instead of worrying about punctuation, why not strip it out?

虽然您可以编写正则表达式来处理所有这些格式,但这可能不是一个好主意。 即使您钉牢此列表中的每种格式,也很容易错过一种格式。 此外, 您实际上只关心数据,而不关心数据的格式 。 因此,为什么不用担心标点符号,却不去掉标点符号呢?

function isPhoneNumber(string) {
  return /^1?\d{10}$/.test(string.replace(/\D/g, ""));
}


The replace function is replacing the \D character, which matches any non-digit characters, with an empty string. The g, or global flag, tells the function to replace all matches to the regular expression instead of just the first.

replace功能是将\D字符(与所有非数字字符匹配)替换为空字符串。 g或全局标志告诉函数将所有匹配项替换为正则表达式,而不仅仅是第一个。

变得更加认真 (Getting Even More Serious)

Everybody loves your phone number page, and you're the king of the water cooler at work. However, being the pro that you are, you want to take things one step further.

每个人都喜欢您的电话号码页面,而您是工作中的饮水机之王。 但是,作为您的专业人士,您想进一步迈进。

The North American Numbering Plan is the phone number standard used in the U.S., Canada, and twenty-three other countries. This system has a few simple rules:

北美编号计划是在美国,加拿大和其他23个国家/地区使用的电话号码标准。 该系统有一些简单的规则:

  1. A phone number ((234) 567-8901) is broken up into three pieces: The area code (234), the exchange code (567) and the subscriber number (8901).

    电话号码(234) 567-8901分为三部分:区号( 234 ),交换码( 567 )和用户号码( 8901 )。

  2. For the area code and exchange code, the first digit can be 2 through 9 and the second and third digits can be 0 through 9.

    对于区号和交换码,第一位数字可以是29 ,第二和第三位数字可以是09

  3. The exchange code cannot have 1 as the third digit if 1 is also the second digit.

    如果1也是第二位,则交换码不能以1作为第三位。

Your regex already works for the first rule, but it breaks the second and third. For now, let's only worry about the second rule. The new regular expression needs to look something like the following:

您的正则表达式已经适用于第一条规则,但它破坏了第二条和第三条规则。 现在,让我们只担心第二条规则。 新的正则表达式需要类似于以下内容:

/^1?<AREA CODE><EXCHANGE CODE><SUBSCRIBER NUMBER>$/


The subscriber number is easy; it's four digits.

订户号码很简单; 这是四位数。

/^1?<AREA CODE><EXCHANGE CODE>\d{4}$/


The area code is a little tricker. You need a number between 2 and 9, followed by two digits. To accomplish that, you can use a character set! A character set lets you specify a group of characters to choose from.

区号有点麻烦。 您需要一个介于29之间的数字,后跟两位数字。 为此,您可以使用字符集! 字符集使您可以指定一组字符以供选择。

/^1?[23456789]\d\d<EXCHANGE CODE>\d{4}$/


That's great, but it's annoying to type out all the characters between 2 and 9. Clean it up with a character range.

很好,但是输入29之间的所有字符很烦人。 用字符范围清理它。

/^1?[2-9]\d\d<EXCHANGE CODE>\d{4}$/


That's better! Since the exchange code is the same as the area code, you could duplicate your regex to finish off the number.

那更好! 由于交换代码与区号相同,因此您可以复制您的正则表达式以完成该号码。

/^1?[2-9]\d\d[2-9]\d\d\d{4}$/


But, wouldn't it be nice if you didn't have to copy and paste the area code section of your regex? You can simplify it up by using a group! Groups are formed by wrapping characters in parentheses.

但是,如果您不必复制并粘贴正则表达式的区号部分,那岂不是很好吗? 您可以通过使用组来简化它! 通过将字符括在括号中来形成组。

/^1?([2-9]\d\d){2}\d{4}$/


Now, [2-9]\d\d is contained in a group and {2} specifies that that group should occur twice.

现在, [2-9]\d\d包含在一个组中, {2}指定该组应该出现两次。

That's it! Here's what the final isPhoneNumber function looks like:

而已! 最终的isPhoneNumber函数如下所示:

function isPhoneNumber(string) {
  return /^1?([2-9]\d\d){2}\d{4}$/.test(string.replace(/\D/g, ""));
}


何时避免使用正则表达式 (When to Avoid Regular Expressions)

Regular expressions are great, but there's some problems you just shouldn't tackle with them.

正则表达式很棒,但是有些问题是您不应该使用它们解决的。

  • Don't be too strict. There's little value in being too strict with regular expressions. For phone numbers, even if we did match all of the rules in NANP, there's still no way to know if a phone number is real. If I rattled off the number (555) 555-5555, it matches the pattern but it's not a real phone number.

    不要太严格。 对正则表达式过于严格没有什么价值。 对于电话号码,即使我们确实符合NANP中的所有规则,也仍然无法知道电话号码是否真实。 如果我列举了数(555) 555-5555 ,它的模式匹配,但它不是一个真正的电话号码。

  • Don't write an HTML parser. While it's fine to use regexes to parse simple things, they're not useful for parsing entire languages. Without getting too technical, you're not going to have a good time parsing non-regular languages with regular expressions.

    不要编写HTML解析器。 虽然使用正则表达式解析简单的东西很好,但是它们对于解析整个语言没有用。 不用太技术性 ,您将不会有足够的时间解析带有正则表达式的非正则语言。

  • Don't use them for really complicated strings. The full regex for emails is 6,318 characters long. A simple, imperfect one looks like this: /^[^@]+@[^@]+\.[^@\.]+$/. As a general rule of thumb, if you regular expression is longer than a line of code, it might be time to look for another solution.

    不要将它们用于真正复杂的字符串。 电子邮件完整正则表达式长6,318个字符。 一个简单,不完善的代码看起来像这样:/ /^[^@]+@[^@]+\.[^@\.]+$/ .[ /^[^@]+@[^@]+\.[^@\.]+$/ .]+ /^[^@]+@[^@]+\.[^@\.]+$/ 。 根据一般经验,如果您的正则表达式长于一行代码,则可能是时候寻找其他解决方案了。

结语 (Wrapping Up)

In this article, you've learned when to use regular expressions and when to avoid them, and you've experienced the process of writing one. Hopefully regular expressions seem a bit less ominous, and maybe even intriguing. If you use a regex to solve a tricky problem, let me know in the comments!

在本文中,您了解了何时使用正则表达式以及何时避免使用正则表达式,并且还经历了编写正则表达式的过程。 希望正则表达式看起来不那么不祥,甚至有趣。 如果您使用正则表达式解决棘手的问题,请在评论中告诉我!

If you'd like to read more about regular expressions, check out the excellent MDN Regular Expressions Guide.

如果您想了解有关正则表达式的更多信息,请查看出色的《 MDN正则表达式指南》

翻译自: https://davidwalsh.name/regular-expressions-rest

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值