JavaScript正则表达式快速简单的指南

Interested in learning JavaScript? Get my ebook at jshandbook.com

有兴趣学习JavaScript吗? 在jshandbook.com上获取我的电子书

正则表达式简介 (Introduction to Regular Expressions)

A regular expression (also called regex for short) is a fast way to work with strings of text.

正则表达式(也简称为regex )是处理文本字符串的快速方法。

By formulating a regular expression with a special syntax, you can:

通过使用特殊语法制定正则表达式,您可以:

  • search for text in a string

    搜索字符串中的文本

  • replace substrings in a string

    替换字符串中的字符串

  • and extract information from a string

    并从字符串中提取信息

Almost every programming language features some implementation of regular expressions. There are small differences between each implementation, but the general concepts apply almost everywhere.

几乎每种编程语言都具有一些正则表达式的实现。 每种实现之间的差异很小,但是一般概念几乎适用于所有地方。

Regular Expressions date back to the 1950s, when they were formalized as a conceptual search pattern for string processing algorithms.

正则表达式的历史可以追溯到1950年代,当时正则表达式被正规化为字符串处理算法的概念搜索模式。

Implemented in UNIX tools like grep, sed, and in popular text editors, regexes grew in popularity. They were introduced into the Perl programming language, and later into many others as well.

在UNIX工具(例如grep,sed)和流行的文本编辑器中实现后,正则表达式越来越流行。 它们被引入Perl编程语言,后来也引入许多其他语言。

JavaScript, along with Perl, is one of the programming languages that has support for regular expressions directly built into the language.

JavaScript与Perl一起是一种编程语言,它支持直接内置在该语言中的正则表达式。

困难但有用 (Hard but useful)

Regular expressions can seem like absolute nonsense to the beginner, and many times to the professional developer as well, if you don’t invest the time necessary to understand them.

如果您不花时间去理解正则表达式,那么对于初学者来说,正则表达式似乎绝对是胡说八道,对于专业开发人员而言,正则表达式也是如此。

Cryptic regular expressions are hard to write, hard to read, and hard to maintain/modify.

隐秘的正则表达式很难编写难以阅读 ,并且难以维护/修改

But sometimes a regular expression is the only sane way to perform some string manipulation, so it’s a very valuable tool in your pocket.

但是有时,正则表达式是执行某些字符串操作的唯一明智的方法 ,因此它是您口袋中非常有价值的工具。

This tutorial aims to introduce you to JavaScript Regular Expressions in a simple way, and to give you all the information to read and create regular expressions.

本教程旨在以简单的方式向您介绍JavaScript正则表达式,并为您提供阅读和创建正则表达式的所有信息。

The rule of thumb is that simple regular expressions are simple to read and write, while complex regular expressions can quickly turn into a mess if you don’t deeply grasp the basics.

经验法则是, 简单的正则表达式易于 读写 ,而如果您不太了解基本知识,则复杂的正则表达式会很快变成一团糟

正则表达式是什么样的? (What does a Regular Expression look like?)

In JavaScript, a regular expression is an object, which can be defined in two ways.

在JavaScript中,正则表达式是object ,可以通过两种方式定义。

The first is by instantiating a new RegExp object using the constructor:

首先是通过使用构造函数实例化新的RegExp对象

const re1 = new RegExp('hey')

The second is using the regular expression literal form:

第二种是使用正则表达式文字形式:

const re1 = /hey/

You know that JavaScript has object literals and array literals? It also has regex literals.

您知道JavaScript有对象文字数组文字吗? 它还具有正则表达式文字

In the example above, hey is called the pattern. In the literal form it’s delimited by forward slashes, while with the object constructor, it’s not.

在上面的示例中, hey被称为pattern 。 在文字形式中,它由正斜杠定界,而在对象构造函数中则不是。

This is the first important difference between the two forms, but we’ll see others later.

这是这两种形式之间的第一个重要区别,但稍后我们将介绍其他形式。

它是如何工作的? (How does it work?)

The regular expression we defined as re1 above is a very simple one. It searches the string hey, without any limitation. The string can contain lots of text, and hey in the middle, and the regex is satisfied. It could also contain just hey, and the regex would be satisfied as well.

我们在上面定义为re1的正则表达式是一个非常简单的表达式。 它搜索字符串hey ,没有任何限制。 该字符串可以包含很多文本,中间是hey ,并且满足正则表达式。 它也可能只包含hey ,并且正则表达式也将得到满足。

That’s pretty simple.

那很简单。

You can test the regex using RegExp.test(String), which returns a boolean:

您可以使用RegExp.test(String)来测试正则表达式,它返回一个布尔值:

re1.test('hey') //✅ re1.test('blablabla hey blablabla') //✅ re1.test('he') //❌ re1.test('blablabla') //❌

In the above example, we just checked if "hey" satisfies the regular expression pattern stored in re1.

在上面的示例中,我们只是检查"hey"满足存储在re1的正则表达式模式。

This is the simplest it can be, but now you already know lots of concepts about regexes.

这可能是最简单的,但是现在您已经了解了许多有关正则表达式的概念。

锚定 (Anchoring)

/hey/

matches hey wherever it was put inside the string.

匹配hey无论它放在字符串中的什么位置。

If you want to match strings that start with hey, use the ^ operator:

如果要匹配以hey 开头的字符串,请使用^运算符:

/^hey/.test('hey') //✅ /^hey/.test('bla hey') //❌

If you want to match strings that end with hey, use the $ operator:

如果要匹配以hey 结尾的字符串,请使用$运算符:

/hey$/.test('hey') //✅ /hey$/.test('bla hey') //✅ /hey$/.test('hey you') //❌

Combine those, and match strings that exactly match hey, and just that string:

合并这些,并匹配与hey完全匹配的字符串,然后匹配该字符串:

/^hey$/.test('hey') //✅

To match a string that starts with a substring and ends with another, you can use .*, which matches any character repeated 0 or more times:

要匹配以子字符串开头和以子字符串结尾的字符串,可以使用.* ,它匹配重复0次或多次的任何字符:

/^hey.*joe$/.test('hey joe') //✅ /^hey.*joe$/.test('heyjoe') //✅ /^hey.*joe$/.test('hey how are you joe') //✅ /^hey.*joe$/.test('hey joe!') //❌

匹配范围内的项目 (Match items in ranges)

Instead of matching a particular string, you can choose to match any character in a range, like:

您可以选择匹配范围内的任何字符,而不是匹配特定的字符串,例如:

/[a-z]/ //a, b, c, ... , x, y, z /[A-Z]/ //A, B, C, ... , X, Y, Z /[a-c]/ //a, b, c /[0-9]/ //0, 1, 2, 3, ... , 8, 9

These regexes match strings that contain at least one of the characters in those ranges:

这些正则表达式匹配包含以下范围内至少一个字符的字符串:

/[a-z]/.test('a') //✅ /[a-z]/.test('1') //❌ /[a-z]/.test('A') //❌ /[a-c]/.test('d') //❌ /[a-c]/.test('dc') //✅

Ranges can be combined:

范围可以组合:

/[A-Za-z0-9]/
/[A-Za-z0-9]/.test('a') //✅ /[A-Za-z0-9]/.test('1') //✅ /[A-Za-z0-9]/.test('A') //✅

多次匹配范围项 (Matching a range item multiple times)

You can check if a string contains one and only one character in a range by using the - char:

您可以检查是否字符串包含一个且只有一个在一个范围内使用字符-字符:

/^[A-Za-z0-9]$/
/^[A-Za-z0-9]$/.test('A') //✅ /^[A-Za-z0-9]$/.test('Ab') //❌

否定模式 (Negating a pattern)

The ^ character at the beginning of a pattern anchors it to the beginning of a string.

模式开头的^字符会将其锚定到字符串的开头。

Used inside a range, it negates it, so:

在范围内使用时,它会否定它,因此:

/[^A-Za-z0-9]/.test('a') //❌ /[^A-Za-z0-9]/.test('1') //❌ /[^A-Za-z0-9]/.test('A') //❌ /[^A-Za-z0-9]/.test('@') //✅
  • \d matches any digit, equivalent to [0-9]

    \d匹配任何数字,等于[0-9]

  • \D matches any character that’s not a digit, equivalent to [^0-9]

    \D匹配任何不是数字的字符,等效于[^0-9]

  • \w matches any alphanumeric character, equivalent to [A-Za-z0-9]

    \w匹配任何字母数字字符,等效于[A-Za-z0-9]

  • \W matches any non-alphanumeric character, equivalent to [^A-Za-z0-9]

    \W匹配任何非字母数字字符,等效于[^A-Za-z0-9]

  • \s matches any whitespace character: spaces, tabs, newlines and Unicode spaces

    \s匹配任何空白字符:空格,制表符,换行符和Unicode空格

  • \S matches any character that’s not a whitespace

    \S匹配任何非空格字符

  • \0 matches null

    \0匹配null

  • \n matches a newline character

    \n匹配换行符

  • \t matches a tab character

    \t匹配制表符

  • \uXXXX matches a unicode character with code XXXX (requires the u flag)

    \uXXXX将一个Unicode字符与代码XXXX匹配(需要u标志)

  • . matches any character that is not a newline char (e.g. \n) (unless you use the s flag, explained later on)

    . 匹配不是换行符的任何字符(例如\n )(除非您使用s标志,稍后再解释)

  • [^] matches any character, including newline characters. It’s useful on multiline strings.

    [^]匹配任何字符,包括换行符。 在多行字符串上很有用。

正则表达式选择 (Regular expression choices)

If you want to search one string or another, use the | operator.

如果要搜索一个另一个字符串,请使用| 操作员。

/hey|ho/.test('hey') //✅ /hey|ho/.test('ho') //✅

量词 (Quantifiers)

Say you have this regex that checks if a string has one digit in it, and nothing else:

假设您有这个正则表达式,用于检查字符串中是否包含一位数字,而没有其他内容:

/^\d$/

You can use the ? quantifier to make it optional, thus requiring zero or one:

您可以使用? 量词以使其为可选,因此需要零或一:

/^\d?$/

but what if you want to match multiple digits?

但是如果要匹配多个数字怎么办?

You can do it in 4 ways, using +, *, {n} and {n,m}. Let’s look at these one by one.

您可以使用+*{n}{n,m}四种方式来实现。 让我们一一看一下。

+ (+)

Match one or more (>=1) items

匹配一个或多个(> = 1)项目

/^\d+$/
/^\d+$/.test('12') //✅ /^\d+$/.test('14') //✅ /^\d+$/.test('144343') //✅ /^\d+$/.test('') //❌ /^\d+$/.test('1a') //❌

* (*)

Match 0 or more (>= 0) items

匹配0个或更多(> = 0)项目

/^\d+$/
/^\d*$/.test('12') //✅ /^\d*$/.test('14') //✅ /^\d*$/.test('144343') //✅ /^\d*$/.test('') //✅ /^\d*$/.test('1a') //❌

{n} ({n})

Match exactly n items

完全匹配n项目

/^\d{3}$/
/^\d{3}$/.test('123') //✅ /^\d{3}$/.test('12') //❌ /^\d{3}$/.test('1234') //❌ /^[A-Za-z0-9]{3}$/.test('Abc') //✅

{n,m} ({n,m})

Match between n and m times:

nm次之间匹配:

/^\d{3,5}$/
/^\d{3,5}$/.test('123') //✅ /^\d{3,5}$/.test('1234') //✅ /^\d{3,5}$/.test('12345') //✅ /^\d{3,5}$/.test('123456') //❌

m can be omitted to have an open ending, so you have at least n items:

m可以省略以具有一个开放的结尾,因此您至少有n项目:

/^\d{3,}$/
/^\d{3,}$/.test('12') //❌ /^\d{3,}$/.test('123') //✅ /^\d{3,}$/.test('12345') //✅ /^\d{3,}$/.test('123456789') //✅

可选项目 (Optional items)

Following an item with ? makes it optional:

以下项目带有? 使它成为可选的:

/^\d{3}\w?$/
/^\d{3}\w?$/.test('123') //✅ /^\d{3}\w?$/.test('123a') //✅ /^\d{3}\w?$/.test('123ab') //❌

团体 (Groups)

Using parentheses, you can create groups of characters: (...)

使用括号可以创建字符组: (...)

This example matches exactly 3 digits followed by one or more alphanumeric characters:

本示例完全匹配3个数字,后跟一个或多个字母数字字符:

/^(\d{3})(\w+)$/
/^(\d{3})(\w+)$/.test('123') //❌ /^(\d{3})(\w+)$/.test('123s') //✅ /^(\d{3})(\w+)$/.test('123something') //✅ /^(\d{3})(\w+)$/.test('1234') //✅

Repetition characters put after a group closing parentheses refer to the whole group:

分组结束括号后的重复字符是指整个分组:

/^(\d{2})+$/
/^(\d{2})+$/.test('12') //✅ /^(\d{2})+$/.test('123') //❌ /^(\d{2})+$/.test('1234') //✅

捕获组 (Capturing groups)

So far, we’ve seen how to test strings and check if they contain a certain pattern.

到目前为止,我们已经看到了如何测试字符串并检查它们是否包含特定模式。

A very cool feature of regular expressions is the ability to capture parts of a string, and put them into an array.

正则表达式的一个非常酷的功能是能够捕获字符串的各个部分 ,并将它们放入数组中。

You can do so using Groups, and in particular Capturing Groups.

您可以使用“组”,尤其是“ 捕获组”来执行此操作。

By default, a Group is a Capturing Group. Now, instead of using RegExp.test(String), which just returns a boolean if the pattern is satisfied, we use either String.match(RegExp) or RegExp.exec(String).

默认情况下,组是捕获组。 现在,我们使用String.match(RegExp)RegExp.exec(String) ,而不是使用RegExp.test(String)如果满足模式则仅返回布尔值RegExp.exec(String)

They are exactly the same, and return an Array with the whole matched string in the first item, then each matched group content.

它们是完全相同的,并返回一个数组,该数组的第一行是整个匹配的字符串,然后是每个匹配的组内容。

If there is no match, it returns null:

如果不匹配,则返回null

'123s'.match(/^(\d{3})(\w+)$/) //Array [ "123s", "123", "s" ]
/^(\d{3})(\w+)$/.exec('123s') //Array [ "123s", "123", "s" ]
'hey'.match(/(hey|ho)/) //Array [ "hey", "hey" ]
/(hey|ho)/.exec('hey') //Array [ "hey", "hey" ]
/(hey|ho)/.exec('ha!') //null

When a group is matched multiple times, only the last match is put in the result array:

当一个组被多次匹配时,只有最后一个匹配项被放入结果数组中:

'123456789'.match(/(\d)+/) //Array [ "123456789", "9" ]

可选组 (Optional groups)

A capturing group can be made optional by using (...)?. If it’s not found, the resulting array slot will contain undefined:

可以使用(...)?将捕获组设为可选组(...)? 。 如果找不到,则结果数组插槽将包含undefined

/^(\d{3})(\s)?(\w+)$/.exec('123 s') //Array [ "123 s", "123", " ", "s" ]
/^(\d{3})(\s)?(\w+)$/.exec('123s') //Array [ "123s", "123", undefined, "s" ]

参考匹配组 (Reference matched groups)

Every group that’s matched is assigned a number. $1 refers to the first, $2 to the second, and so on. This will be useful when we talk later on about replacing parts of a string.

每个匹配的组都会分配一个数字。 $1指向第一个, $2指向第二个,依此类推。 当我们稍后讨论替换字符串的部分时,这将很有用。

命名捕获组 (Named capturing groups)

This is a new ES2018 feature.

这是ES2018的新功能。

A group can be assigned to a name, rather than just being assigned a slot in the resulting array:

可以为一个组分配一个名称,而不仅仅是在结果数组中分配一个插槽:

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/ const result = re.exec('2015-01-02')
// result.groups.year === '2015'; // result.groups.month === '01'; // result.groups.day === '02';

在没有组的情况下使用match和exec (Using match and exec without groups)

There is a difference between using match and exec without groups: the first item in the array is not the whole matched string, but the match directly:

在不使用分组的情况下使用matchexec之间是有区别的:数组中的第一项不是整个匹配的字符串,而是直接匹配:

/hey|ho/.exec('hey') // [ "hey" ]
/(hey).(ho)/.exec('hey ho') // [ "hey ho", "hey", "ho" ]

非捕获组 (Noncapturing groups)

Since by default groups are Capturing Groups, you need a way to ignore some groups in the resulting array. This is possible using Noncapturing Groups, which start with a (?:...)

由于默认情况下组是捕获组,因此您需要一种方法来忽略结果数组中的某些组。 这可以使用非捕获组 (?:...)(?:...)开头

'123s'.match(/^(\d{3})(?:\s)(\w+)$/)//null
'123 s'.match(/^(\d{3})(?:\s)(\w+)$/) //Array [ "123 s", "123", "s" ]

标志 (Flags)

You can use the following flags on any regular expression:

您可以在任何正则表达式上使用以下标志:

  • g: matches the pattern multiple times

    g :多次匹配模式

  • i: makes the regex case insensitive

    i :使正则表达式不区分大小写

  • m: enables multiline mode. In this mode, ^ and $ match the start and end of the whole string. Without this, with multiline strings they match the beginning and end of each line.

    m :启用多行模式。 在此模式下, ^$匹配整个字符串的开始和结束。 否则,多行字符串将匹配每行的开头和结尾。

  • u: enables support for unicode (introduced in ES6/ES2015)

    u :启用对unicode的支持(在ES6 / ES2015中引入)

  • s: (new in ES2018) short for single line, it causes the . to match new line characters as well.

    s :( ES2018中的新增功能 ) 单行的简称,它会导致. 以匹配换行符。

Flags can be combined, and they are added at the end of the string in regex literals:

可以组合标志,并在正则表达式文字中的字符串末尾添加标志:

/hey/ig.test('HEy') //✅

or as the second parameter with RegExp object constructors:

或作为RegExp对象构造函数的第二个参数:

new RegExp('hey', 'ig').test('HEy') //✅

检查正则表达式 (Inspecting a regex)

Given a regex, you can inspect its properties:

给定一个正则表达式,您可以检查其属性:

  • source the pattern string

    source模式字符串

  • multiline true with the m flag

    m标志的multiline true

  • global true with the g flag

    g标志的global true

  • ignoreCase true with the i flag

    ignoreCasei标志一起为true

  • lastIndex

    lastIndex

/^(\w{3})$/i.source //"^(\\d{3})(\\w+)$" /^(\w{3})$/i.multiline //false /^(\w{3})$/i.lastIndex //0 /^(\w{3})$/i.ignoreCase //true /^(\w{3})$/i.global //false

转义 (Escaping)

These characters are special:

这些字符是特殊的:

  • \

    \

  • /

    /

  • [ ]

    [ ]

  • ( )

    ( )

  • { }

    { }

  • ?

    ?

  • +

    +

  • *

    *

  • |

    |

  • .

    .

  • ^

    ^

  • $

    $

They are special because they are control characters that have a meaning in the regular expression pattern. If you want to use them inside the pattern as matching characters, you need to escape them, by prepending a backslash:

它们之所以特别是因为它们是在正则表达式模式中具有含义的控制字符。 如果要在模式中将它们用作匹配字符,则需要通过在其前面加上反斜杠来对其进行转义:

/^\\$/ /^\^$/ // /^\^$/.test('^') ✅ /^\$$/ // /^\$$/.test('$') ✅

字符串边界 (String boundaries)

\b and \B let you inspect whether a string is at the beginning or at the end of a word:

\b\B让您检查字符串是在单词的开头还是结尾:

  • \b matches a set of characters at the beginning or end of a word

    \b匹配单词开头或结尾的一组字符

  • \B matches a set of characters not at the beginning or end of a word

    \B匹配不在单词开头或结尾的一组字符

Example:

例:

'I saw a bear'.match(/\bbear/) //Array ["bear"] 'I saw a beard'.match(/\bbear/) //Array ["bear"] 'I saw a beard'.match(/\bbear\b/) //null 'cool_bear'.match(/\bbear\b/) //null

使用正则表达式替换 (Replace, using Regular Expressions)

We already saw how to check if a string contains a pattern.

我们已经看到了如何检查字符串是否包含模式。

We also saw how to extract parts of a string to an array, matching a pattern.

我们还看到了如何将字符串的一部分提取到与模式匹配的数组中。

Let’s see how to replace parts of a string based on a pattern.

让我们看看如何根据模式替换字符串各个部分

The String object in JavaScript has a replace() method, which can be used without regular expressions to perform a single replacement on a string:

JavaScript中的String对象具有replace()方法,无需使用正则表达式就可以对字符串执行单个替换

"Hello world!".replace('world', 'dog') //Hello dog!
"My dog is a good dog!".replace('dog', 'cat') //My cat is a good dog!

This method also accepts a regular expression as argument:

此方法还接受正则表达式作为参数:

"Hello world!".replace(/world/, 'dog') //Hello dog!

Using the g flag is the only way to replace multiple occurrences in a string in vanilla JavaScript:

使用g标志是替换香草JavaScript字符串中多次出现的唯一方法

"My dog is a good dog!".replace(/dog/g, 'cat') //My cat is a good cat!

Groups let us do more fancy things, like moving around parts of a string:

组让我们做更多更有趣的事情,例如在字符串的各个部分之间移动:

"Hello, world!".replace(/(\w+), (\w+)!/, '$2: $1!!!') // "world: Hello!!!"

Instead of using a string you can use a function, to do even fancier things. It will receive a number of arguments like the one returned by String.match(RegExp) or RegExp.exec(String), with a number of arguments that depends on the number of groups:

除了使用字符串,您还可以使用函数来执行更出色的操作。 它将收到许多参数,例如String.match(RegExp)RegExp.exec(String)返回的参数,其中的参数取决于组的数量:

"Hello, world!".replace(/(\w+), (\w+)!/, (matchedString, first, second) => {   console.log(first);   console.log(second);
return `${second.toUpperCase()}: ${first}!!!` })
//"WORLD: Hello!!!"

贪婪 (Greediness)

Regular expressions are said to be greedy by default.

正则表达式默认说是贪婪的。

What does it mean?

这是什么意思?

Take this regex:

使用此正则表达式:

/\$(.+)\s?/

It is supposed to extract a dollar amount from a string:

应该从字符串中提取美元金额:

/\$(.+)\s?/.exec('This costs $100')[1] //100

but if we have more words after the number, it freaks out:

但是如果我们在数字后再加上一些字,它就会吓到了:

/\$(.+)\s?/.exec('This costs $100 and it is less than $200')[1] //100 and it is less than $200

Why? Because the regex after the $ sign matches any character with .+, and it won’t stop until it reaches the end of the string. Then, it finishes off because \s? makes the ending space optional.

为什么? 因为$符号后的正则表达式与.+匹配任何字符,并且直到到达字符串末尾时它才会停止。 然后,它结束了,因为\s? 使结尾空间为可选。

To fix this, we need to tell the regex to be lazy, and perform the least amount of matching possible. We can do so using the ? symbol after the quantifier:

要解决此问题,我们需要告诉正则表达式是惰性的,并执行尽可能少的匹配。 我们可以使用? 量词后的符号:

/\$(.+?)\s/.exec('This costs $100 and it is less than $200')[1] //100

I removed the ? after \s . Otherwise it matched only the first number, since the space was optional

我删除了? \s 否则,它仅与第一个数字匹配,因为空格是可选的

So, ? means different things based on its position, because it can be both a quantifier and a lazy mode indicator.

那么, ? 根据其位置,意味着不同的事物,因为它既可以是量词又可以是惰性模式指示符。

前瞻:根据字符串匹配字符串 (Lookaheads: match a string depending on what follows it)

Use ?= to match a string that’s followed by a specific substring:

使用?=匹配后面跟特定子字符串的字符串:

/Roger(?=Waters)/
/Roger(?= Waters)/.test('Roger is my dog') //false /Roger(?= Waters)/.test('Roger is my dog and Roger Waters is a famous musician') //true

?! performs the inverse operation, matching if a string is not followed by a specific substring:

?! 执行逆运算,如果字符串后没有特定的子字符串则匹配:

/Roger(?!Waters)/
/Roger(?! Waters)/.test('Roger is my dog') //true /Roger(?! Waters)/.test('Roger Waters is a famous musician') //false

Lookbehinds:根据字符串的开头匹配字符串 (Lookbehinds: match a string depending on what precedes it)

This is an ES2018 feature.

这是ES2018的功能。

Lookaheads use the ?= symbol. Lookbehinds use ?&lt;=.

提前使用?=符号。 后视使用?&l t; =。

/(?<=Roger) Waters/
/(?<=Roger) Waters/.test('Pink Waters is my dog') //false
/(?<=Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //true

A lookbehind is negated using ?&lt;!:

使用?&l t ;!可以使后向否定:

/(?<!Roger) Waters/
/(?<!Roger) Waters/.test('Pink Waters is my dog') //true
/(?<!Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //false

正则表达式和Unicode (Regular expressions and Unicode)

The u flag is mandatory when working with Unicode strings. In particular, this applies when you might need to handle characters in astral planes (the ones that are not included in the first 1600 Unicode characters).

使用Unicode字符串时, u标志是必需的。 特别是,当您可能需要处理星体平面中的字符(前1600个Unicode字符中未包含的字符)时,这适用。

Emojis are a good example, but they’re not the only one.

表情符号是一个很好的例子,但并非唯一的表情符号。

If you don’t add that flag, this simple regex that should match one character will not work, because for JavaScript that emoji is represented internally by 2 characters (see Unicode in JavaScript):

如果不添加该标志,则此应匹配一个字符的简单正则表达式将不起作用,因为对于JavaScript,表情符号在内部由2个字符表示(请参见JavaScript中的Unicode ):

/^.$/.test('a') //✅ /^.$/.test('?') //❌ /^.$/u.test('?') //✅

So, always use the u flag.

So, always use the u flag.

Unicode, just like normal characters, handle ranges:

Unicode, just like normal characters, handle ranges:

/[a-z]/.test('a') //✅ /[1-9]/.test('1') //✅ /[?-?]/u.test('?') //✅ /[?-?]/u.test('?') //❌

JavaScript checks the internal code representation, so ? < ? < ? because \u1F436 < \u1F43A <; \u1F98A. Check the full Emoji list to get those codes, and to find out the order (tip: the macOS Emoji picker has some emojis in a mixed order, so don’t count on it).

JavaScript checks the internal code representation, so ? < ? < ? becau se \u1F 436 < \ u1F 43A < ; \u1F98A. C heck the full E moji list to get those codes, and to find out the order (tip: the macOS Emoji picker has some emojis in a mixed order, so don't count on it).

Unicode property escapes (Unicode property escapes)

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any character that’s not a white space, \w to match any alphanumeric character, and so on.

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any character that's not a white space, \w to match any alphanumeric character, and so on.

The Unicode property escapes is an ES2018 feature that introduces a very cool feature, extending this concept to all Unicode characters introducing \p{} and its negation \P{}.

The Unicode property escapes is an ES2018 feature that introduces a very cool feature, extending this concept to all Unicode characters introducing \p{} and its negation \P{} .

Any Unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that’s true for ASCII characters, and so on. You can put this property in the graph parentheses, and the regex will check for that to be true:

Any Unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that's true for ASCII characters, and so on. You can put this property in the graph parentheses, and the regex will check for that to be true:

/^\p{ASCII}+$/u.test('abc') //✅ /^\p{ASCII}+$/u.test('ABC@') //✅ /^\p{ASCII}+$/u.test('ABC?') //❌

ASCII_Hex_Digit is another boolean property that checks if the string only contains valid hexadecimal digits:

ASCII_Hex_Digit is another boolean property that checks if the string only contains valid hexadecimal digits:

/^\p{ASCII_Hex_Digit}+$/u.test('0123456789ABCDEF') //✅ /^\p{ASCII_Hex_Digit}+$/u.test('h') //❌

There are many other boolean properties, which you just check by adding their name in the graph parentheses, including Uppercase, Lowercase, White_Space, Alphabetic, Emoji and more:

There are many other boolean properties, which you just check by adding their name in the graph parentheses, including Uppercase , Lowercase , White_Space , Alphabetic , Emoji and more:

/^\p{Lowercase}$/u.test('h') //✅ /^\p{Uppercase}$/u.test('H') //✅
/^\p{Emoji}+$/u.test('H') //❌ /^\p{Emoji}+$/u.test('??') //✅

In addition to those binary properties, you can check any of the unicode character properties to match a specific value. In this example, I check if the string is written in the Greek or Latin alphabet:

In addition to those binary properties, you can check any of the unicode character properties to match a specific value. In this example, I check if the string is written in the Greek or Latin alphabet:

/^\p{Script=Greek}+$/u.test('ελληνικά') //✅ /^\p{Script=Latin}+$/u.test('hey') //✅

Read more about all the properties you can use directly on the proposal.

Read more about all the properties you can use directly on the proposal .

Examples (Examples)

Supposing a string has only one number you need to extract, /\d+/ should do it:

Supposing a string has only one number you need to extract, /\d+/ should do it:

'Test 123123329'.match(/\d+/) // Array [ "123123329" ]
Match an email address (Match an email address)

A simplistic approach is to check non-space characters before and after the @ sign, using \S:

A simplistic approach is to check non-space characters before and after the @ sign, using \S :

/(\S+)@(\S+)\.(\S+)/
/(\S+)@(\S+)\.(\S+)/.exec('copesc@gmail.com') //["copesc@gmail.com", "copesc", "gmail", "com"]

This is a simplistic example, however, as many invalid emails are still satisfied by this regex.

This is a simplistic example, however, as many invalid emails are still satisfied by this regex.

Capture text between double quotes (Capture text between double quotes)

Suppose you have a string that contains something in double quotes, and you want to extract that content.

Suppose you have a string that contains something in double quotes, and you want to extract that content.

The best way to do so is by using a capturing group, because we know the match starts and ends with ", and we can easily target it, but we also want to remove those quotes from our result.

The best way to do so is by using a capturing group , because we know the match starts and ends with " , and we can easily target it, but we also want to remove those quotes from our result.

We’ll find what we need in result[1]:

We'll find what we need in result[1] :

const hello = 'Hello "nice flower"' const result = /"([^']*)"/.exec(hello) //Array [ "\"nice flower\"", "nice flower" ]
Get the content inside an HTML tag (Get the content inside an HTML tag)

For example get the content inside a span tag, allowing any number of arguments inside the tag:

For example get the content inside a span tag, allowing any number of arguments inside the tag:

/<span\b[^>]*>(.*?)&lt;\/span>/
/<span\b[^>]*>(.*?)<\/span>/.exec('test')// null
/<span\b[^>]*>(.*?)<\/span>/.exec('<span>test</span>') // ["&lt;span>test</span>", "test"]
/<span\b[^>]*>(.*?)<\/span>/.exec('<span class="x">test</span>') // ["<span class="x">test</span>", "test"]

Interested in learning JavaScript? Get my ebook at jshandbook.com

Interested in learning JavaScript? Get my ebook at jshandbook.com

翻译自: https://www.freecodecamp.org/news/a-quick-and-simple-guide-to-javascript-regular-expressions-48b46a68df29/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值