JavaScript正则表达式指南

正则表达式简介 (Introduction to Regular Expressions)

A regular expression (also called regex) is a way to work with strings, in a very performant way.

正则表达式(也称为regex )是一种以非常有效的方式处理字符串的方法。

By formulating a regular expression with a special syntax, you can

通过使用特殊语法来表达正则表达式,您可以

  • search text a string

    搜索文本字符串

  • replace substrings in a string

    替换字符串中的字符串

  • extract information from a string

    从字符串中提取信息

Almost every programming language implements regular expressions. There are small differences between each implementation, but the general concepts apply almost everywhere.

几乎每种编程语言都实现正则表达式。 每种实现之间的差异很小,但是一般概念几乎适用于所有地方。

Regular Expressions date back to the 1950s, when it was formalized as a conceptual search pattern for string processing algorithms.

正则表达式的历史可以追溯到1950年代,当时正则表达式被正式用作字符串处理算法的概念搜索模式。

Implemented in UNIX tools like grep, sed, and in popular text editors, regexes grew in popularity and were introduced in the Perl programming language, and later in many others.

正则表达式在UNIX工具(例如grep,sed)和流行的文本编辑器中实现,正逐渐流行起来,并以Perl编程语言以及后来的许多其他语言引入。

JavaScript, among with Perl, is one of the programming languages that have regular expressions support directly built in the language.

JavaScript与Perl一起,是一种编程语言之一,其正则表达式支持直接用该语言构建。

困难但有用 (Hard but useful)

Regular expressions can appear like absolute nonsense to the beginner, and many times also to the professional developer, if one does not invest the time necessary to understand them.

如果不花时间去理解正则表达式,那么对于初学者和专业开发人员来说,正则表达式似乎绝对是胡说八道。

Cryptic regular expressions are hard to write, hard to read, and hard to maintain/modify.

隐秘的正则表达式很难编写难以阅读 ,并且难以维护/修改

But sometimes a regular expression is the only sane way to perform some string manipulation, so it’s a very valuable tool in your pocket.

但是有时,正则表达式是执行某些字符串操作的唯一明智的方法 ,因此它是您口袋中非常有价值的工具。

This tutorial aims to introduce you to JavaScript Regular Expressions in a simple way, and give you all the information to read and create regular expressions.

本教程旨在以简单的方式向您介绍JavaScript正则表达式,并为您提供阅读和创建正则表达式的所有信息。

The rule of thumb is that simple regular expressions are simple to read and write, while complex regular expressions can quickly turn into a mess if you don’t deeply grasp the basics.

经验法则是, 简单的正则表达式易于读写,而如果您不太了解基本知识,则复杂的正则表达式会很快变成一团糟

正则表达式如何 (How does a Regular Expression look like)

In JavaScript, a regular expression is an object, which can be defined in two ways.

在JavaScript中,正则表达式是object ,可以通过两种方式定义。

The first is by instantiating a new RegExp object using the constructor:

首先是通过使用构造函数实例化新的RegExp对象

const re1 = new RegExp('hey')

The second is using the regular expression literal form:

第二种是使用正则表达式文字形式:

const re1 = /hey/

You know that JavaScript has object literals and array literals? It also has regex literals.

您知道JavaScript有对象文字数组文字吗? 它还具有正则表达式文字

In the example above, hey is called the pattern. In the literal form it’s delimited by forward slashes, while with the object constructor, it’s not.

在上面的示例中, hey被称为pattern 。 在文字形式中,它由正斜杠定界,而在对象构造函数中则不是。

This is the first important difference between the two forms, but we’ll see others later.

这是这两种形式之间的第一个重要区别,但稍后我们将介绍其他形式。

它是如何工作的? (How does it work?)

The regular expression we defined as re1 above is a very simple one. It searches the string hey, without any limitation: the string can contain lots of text, and hey in the middle, and the regex is satisfied. It could also contain just hey, and it will be satisfied as well.

我们在上面定义为re1的正则表达式是一个非常简单的表达式。 它搜索字符串hey ,没有任何限制:该字符串可以包含很多文本,中间是hey ,并且满足正则表达式。 它也可能只包含hey ,它也会被满足。

That’s pretty simple.

那很简单。

You can test the regex using RegExp.test(String), which returns a boolean:

您可以使用RegExp.test(String)来测试正则表达式,它返回一个布尔值:

re1.test('hey')                     //✅
re1.test('blablabla hey blablabla') //✅


re1.test('he')        //❌
re1.test('blablabla') //❌

In the above example we just checked if "hey" satisfies the regular expression pattern stored in re1.

在上面的示例中,我们只是检查"hey"满足存储在re1的正则表达式模式。

This is the simplest it can be, but you already know lots of concepts about regexes.

这可能是最简单的,但是您已经了解了许多有关正则表达式的概念。

锚定 (Anchoring)

/hey/

matches hey wherever it was put inside the string.

匹配hey无论它放在字符串中的什么位置。

If you want to match strings that start with hey, use the ^ operator:

如果要匹配以hey 开头的字符串,请使用^运算符:

/^hey/.test('hey')     //✅
/^hey/.test('bla hey') //❌

If you want to match strings that end with hey, use the $ operator:

如果要匹配以hey 结尾的字符串,请使用$运算符:

/hey$/.test('hey')     //✅
/hey$/.test('bla hey') //✅
/hey$/.test('hey you') //❌

Combine those, and match strings that exactly match hey, and just that string:

合并这些,并匹配与hey完全匹配的字符串,然后匹配该字符串:

/^hey$/.test('hey') //✅

To match a string that starts with a substring and ends with another, you can use .*, which matches any character repeated 0 or more times:

要匹配以子字符串开头和以子字符串结尾的字符串,可以使用.* ,它匹配重复0次或多次的任何字符:

/^hey.*joe$/.test('hey joe')             //✅
/^hey.*joe$/.test('heyjoe')              //✅
/^hey.*joe$/.test('hey how are you joe') //✅
/^hey.*joe$/.test('hey joe!')            //❌

匹配范围内的项目 (Match items in ranges)

Instead of matching a particular string, you can choose to match any character in a range, like:

您可以选择匹配范围内的任何字符,而不是匹配特定的字符串,例如:

/[a-z]/ //a, b, c, ... , x, y, z
/[A-Z]/ //A, B, C, ... , X, Y, Z
/[a-c]/ //a, b, c
/[0-9]/ //0, 1, 2, 3, ... , 8, 9

These regexes match strings that contain at least one of the characters in those ranges:

这些正则表达式匹配包含以下范围内至少一个字符的字符串:

/[a-z]/.test('a')  //✅
/[a-z]/.test('1')  //❌
/[a-z]/.test('A')  //❌

/[a-c]/.test('d')  //❌
/[a-c]/.test('dc') //✅

Ranges can be combined:

范围可以组合:

/[A-Za-z0-9]/
/[A-Za-z0-9]/.test('a') //✅
/[A-Za-z0-9]/.test('1') //✅
/[A-Za-z0-9]/.test('A') //✅

多次匹配范围项 (Matching a range item multiple times)

You can check if a string contains one an only one character in a range, by starting the regex with ^ and ending with the $ char:

您可以通过以^开头的正则表达式并以$ char结尾的方式来检查字符串是否在一个范围内仅包含​​一个字符。

/^[A-Z]$/.test('A')  //✅
/^[A-Z]$/.test('AB') //❌
/^[A-Z]$/.test('Ab') //❌
/^[A-Za-z0-9]$/.test('1')  //✅
/^[A-Za-z0-9]$/.test('A1') //❌

否定模式 (Negating a pattern)

The ^ character at the beginning of a pattern anchors it to the beginning of a string.

模式开头的^字符会将其锚定到字符串的开头。

Used inside a range, it negates it, so:

在范围内使用时,它会否定它,因此:

/[^A-Za-z0-9]/.test('a') //❌
/[^A-Za-z0-9]/.test('1') //❌
/[^A-Za-z0-9]/.test('A') //❌
/[^A-Za-z0-9]/.test('@') //✅

元字符 (Meta characters)

  • \d matches any digit, equivalent to [0-9]

    \d匹配任何数字,等于[0-9]

  • \D matches any character that’s not a digit, equivalent to [^0-9]

    \D匹配任何不是数字的字符,等效于[^0-9]

  • \w matches any alphanumeric character (plus underscore), equivalent to [A-Za-z_0-9]

    \w匹配任何字母数字字符(加下划线),等效于[A-Za-z_0-9]

  • \W matches any non-alphanumeric character, anything except [^A-Za-z_0-9]

    \W匹配任何非字母数字字符,除[^A-Za-z_0-9]以外的任何字符

  • \s matches any whitespace character: spaces, tabs, newlines and Unicode spaces

    \s匹配任何空白字符:空格,制表符,换行符和Unicode空格

  • \S matches any character that’s not a whitespace

    \S匹配非空格的任何字符

  • \0 matches null

    \0匹配null

  • \n matches a newline character

    \n匹配换行符

  • \t matches a tab character

    \t匹配制表符

  • \uXXXX matches a unicode character with code XXXX (requires the u flag)

    \uXXXX将一个Unicode字符与代码XXXX匹配(需要u标志)

  • . matches any character that is not a newline char (e.g. \n) (unless you use the s flag, explained later on)

    . 匹配不是换行符的任何字符(例如\n )(除非您使用s标志,稍后再解释)

  • [^] matches any character, including newline characters. It’s useful on multiline strings

    [^]匹配任何字符,包括换行符。 在多行字符串上很有用

正则表达式选择 (Regular expressions choices)

If you want to search one string or another, use the | operator.

如果要搜索一个另一个字符串,请使用| 操作员。

/hey|ho/.test('hey') //✅
/hey|ho/.test('ho')  //✅

量词 (Quantifiers)

Say you have this regex, that checks if a string has one digit in it, and nothing else:

假设您有此正则表达式,它检查字符串中是否包含一位数字,而没有其他内容:

/^\d$/

You can use the ? quantifier to make it optional, thus requiring zero or one:

您可以使用? 量词以使其为可选,因此需要零或一:

/^\d?$/

but what if you want to match multiple digits?

但是如果要匹配多个数字怎么办?

You can do it in 4 ways, using +, *, {n} and {n,m}.

您可以使用+*{n}{n,m}四种方式来实现。

+ (+)

Match one or more (>=1) items

匹配一个或多个(> = 1)项目

/^\d+$/

/^\d+$/.test('12')     //✅
/^\d+$/.test('14')     //✅
/^\d+$/.test('144343') //✅
/^\d+$/.test('')       //❌
/^\d+$/.test('1a')     //❌

* (*)

Match 0 or more (>= 0) items

匹配0个或更多(> = 0)项目

/^\d+$/

/^\d*$/.test('12')     //✅
/^\d*$/.test('14')     //✅
/^\d*$/.test('144343') //✅
/^\d*$/.test('')       //✅
/^\d*$/.test('1a')     //❌

{n} ({n})

Match exactly n items

完全匹配n项目

/^\d{3}$/

/^\d{3}$/.test('123')  //✅
/^\d{3}$/.test('12')   //❌
/^\d{3}$/.test('1234') //❌

/^[A-Za-z0-9]{3}$/.test('Abc') //✅

{n,m} ({n,m})

Match between n and m times:

nm次之间匹配:

/^\d{3,5}$/

/^\d{3,5}$/.test('123')    //✅
/^\d{3,5}$/.test('1234')   //✅
/^\d{3,5}$/.test('12345')  //✅
/^\d{3,5}$/.test('123456') //❌

m can be omitted to have an open ending to have at least n items:

可以省略m以使其末端至少包含n项目:

/^\d{3,}$/

/^\d{3,}$/.test('12')        //❌
/^\d{3,}$/.test('123')       //✅
/^\d{3,}$/.test('12345')     //✅
/^\d{3,}$/.test('123456789') //✅

可选项目 (Optional items)

Following an item with ? makes it optional:

以下项目带有? 使它成为可选的:

/^\d{3}\w?$/

/^\d{3}\w?$/.test('123')   //✅
/^\d{3}\w?$/.test('123a')  //✅
/^\d{3}\w?$/.test('123ab') //❌

团体 (Groups)

Using parentheses, you can create groups of characters: (...)

使用括号可以创建字符组: (...)

This example matches exactly 3 digits followed by one or more alphanumeric characters:

本示例完全匹配3个数字,后跟一个或多个字母数字字符:

/^(\d{3})(\w+)$/

/^(\d{3})(\w+)$/.test('123')          //❌
/^(\d{3})(\w+)$/.test('123s')         //✅
/^(\d{3})(\w+)$/.test('123something') //✅
/^(\d{3})(\w+)$/.test('1234')         //✅

Repetition characters put after a group closing parentheses refer to the whole group:

分组结束括号后的重复字符是指整个分组:

/^(\d{2})+$/

/^(\d{2})+$/.test('12')   //✅
/^(\d{2})+$/.test('123')  //❌
/^(\d{2})+$/.test('1234') //✅

捕获组 (Capturing Groups)

So far, we’ve seen how to test strings and check if they contain a certain pattern.

到目前为止,我们已经看到了如何测试字符串并检查它们是否包含特定模式。

A very cool feature of regular expressions is the ability to capture parts of a string, and put them into an array.

正则表达式的一个非常酷的功能是能够捕获字符串的各个部分 ,并将它们放入数组中。

You can do so using Groups, and in particular Capturing Groups.

您可以使用“组”,尤其是“ 捕获组”来执行此操作。

By default, a Group is a Capturing Group. Now, instead of using RegExp.test(String), which just returns a boolean if the pattern is satisfied, we use one of

默认情况下,组是捕获组。 现在,代替使用RegExp.test(String) ,如果满足模式,它仅返回一个布尔值,而是使用以下方法之一

  • String.match(RegExp)

    String.match(RegExp)

  • RegExp.exec(String)

    RegExp.exec(String)

They are exactly the same, and return an Array with the whole matched string in the first item, then each matched group content.

它们是完全相同的,并返回一个数组,该数组的第一行是整个匹配的字符串,然后是每个匹配的组内容。

If there is no match, it returns null:

如果不匹配,则返回null

'123s'.match(/^(\d{3})(\w+)$/)
//Array [ "123s", "123", "s" ]

/^(\d{3})(\w+)$/.exec('123s')
//Array [ "123s", "123", "s" ]

'hey'.match(/(hey|ho)/)
//Array [ "hey", "hey" ]

/(hey|ho)/.exec('hey')
//Array [ "hey", "hey" ]

/(hey|ho)/.exec('ha!')
//null

When a group is matched multiple times, only the last match is put in the result array:

当一个组被多次匹配时,只有最后一个匹配项被放入结果数组中:

'123456789'.match(/(\d)+/)
//Array [ "123456789", "9" ]

可选组 (Optional groups)

A capturing group can be made optional by using (...)?. If it’s not found, the resulting array slot will contain undefined:

可以使用(...)?将捕获组设为可选组(...)? 。 如果找不到,则结果数组插槽将包含undefined

/^(\d{3})(\s)?(\w+)$/.exec('123 s') //Array [ "123 s", "123", " ", "s" ]
/^(\d{3})(\s)?(\w+)$/.exec('123s') //Array [ "123s", "123", undefined, "s" ]

参考匹配组 (Reference matched groups)

Every group that’s matched is assigned a number. $1 refers to the first, $2 to the second, and so on. This will be useful when we’ll later talk about replacing parts of a string.

每个匹配的组均分配有一个编号。 $1指向第一个, $2指向第二个,依此类推。 当我们稍后讨论替换字符串的部分时,这将很有用。

命名捕获组 (Named Capturing Groups)

This is a new, ES2018 feature.

这是ES2018的新功能。

A group can be assigned to a name, rather than just being assigned a slot in the result array:

可以为一个组分配一个名称,而不仅仅是在结果数组中分配一个插槽:

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const result = re.exec('2015-01-02')

// result.groups.year === '2015';
// result.groups.month === '01';
// result.groups.day === '02';

Named capturing groups

在没有组的情况下使用match和exec (Using match and exec without groups)

There is a difference with using match and exec without groups: the first item in the array is not the whole matched string, but the match directly:

使用不带组的matchexec有一个区别:数组中的第一项不是整个匹配的字符串,而是直接匹配:

/hey|ho/.exec('hey')   // [ "hey" ]

/(hey).(ho)/.exec('hey ho') // [ "hey ho", "hey", "ho" ]

非捕获组 (Noncapturing Groups)

Since by default groups are Capturing Groups, you need a way to ignore some groups in the resulting array. This is possible using Noncapturing Groups, which start with an (?:...)

由于默认情况下组是捕获组,因此您需要一种方法来忽略结果数组中的某些组。 这可以使用非捕获组 (?:...)(?:...)开头

'123s'.match(/^(\d{3})(?:\s)(\w+)$/)
//null
'123 s'.match(/^(\d{3})(?:\s)(\w+)$/)
//Array [ "123 s", "123", "s" ]

标志 (Flags)

You can use the following flags on any regular expression:

您可以在任何正则表达式上使用以下标志:

  • g: matches the pattern multiple times

    g :多次匹配模式

  • i: makes the regex case insensitive

    i :使正则表达式不区分大小写

  • m: enables multiline mode. In this mode, ^ and $ match the start and end of the whole string. Without this, with multiline strings they match the beginning and end of each line.

    m :启用多行模式。 在此模式下, ^$匹配整个字符串的开始和结束。 否则,多行字符串将匹配每行的开头和结尾。

  • u: enables support for unicode (introduced in ES6/ES2015)

    u :启用对unicode的支持(在ES6 / ES2015中引入)

  • s: (new in ES2018) short for single line, it causes the . to match new line characters as well

    s :( ES2018中的新增功能 ) 单行的简称,它会导致. 也要匹配换行符

Flags can be combined, and they are added at the end of the string in regex literals:

可以组合标志,并在正则表达式文字中的字符串末尾添加标志:

/hey/ig.test('HEy') //✅

or as the second parameter with RegExp object constructors:

或作为RegExp对象构造函数的第二个参数:

new RegExp('hey', 'ig').test('HEy') //✅

检查正则表达式 (Inspecting a regex)

Given a regex, you can inspect its properties:

给定正则表达式,您可以检查其属性:

  • source the pattern string

    source模式字符串

  • multiline true with the m flag

    m标志的multiline true

  • global true with the g flag

    g标志的global true

  • ignoreCase true with the i flag

    ignoreCasei标志一起为true

  • lastIndex

    lastIndex

    /^(\w{3})$/i.source     //"^(\\d{3})(\\w+)$"
    /^(\w{3})$/i.multiline  //false
    /^(\w{3})$/i.lastIndex  //0
    /^(\w{3})$/i.ignoreCase //true
    /^(\w{3})$/i.global     //false

转义 (Escaping)

These characters are special:

这些字符是特殊的:

  • \

    \

  • /

    /

  • [ ]

    [ ]

  • ( )

    ( )

  • { }

    { }

  • ?

    ?

  • +

    +

  • *

    *

  • |

    |

  • .

    .

  • ^

    ^

  • $

    $

They are special because they are control characters that have a meaning in the regular expression pattern, so if you want to use them inside the pattern as matching characters, you need to escape them, by prepending a backslash:

它们之所以很特殊,是因为它们是控制字符,它们在正则表达式模式中具有含义,因此,如果要在模式中将它们用作匹配字符,则需要在它们前面加上反斜杠来对其进行转义:

/^\\$/
/^\^$/ // /^\^$/.test('^') ✅
/^\$$/ // /^\$$/.test('$') ✅

字符串边界 (String boundaries)

\b and \B let you inspect whether a string is at the beginning or at the end of a word:

\b\B让您检查字符串是在单词的开头还是结尾:

  • \b matches a set of characters at the beginning or end of a word

    \b在单词的开头或结尾匹配一组字符

  • \B matches a set of characters not at the beginning or end of a word

    \B匹配不在单词开头或结尾的一组字符

Example:

例:

'I saw a bear'.match(/\bbear/)    //Array ["bear"]
'I saw a beard'.match(/\bbear/)   //Array ["bear"]
'I saw a beard'.match(/\bbear\b/) //null
'cool_bear'.match(/\bbear\b/)     //null

使用正则表达式替换 (Replacing using Regular Expressions)

We already saw how to check if a string contains a pattern.

我们已经看到了如何检查字符串是否包含模式。

We also saw how to extract parts of a string to an array, matching a pattern.

我们还看到了如何将字符串的一部分提取到与模式匹配的数组中。

Let’s see how to replace parts of a string based on a pattern.

让我们看看如何根据模式替换字符串各个部分

The String object in JavaScript has a replace() method, which can be used without regular expressions to perform a single replacement on a string:

JavaScript中的String对象具有replace()方法,无需使用正则表达式就可以对字符串执行单个替换

"Hello world!".replace('world', 'dog') //Hello dog!
"My dog is a good dog!".replace('dog', 'cat') //My cat is a good dog!

This method also accepts a regular expression as argument:

此方法还接受正则表达式作为参数:

"Hello world!".replace(/world/, 'dog') //Hello dog!

Using the g flag is the only way to replace multiple occurrences in a string in vanilla JavaScript:

使用g标志是替换原始JavaScript中字符串中多次出现的唯一方法

"My dog is a good dog!".replace(/dog/g, 'cat') //My cat is a good cat!

Groups let us do more fancy things, like moving around parts of a string:

组让我们做更多更有趣的事情,例如在字符串的各个部分之间移动:

"Hello, world!".replace(/(\w+), (\w+)!/, '$2: $1!!!')
// "world: Hello!!!"

Instead of using a string you can use a function, to do even fancier things. It will receive a number of arguments like the one returned by String.match(RegExp) or RegExp.exec(String), with a number of arguments that depends on the number of groups:

除了使用字符串,您还可以使用函数来执行更出色的操作。 它将收到许多参数,例如String.match(RegExp)RegExp.exec(String)返回的参数,其中的参数取决于组的数量:

"Hello, world!".replace(/(\w+), (\w+)!/, (matchedString, first, second) => {
  console.log(first);
  console.log(second);

  return `${second.toUpperCase()}: ${first}!!!`
})
//"WORLD: Hello!!!"

贪婪 (Greediness)

Regular expressions are said to be greedy by default.

正则表达式默认说是贪婪的。

What does it mean?

这是什么意思?

Take this regex

以这个正则表达式

/\$(.+)\s?/

It is supposed to extract a dollar amount from a string

应该从字符串中提取美元金额

/\$(.+)\s?/.exec('This costs $100')[1]
//100

but if we have more words after the number, it freaks off

但是如果数字后面还有更多的单词,它会吓跑

/\$(.+)\s?/.exec('This costs $100 and it is less than $200')[1]
//100 and it is less than $200

Why? Because the regex after the $ sign matches any character with .+, and it won’t stop until it reaches the end of the string. Then, it finishes off because \s? makes the ending space optional.

为什么? 因为$符号后的正则表达式与.+匹配任何字符,并且直到到达字符串末尾时它才会停止。 然后,它结束了,因为\s? 使结尾空间为可选。

To fix this, we need to tell the regex to be lazy, and perform the least amount of matching possible. We can do so using the ? symbol after the quantifier:

要解决此问题,我们需要告诉正则表达式是惰性的,并执行尽可能少的匹配。 我们可以使用? 量词后的符号:

/\$(.+?)\s/.exec('This costs $100 and it is less than $200')[1]
//100

I removed the ? after \s otherwise it matched only the first number, since the space was optional

我删除了?\s否则它仅与第一个数字匹配,因为空格是可选的

So, ? means different things based on its position, because it can be both a quantifier and a lazy mode indicator.

那么, ? 根据其位置,意味着不同的事物,因为它既可以是量词又可以是惰性模式指示符。

前瞻:根据字符串匹配字符串 (Lookaheads: match a string depending on what follows it)

Use ?= to match a string that’s followed by a specific substring:

使用?=来匹配后面跟特定子字符串的字符串:

/Roger(?=Waters)/

/Roger(?= Waters)/.test('Roger is my dog') //false
/Roger(?= Waters)/.test('Roger is my dog and Roger Waters is a famous musician') //true

?! performs the inverse operation, matching if a string is not followed by a specific substring:

?! 执行逆运算,如果字符串后没有特定的子字符串则匹配:

/Roger(?!Waters)/

/Roger(?! Waters)/.test('Roger is my dog') //true
/Roger(?! Waters)/.test('Roger Waters is a famous musician') //false

Lookbehinds:根据字符串的前面匹配字符串 (Lookbehinds: match a string depending on what precedes it)

This is an ES2018 feature.

这是ES2018的功能。

Lookaheads use the ?= symbol. Lookbehinds use ?<=.

提前使用?=符号。 后方使用?<=

/(?<=Roger) Waters/

/(?<=Roger) Waters/.test('Pink Waters is my dog') //false
/(?<=Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //true

A lookbehind is negated using ?<!:

使用?<!

/(?<!Roger) Waters/

/(?<!Roger) Waters/.test('Pink Waters is my dog') //true
/(?<!Roger) Waters/.test('Roger is my dog and Roger Waters is a famous musician') //false

正则表达式和Unicode (Regular Expressions and Unicode)

The u flag is mandatory when working with Unicode strings, in particular when you might need to handle characters in astral planes, the ones that are not included in the first 1600 Unicode characters.

当使用Unicode字符串时, u标志是必需的,特别是当您可能需要在星体平面中处理字符时,前1600个Unicode字符中不包括这些字符。

Like Emojis, for example, but not just those.

例如,像表情符号,但不仅限于这些。

If you don’t add that flag, this simple regex that should match one character will not work, because for JavaScript that emoji is represented internally by 2 characters (see Unicode in JavaScript):

如果不添加该标志,则此应匹配一个字符的简单正则表达式将不起作用,因为对于JavaScript,表情符号在内部由2个字符表示(请参见JavaScript中的Unicode ):

/^.$/.test('a') //✅
/^.$/.test('🐶') //❌
/^.$/u.test('🐶') //✅

So, always use the u flag.

因此,请始终使用u标志。

Unicode, just like normal characters, handle ranges:

与普通字符一样,Unicode处理范围:

/[a-z]/.test('a')  //✅
/[1-9]/.test('1')  //✅

/[🐶-🦊]/u.test('🐺')  //✅
/[🐶-🦊]/u.test('🐛')  //❌

JavaScript checks the internal code representation, so 🐶 < 🐺 < 🦊 because \u1F436 < \u1F43A < \u1F98A. Check the full Emoji list to get those codes, and to find out the order (tip: the macOS Emoji picker has some emojis in a mixed order, don’t count on it)

JavaScript检查内部代码表示形式,因此🐶<🐺<🦊因为\u1F436 < \u1F43A < \u1F98A 。 查看完整的表情符号列表以获取这些代码,并找出顺序(提示:macOS表情符号选择器具有一些混合顺序的表情符号,请不要指望)

Unicode属性转义 (Unicode property escapes)

As we saw above, in a regular expression pattern you can use \d to match any digit, \s to match any character that’s not a white space, \w to match any alphanumeric character, and so on.

如上所述,在正则表达式模式中,您可以使用\d匹配任何数字, \s匹配任何非空格的字符, \w匹配任何字母数字的字符,依此类推。

Unicode property escapes is an ES2018 feature that introduces a very cool feature, extending this concept to all Unicode characters introducing \p{} and its negation \P{}.

Unicode属性转义是ES2018的一项功能,引入了一个非常酷的功能,将该概念扩展到所有引入\p{}及其否定\P{} Unicode字符。

Any unicode character has a set of properties. For example Script determines the language family, ASCII is a boolean that’s true for ASCII characters, and so on. You can put this property in the graph parentheses, and the regex will check for that to be true:

任何unicode字符都有一组属性。 例如, Script确定了语言系列, ASCII是布尔值,对于ASCII字符是正确的,依此类推。 您可以将此属性放在图形括号中,正则表达式将检查该属性是否为真:

/^\p{ASCII}+$/u.test('abc')   //✅
/^\p{ASCII}+$/u.test('ABC@')  //✅
/^\p{ASCII}+$/u.test('ABC🙃') //❌

ASCII_Hex_Digit is another boolean property, that checks if the string only contains valid hexadecimal digits:

ASCII_Hex_Digit是另一个布尔属性,用于检查字符串是否仅包含有效的十六进制数字:

/^\p{ASCII_Hex_Digit}+$/u.test('0123456789ABCDEF') //✅
/^\p{ASCII_Hex_Digit}+$/u.test('h')                //❌

There are many other boolean properties, which you just check by adding their name in the graph parentheses, including Uppercase, Lowercase, White_Space, Alphabetic, Emoji and more:

还有许多其他布尔属性,您可以通过在图形括号中添加它们的名称来进行检查,包括UppercaseLowercaseWhite_SpaceAlphabeticEmoji等等:

/^\p{Lowercase}$/u.test('h') //✅
/^\p{Uppercase}$/u.test('H') //✅

/^\p{Emoji}+$/u.test('H')   //❌
/^\p{Emoji}+$/u.test('🙃🙃') //✅

In addition to those binary properties, you can check any of the unicode character properties to match a specific value. In this example, I check if the string is written in the greek or latin alphabet:

除了这些二进制属性外,您还可以检查任何unicode字符属性以匹配特定值。 在此示例中,我检查字符串是否以希腊字母或拉丁字母书写:

/^\p{Script=Greek}+$/u.test('ελληνικά') //✅
/^\p{Script=Latin}+$/u.test('hey') //✅

Read more about all the properties you can use directly on the TC39 proposal.

阅读更多有关可直接在TC39提案上使用的所有属性的信息。

例子 (Examples)

从字符串中提取数字 (Extract a number from a string)

Supposing a string has only one number you need to extract, /\d+/ should do it:

假设一个字符串只有一个数字需要提取, /\d+/应该这样做:

'Test 123123329'.match(/\d+/)
// Array [ "123123329" ]

匹配电子邮件地址 (Match an email address)

A simplistic approach is to check non-space characters before and after the @ sign, using \S:

一种简单的方法是使用\S检查@符号前后的非空格字符:

/(\S+)@(\S+)\.(\S+)/

/(\S+)@(\S+)\.(\S+)/.exec('copesc@gmail.com')
//["copesc@gmail.com", "copesc", "gmail", "com"]

This is a simplistic example however, as many invalid emails are still satisfied by this regex.

但是,这是一个简单的示例,因为此正则表达式仍然可以满足许多无效电子邮件的要求。

捕获双引号之间的文本 (Capture text between double quotes)

Suppose you have a string that contains something in double quotes, and you want to extract that content.

假设您有一个包含双引号的字符串,并且您想要提取该内容。

The best way to do so is by using a capturing group, because we know the match starts and ends with ", and we can easily target it, but we also want to remove those quotes from our result.

最好的方法是使用捕获组 ,因为我们知道匹配以"开始和结束,我们可以轻松地将其作为目标,但是我们也想从结果中删除那些引号。

We’ll find what we need in result[1]:

我们将在result[1]找到所需的内容:

const hello = 'Hello "nice flower"'
const result = /"([^']*)"/.exec(hello)
//Array [ "\"nice flower\"", "nice flower" ]

在HTML标记内获取内容 (Get the content inside an HTML tag)

For example get the content inside a span tag, allowing any number of arguments inside the tag:

例如,将内容获取到span标签内,允许在标签内使用任意数量的参数:

/<span\b[^>]*>(.*?)<\/span>/

/<span\b[^>]*>(.*?)<\/span>/.exec('test')
// null
/<span\b[^>]*>(.*?)<\/span>/.exec('<span>test</span>')
// ["<span>test</span>", "test"]
/<span\b[^>]*>(.*?)<\/span>/.exec('<span class="x">test</span>')
// ["<span class="x">test</span>", "test"]

翻译自: https://flaviocopes.com/javascript-regular-expressions/

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值