Golang支持Re2正则标准(实际上并不支持全部,只是Re2语法的子集),本文介绍一些Golang正则支持语法的解释。
1、Regex Flags
1、贪婪和非贪婪:
正则匹配的时候一个个字符向后找。贪婪就是即使已经匹配了还会尝试向后找扩大范围。非贪婪就是,一旦匹配完成就不会继续向下尝试。
需注意\d*?这种,每个数字其实都是带边界符,所以每个字符都能匹配到边界符,即0个数字。\w*?也一样。
Flag syntax (?U-sm)表示开启贪婪,管理单行和多行。可以放在最外面对全局生效,也可以放在group中仅对group生效。
(?U)非贪婪,表示repetition的情况下会尽量减少匹配。不开启,默认贪婪。
(?s)单行,.可以匹配到换行符,有把多行变成单行的可能。不开启,默认dot不能匹配换行符。
(?m)多行,^和$可以匹配到每行首尾(换行符)。不开启,默认^$只能匹配整个文本的首尾
(?i)表示大小写不敏感。不开启,默认大小写敏感。
global不在语法之列,用程序表达。
2、gloabal在go中不是通过flag开启的,而是通过api
global
var re = regexp.MustCompile(`(?Usim)m+`) //ungreedy singleLine insensitive multiline
var str = `mmmsdf`
for i, match := range re.FindAllString(str, -1) {
fmt.Println(match, "found at index", i)
}
非global:
var re = regexp.MustCompile(`(?s).`)
var str = `mmxx`
if len(re.FindStringIndex(str)) > 0 {
fmt.Println(re.FindString(str),"found at index",re.FindStringIndex(str)[0])
}
2、Single Characters
- . 任意字符,不包含行分隔符,[.] 仅表示.字符
- [xyz] 字符
- [^xyz] 非字符
- \d 数字
- \D 非数字
- [[:alpha:]] 字母表[a-zA-Z]
- [[:^alpha:]] 非字母表
- \pN 数字
- \p{Greek} 希腊字符
- \PN 非数字
- \P{Greek} 非希腊字符
3、Composites
xy 连续出现xy
x|y x或y
4、Repetitions
x* 零个或多个x,偏向于更多(不是x则会匹配到空字符)。(加上Ungreedy模式则偏向更少,匹配到全是空字符)
x+ 一个或多个,偏向于更多
x? 零个或一个,偏向于更多(不是x则会匹配到空字符)。(加上Ungreedy模式则偏向更少,匹配到全是空字符)
x{n,m} n到m个,偏向于更多(加上Ungreedy模式则偏向更少)
x{n,} 大于n个,偏向于更多(加上Ungreedy模式则偏向更少)
x{n} n个字符
4.1、 Ungreedy Repetitions
x*? Ungreedy模式
x+? Ungreedy模式
x?? Ungreedy模式
x{n,m}? Ungreedy模式
x{n,}? Ungreedy模式
x{n}? Ungreedy模式
x{} 与 x*相同,golang不支持该语法
x{-} 与 x*?相同,golang不支持该语法
x{-n} 与 x{n}?相同,golang不支持该语法
x= 与 x?相同,golang不支持该语法
注意:
x{n,m} x{n} x{n,}重复次数不能大于1000。Unlimited repetitions 不受该限制。
4.2、Possessive repetitions
x*+ 默认贪婪,该+符号不支持
x++ 默认贪婪,该+符号不支持
x?+ 默认贪婪,该+符号不支持
x{n,m}+ 默认贪婪,该+符号不支持
x{n,}+ 默认贪婪,该+符号不支持
x{n}+ 默认贪婪,该+符号不支持
5、Grouping
(re) 捕捉,数字组
(?P<name>re) 捕捉,命名&数字组
(?<name>re) 捕捉,命名&数字组,不支持
(?'name're) 捕捉,命名&数字组,不支持
(?:re) 不捕捉的分组
(?flags) 仅表示该生效范围的flag,语法(?U-ism)
(?flags:re) 对该正则设置flag,不捕捉该分组
(?#text) 表示注释,类似于代码中的"//"或"/**/",不参与正则。golang不支持该语法
(?|x|y|z) 组的数字重置,例如(a)|(b)|(c),那么a,b,c在组1,2,3中,加上(?|)则a,b,c都在组1中。golang不支持该语法
(?>re) 表达式贪婪,不捕获分组。golang不支持该语法,可以使用(?-U)。
re@> 表达式贪婪。golang不支持该语法,可以使用(?-U)。
%(re) 不捕获分组。golang不支持该语法,可以使用(?:)
6、Empty strings
^ 文本或行首
$ 文本末(\z)或行末(multiline开启)
\A 文本开始
\b \w和非\w的边界
\B 非\w和非\w的边界
\G 在搜索子文本的开始处 golang不支持该语法
\G 在最后一个匹配的结尾 golang不支持该语法
\Z 文本末,或在行前的文本末,golang不支持该语法
\z 文本末
(?=re) positive Lookahead,(?<=\d)ab(?=\d),golang不支持该语法
(?!re) negative Lookahead,(?<=\d)ab(?=\d),golang不支持该语法
(?<=re) positive Lookabehind,(?<=\d)ab(?=\d),golang不支持该语法
(?<!re) negative Lookabehind,(?<=\d)ab(?=\d),golang不支持该语法
re& golang不支持该语法 vim
re@= golang不支持该语法 vim
re@! golang不支持该语法 vim
re@<= golang不支持该语法 vim
re@<! golang不支持该语法 vim
\zs golang不支持该语法 vim
\ze golang不支持该语法 vim
\%^ golang不支持该语法 vim
\%$ golang不支持该语法 vim
\%V golang不支持该语法 vim
\%# golang不支持该语法 vim
\%'m golang不支持该语法 vim
\%23l golang不支持该语法 vim
\%23c golang不支持该语法 vim
\%23v golang不支持该语法 vim
7、Escape sequences
\a bell (== \007)
\f form feed (== \014)
\t horizontal tab (== \011)
\n newline (== \012)
\r carriage return (== \015)
\v vertical tab character (== \013)
\* 字符*
\123 八进制的字符码(最多3位)
\x7F 十六进制字符码(2位)
\x{10FFFF} 十六进制字符码(任意位)
\C 匹配单个字节 golang不支持该语法
\Q...\E 匹配...等标点符号
\1 backreference golang不支持该语法
\b backspace golang中为boundary (use «\010»)
\cK control char ^K golang不支持该语法 (use «\001» etc)
\e escape golang不支持该语法 (use «\033»)
\g1 backreference golang不支持该语法
\g{1} backreference golang不支持该语法
\g{+1} backreference golang不支持该语法
\g{-1} backreference golang不支持该语法
\g{name} named backreference golang不支持该语法
\g<name> subroutine call golang不支持该语法
\g'name' subroutine call golang不支持该语法
\k<name> named backreference golang不支持该语法
\k'name' named backreference golang不支持该语法
\lX lowercase «X» golang不支持该语法
\ux uppercase «x» golang不支持该语法
\L...\E lowercase text «...» golang不支持该语法
\K reset beginning of «$0» golang不支持该语法
\N{name} named Unicode character golang不支持该语法
\R line break golang不支持该语法
\U...\E upper case text «...» golang不支持该语法
\X extended Unicode sequence golang不支持该语法
\%d123 decimal character 123 golang不支持该语法 vim
\%xFF hex character FF golang不支持该语法 vim
\%o123 octal character 123 golang不支持该语法 vim
\%u1234 Unicode character 0x1234 golang不支持该语法 vim
\%U12345678 Unicode character 0x12345678 golang不支持该语法 vim
9、Character class elements
x single character
A-Z character range (inclusive)
\d Perl character class
[:foo:] 字符: f o都可匹配,注意[[:alpha:]]的不同
\p{Foo} Unicode character class «Foo»golang不支持该语法
\pF Unicode character class «F» (one-letter name)golang不支持该语法
[\d] digits (== \d)
[^\d] not digits (== \D)
[\D] not digits (== \D)
[^\D] not not digits (== \d)
[[:name:]] named ASCII class inside character class (== [:name:])golang不支持该语法
[^[:name:]] named ASCII class inside negated character class (== [:^name:])golang不支持该语法
[\p{Name}] named Unicode property inside character class (== \p{Name})golang不支持该语法
[^\p{Name}] named Unicode property inside negated character class (== \P{Name})golang不支持该语法
[[:alnum:]] alphanumeric (== [0-9A-Za-z])
[[:alpha:]] alphabetic (== [A-Za-z])
[[:ascii:]] ASCII (== [\x00-\x7F])
[[:blank:]] blank (== [\t ])
[[:cntrl:]] control (== [\x00-\x1F\x7F])
[[:digit:]] digits (== [0-9])
[[:graph:]] graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~])
[[:lower:]] lower case (== [a-z])
[[:print:]] printable (== [ -~] == [ [:graph:]])
[[:punct:]] punctuation (== [!-/:-@[-`{-~])
[[:space:]] whitespace (== [\t\n\v\f\r ])
[[:upper:]] upper case (== [A-Z])
[[:word:]] word characters (== [0-9A-Za-z_])
[[:xdigit:]] hex digit (== [0-9A-Fa-f])
Perl character classes (all ASCII-only)
\d digits (== [0-9])
\D not digits (== [^0-9])
\s whitespace (== [\t\n\f\r ])
\S not whitespace (== [^\t\n\f\r ])
\w word characters (== [0-9A-Za-z_])
\W not word characters (== [^0-9A-Za-z_])
\h horizontal space golang不支持该语法
\H not horizontal space golang不支持该语法
\v vertical space
\V not vertical space golang不支持该语法
参考
DouXiang Tech🐘
Unicode character class names--general category
C other
Cc control
Cf format
Cn unassigned code points NOT SUPPORTED
Co private use
Cs surrogate
L letter
LC cased letter NOT SUPPORTED
L& cased letter NOT SUPPORTED
Ll lowercase letter
Lm modifier letter
Lo other letter
Lt titlecase letter
Lu uppercase letter
M mark
Mc spacing mark
Me enclosing mark
Mn non-spacing mark
N number
Nd decimal number
Nl letter number
No other number
P punctuation
Pc connector punctuation
Pd dash punctuation
Pe close punctuation
Pf final punctuation
Pi initial punctuation
Po other punctuation
Ps open punctuation
S symbol
Sc currency symbol
Sk modifier symbol
Sm math symbol
So other symbol
Z separator
Zl line separator
Zp paragraph separator
Zs space separator
Unicode character class names--scripts
Adlam
Ahom
Anatolian_Hieroglyphs
Arabic
Armenian
Avestan
Balinese
Bamum
Bassa_Vah
Batak
Bengali
Bhaiksuki
Bopomofo
Brahmi
Braille
Buginese
Buhid
Canadian_Aboriginal
Carian
Caucasian_Albanian
Chakma
Cham
Cherokee
Chorasmian
Common
Coptic
Cuneiform
Cypriot
Cypro_Minoan
Cyrillic
Deseret
Devanagari
Dives_Akuru
Dogra
Duployan
Egyptian_Hieroglyphs
Elbasan
Elymaic
Ethiopic
Georgian
Glagolitic
Gothic
Grantha
Greek
Gujarati
Gunjala_Gondi
Gurmukhi
Han
Hangul
Hanifi_Rohingya
Hanunoo
Hatran
Hebrew
Hiragana
Imperial_Aramaic
Inherited
Inscriptional_Pahlavi
Inscriptional_Parthian
Javanese
Kaithi
Kannada
Katakana
Kawi
Kayah_Li
Kharoshthi
Khitan_Small_Script
Khmer
Khojki
Khudawadi
Lao
Latin
Lepcha
Limbu
Linear_A
Linear_B
Lisu
Lycian
Lydian
Mahajani
Makasar
Malayalam
Mandaic
Manichaean
Marchen
Masaram_Gondi
Medefaidrin
Meetei_Mayek
Mende_Kikakui
Meroitic_Cursive
Meroitic_Hieroglyphs
Miao
Modi
Mongolian
Mro
Multani
Myanmar
Nabataean
Nag_Mundari
Nandinagari
New_Tai_Lue
Newa
Nko
Nushu
Nyiakeng_Puachue_Hmong
Ogham
Ol_Chiki
Old_Hungarian
Old_Italic
Old_North_Arabian
Old_Permic
Old_Persian
Old_Sogdian
Old_South_Arabian
Old_Turkic
Old_Uyghur
Oriya
Osage
Osmanya
Pahawh_Hmong
Palmyrene
Pau_Cin_Hau
Phags_Pa
Phoenician
Psalter_Pahlavi
Rejang
Runic
Samaritan
Saurashtra
Sharada
Shavian
Siddham
SignWriting
Sinhala
Sogdian
Sora_Sompeng
Soyombo
Sundanese
Syloti_Nagri
Syriac
Tagalog
Tagbanwa
Tai_Le
Tai_Tham
Tai_Viet
Takri
Tamil
Tangsa
Tangut
Telugu
Thaana
Thai
Tibetan
Tifinagh
Tirhuta
Toto
Ugaritic
Vai
Vithkuqi
Wancho
Warang_Citi
Yezidi
Yi
Zanabazar_Square
Vim character classes
\i identifier character NOT SUPPORTED vim
\I «\i» except digits NOT SUPPORTED vim
\k keyword character NOT SUPPORTED vim
\K «\k» except digits NOT SUPPORTED vim
\f file name character NOT SUPPORTED vim
\F «\f» except digits NOT SUPPORTED vim
\p printable character NOT SUPPORTED vim
\P «\p» except digits NOT SUPPORTED vim
\s whitespace character (== [ \t]) NOT SUPPORTED vim
\S non-white space character (== [^ \t]) NOT SUPPORTED vim
\d digits (== [0-9]) vim
\D not «\d» vim
\x hex digits (== [0-9A-Fa-f]) NOT SUPPORTED vim
\X not «\x» NOT SUPPORTED vim
\o octal digits (== [0-7]) NOT SUPPORTED vim
\O not «\o» NOT SUPPORTED vim
\w word character vim
\W not «\w» vim
\h head of word character NOT SUPPORTED vim
\H not «\h» NOT SUPPORTED vim
\a alphabetic NOT SUPPORTED vim
\A not «\a» NOT SUPPORTED vim
\l lowercase NOT SUPPORTED vim
\L not lowercase NOT SUPPORTED vim
\u uppercase NOT SUPPORTED vim
\U not uppercase NOT SUPPORTED vim
\_x «\x» plus newline, for any «x» NOT SUPPORTED vim
Vim flags
\c ignore case NOT SUPPORTED vim
\C match case NOT SUPPORTED vim
\m magic NOT SUPPORTED vim
\M nomagic NOT SUPPORTED vim
\v verymagic NOT SUPPORTED vim
\V verynomagic NOT SUPPORTED vim
\Z ignore differences in Unicode combining characters NOT SUPPORTED vim
Magic
(?{code}) arbitrary Perl code NOT SUPPORTED perl
(??{code}) postponed arbitrary Perl code NOT SUPPORTED perl
(?n) recursive call to regexp capturing group «n» NOT SUPPORTED
(?+n) recursive call to relative group «+n» NOT SUPPORTED
(?-n) recursive call to relative group «-n» NOT SUPPORTED
(?C) PCRE callout NOT SUPPORTED pcre
(?R) recursive call to entire regexp (== (?0)) NOT SUPPORTED
(?&name) recursive call to named group NOT SUPPORTED
(?P=name) named backreference NOT SUPPORTED
(?P>name) recursive call to named group NOT SUPPORTED
(?(cond)true|false) conditional branch NOT SUPPORTED
(?(cond)true) conditional branch NOT SUPPORTED
(*ACCEPT) make regexps more like Prolog NOT SUPPORTED
(*COMMIT) NOT SUPPORTED
(*F) NOT SUPPORTED
(*FAIL) NOT SUPPORTED
(*MARK) NOT SUPPORTED
(*PRUNE) NOT SUPPORTED
(*SKIP) NOT SUPPORTED
(*THEN) NOT SUPPORTED
(*ANY) set newline convention NOT SUPPORTED
(*ANYCRLF) NOT SUPPORTED
(*CR) NOT SUPPORTED
(*CRLF) NOT SUPPORTED
(*LF) NOT SUPPORTED
(*BSR_ANYCRLF) set \R convention NOT SUPPORTED pcre
(*BSR_UNICODE) NOT SUPPORTED pcre