正则表达式语法讲解(一)

       

Syntax of Regular Expressions(1)

(正则表达式语法1)

              

Important note
Below is the description of regular expressions implemented in freeware library TRegExpr. Please note, that the library widely used in many free and commertial software products. The author of TRegExpr library cannot answer direct questions from this products' users. Please, send Your questions to the product's support first.
重要事项
以下是对自由软件TregExpr库实现的正则表达式的说明。请注意,这个库广泛用于很多免费和商业软件产品。但TregExpr的作者不会直接回答来自使用这些产品的用户的问题。如果(这些用户)需要帮助,请先发送你的问题到这些产品的售后服务部门。
 
Introduction
Regular Expressions are a widely-used method of specifying patterns of text to search for. Special metacharacters allow You to specify, for instance, that a particular string You are looking for occurs at the beginning or end of a line, or contains n recurrences of a certain character.
介绍
正则表达式是广泛使用的、根据指定的文本模式进行查找的方法。它允许你指定特殊的原字符,比如你可以查找位于一行开头或结尾的特殊字符串,或者包括n个重复出现字符的字符串。
 
Regular expressions look ugly for novices, but really they are very simple (well, usually simple ;) ), handly and powerfull tool.
正则表达式对于初学者看来很费解,其实它真的是非常简单、易学和强大的工具。
 
I recommend You to play with regular expressions using RegExp Studio - it'll help You to uderstand main conceptions. Moreover, there are many predefined examples with comments included into repository of R.e. visual debugger.
我建议你使用RegExp Studio学习正则表达式-它可以帮助你理解主要的概念。另外,R.e. visual debugger的资料库里面还有很多有注释的完整示例。
 
Let's start our learning trip!
让我们开始吧!
 
Simple matches
 
Any single character matches itself, unless it is a metacharacter with a special meaning described below.
简单匹配
任何一个字符匹配它自己,除非它是下面有特殊含义的元字符。
 
A series of characters matches that series of characters in the target string, so the pattern "bluh" would match "bluh'' in the target string. Quite simple, eh ?
一系列的字符匹配目标串中相同的字符,所以“bluh”模式匹配目标串里的“bluh”。非常简单,不是吗?
 
You can cause characters that normally function as metacharacters or escape sequences to be interpreted literally by 'escaping' them by preceding them with a backslash "/", for instance: metacharacter "^" match beginning of string, but "/^" match character "^", "//" match "/" and so on.
你可以使字符作为一个元字符的功能处理,或者通过在它们的前面加反斜线“/” 做转义序列处理,即按它们的字面意思进行解释,比如:元字符“^”匹配字符串的开头,但“/^”匹配字符“^”,同样的有“//”表示“/”等。
 
Examples:
 
 foobar           matchs string 'foobar'
 /^FooBarPtr      matchs '^FooBarPtr'
 
Note for C++ Builder users
Please, read in FAQ answer on question Why many r.e. work wrong in Borland C++ Builder?
C++Builder的使用者注意
请阅读FAQ中回答的关于为什么许多r.e在Borland C++ Builder无法正常工作的问题?
 
Escape sequences
转义序列
 
Characters may be specified using a escape sequences syntax much like that used in C and Perl: "/n'' matches a newline, "/t'' a tab, etc. More generally, /xnn, where nn is a string of hexadecimal digits, matches the character whose ASCII value is nn. If You need wide (Unicode) character code, You can use '/x{nnnn}', where 'nnnn' - one or more hexadecimal digits.
 
 
 /xnn      char with hex code nn
 /x{nnnn} char with hex code nnnn (one byte for plain text and two bytes for Unicode)
 /t        tab (HT/TAB), same as /x09
 /n        newline (NL), same as /x0a
 /r        car.return (CR), same as /x0d
 /f        form feed (FF), same as /x0c
 /a        alarm (bell) (BEL), same as /x07
 /e        escape (ESC), same as /x1b
 
 /xnn      16进制nn形式的字符
 /x{nnnn} 16进制nnnn形式的字符(一字节用于明文,两字节用于Unicode)
 /t        tab (HT/TAB), 同/x09
 /n        换行 (NL), 同/x0a
 /r        回车(CR), 同/x0d
 /f        换页 (FF), 同/x0c
 /a        报警 (bell) (BEL), 同/x07
 /e        逃逸符 (ESC), 同/x1b
 
 
Examples:
 
 foo/x20bar    matchs 'foo bar' (note space in the middle)
 /tfoobar      matchs 'foobar' predefined by tab
 foo/x20bar    匹配’foo bar’(注意中间的空格)
 /tfoobar      匹配前面有tab的’foobar’
 
Character classes
字符类
You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list.
你可以通过用[]包括一系列字符指定一个字符类, 将匹配任何[]中的字符。
 
If the first character after the "['' is "^'', the class matches any character not in the list.
如果[后第一个字符使“^”,这个类将匹配任何不在这个[]里的的列表。
 
Examples:
 foob[aeiou]r    finds strings 'foobar', 'foober' etc. but not 'foobbr', 'foobcr' etc.
 foob[^aeiou]r find strings 'foobbr', 'foobcr' etc. but not 'foobar', 'foober' etc.
 foob[aeiou]r    匹配'foobar', 'foober'等,但不匹配'foobbr', 'foobcr'等.
 foob[^aeiou]r 匹配'foobbr', 'foobcr'等,但不匹配'foobar', 'foober'等.
 
 
Within a list, the "-'' character is used to specify a range, so that a-z represents all characters between "a'' and "z'', inclusive.
在一个列表中,“-”表示一个范围,所以a-z表示a到z间的所有字符。
 
If You want "-'' itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. If You want ']' you may place it at the start of list or escape it with a backslash.
如果你要匹配“-”,你要把它放在列表的开始或者结束,或者用“/”转义。
如果你要匹配“]”,你要把它放在列表的开始,或者用“/”转义。
 
 
Examples:
 [-az]       matchs 'a', 'z' and '-'
 
 [az-]       matchs 'a', 'z' and '-'
 [a/-z]      matchs 'a', 'z' and '-'
 [a-z]       matchs all twenty six small characters from 'a' to 'z'
 [/n-/x0D] matchs any of #10,#11,#12,#13.
 [/d-t]      matchs any digit, '-' or 't'.
 []-a]       matchs any char from ']'..'a'.
 
 
Metacharacters
元字符
 
Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below.
元字符是正在表达式的本质,它是一类特殊的字符,下面展示了不同类型的元字符:
 
Metacharacters - line separators
元字符 – 行分隔符
 
 ^       start of line。表示一行的开头
 $       end of line。表示一行的结束
 /A      start of text。表示文本的开始
 /Z      end of text。表示文本的结束
 .      any character in line。匹配任意一个字符
 
Examples:
 ^foobar      matchs string 'foobar' only if it's at the beginning of line
 foobar$      matchs string 'foobar' only if it's at the end of line
 ^foobar$     matchs string 'foobar' only if it's the only string in line
 
 foob.r       matchs strings like 'foobar', 'foobbr', 'foob1r' and so on
 
The "^" metacharacter by default is only guaranteed to match at the beginning of the input string/text, the "$" metacharacter only at the end. Embedded line separators will not be matched by "^'' or "$''.
当嵌入行分割符后,"^"或"$''就不在表示原来的意思。
 
You may, however, wish to treat a string as a multi-line buffer, such that the "^'' will match after any line separator within the string, and "$'' will match before any line separator. You can do this by switching On the modifier /m.
但是你可能处理多行文本,但是这样"^"或"$''就只会匹配行分隔符后的开头或者结束。这时,你可以启用修改符/m。
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值