正则表达式语法讲解（一）

最新推荐文章于 2024-04-24 20:07:43 发布

码农天天向上

最新推荐文章于 2024-04-24 20:07:43 发布

阅读量1k

点赞数

分类专栏： Delphi 其他文章标签：正则表达式 character string newline borland library

Delphi 同时被 2 个专栏收录

5 篇文章 0 订阅

订阅专栏

其他

3 篇文章 0 订阅

订阅专栏

Syntax of Regular Expressions(1)

（正则表达式语法1）

Important note

Below is the description of regular expressions implemented in freeware library TRegExpr. Please note, that the library widely used in many free and commertial software products. The author of TRegExpr library cannot answer direct questions from this products' users. Please, send Your questions to the product's support first.

重要事项

以下是对自由软件TregExpr库实现的正则表达式的说明。请注意，这个库广泛用于很多免费和商业软件产品。但TregExpr的作者不会直接回答来自使用这些产品的用户的问题。如果（这些用户）需要帮助，请先发送你的问题到这些产品的售后服务部门。

Introduction

Regular Expressions are a widely-used method of specifying patterns of text to search for. Special metacharacters allow You to specify, for instance, that a particular string You are looking for occurs at the beginning or end of a line, or contains n recurrences of a certain character.

介绍

正则表达式是广泛使用的、根据指定的文本模式进行查找的方法。它允许你指定特殊的原字符，比如你可以查找位于一行开头或结尾的特殊字符串，或者包括n个重复出现字符的字符串。

Regular expressions look ugly for novices, but really they are very simple (well, usually simple ;) ), handly and powerfull tool.

正则表达式对于初学者看来很费解，其实它真的是非常简单、易学和强大的工具。

I recommend You to play with regular expressions using RegExp Studio - it'll help You to uderstand main conceptions. Moreover, there are many predefined examples with comments included into repository of R.e. visual debugger.

我建议你使用RegExp Studio学习正则表达式－它可以帮助你理解主要的概念。另外，R.e. visual debugger的资料库里面还有很多有注释的完整示例。

Let's start our learning trip!

让我们开始吧！

Simple matches

Any single character matches itself, unless it is a metacharacter with a special meaning described below.

简单匹配

任何一个字符匹配它自己，除非它是下面有特殊含义的元字符。

A series of characters matches that series of characters in the target string, so the pattern "bluh" would match "bluh'' in the target string. Quite simple, eh ?

一系列的字符匹配目标串中相同的字符，所以“bluh”模式匹配目标串里的“bluh”。非常简单，不是吗？

You can cause characters that normally function as metacharacters or escape sequences to be interpreted literally by 'escaping' them by preceding them with a backslash "/", for instance: metacharacter "^" match beginning of string, but "/^" match character "^", "//" match "/" and so on.

你可以使字符作为一个元字符的功能处理，或者通过在它们的前面加反斜线“/” 做转义序列处理，即按它们的字面意思进行解释，比如：元字符“^”匹配字符串的开头，但“/^”匹配字符“^”，同样的有“//”表示“/”等。

Examples:

foobar matchs string 'foobar'

/^FooBarPtr matchs '^FooBarPtr'

Note for C++ Builder users

Please, read in FAQ answer on question Why many r.e. work wrong in Borland C++ Builder?

C＋＋Builder的使用者注意

请阅读FAQ中回答的关于为什么许多r.e在Borland C++ Builder无法正常工作的问题？

Escape sequences

转义序列

Characters may be specified using a escape sequences syntax much like that used in C and Perl: "/n'' matches a newline, "/t'' a tab, etc. More generally, /xnn, where nn is a string of hexadecimal digits, matches the character whose ASCII value is nn. If You need wide (Unicode) character code, You can use '/x{nnnn}', where 'nnnn' - one or more hexadecimal digits.

/xnn char with hex code nn

/x{nnnn} char with hex code nnnn (one byte for plain text and two bytes for Unicode)

/t tab (HT/TAB), same as /x09

/n newline (NL), same as /x0a

/r car.return (CR), same as /x0d

/f form feed (FF), same as /x0c

/a alarm (bell) (BEL), same as /x07

/e escape (ESC), same as /x1b

/xnn 16进制nn形式的字符

/x{nnnn} 16进制nnnn形式的字符（一字节用于明文，两字节用于Unicode）

/t tab (HT/TAB), 同/x09

/n 换行 (NL), 同/x0a

/r 回车(CR), 同/x0d

/f 换页 (FF), 同/x0c

/a 报警 (bell) (BEL), 同/x07

/e 逃逸符 (ESC), 同/x1b

Examples:

foo/x20bar matchs 'foo bar' (note space in the middle)

/tfoobar matchs 'foobar' predefined by tab

foo/x20bar 匹配’foo bar’（注意中间的空格）

/tfoobar 匹配前面有tab的’foobar’

Character classes

字符类

You can specify a character class, by enclosing a list of characters in [], which will match any one character from the list.

你可以通过用[]包括一系列字符指定一个字符类，将匹配任何[]中的字符。

If the first character after the "['' is "^'', the class matches any character not in the list.

如果[后第一个字符使“^”，这个类将匹配任何不在这个[]里的的列表。

Examples:

foob[aeiou]r finds strings 'foobar', 'foober' etc. but not 'foobbr', 'foobcr' etc.

foob[^aeiou]r find strings 'foobbr', 'foobcr' etc. but not 'foobar', 'foober' etc.

foob[aeiou]r 匹配'foobar', 'foober'等，但不匹配'foobbr', 'foobcr'等.

foob[^aeiou]r 匹配'foobbr', 'foobcr'等，但不匹配'foobar', 'foober'等.

Within a list, the "-'' character is used to specify a range, so that a-z represents all characters between "a'' and "z'', inclusive.

在一个列表中，“-”表示一个范围，所以a-z表示a到z间的所有字符。

If You want "-'' itself to be a member of a class, put it at the start or end of the list, or escape it with a backslash. If You want ']' you may place it at the start of list or escape it with a backslash.

如果你要匹配“-”，你要把它放在列表的开始或者结束，或者用“/”转义。

如果你要匹配“]”，你要把它放在列表的开始，或者用“/”转义。

Examples:

[-az] matchs 'a', 'z' and '-'

[az-] matchs 'a', 'z' and '-'

[a/-z] matchs 'a', 'z' and '-'

[a-z] matchs all twenty six small characters from 'a' to 'z'

[/n-/x0D] matchs any of #10,#11,#12,#13.

[/d-t] matchs any digit, '-' or 't'.

[]-a] matchs any char from ']'..'a'.

Metacharacters

元字符

Metacharacters are special characters which are the essence of Regular Expressions. There are different types of metacharacters, described below.

元字符是正在表达式的本质，它是一类特殊的字符，下面展示了不同类型的元字符：

Metacharacters - line separators

元字符 – 行分隔符

^ start of line。表示一行的开头

$ end of line。表示一行的结束

/A start of text。表示文本的开始

/Z end of text。表示文本的结束

. any character in line。匹配任意一个字符

Examples:

^foobar matchs string 'foobar' only if it's at the beginning of line

foobar$ matchs string 'foobar' only if it's at the end of line

^foobar$ matchs string 'foobar' only if it's the only string in line

foob.r matchs strings like 'foobar', 'foobbr', 'foob1r' and so on

The "^" metacharacter by default is only guaranteed to match at the beginning of the input string/text, the "$" metacharacter only at the end. Embedded line separators will not be matched by "^'' or "$''.

当嵌入行分割符后，"^"或"$''就不在表示原来的意思。

You may, however, wish to treat a string as a multi-line buffer, such that the "^'' will match after any line separator within the string, and "$'' will match before any line separator. You can do this by switching On the modifier /m.

但是你可能处理多行文本，但是这样"^"或"$''就只会匹配行分隔符后的开头或者结束。这时，你可以启用修改符/m。

码农天天向上

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
正则表达式语法讲解（一）

Syntax of Regular Expressions(1)（正则表达式语法1） Important noteBelow is the description of regular expressions implemented in freeware library TRegExpr. Please note, that the
复制链接

扫一扫