Qt正则表达式文档介绍

Qt中正则表达式在 类QRegExp中有详细的介绍,想看原文档的可以在Qt中查看。此处主要是翻译该文档。

 

Introduction

 

Regexps are built up from expressions, quantifiers, and assertions. The simplest expression is a character, e.g. x or 5. An expression can also be a set of characters enclosed in square brackets. [ABCD] will match an A or a B or a C or a D. We can write this same expression as [A-D], and an expression to match any capital letter in the English alphabet is written as [A-Z].

regexp是由表达式、量词和断言构建的。最简单的表达式是字符,例如x或5。表达式也可以是方括号内的一组字符。[ABCD]将匹配A、B、C或D。我们可以将这个表达式写成[A- D],而与英语字母表中任何大写字母匹配的表达式写成[A- Z]。

A quantifier specifies the number of occurrences of an expression that must be matched. x{1,1} means match one and only one x. x{1,5} means match a sequence of x characters that contains at least one x but no more than five.

量词必须指定匹配的表达式的出现次数。{1,1}表示匹配一个且仅匹配一个x。{1,5}表示匹配包含至少一个x但不超过五个的x字符序列。

Note that in general regexps cannot be used to check for balanced brackets or tags. For example, a regexp can be written to match an opening html <b> and its closing </b>, if the <b> tags are not nested, but if the <b> tags are nested, that same regexp will match an opening <b> tag with the wrong closing </b>. For the fragment <b>bold <b>bolder</b></b>, the first <b> would be matched with the first </b>, which is not correct. However, it is possible to write a regexp that will match nested brackets or tags correctly, but only if the number of nesting levels is fixed and known. If the number of nesting levels is not fixed and known, it is impossible to write a regexp that will not fail.

注意,一般情况下,regexp不能用于检查平衡的括号或标记。例如,可以编写regexp来匹配开始的html <b>及其结束的</b>,如果<b>标记没有嵌套,但是如果<b>标记嵌套,那么相同的regexp将匹配开始的<b>标记,其结束的</b>是错误的。对于片段<b>bold <b>bold </b></b>,第一个<b>将与第一个</b>匹配,这是不正确的。然而,可以编写一个regexp来正确匹配嵌套的括号或标记,但前提是嵌套级别的数量是固定且已知的。如果嵌套级别的数量不固定且已知,则不可能编写不会失败的regexp。

Suppose we want a regexp to match integers in the range 0 to 99. At least one digit is required, so we start with the expression [0-9]{1,1}, which matches a single digit exactly once. This regexp matches integers in the range 0 to 9. To match integers up to 99, increase the maximum number of occurrences to 2, so the regexp becomes [0-9]{1,2}. This regexp satisfies the original requirement to match integers from 0 to 99, but it will also match integers that occur in the middle of strings. If we want the matched integer to be the whole string, we must use the anchor assertions, ^ (caret) and $ (dollar). When ^ is the first character in a regexp, it means the regexp must match from the beginning of the string. When $ is the last character of the regexp, it means the regexp must match to the end of the string. The regexp becomes ^[0-9]{1,2}$. Note that assertions, e.g. ^ and $, do not match characters but locations in the string.

假设我们希望regexp匹配范围为0到99的整数。至少需要一个数字,因此我们从表达式[0-9]{1,1}开始,它只匹配一个数字一次。这个regexp匹配范围为0到9的整数。要匹配最多99个整数,请将最大出现次数增加到2,这样regexp就变成[0-9]{1,2}。这个regexp满足从0到99匹配整数的原始要求,但它也将匹配字符串中间的整数。如果我们想要匹配整数整个字符串,我们必须使用锚断言,^(脱字符号)和$(美元)。当^ 是regexp的第一个字符,这意味着regexp必须匹配字符串的开始。当$是regexp的最后一个字符时,这意味着regexp必须匹配到字符串的末尾。regexp变得^[0 - 9]{ 1,2 } $。注意,断言,例如^和$,但位置字符串不匹配字符。

举例:

    QRegExp regExp("[0-9]{0,2}[A-Z]{0,1}");
    testLineEdit->setValidator(new QRegExpValidator(regExp, this));

上面例子是指testLineEdit中可以输入数字0~9,次数为0次到2次,再后面还可以输入大写字母A~Z,次数为0~1次。

这里表达式"^[0 - 9]{ 1,2 } $"和“[0-9]{1,1}”的区别在于,比如字符串为“a12b”,使用"^[0 - 9]{ 1,2 } $"表达式,将不会匹配这个字符串,而使用“[0-9]{1,1}”表达式将可以匹配该字符串。

If you have seen regexps described elsewhere, they may have looked different from the ones shown here. This is because some sets of characters and some quantifiers are so common that they have been given special symbols to represent them. [0-9] can be replaced with the symbol \d. The quantifier to match exactly one occurrence, {1,1}, can be replaced with the expression itself, i.e. x{1,1} is the same as x. So our 0 to 99 matcher could be written as ^\d{1,2}$. It can also be written ^\d\d{0,1}$, i.e. From the start of the string, match a digit, followed immediately by 0 or 1 digits. In practice, it would be written as ^\d\d?$. The ? is shorthand for the quantifier {0,1},  i.e. 0 or 1 occurrences. ? makes an expression optional. The regexp ^\d\d?$ means From the beginning of the string, match one digit, followed immediately by 0 or 1 more digit, followed immediately by end of string.

如果您在其他地方看到过对regexp的描述,那么它们看起来可能与这里显示的不同。这是因为一些字符集和一些量词是如此常见,以至于它们被赋予了特殊的符号来表示它们。[0-9]可替换为符号\d。量词完全匹配一个发生,{1, 1 },可以更换表达本身,即x { 1 1 } 和x是一样的。所以我们的0到99匹配器可以写成^\d{1,2}$。也可以写^\d\d {0,1}$,即从一开始的字符串,匹配一个数字,然后立即通过0或1的数字。在实践中,它将被写成^\d\d?$。?是量词{0,1}的简写,即出现0或1次。?使表达式可选。regexp ^\d\d?$表示从字符串的开始匹配一个数字,紧跟着0个或1个数字,紧跟着字符串的结束。

To write a regexp that matches one of the words 'mail' or 'letter' or 'correspondence' but does not match words that contain these words, e.g., 'email', 'mailman', 'mailer', and 'letterbox', start with a regexp that matches 'mail'. Expressed fully, the regexp is m{1,1}a{1,1}i{1,1}l{1,1}, but because a character expression is automatically quantified by {1,1}, we can simplify the regexp to mail, i.e., an 'm' followed by an 'a' followed by an 'i' followed by an 'l'. Now we can use the vertical bar |, which means or, to include the other two words, so our regexp for matching any of the three words becomes mail|letter|correspondence. Match 'mail' or 'letter' or 'correspondence'. While this regexp will match one of the three words we want to match, it will also match words we don't want to match, e.g., 'email'. To prevent the regexp from matching unwanted words, we must tell it to begin and end the match at word boundaries. First we enclose our regexp in parentheses, (mail|letter|correspondence). Parentheses group expressions together, and they identify a part of the regexp that we wish to capture. Enclosing the expression in parentheses allows us to use it as a component in more complex regexps. It also allows us to examine which of the three words was actually matched. To force the match to begin and end on word boundaries, we enclose the regexp in \b word boundary assertions: \b(mail|letter|correspondence)\b. Now the regexp means: Match a word boundary, followed by the regexp in parentheses, followed by a word boundary. The \b assertion matches a position in the regexp, not a character. A word boundary is any non-word character, e.g., a space, newline, or the beginning or ending of a string.

要编写匹配“mail”或“letter”或“correspondence”中的一个单词,但不匹配包含这些单词的单词,例如“email”、“mailman”、“mailer”和“letterbox”,请从匹配“mail”的regexp开始。regexp完全表示为m{1,1}a{1,1}i{1,1}l{1,1},但由于字符表达式由{1,1}自动量化,我们可以将regexp简化为mail,即, m后面跟着a后面跟着i后面跟着l。现在我们可以使用竖条|,这意味着,或者,包含另外两个单词,所以我们匹配这三个单词中的任何一个的regexp就变成了mail|letter|correspondence。匹配“mail”或“letter”或“correspondence”。虽然这个regexp将匹配我们想匹配的三个单词中的一个,但它也将匹配我们不想匹配的单词,例如“email”。为了防止regexp匹配不需要的单词,我们必须告诉它在单词边界处开始和结束匹配。首先,我们将regexp括在括号中(mail|letter|correspondence)。括号将表达式分组在一起,它们标识了我们希望捕获的regexp的一部分。将表达式括在括号中允许我们在更复杂的regexp中将其用作组件。它还允许我们检查这三个单词中的哪一个是真正匹配的。为了强制匹配在单词边界上开始和结束,我们将regexp放在\b单词边界断言:\b(mail|letter|correspondence)\b中。regexp的意思是:匹配一个单词边界,后面跟着圆括号中的regexp,后面跟着一个单词边界。\b断言匹配regexp中的位置,而不是字符。字边界是任何非字字符,例如空格、换行符或字符串的开头或结尾。

If we want to replace ampersand characters with the HTML entity &amp;, the regexp to match is simply &. But this regexp will also match ampersands that have already been converted to HTML entities. We want to replace only ampersands that are not already followed by amp;. For this, we need the negative lookahead assertion, (?!__). The regexp can then be written as &(?!amp;), i.e. Match an ampersand that is not followed by amp;.

这一段是说html的,不好翻译,就不翻译免得误导人了。有会翻译的朋友可以在下面留言。

If we want to count all the occurrences of 'Eric' and 'Eirik' in a string, two valid solutions are \b(Eric|Eirik)\b and \bEi?ri[ck]\b. The word boundary assertion '\b' is required to avoid matching words that contain either name, e.g. 'Ericsson'. Note that the second regexp matches more spellings than we want: 'Eric', 'Erik', 'Eiric' and 'Eirik'.

如果我们想计算一个字符串中所有出现的“Eric”和“Eirik”,两个有效的解决方案是\b(Eric|Eirik)\b和\bEi?ri[ck]\b。需要单词边界断言'\b'来避免匹配包含两个名称的单词,例如:“Ericsson”。注意,第二个regexp匹配的拼写比我们希望的要多:“Eric”、“Erik”、“Eiric”和“Eirik”。

 

 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值