轻松搞定regexp正则匹配

最新推荐文章于 2024-08-20 19:34:47 发布

IC-CAD

最新推荐文章于 2024-08-20 19:34:47 发布

阅读量1.7k

点赞数 25

文章标签： python linux 开发语言

本文链接：https://blog.csdn.net/m0_59557249/article/details/137026465

版权

1. regexp 的返回值

首先 regexp 最简单的用法就是不加任何option： regexp + {表达式} + 待匹配文本，这时候 regexp 命令的返回是布尔值 1或者0 ，如果正确匹配到了返回 1 ，否则返回 0 。

注意，这里表达式最好要花括号括起来，防止不必要的错误！

set string "123 abc 456 def"
puts [regexp {\d+} $string]
puts [regexp -all {\d+} $string]
puts [regexp -inline {\d+} $string]
puts [regexp -all -inline {\d+} $string]
#输出结果如下
1
2
123
123 456

如上所示，不加任何选项，返回 1，\d+ 代表匹配 1 个或者多个数字，这里 \d 代表任何一个数字（0-9），+ 是一个量词，表示匹配前面的元素一次或多次。因此，\d+ 可以匹配任何长度的数字序列，如"1"、"12"、"123"等。-all 选项用于计算匹配到的子串的数量，这里匹配到了两个子串，故返回 2。-inline 选项用于返回匹配到的字符串本身，但是仅返回第一个，如果想要返回所有匹配到的结果，则要结合 -all 选项一起使用。

为了将匹配到的字符串存储在变量里，regexp 还可以接多个变量名，用来存储一个一个的子串。

regexp + {表达式} + 待匹配文本 + subMatch1 + subMatch 2 + ...

举例 1 ：

# Match the first substring with lowercase letters only
set sample "Where there is a will, There is a way."
set result [regexp {[a-z]+} $sample match]
puts "Result: $result match: $match"
# 输出结果：
Result: 1 match: here

[a-z] 是一个字符集合，代表任何一个从 a 到 z 的小写英文字母。

+ 是一个量词，表示匹配前面的元素一次或多次。

因此，[a-z]+ 可以匹配任何长度的连续小写英文字母序列，如 "a"、"abc"、"xyz" 等。由于第一个字母 W 是大写，所以匹配到的子串为 here 。只要匹配到了东西就返回 1 ，故 $result 为 1。

有些同学可能会疑惑，后面的 there is a 等等，都是小写为什么没有匹配到呢？我认为是因为该字符串之间有一个空格，空格不属于小写字母，所以匹配中断。但是如果我们加上 -all 这个 option 看看会发生什么！（其实我对这个 -all 选项也比较迷糊，按道理碰到空格会停止！）

# Match the first substring with lowercase letters only
set sample "Where there is a will, There is a way."
set result [regexp -all {[a-z]+} $sample match]
puts "Result: $result match: $match"
# 输出结果：
Result: 9 match: way

可以看到加了 -all option ，匹配到了 9 个子串，故返回值为 9 ，且将最后一个匹配到的结果存在变量 match 里面。这里要注意 -inline ，和 subMatch 是不同同时使用的，冲突了。

# Match the first substring with lowercase letters only
set sample "Where there is a will, There is a way."
set result [regexp -inline {[a-z]+} $sample match]
puts "Result: $result match: $match"
# 报错：
regexp match variables not allowed when using -inline
    while executing
"regexp -inline {[a-z]+} $sample match"
    invoked from within
"set result [regexp -inline {[a-z]+} $sample match]"

如果我们用 -all 结合 -inline 的方法，看看会产生什么结果：

set sample "Where there is a will, There is a way."
set result [regexp -inline -all {[a-z]+} $sample]
puts "Result: $result"
# 输出结果：
Result: here there is a will here is a way

可以看到返回了所有匹配到的子串。

如果我们想要匹配并且提取前两个单词，该怎么构建这个表达式呢？

举例 2 ：

# Match the first two words, the first one allows uppercase
set sample "Where there is a will, There is a way."
set result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ]
puts "Result: $result\nallMatched: $match\nfirstMatched: $sub1\nsecondMatched: $sub2"

# 这里使用 \n 换行显示结果
# 输出结果：
Result: 1
allMatched: Where there
firstMatched: Where
secondMatched: there

正则表达式 ([A-Za-z]+) +([a-z]+) 用于匹配由两个单词组成的序列，其中单词由英文字母组成，两个单词之间至少有一个空格。这个表达式的各个部分代表的含义如下：

([A-Za-z]+)：这个部分匹配一个或多个大写或小写英文字母。括号 () 表示这是一个捕获组，可以单独提取这部分匹配的内容。A-Za-z 覆盖了所有的英文大写和小写字母，+ 量词表示至少有一个字母。

空格+：这个部分匹配一个或多个空格字符。+ 号紧跟在空格后面，表示空格至少出现一次，用于分隔两个单词。

([a-z]+)：这个部分匹配一个或多个小写英文字母。同样，括号表示这是另一个捕获组，可以单独提取这部分匹配的内容。a-z 覆盖了所有的英文小写字母，+ 量词表示至少有一个字母。

因此，这个正则表达式可以匹配如下的字符串示例："Hello world"，其中 "Hello" 会被第一个捕获组匹配，"world" 会被第二个捕获组匹配。两个括号之间的空格和 + 号确保了两个单词之间至少有一个空格，这样的结构对于匹配由空格分隔的单词序列非常有用。

或者使用 \s+ 代表匹配一个或多个空白字符。regexp 表达式涉及到很多的特殊字符的使用，是在是难以一次性讲解完全，下一篇继续！^_^

2. 特殊字符序列

在Tcl正则表达式中，有几个特殊的字符序列（通常以反斜杠 \ 开头），它们代表了特定类型的字符类或特殊的匹配模式。这些特殊字符序列使得正则表达式既强大又灵活。以表总结了常用的特殊字符及其含义。

特殊字符
字符	解释
\d	匹配任何数字字符，等价于[0-9]。
\D	匹配任何非数字字符，等价于[^0-9]。
\s	匹配任何空白字符，包括空格、制表符、换行符等。
\S	匹配任何非空白字符。
\w	匹配任何单词字符，包括字母、数字和下划线，等价于[A-Za-z0-9_]。
\W	匹配任何非单词字符，等价于[^A-Za-z0-9_]。