TCL中的正则表达式

TCL中的正则表达式

主要是两条命令:

1      regexp?switches?exp string?matchVar? ?subMatchVar subMatchVar ...?

在字符串中使用正则表达式匹配。

2      regsub?switches?exp string subSpec?varName?

在字符串中基于正则表达式的替换

介绍一下常用的几个switches

-all在字符串中进行全部匹配或者替换,会返回匹配或者替换的总次数。

-nocase将字符串中的大写都当成小写看待。

-indices存在子匹配变量中不再是字符,而是index

--表示后面不再是switches,而是正则表达式的模式了,即使后面紧接着是-

 

例子:

(bin) 29 % set e1 {[zhou1020]}

[zhou1020]

(bin) 30 % regexp {/[([a-z]+)([0-9]+)/]} $e1 matchstring sub1 sub2

1

(bin) 31 % puts $matchstring

[zhou1020]

(bin) 32 % puts $sub1

zhou

(bin) 33 % puts $sub2

1020

下面解释一下:

Matchstring表示用正则表示式匹配的所有字符串

Sub1表示正则表达式中的第一个子表达式(就是小括号里面的正则表达式,在这里就是([a-z]+)匹配的字符串;

Sub2表示正则表达式中的第二个子表达式(就是小括号里面的正则表达式,在这里就是([0-9]+))匹配的字符串;

 

如果加上-indices,得到的是字符串的索引。

(bin) 34 % regexp -indices {/[([a-z]+)([0-9]+)/]} $e1 matchstring sub1 sub2

1

(bin) 35 % puts $matchstring

0 9

(bin) 36 % puts $sub1

1 4

(bin) 37 % puts $sub2

5 8

 

一些需要注意的地方:

1.        表示后面是pattern,不再是switches了;

2.        regsub只能替换第一个匹配的;如果要全部替换,需要加上-all选项;

3.        expect中使用正则表达式:

Expect –rea*”:当看见-re选项,就知道后面是正则表达式,这时的*表示匹配0或者多次。就是可以匹配空字符串aaaaaa……

注意Expecta*”与上面的区别:默认是global pattern模式,相当于expect –gla*”,这时*表示任何一个字符。这时匹配aaabac……。

4.        [ ]的两种用法

1      表示里面是命令

2      用在正则表达式里面,表示里面是一个范围。[0-9][a-zA-Z]

5.        ^的两种用法

(1)   表示从头开始匹配

(2)   用在正则表达式的[]里,表示取反。如[^0-9]表示除了数字0-9之外的。

6.        Quantifiers量词

*:表示0到多次

+:表示一到多次

?:表示0到一次。

7.        一个容易弄错的地方

.*:表示匹配任何字符串。(因为.表示任何字符,而*表示0次或者多次,故*:表示匹配任何字符串)

同时,*是贪婪的,它希望得到最长的字符串。如果想关掉贪婪属性,可以在后面加?。

.*\n:表示尽可能多的,以回车换行符作为结束。

看下面的例子:

(bin) 44 % set a {1111

2222

3333

}

1111

2222

3333

 

(bin) 45 %

(bin) 45 % regexp .*/n $a match

1

(bin) 46 % puts $match

1111

2222

3333

 

(bin)可以看出,它在遇到第一行结束的回车换行并没有结束,而是贪婪的匹配到了最后一个回车换行。

如果只想得到第一行的1111,可以关掉*的贪婪属性。

(bin) 47 % regexp .*?/n $a match

1

(bin) 48 % puts $match

1111

 

(bin)

也可以用regexp "/[^/n]*/n" $a match达到目的。

(bin) 49 % regexp "/[^/n]*/n" $a match

1

(bin) 50 % puts $match

1111

 

(bin) 51 %

8.        backslash转义字符

tab键并不是有几个空格组成的。

匹配单个+:两个\经过tcl解释得到一个\,\+经过正则表达式转义得到+:

(bin) 51 % regexp "//+" {+} e1

1

(bin) 52 % puts $e1

+

如果使用大括号,只使用一个\即可:

(bin) 53 % regexp {/+} {+} e1

1

(bin) 54 % puts $e1

+

(bin) 55 %

 

匹配单个\:使用中括号需要2个\,使用双引号,需要4个\.

(bin) 55 % regexp{//}"//" e1

1

(bin) 56 % puts $e1

/

(bin) 57 % regexp"""//"

1

(bin) 58 % puts $e1

/

(bin) 59 %

Regular Expressions 101

 

Tcl also supports string operations known as regular expressions Several commands can access these methods with a -regexp argument, see the man pages for which commands support regular expressions.

There are also two explicit commands for parsing regular expressions.

regexp ?switches? exp string ?matchVar? ?subMatch1 ... subMatchN?

Searches string for the regular expression exp. If a parameter matchVar is given, then the substring that matches the regular expression is copied to matchVar. If subMatchN variables exist, then the parenthetical parts of the matching string are copied to the subMatch variables, working from left to right.

regsub ?switches? exp string subSpec varName

Searches string for substrings that match the regular expression exp and replaces them with subSpec. The resulting string is copied into varName.

Regular expressions can be expressed in just a few rules.

^

Matches the beginning of a string

$

Matches the end of a string

.

Matches any single character

*

Matches any count (0-n) of the previous character

+

Matches any count, but at least 1 of the previous character

[...]

Matches any character of a set of characters

[^...]

Matches any character *NOT* a member of the set of characters following the ^.

(...)

Groups a set of characters into a subSpec.

Regular expressions are similar to the globbing that was discussed in lessons 16 and 18. The main difference is in the way that sets of matched characters are handled. In globbing the only way to select sets of unknown text is the * symbol. This matches to any quantity of any character.

In regular expression parsing, the * symbol matches zero or more occurrences of the character immediately proceeding the *. For example a* would match a, aaaaa, or a blank string. If the character directly before the * is a set of characters within square brackets, then the * will match any quantity of all of these characters. For example, [a-c]* would match aa, abc, aabcabc, or again, an empty string.

The + symbol behaves roughly the same as the *, except that it requires at least one character to match. For example, [a-c]+ would match a, abc, or aabcabc, but not an empty string.

Regular expression parsing is more powerful than globbing. With globbing you can use square brackets to enclose a set of characters any of which will be a match. Regular expression parsing also includes a method of selecting any character not in a set. If the first character after the [ is a caret (^), then the regular expression parser will match any character not in the set of characters between the square brackets. A caret can be included in the set of characters to match (or not) by placing it in any position other than the first.

The regexp command is similar to the string match command in that it matches an exp against a string. It is different in that it can match a portion of a string, instead of the entire string, and will place the characters matched into the matchVar variable.

If a match is found to the portion of a regular expression enclosed within parentheses, regexp will copy the subset of matching characters is to the subSpec argument. This can be used to parse simple strings.

Regsub will copy the contents of the string to a new variable, substituting the characters that match exp with the characters in subSpec. If subSpec contains a & or /0, then those characters will be replaced by the characters that matched exp. If the number following a backslash is 1-9, then that backslash sequence will be replaced by the appropriate portion of exp that is enclosed within parentheses.

Note that the exp argument to regexp or regsub is processed by the Tcl substitution pass. Therefore quite often the expression is enclosed in braces to prevent any special processing by Tcl.

Example

set sample "Where there is a will, There is a way."
 
#
# Match the first substring with lowercase letters only
#
set result [regexp {[a-z]+} $sample match]
puts "Result: $result match: $match"
 
#
# Match the first two words, the first one allows uppercase
set result [regexp {([A-Za-z]+) +([a-z]+)} $sample match sub1 sub2 ]
puts "Result: $result Match: $match 1: $sub1 2: $sub2"
 
#
# Replace a word
#
regsub "way" $sample "lawsuit" sample2
puts "New: $sample2"
 
#
# Use the -all option to count the number of "words"
#
puts "Number of words: [regexp -all {[^ ]} $sample]"

 

 

More Quoting Hell - Regular Expressions 102

 

regexp ?switches? exp string ?matchVar? ?subMatch1 ... subMatchN?

Searches string for the regular expression exp. If a parameter matchVar is given, then the substring that matches the regular expression is copied to matchVar. If subMatchN variables exist, then the parenthetical parts of the matching string are copied to the subMatch variables, working from left to right.

regsub ?switches? exp string subSpec varName

Searches string for substrings that match the regular expression exp and replaces them with subSpec. The resulting string is copied into varName.

The regular expression (exp) in the two regular expression parsing commands is evaluated by the Tcl parser during the Tcl substitution phase. This can provide a great deal of power, and also requires a great deal of care.

These examples show some of the trickier aspects of regular expression evaluation. The fields in each example are discussed in painful detail in the most verbose level.

The points to remember as you read the examples are:

  • A left square bracket ([) has meaning to the substitution phase, and to the regular expression parser.
  • A set of parentheses, a plus sign, and a star have meaning to the regular expression parser, but not the Tcl substitution phase.
  • A backslash sequence (/n, /t, etc) has meaning to the Tcl substitution phase, but not to the regular expression parser.
  • A backslash escaped character (/[) has no special meaning to either the Tcl substitution phase or the regular expression parser.

The phase at which a character has meaning affects how many escapes are necessary to match the character you wish to match. An escape can be either enclosing the phrase in braces, or placing a backslash before the escaped character.

To pass a left bracket to the regular expression parser to evaluate as a range of characters takes 1 escape. To have the regular expression parser match a literal left bracket takes 2 escapes (one to escape the bracket in the Tcl substitution phase, and one to escape the bracket in the regular expression parsing.). If you have the string placed within quotes, then a backslash that you wish passed to the regular expression parser must also be escaped with a backslash.

Note: You can copy the code and run it in tclsh or wish to see the effects.

Example

 
#
# Examine an overview of UNIX/Linux disks
#
set list1 [list /
{/dev/wd0a        17086    10958     5272    68%    /}/
{/dev/wd0f       179824   127798    48428    73%    /news}/
{/dev/wd0h      1249244   967818   218962    82%    /usr}/
{/dev/wd0g        98190    32836    60444    35%    /var}]
 
foreach line $list1 {
    regexp {[^ ]* *([0-9]+)[^/]*(/[a-z]*)} $line match size mounted;
    puts "$mounted is $size blocks"
}
 
 
#
# Extracting a hexadecimal value ...
#
set line {Interrupt Vector?        [32(0x20)]}
regexp "/[^/t]+/t///[/[0-9]+//(0x(/[0-9a-fA-F]+)//)]" $line match hexval
puts "Hex Default is: 0x$hexval"
 
#
# Matching the special characters as if they were ordinary
#
set str2 "abc^def"
regexp "/[^a-f]*def" $str2 match
puts "using /[^a-f] the match is: $match"
 
regexp "/[a-f^]*def" $str2 match
puts "using /[a-f^] the match is: $match"
 
regsub {/^} $str2 " is followed by: " str3
puts "$str2 with the ^ substituted is: /"$str3/""
 
regsub "(/[a-f]+)//^(/[a-f]+)" $str2 "//2 follows //1" str3
puts "$str2 is converted to /"$str3/""
  • 1
    点赞
  • 23
    收藏
    觉得还不错? 一键收藏
  • 1
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值