8.正规表示法
8.1正规表示法
正规表示法就是处理字符串的方法,他是以行为单位来进行字符串的处理行为,正规表示法透过一些特殊符号的辅助,可以让使用者轻易的达到『搜寻/删除/取代』某特定字符串的处理程序!
8.2基础正规表示法
8.2.1语系对正规表示法的影响
使用正规表示法时,需要特别留意当时环境的语系为何,否则可能会发现与别人不相同的撷取结果喔!底下的很多练习都是使用『 』这个语系数据来进行特殊符号代表意义
[:alnum:]代表英文大小写字符及数字,亦即0-9, A-Z, a-z
[:alpha:]代表任何英文大小写字符,亦即A-Z, a-z
[:blank:]代表空格键与[Tab]按键两者
[:cntrl:]代表键盘上面的控制按键,亦即包括CR, LF, Tab, Del..等等
[:digit:]代表数字而已,亦即0-9
[:graph:]除了空格符(空格键与[Tab]按键)外的其他所有按键
[:lower:]代表小写字符,亦即a-z
[:print:]代表任何可以被打印出来的字符
[:punct:]代表标点符号(punctuation symbol),亦即:" ' ? ! ; : # $...
[:upper:]代表大写字符,亦即A-Z
[:space:]任何会产生空白的字符,包括空格键, [Tab], CR等等
[:xdigit:]代表16进位的数字类型,因此包括:0-9, A-F, a-f的数字与字符
8.2.2 grep的一些进阶选项
1.grep语法
[dmtsai@study ~]$ grep [-A] [-B] [--color=auto] '搜寻字符串' filename
选项与参数:
-A :后面可加数字,为 after 的意思,除了列出该行外,后续的 n 行也列出来;
-B :后面可加数字,为 befer 的意思,除了列出该行外,前面的 n 行也列出来;
--color=auto 可将正确的那个撷取数据列出颜色
2.用法
[root@localhost tmp]# alias
alias cp='cp -i'
alias egrep='egrep --color=auto'
alias fgrep='fgrep --color=auto'
alias grep='grep --color=auto'
alias l.='ls -d .* --color=auto'
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias mv='mv -i'
alias rm='rm -i'
alias which='alias | /usr/bin/which --tty-only --read-alias --show-dot --show-tilde'
[root@localhost tmp]# alias | grep "mv"
alias mv='mv -i'
[root@localhost tmp]# alias | grep -A1 -B2 "mv"
alias ll='ls -l --color=auto'
alias ls='ls --color=auto'
alias mv='mv -i'
alias rm='rm -i'
8.2.3基础正规表示法练习RE字符意义与范例
^word意义:待搜寻的字符串(word)在行首!
范例:搜寻行首为#开始的那一行,并列出行号
grep -n '^#' regular_express.txt
word$意义:待搜寻的字符串(word)在行尾!
范例:将行尾为!的那一行打印出来,并列出行号
grep -n '!$' regular_express.txt
.意义:代表『一定有一个任意字符』的字符!
范例:搜寻的字符串可以是(eve) (eae) (eee) (e e),但不能仅有(ee)!亦即e与e中间『一定』仅有一个字符,而空格符也是字符!
grep -n 'e.e' regular_express.txt
\意义:跳脱字符,将特殊符号的特殊意义去除!
范例:搜寻含有单引号'的那一行!
grep -n \' regular_express.txt
*意义:重复零个到无穷多个的前一个RE字符
范例:找出含有(es) (ess) (esss)等等的字符串,注意,因为*可以是0个,所以es也是符合带搜寻字符串。另外,因为*为重复『前一个RE字符』的符号,因此,在*之前必须要紧接着一个RE字符喔!例如任意字符则为『.*』!
grep -n 'ess*' regular_express.txt
[list]意义:字符集合的RE字符,里面列出想要撷取的字符!
范例:搜寻含有(gl)或(gd)的那一行,需要特别留意的是,在[]当中『谨代表一个待搜寻的字符』,例如『a[afl]y』代表搜寻的字符串可以是aay, afy, aly即[afl]代表a或f或l的意思!
grep -n 'g[ld]' regular_express.txt
[n1-n2]意义:字符集合的RE字符,里面列出想要撷取的字符范围!
范例:搜寻含有任意数字的那一行!需特别留意,在字符集合[]中的减号-是有特殊意义的,他代表两个字符之间的所有连续字符!但这个连续与否与ASCII编码有关,因此,你的编码需要设定正确(在bash当中,需要确定LANG与LANGUAGE的变量是否正确!)例如所有大写字符则为[A-Z]
grep -n '[A-Z]' regular_express.txt
[^list]意义:字符集合的RE字符,里面列出不要的字符串或范围!
范例:搜寻的字符串可以是(oog) (ood)但不能是(oot),那个^在[]内时,代表的意义是『反向选择』的意思。例如,我不要大写字符,则为[^A-Z]。但是,需要特别注意的是,如果以grep -n [^A-Z] regular_express.txt来搜寻,却发现该档案内的所有行都被列出,为什么?因为这个[^A-Z]是『非大写字符』的意思,因为每一行均有非大写字符,例如第一行的"Open Source"就有p,e,n,o....等等的小写字
grep -n 'oo[^t]' regular_express.txt
\{n,m\}意义:连续n到m个的『前一个RE字符』
意义:若为\{n\}则是连续n个的前一个RE字符,
意义:若是\{n,\}则是连续n个以上的前一个RE字符!范例:在g与g之间有2个到3个的o存在的字符串,亦即(goog)(gooog)
grep -n 'go\{2,3\}g' regular_express.txt
1.测试文档
[root@localhost tmp]# cat test
"Open Source" is a good mechanism to develop programs.
apple is my favorite food.
Football game is not use feet only.
this dress doesn't fit me.
However, this dress is about $ 3183 dollars.^M
GNU is free air not free beer.^M
Her hair is very beauty.^M
I can't finish the test.^M
Oh! The soup taste good.^M
motorcycle is cheap than car.
This window is clear.
the symbol '*' is represented as start.
Oh! My god!
The gd software is a library for drafting programs.^M
You are the best is mean you are the no. 1.
The world is the same with "glad".
I like dog.
google is the best tools for search keyword.
goooooogle yes!
go! go! Let's go.
# I am VBird
#文档含有空行
2.搜寻特定字符串
[root@localhost tmp]# grep -n "the" test#查找含有"the"的行
8:I can't finish the test.^M #-n,显示行号
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world is the same with "glad".
18:google is the best tools for search keyword.
[root@localhost tmp]# grep -vn "the" test #-v,查找不含"the"的行
[root@localhost tmp]# grep -in "the" test #-i,不区分大小写
3.利用中括号 [] 来搜寻集合字符
[root@localhost tmp]# grep -n 't[ae]st' test #查找含有test或者tast的行
8:I can't finish the test.^M
9:Oh! The soup taste good.^M
[root@localhost tmp]# grep -n 'oo' test #查找含有oo的行
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!
[root@localhost tmp]# grep -n '[^g]oo' test #不是以g开头的oo
2:apple is my favorite food.
3:Football game is not use feet only.
18:google is the best tools for search keyword.
19:goooooogle yes!
[root@localhost tmp]# grep -n '[^a-z]oo' test #不是以小写字母开头的oo
3:Football game is not use feet only.
#注:下面的语句效果相同:
[root@localhost tmp]# grep -n '[^[:lower:]]oo' test
[root@localhost tmp]# grep -n '[0-9]' test #含有数字的行
5:However, this dress is about $ 3183 dollars.^M
15:You are the best is mean you are the no. 1.
#注:下面的语句效果相同:
[root@localhost tmp]# grep -n '[[:digit:]]' test
4.行首与行尾字符 ^ $
[root@localhost tmp]# grep -n '^the' test #以the开头的行
12:the symbol '*' is represented as start.
[root@localhost tmp]# grep -n '^[0-9]' test #以数字开头的行
#注:下面的命令效果相同:
[root@localhost tmp]# grep -n '^[[:digit:]]' test
[root@localhost tmp]# grep -n '^[a-z]' test #以小写字母开头的行
2:apple is my favorite food.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
12:the symbol '*' is represented as start.
18:google is the best tools for search keyword.
19:goooooogle yes!
20:go! go! Let's go.
#注:下面的命令效果相同:
[root@localhost tmp]# grep -n '^[[:lower:]]' test
[root@localhost tmp]# grep -n '^[^a-zA-Z]' test #不是以字母开头的行
1:"Open Source" is a good mechanism to develop programs.
21:# I am VBird
[root@localhost tmp]# grep -n '\.$' test #以.结尾的行
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
4:this dress doesn't fit me.
10:motorcycle is cheap than car.
11:This window is clear.
12:the symbol '*' is represented as start.
15:You are the best is mean you are the no. 1.
16:The world is the same with "glad".
17:I like dog.
18:google is the best tools for search keyword.
20:go! go! Let's go.
[root@localhost tmp]# grep -n '^$' test #空行
22:
[root@localhost tmp]# grep -vn '^$' test | grep -vn '^#' #去掉空行和注释行
5.任意一个字符 . 与重复字符 *
u. (小数点):代表『一定有一个任意字符』的意思;
u* (星星号):代表『重复前一个字符, 0 到无穷多次』的意思,为组合形态
[root@localhost tmp]# grep -n 'g..d' test #g和d中间有两个字符
1:"Open Source" is a good mechanism to develop programs.
9:Oh! The soup taste good.^M
16:The world is the same with "glad".
[root@localhost tmp]# grep -n 'ooo*' test #两个或以上的o
[root@localhost tmp]# grep -n 'goo*g' test #两个g之间至少有一个o
[root@localhost tmp]# grep -n 'g*g' test #至少一个g
[root@localhost tmp]# grep -n 'g.*g' test #含有两个g
[root@localhost tmp]# grep -n '[0-9][0-9]*' test #含有数值
#注:个人感觉和下面的命令效果一样:
[root@localhost tmp]# grep -n '[0-9]' test
6.限定连续 RE 字符范围 {}
[root@localhost tmp]# grep -n 'o\{2\}' test #含有连续两个o
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!
#注:下面的命令效果一样:
[root@localhost tmp]# grep -n 'ooo*' test
[root@localhost tmp]# grep -n 'o\{2,5\}' test #含有连续2-5个o
1:"Open Source" is a good mechanism to develop programs.
2:apple is my favorite food.
3:Football game is not use feet only.
9:Oh! The soup taste good.^M
18:google is the best tools for search keyword.
19:goooooogle yes!
[root@localhost tmp]# grep -n 'o\{5,\}' test #含有5个或以上个o
19:goooooogle yes!
[root@localhost tmp]# grep -n 'go\{2,\}g' test #g和g之间含有2个或以上个o
18:google is the best tools for search keyword.
19:goooooogle yes!
#注:下面的语句效果一样:
[root@localhost tmp]# grep -n 'gooo*g' test
[root@localhost tmp]# grep -n 'go\{2,5\}g' test #g和g之间含有2-5个o
18:google is the best tools for search keyword.