linux下的orre命令,鳥哥的 Linux 私房菜

最新推荐文章于 2022-01-28 15:11:17 发布

未知数Swendy

最新推荐文章于 2022-01-28 15:11:17 发布

阅读量265

点赞数

文章标签： linux下的orre命令

11.2.3 基礎正規表示法練習

要瞭解正規表示法最簡單的方法就是由實際練習去感受啦！所以在彙整正規表示法特殊符號前，

我們先以底下這個檔案的內容來進行正規表示法的理解吧！先說明一下，底下的練習大前提是：

語系已經使用『 export LANG=C; export LC_ALL=C 』的設定值；

grep 已經使用 alias 設定成為『 grep --color=auto 』

至於本章的練習用檔案請由底下的連結來下載。需要特別注意的是，底下這個檔案是鳥哥在 MS Windows 系統下編輯的，

並且已經特殊處理過，因此，他雖然是純文字檔，但是內含一些 Windows

系統下的軟體常常自行加入的一些特殊字元，例如斷行字元 (^M) 就是一例！

所以，你可以直接將底下的文字以 vi 儲存成 regular_express.txt 這個檔案，

不過，還是比較建議直接點底下的連結：

如果你的 Linux 可以直接連上 Internet 的話，那麼使用如下的指令來捉取即可：

wget http://linux.vbird.org/linux_basic/0330regularex/regular_express.txt

至於這個檔案的內容如下：

[dmtsai@study ~]$ vi regular_express.txt

"Open Source" is a good mechanism to develop programs.

apple is my favorite food.

Football game is not use feet only.

this dress doesn't fit me.

However, this dress is about $ 3183 dollars.^M

GNU is free air not free beer.^M

Her hair is very beauty.^M

I can't finish the test.^M

Oh! The soup taste good.^M

motorcycle is cheap than car.

This window is clear.

the symbol '*' is represented as start.

Oh! My god!

The gd software is a library for drafting programs.^M

You are the best is mean you are the no. 1.

The world is the same with "glad".

I like dog.

google is the best tools for search keyword.

goooooogle yes!

go! go! Let's go.

# I am VBird

這檔案共有 22 行，最底下一行為空白行！現在開始我們一個案例一個案例的來介紹吧！

例題一、搜尋特定字串

搜尋特定字串很簡單吧？假設我們要從剛剛的檔案當中取得 the 這個特定字串，最簡單的方式就是這樣：

[dmtsai@study ~]$ grep -n 'the' regular_express.txt

8:I can't finish the test.

12:the symbol '*' is represented as start.

15:You are the best is mean you are the no. 1.

16:The world is the same with "glad".

18:google is the best tools for search keyword.

那如果想要『反向選擇』呢？也就是說，當該行沒有

'the' 這個字串時才顯示在螢幕上，那就直接使用：

[dmtsai@study ~]$ grep -vn 'the' regular_express.txt

你會發現，螢幕上出現的行列為除了 8,12,15,16,18 五行之外的其他行列！

接下來，如果你想要取得不論大小寫的 the 這個字串，則：

[dmtsai@study ~]$ grep -in 'the' regular_express.txt

8:I can't finish the test.

9:Oh! The soup taste good.

12:the symbol '*' is represented as start.

14:The gd software is a library for drafting programs.

15:You are the best is mean you are the no. 1.

16:The world is the same with "glad".

18:google is the best tools for search keyword.

除了多兩行 (9, 14行) 之外，第 16 行也多了一個 The 的關鍵字被擷取到喔！

例題二、利用中括號 [] 來搜尋集合字元

如果我想要搜尋 test 或 taste 這兩個單字時，可以發現到，其實她們有共通的 't?st'

存在～這個時候，我可以這樣來搜尋：

[dmtsai@study ~]$ grep -n 't[ae]st' regular_express.txt

8:I can't finish the test.

9:Oh! The soup taste good.

瞭解了吧？其實 [] 裡面不論有幾個字元，他都僅代表某『一個』字元，

所以，上面的例子說明了，我需要的字串是『tast』或『test』兩個字串而已！

而如果想要搜尋到有 oo 的字元時，則使用：

[dmtsai@study ~]$ grep -n 'oo' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

2:apple is my favorite food.

3:Football game is not use feet only.

9:Oh! The soup taste good.

18:google is the best tools for search keyword.

19:goooooogle yes!

但是，如果我不想要 oo 前面有 g 的話呢？此時，可以利用在集合字元的反向選擇 [^] 來達成：

[dmtsai@study ~]$ grep -n '[^g]oo' regular_express.txt

2:apple is my favorite food.

3:Football game is not use feet only.

18:google is the best tools for search keyword.

19:goooooogle yes!

意思就是說，我需要的是 oo ，但是 oo 前面不能是 g 就是了！仔細比較上面兩個表格，妳會發現，第

1,9 行不見了，因為 oo 前面出現了 g 所致！第 2,3 行沒有疑問，因為 foo 與 Foo 均可被接受！但是第 18

行明明有 google 的 goo 啊～別忘記了，因為該行後面出現了 tool 的 too 啊！所以該行也被列出來～

也就是說， 18 行裡面雖然出現了我們所不要的項目 (goo) 但是由於有需要的項目 (too) ，

因此，是符合字串搜尋的喔！

至於第 19 行，同樣的，因為 goooooogle 裡面的 oo 前面可能是 o ，例如：

go(ooo)oogle ，所以，這一行也是符合需求的！

再來，假設我 oo 前面不想要有小寫字元，所以，我可以這樣寫 [^abcd....z]oo ，

但是這樣似乎不怎麼方便，由於小寫字元的 ASCII 上編碼的順序是連續的，

因此，我們可以將之簡化為底下這樣：

[dmtsai@study ~]$ grep -n '[^a-z]oo' regular_express.txt

3:Football game is not use feet only.

也就是說，當我們在一組集合字元中，如果該字元組是連續的，例如大寫英文/小寫英文/數字等等，

就可以使用[a-z],[A-Z],[0-9]等方式來書寫，那麼如果我們的要求字串是數字與英文呢？

呵呵！就將他全部寫在一起，變成：[a-zA-Z0-9]。例如，我們要取得有數字的那一行，就這樣：

[dmtsai@study ~]$ grep -n '[0-9]' regular_express.txt

5:However, this dress is about $ 3183 dollars.

15:You are the best is mean you are the no. 1.

但由於考慮到語系對於編碼順序的影響，因此除了連續編碼使用減號『 - 』之外，

你也可以使用如下的方法來取得前面兩個測試的結果：

[dmtsai@study ~]$ grep -n '[^[:lower:]]oo' regular_express.txt

# 那個 [:lower:] 代表的就是 a-z 的意思！請參考前兩小節的說明表格

[dmtsai@study ~]$ grep -n '[[:digit:]]' regular_express.txt

啥？上頭在寫啥東西呢？不要害怕！分開來瞧一瞧。我們知道 [:lower:] 就是 a-z 的意思，那麼 [a-z] 當然就是

[[:lower:]] 囉！鳥哥第一次接觸正規表示法的時候，看到兩層中括號差點昏倒～完全看不懂！現在，請注意那個疊代的意義，

自然就能夠比較清楚了解囉！

這樣對於 [] 以及 [^] 以及 [] 當中的 - ，還有關於前面表格提到的特殊關鍵字有瞭解了嗎？^_^！

例題三、行首與行尾字元 ^ $

我們在例題一當中，可以查詢到一行字串裡面有 the 的，那如果我想要讓 the 只在行首列出呢？

這個時候就得要使用定位字元了！我們可以這樣做：

[dmtsai@study ~]$ grep -n '^the' regular_express.txt

12:the symbol '*' is represented as start.

此時，就只剩下第 12 行，因為只有第 12 行的行首是 the 開頭啊～此外，

如果我想要開頭是小寫字元的那一行就列出呢？可以這樣：

[dmtsai@study ~]$ grep -n '^[a-z]' regular_express.txt

2:apple is my favorite food.

4:this dress doesn't fit me.

10:motorcycle is cheap than car.

12:the symbol '*' is represented as start.

18:google is the best tools for search keyword.

19:goooooogle yes!

20:go! go! Let's go.

你可以發現我們可以捉到第一個字元都不是大寫的！上面的指令也可以用如下的方式來取代的：

[dmtsai@study ~]$ grep -n '^[[:lower:]]' regular_express.txt

好！那如果我不想要開頭是英文字母，則可以是這樣：

[dmtsai@study ~]$ grep -n '^[^a-zA-Z]' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

21:# I am VBird

# 指令也可以是： grep -n '^[^[:alpha:]]' regular_express.txt

注意到了吧？那個 ^ 符號，在字元集合符號(括號[])之內與之外是不同的！

在 [] 內代表『反向選擇』，在 [] 之外則代表定位在行首的意義！要分清楚喔！

反過來思考，那如果我想要找出來，行尾結束為小數點 (.) 的那一行，該如何處理：

[dmtsai@study ~]$ grep -n '\.$' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

2:apple is my favorite food.

3:Football game is not use feet only.

4:this dress doesn't fit me.

10:motorcycle is cheap than car.

11:This window is clear.

12:the symbol '*' is represented as start.

15:You are the best is mean you are the no. 1.

16:The world is the same with "glad".

17:I like dog.

18:google is the best tools for search keyword.

20:go! go! Let's go.

特別注意到，因為小數點具有其他意義(底下會介紹)，所以必須要使用跳脫字元(\)來加以解除其特殊意義！

不過，你或許會覺得奇怪，但是第 5~9 行最後面也是 . 啊～怎麼無法列印出來？

這裡就牽涉到 Windows 平台的軟體對於斷行字元的判斷問題了！我們使用 cat -A 將第五行拿出來看，

你會發現：

[dmtsai@study ~]$ cat -An regular_express.txt | head -n 10 | tail -n 6

5 However, this dress is about $ 3183 dollars.^M$

6 GNU is free air not free beer.^M$

7 Her hair is very beauty.^M$

8 I can't finish the test.^M$

9 Oh! The soup taste good.^M$

10 motorcycle is cheap than car.$

我們在第九章內談到過斷行字元在 Linux 與 Windows 上的差異，

在上面的表格中我們可以發現 5~9 行為 Windows 的斷行字元 (^M$) ，而正常的 Linux 應該僅有第 10 行顯示的那樣 ($)

。所以囉，那個 . 自然就不是緊接在 $ 之前喔！也就捉不到 5~9 行了！這樣可以瞭解 ^ 與 $ 的意義嗎？

好了，先不要看底下的解答，自己想一想，那麼如果我想要找出來，哪一行是『空白行』，

也就是說，該行並沒有輸入任何資料，該如何搜尋？

[dmtsai@study ~]$ grep -n '^$' regular_express.txt

22:

因為只有行首跟行尾 (^$)，所以，這樣就可以找出空白行啦！再來，假設你已經知道在一個程式腳本

(shell script) 或者是設定檔當中，空白行與開頭為 # 的那一行是註解，因此如果你要將資料列出給別人參考時，

可以將這些資料省略掉以節省保貴的紙張，那麼你可以怎麼作呢？

我們以 /etc/rsyslog.conf 這個檔案來作範例，你可以自行參考一下輸出的結果：

[dmtsai@study ~]$ cat -n /etc/rsyslog.conf

# 在 CentOS 7 中，結果可以發現有 91 行的輸出，很多空白行與 # 開頭的註解行

[dmtsai@study ~]$ grep -v '^$' /etc/rsyslog.conf | grep -v '^#'

# 結果僅有 14 行，其中第一個『 -v '^$' 』代表『不要空白行』，

# 第二個『 -v '^#' 』代表『不要開頭是 # 的那行』喔！

是否節省很多版面啊？另外，你可能也會問，那為何不要出現 # 的符號的那行就直接捨棄呢？沒辦法！因為某些註解是與設定寫在同一行的後面，

如果你只是抓 # 就予以去除，那就會將某些設定也同時移除了！那錯誤就大了～

例題四、任意一個字元 . 與重複字元 *

在第十章 bash 當中，我們知道萬用字元 * 可以用來代表任意(0或多個)字元，

但是正規表示法並不是萬用字元，兩者之間是不相同的！

至於正規表示法當中的『 . 』則代表『絕對有一個任意字元』的意思！這兩個符號在正規表示法的意義如下：

. (小數點)：代表『一定有一個任意字元』的意思；

* (星星號)：代表『重複前一個字元， 0 到無窮多次』的意思，為組合形態

這樣講不好懂，我們直接做個練習吧！假設我需要找出 g??d 的字串，亦即共有四個字元，

起頭是 g 而結束是 d ，我可以這樣做：

[dmtsai@study ~]$ grep -n 'g..d' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

9:Oh! The soup taste good.

16:The world is the same with "glad".

因為強調 g 與 d 之間一定要存在兩個字元，因此，第 13 行的 god 與第 14 行的 gd

就不會被列出來啦！再來，如果我想要列出有 oo, ooo, oooo 等等的資料，

也就是說，至少要有兩個(含) o 以上，該如何是好？是 o* 還是 oo* 還是 ooo* 呢？

雖然你可以試看看結果，不過結果太佔版面了 @_@ ，所以，我這裡就直接說明。

因為 * 代表的是『重複 0 個或多個前面的 RE 字符』的意義，

因此，『o*』代表的是：『擁有空字元或一個 o 以上的字元』，

特別注意，因為允許空字元(就是有沒有字元都可以的意思)，因此，『 grep -n 'o*' regular_express.txt 』將會把所有的資料都列印出來螢幕上！

那如果是『oo*』呢？則第一個 o 肯定必須要存在，第二個 o 則是可有可無的多個 o ，

所以，凡是含有 o, oo, ooo, oooo 等等，都可以被列出來～

同理，當我們需要『至少兩個 o 以上的字串』時，就需要 ooo* ，亦即是：

[dmtsai@study ~]$ grep -n 'ooo*' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

2:apple is my favorite food.

3:Football game is not use feet only.

9:Oh! The soup taste good.

18:google is the best tools for search keyword.

19:goooooogle yes!

這樣理解 * 的意義了嗎？好了，現在出個練習，如果我想要字串開頭與結尾都是 g，但是兩個 g

之間僅能存在至少一個 o ，亦即是 gog, goog, gooog.... 等等，那該如何？

[dmtsai@study ~]$ grep -n 'goo*g' regular_express.txt

18:google is the best tools for search keyword.

19:goooooogle yes!

如此瞭解了嗎？再來一題，如果我想要找出 g 開頭與 g 結尾的字串，當中的字元可有可無，那該如何是好？是『g*g』嗎？

[dmtsai@study ~]$ grep -n 'g*g' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

3:Football game is not use feet only.

9:Oh! The soup taste good.

13:Oh! My god!

14:The gd software is a library for drafting programs.

16:The world is the same with "glad".

17:I like dog.

18:google is the best tools for search keyword.

19:goooooogle yes!

20:go! go! Let's go.

但測試的結果竟然出現這麼多行？太詭異了吧？其實一點也不詭異，因為 g*g 裡面的 g* 代表『空字元或一個以上的 g』

在加上後面的 g ，因此，整個 RE 的內容就是 g, gg, ggg, gggg ，

因此，只要該行當中擁有一個以上的 g 就符合所需了！

那該如何得到我們的 g....g 的需求呢？呵呵！就利用任意一個字元『.』啊！

亦即是：『g.*g』的作法，因為 * 可以是 0 或多個重複前面的字符，而 . 是任意字元，所以：

『.* 就代表零個或多個任意字元』的意思啦！

[dmtsai@study ~]$ grep -n 'g.*g' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

14:The gd software is a library for drafting programs.

18:google is the best tools for search keyword.

19:goooooogle yes!

20:go! go! Let's go.

因為是代表 g 開頭與 g 結尾，中間任意字元均可接受，所以，第 1, 14, 20 行是可接受的喔！

這個 .* 的 RE 表示任意字元是很常見的，希望大家能夠理解並且熟悉！

再出一題，如果我想要找出『任意數字』的行列呢？因為僅有數字，所以就成為：

[dmtsai@study ~]$ grep -n '[0-9][0-9]*' regular_express.txt

5:However, this dress is about $ 3183 dollars.

15:You are the best is mean you are the no. 1.

雖然使用 grep -n '[0-9]' regular_express.txt 也可以得到相同的結果，

但鳥哥希望大家能夠理解上面指令當中 RE 表示法的意義才好！

例題五、限定連續 RE 字符範圍 {}

在上個例題當中，我們可以利用 . 與 RE 字符及 * 來設定 0 個到無限多個重複字元，

那如果我想要限制一個範圍區間內的重複字元數呢？舉例來說，我想要找出兩個到五個 o

的連續字串，該如何作？這時候就得要使用到限定範圍的字符 {} 了。

但根據正規表示法的處理原則，要讓 { 生效，得要加上反斜線，

亦即使用 \{ 才能成功的讓限定連續 RE 字符範圍的功能生效喔！

至於 {} 的語法是這樣的，假設我要找到兩個 o 的字串，可以是：

[dmtsai@study ~]$ grep -n 'o\{2\}' regular_express.txt

1:"Open Source" is a good mechanism to develop programs.

2:apple is my favorite food.

3:Football game is not use feet only.

9:Oh! The soup taste good.

18:google is the best tools for search keyword.

19:goooooogle yes!

這樣看似乎與 ooo* 的字符沒有什麼差異啊？因為第 19 行有多個 o 依舊也出現了！

好，那麼換個搜尋的字串，假設我們要找出 g 後面接 2 到 5 個 o ，然後再接一個 g 的字串，他會是這樣：

[dmtsai@study ~]$ grep -n 'go\{2,5\}g' regular_express.txt

18:google is the best tools for search keyword.

嗯！很好！第 19 行終於沒有被取用了(因為 19 行有 6 個 o 啊！)。

那麼，如果我想要的是 2 個 o 以上的 goooo....g 呢？除了可以是 gooo*g ，也可以是：

[dmtsai@study ~]$ grep -n 'go\{2,\}g' regular_express.txt

18:google is the best tools for search keyword.

19:goooooogle yes!

呵呵！就可以找出來啦～

未知数Swendy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
linux下的orre命令,鳥哥的 Linux 私房菜

11.2.3 基礎正規表示法練習要瞭解正規表示法最簡單的方法就是由實際練習去感受啦！所以在彙整正規表示法特殊符號前，我們先以底下這個檔案的內容來進行正規表示法的理解吧！先說明一下，底下的練習大前提是：語系已經使用『 export LANG=C; export LC_ALL=C 』的設定值；grep 已經使用 alias 設定成為『 grep --color=auto 』至於本章的練習用檔案請由底下...
复制链接

扫一扫