正则表达式-学习笔记

Full regular expressions are composed of two types of characters. The special characters are called metacbaracter, while the rest are called literal, or normal text characters. It might help to consider regular expressions as their own language, with literal text acting as the words and metacharacters as the grammer.

The egrep command interprets the first command-line argument as a regular expression, and any remaining arguments as the file(s) to search. Note, however, that the single quotes are not part of regular expression, but are needed by command shell.

^ and $ which represent the start and end, respectively, of the line of text as it is being checked.

[...], usually called a character class, lets you list the characters you want to allow at that point in the match. Within a character class, the character-class metacharacter '-' indicates a range of characters. Note that a dash is a metacharacter only within a character class - otherwise it matches the normal dash character class. If you use [^...] instead of [...], the class matches any character that isn't listed. The metacharacter . is a shorthand for a character class that matches any character. It can be convenient when you want to have an "any character here" placeholder in your expression.

A very convenient metacharacter is |, which means "or". With the parenthese are required because without them, it will be different. Case-insensitive and case-sensitive is not a part of the regular-expression language, but is a related useful feature many tools provide. egrep's command-line option "-i" tells it to do a case-insensitive match. A common problem is that a regualr expression that matches the word you want can often also match where the "word" is embedded within a larger word. You can use the metasequnces \< and \> if your version happens to support them. You can think of them as word-based version of ^ and $ that match the position at the start and end of a word.

The metacharacter ? means optional. It is placed after the character or string which is srounded by parenthese. It means that it is allowed to appear at that point in the expression, but whose existence isn't actually required to still be considered a successful match. Similar to the question mark are + and *. The metachacter + means "one or more of the immediately-preceding item", and * means "any number, including none, of the item". Some version of egrep support a metasequence for providing your own minimum and maximum times of repetition, it is {min, max} placed after the item.

Backreferencing is a regular-expression feature that allows you to match new text that is the same as some text matched earlier in the expression. Finally, we replace the second word by the special metasequence \1(\2,\3...). For example, we can use \<[a-zA-Z]+ +\1\> to find the double word.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值