Python --- Regular Expression 1

Referecned from 

1. https://docs.python.org/3.4/howto/regex.html

2. https://fishc.com.cn/thread-57073-1-3.html

 

Simple pattern

 

Matching characters

Most letters and characters will simply match themselves. However, there are some exceptions (metacharacters) to this rule.

Here is the list of metacharacters: 

.   ^   $   *   +   ?   { }   [ ]   \   |   ( )

[] are used for specifying a character class, which is a set of characters that you wish to match.

Metacharacters are not active inside classes. For example, [akm$] will match any of the characters 'a''k''m', or '$''$' is usually a metacharacter, but inside a character class it’s stripped of its special nature.

You can match the characters not listed within the class by complementing the set. This is indicated by including a '^' as the first character of the class; '^' outside a character class will simply match the '^' character. For example, [^5] will match any character except '5'.

 

\d

Matches any decimal digit; this is equivalent to the class [0-9].

\D

Matches any non-digit character; this is equivalent to the class [^0-9].

\s

Matches any whitespace character; this is equivalent to the class [ \t\n\r\f\v].

\S

Matches any non-whitespace character; this is equivalent to the class [^ \t\n\r\f\v].

\w

Matches any alphanumeric character; this is equivalent to the class [a-zA-Z0-9_].

\W

Matches any non-alphanumeric character; this is equivalent to the class [^a-zA-Z0-9_].

  
  

These sequences can be included inside a character class. For example, [\s,.] is a character class that will match any whitespace character, or ',' or '.'.

The final metacharacter in this section is '.'. It matches anything except a newline character, and there’s an alternate mode (re.DOTALL) where it will match even a newline.  '.' is often used where you want to match “any character”.

 

Repeating things

The first metacharacter for repeating things that we’ll look at is *.  * doesn’t match the literal character *; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once.

Another repeating metacharacter is +, which matches one or more times. Pay careful attention to the difference between * and +* matches zero or more times, so whatever’s being repeated may not be present at all, while + requires at least one occurrence. To use a similar example, ca+t will match cat(1 a)caaat (3 a’s), but won’t match ct.

There are two more repeating qualifiers. The question mark character, ?, matches either once or zero times; you can think of it as marking something as being optional. For example, home-?brew matches either homebrew or home-brew.

The most complicated repeated qualifier is {m,n}, where m and n are decimal integers. This qualifier means there must be at least m repetitions, and at most n. For example, a/{1,3}b will match a/ba//b, and a///b. It won’t match ab, which has no slashes, or ab, which has four.

You can omit either m or n; in that case, a reasonable value is assumed for the missing value. Omitting m is interpreted as a lower limit of 0, while omitting n results in an upper bound of infinity — actually, the upper bound is the 2-billion limit mentioned earlier, but that might as well be infinity.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值