匹配单个字符的基本模式
cite from google python class
https://developers.google.com/edu/python/regular-expressions?hl=zh-CN
The power of regular expressions is that they can specify patterns, not just fixed characters. Here are the most basic patterns which match single chars:
- a, X, 9, < -- ordinary characters just match themselves exactly. The meta-characters which do not match themselves because they have special meanings are: . ^ $ * + ? { [ ] \ | ( ) (details below)
- . (a period) -- matches any single character except newline '\n'
- \w -- (lowercase w) matches a "word" character: a letter or digit or underbar [a-zA-Z0-9_]. Note that although "word" is the mnemonic for this, it only matches a single word char, not a whole word. \W (upper case W) matches any non-word character.
- \b -- boundary between word and non-word
- \s -- (lowercase s) matches a single whitespace character -- space, newline, return, tab, form [ \n\r\t\f]. \S (upper case S) matches any non-whitespace character.
- \t, \n, \r -- tab, newline, return
- \d -- decimal digit [0-9] (some older regex utilities do not support but \d, but they all support \w and \s)
- ^ = start, $ = end -- match the start or end of the string
- \ -- inhibit the "specialness" of a character. So, for example, use \. to match a period or \\ to match a slash. If you are unsure if a character has special meaning, such as '@', you can put a slash in front of it, \@, to make sure it is treated
str='http://www.google.com'
match = re.search(r'^ht\w+',str)
print match.group() #http
match = re.search(r'^h[\w@/:.]+m$',str)
print match.group() #http://www.google.com
匹配以某个字符开头的用^号,比如以p开头的则是r'^p'.以某个字符结尾的用$符号,比如以字符m结果的,r'm$'
str='http://www.google.com'
match = re.search(r'^ht\w+',str)
print match.group() #http
match = re.search(r'^h[\w@/:.]+m$',str)
print match.group() #http://www.google.com
Repetition
Things get more interesting when you use + and * to specify repetition in the pattern
- + -- 1 or more occurrences of the pattern to its left, e.g. 'i+' = one or more i's
- * -- 0 or more occurrences of the pattern to its left
- ? -- match 0 or 1 occurrences of the pattern to its left
Leftmost & Largest
First the search finds the leftmost match for the pattern, and second it tries to use up as much of the string as possible -- i.e. + and * go as far as possible (the + and * are said to be "greedy").
第一点注意到搜索的是最左边匹配的模式,第二点是它会尝试匹配尽可能长的字符串Square Brackets
str = 'purple alice-b@google.com monkey dishwasher'
match = re.search(r'[\w-]+@[\w.]+',str)
if match:
print match.group() ## 'b@google'
Group Extraction
The "group" feature of a regular expression allows you to pick out parts of the matching text. Suppose for the emails problem that we want to extract the username and host separately. To do this, add parenthesis ( ) around the username and host in the pattern, like this: r'([\w.-]+)@([\w.-]+)'.
组的提取。可以在pattern表达式里面提取的部分增加括号^
anything except digit
^a意味着不是a的字符
[^ab]
anything not and also not b
Return the string obtained by replacing the leftmost non-overlapping occurrences of pattern in string by the replacement repl. If the pattern isn’t found, string is returned unchanged.
p=re.compile(r'mo')
s='molmormo'
s1=re.sub(r'mo','mi',s)
print s1