1. 正则表达式使用的特殊符号和字符
记号 | 说明 | 正则表达式样例 |
literal | 匹配字符串的值 | foo |
re1 | re2 | 匹配正则表达式re1或re2 | foo | bar |
. | 匹配任何字符(换行符除外) | b.b |
^ | 匹配字符串的开始 | ^Dear |
$ | 匹配字符串的结尾 | /bin/*sh$ |
* | 匹配前面出现的正则表达式零次或多次 | [A-Za-z0-9]* |
+ | 匹配前面出现的正则表达式一次或多次 | [a-z]+\.com |
? | 匹配前面出现的正则表达式零次或一次 | goo? |
{N} | 匹配前面出现的正则表达式N次 | [0-9]{3} |
{M,N} | 匹配重复出现M次到N次的正则表达式 | [0-9]{5,9} |
[…] | 匹配字符组里出现的任意一个字符 | [aeiou] |
[..x-y..] | 匹配从字符x到y中出现的任意一个字符 | [0-9], [A-Za-z] |
[^…] | 不匹配此字符集中出现的任何一个字符,包括某一范围的字符(如果在此字符集中出现) | [^aeiou], [^A-Za-z0-9_] |
(* | + | ? | {} )? | 用于上面出现的任何“非贪婪”。版本重复匹配次数符号 | .*?[a-z] |
(…) | 匹配封闭括号中正则表达式(RE),并保存为子组 | ([0-9]{3})?, f(oo|u)bar |
特殊字符 |
|
|
\d | 匹配任何数字,和[0-9]一样(\D是\d的反义:任何非数字符) | data\d.txt |
\w | 匹配任何数字字母字符,和[A-Za-z0-9_]相同(\W是\w的反义) | [A-Za-z_]\w+ |
\s | 匹配任何空白符,和[\n\t\r\v\f]相同,(\S是\s的反义) | of\sthe |
\b | 匹配单词边界(\B是\b的反义) | \bThe\b |
\nn | 匹配已保存的子组(请参考上面的正则表达式符号:(…)) | price:\16 |
\c | 逐一匹配特殊字符c(即,取消它的特殊含义,按字面匹配) | \., \\, \* |
\A (\Z) | 匹配字符串的起始(结束) | \ADear |
2. Python 的re模块:核心函数和方法
模块的函数
compile(pattern, flags=0) | compile RE pattern with any optional flags and return a regex object |
re模块的函数和regex对象的方法
match(pattern, string, flags=0) | attempt to match RE pattern to string with optional flags; return match object on success, None on failure |
search(pattern, string, flags=0) | search for first occurrence of RE pattern within string with optional flags; return match object on success, None on failure |
findall(pattern, string) | look for all (non-overlapping) occurrences of pattern in string; return a list of matches (new as of Python 1.5.2) |
split(pattern, string, max=0) | split string into a list according to RE pattern delimiter and return list of successful matches, splitting at most max times (split all occurrences is the default) |
sub(pattern, repl, string, max=0) | replace all occurrences of the RE pattern in string with repl, substituting all occurrences unless max provided (also see subn() which, in addition, returns the number of substitutions made) |
匹配对象的方法
group(num=0) | return entire match (or specific subgroup num) |
groups() | return all matching subgroups in a tuple (empty if there weren't any) |
3. 正则表达式的用法
Tue Nov 7 18:38:48 1995::kmfxps@vixmsfphpvxh.gov::815740728-6-12
Mon Dec 12 15:09:50 1977::goprf@pivcuqfecxxf.edu::250758590-5-12
Thu Apr 13 15:40:25 1972::uqkrtf@dqtnunm.com::71998825-6-7
Mon Jan 31 07:58:42 1994::knqz@cofegju.edu::759974322-4-7
Mon Nov 16 12:30:48 1970::edlprre@omwmoaqhqjb.net::27577848-7-11
Fri Jul 5 01:07:39 1996::uhjwm@gffttoky.edu::836500059-5-8
Python脚本
import re
for line in open('data.log'):
# result = re.findall('.*(h).*(h).*', line)
line = line.rstrip()
# result = re.split('::|\n', line)
# result = re.match('^(Mon|Tue|Wed|Thu|Fri|Sat|Sun)', line)
# result = re.search('.+?(\d+-\d+-\d+)', line)
result = re.search('-(\d)-', line)
# if len(result) != 0:
if result is not None:
# print result
# print result.group()
print result.group(1)
注释中的语句为re模块的一些常见用法