re.match(pattern, string, flags=0) 尝试从字符串的起始位置匹配一个模式
re.search(pattern, string, flags=0) 扫描整个字符串并返回第一个成功的匹配
re.sub(pattern, repl, string, max=0) 替换字符串中的匹配项
>>> s='112.90.239.137 112.90.239.137 1526446118 [26/Nov/2015:00:00:47 +0800] 23 "GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1" "api.map.baidu.com" 200 76 gzip:116pct. "-" "BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1" "jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)" map apimap 16555290153476373216 10.46.234.22 "9904758605881922946"'
>>> res=re.compile(r"(.*) (.*) (.*) \[(.*)\] (.*) \"(.*)\" \"(.*)\" (.*) (.*) (.*) \"(.*)\" \"(.*)\" \"(.*)\" (.*) (.*) (.*) (.*) \"(.*)\"")
>>> res is None
False
>>> res.search(s).groups()
('112.90.239.137', '112.90.239.137', '1526446118', '26/Nov/2015:00:00:47 +0800', '23', 'GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1', 'api.map.baidu.com', '200', '76', 'gzip:116pct.', '-', 'BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1', 'jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)', 'map', 'apimap', '16555290153476373216', '10.46.234.22', '9904758605881922946’)
>>> re.sub('(<b>)|(</b>)', '', s)
grep:
-v, --invert-match select non-matching lines
-i, --ignore-case ignore case distinctions
-f, --file=FILE obtain PATTERN from FILE
-w, --word-regexp force PATTERN to match only whole words
-o, --only-matching show only the part of a line matching PATTERN
-P, --perl-regexp PATTERN is a Perl regular expression
-n, --line-number print line number with output lines
-H, --with-filename print the file name for each match
-B, --before-context=NUM print NUM lines of leading context
-A, --after-context=NUM print NUM lines of trailing context
-C, --context=NUM print NUM lines of output context
-a, --text equivalent to --binary-files=text
-s, --no-messages suppress error messages
regexp:
- . (dot) - a single character.
- ? - the preceding character matches 0 or 1 times only.
- * - the preceding character matches 0 or more times.
- + - the preceding character matches 1 or more times.
- {n} - the preceding character matches exactly n times.
- {n,m} - the preceding character matches at least n times and not more than m times.
- [agd] - the character is one of those included within the square brackets.
- [^agd] - the character is not one of those included within the square brackets.
- [c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f.
- () - allows us to group several characters to behave as one.
- | (pipe symbol) - the logical OR operation.
- ^ - matches the beginning of the line.
- $ - matches the end of the line.
- \s - matches anything which is considered whitespace. This could be a space, tab, line break etc.
- \S - matches the opposite of \s, that is anything which is not considered whitespace.
- \d - matches anything which is considered a digit. ie 0 - 9 (It is effectively a shortcut for [0-9]).
- \D - matches the opposite of \d, that is anything which is not considered a digit.
- \w - matches anything which is considered a word character. That is [A-Za-z0-9_]. Note the inclusion of the underscore character '_'. This is because in programming and other areas we regulaly use the underscore as part of, say, a variable or function name.
- \W - matches the opposite of \w, that is anything which is not considered a word character.
- Tab - represented in regular expressions as \t
- Carriage return - represented in regular expressions as \r
- Line feed (or newline) - represented in regular expressions as \n
- Windows - uses the sequence \r\n (in that order)
- Mac OS (version 9 and below) - uses the sequence \r
- Unix/Linux and OSX - uses the sequence \n
- \< - represents the beginning of a word.
- \> - represents the end of a word.
- \b - represents either the beginning or end of a word.
- ( )Group part of the regular expression.\1 \2 etcRefer to something matched by a previous grouping.|Match what is on either the left or right of the pipe symbol.(?=x)Positive lookahead.(?!x)Negative lookahead.(?<=x)Positive lookbehind.(?<!x)Negative lookbehind.