regular expression, grep (python, linux)

最新推荐文章于 2024-11-02 22:17:47 发布

weixin_30535167

最新推荐文章于 2024-11-02 22:17:47 发布

阅读量100

点赞数

文章标签： python php 测试

原文链接：http://www.cnblogs.com/yaoyaohust/p/10363200.html

版权

本文详细介绍Python中正则表达式的使用方法，包括re模块的基本函数如match、search和sub等，并通过具体示例展示了如何解析复杂日志数据。同时，文章提供了正则表达式语法的全面解释，帮助读者理解各种元字符和特殊符号的作用。

摘要由CSDN通过智能技术生成

https://docs.python.org/2/library/re.html

re.match(pattern, string, flags=0) 尝试从字符串的起始位置匹配一个模式

re.search(pattern, string, flags=0) 扫描整个字符串并返回第一个成功的匹配

re.sub(pattern, repl, string, max=0) 替换字符串中的匹配项

>>> import re

>>> s='112.90.239.137 112.90.239.137 1526446118 [26/Nov/2015:00:00:47 +0800] 23 "GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1" "api.map.baidu.com" 200 76 gzip:116pct. "-" "BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1" "jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)" map apimap 16555290153476373216 10.46.234.22 "9904758605881922946"'

>>> res=re.compile(r"(.*) (.*) (.*) \[(.*)\] (.*) \"(.*)\" \"(.*)\" (.*) (.*) (.*) \"(.*)\" \"(.*)\" \"(.*)\" (.*) (.*) (.*) (.*) \"(.*)\"")

>>> res is None

False

>>> res.search(s).groups()

('112.90.239.137', '112.90.239.137', '1526446118', '26/Nov/2015:00:00:47 +0800', '23', 'GET /ag/coord/convert?_appName=jiakaobaodianxingui&_appUser=632e76c53b4f3c9ffe90b8c4c61bd5b0&_cityCode=330300&_cityName=%E6%B8%A9%E5%B7%9E&_device=iPhone&_firstTime=2015-10-28%2018%3A49%3A05&_gpsType=baidu&_idfa=D0DD23E5-B407-449B-B005-0B52C6C2CBF3&_idfv=85A23658-2DD4-490D-B8AE-767842401821&_imei=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_j=1.0&_jail=false&_latitude=27.610026844605&_launch=45&_longitude=120.56419068644&_network=wifi&_openUuid=c09b0b9b9759e72eaf0fd6e3eb38e55113d74cdd&_pkgName=cn.mucang.ios.jiakaobaodianPromise&_platform=iphone&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8-%E9%A9%BE%E7%85%A7%E8%80%83%E8%AF%95&_productCategory=jiakaobaodian&_renyuan=mucang&_screenDip=2&_screenHeight=1136&_screenWidth=640&_system=iPhone%20OS&_systemVersion=9.0.2&_vendor=appstore&_version=5.9.0&from=0&to=4&x=120.5576965508963&y=27.61254659188421 HTTP/1.1', 'api.map.baidu.com', '200', '76', 'gzip:116pct.', '-', 'BAIDUID=C328D2934E2C6EDF8E185FAC44EB168D:FG=1', 'jiakaobaodianPromise/5.9.0 (iPhone; iOS 9.0.2; Scale/2.00)', 'map', 'apimap', '16555290153476373216', '10.46.234.22', '9904758605881922946’)

>>> re.sub('(<b>)|(</b>)', '', s)

grep:

http://ryanstutorials.net/linuxtutorial/grep.php

-v, --invert-match select non-matching lines

-i, --ignore-case ignore case distinctions

-f, --file=FILE obtain PATTERN from FILE

-w, --word-regexp force PATTERN to match only whole words

-o, --only-matching show only the part of a line matching PATTERN

-P, --perl-regexp PATTERN is a Perl regular expression

-n, --line-number print line number with output lines

-H, --with-filename print the file name for each match

-B, --before-context=NUM print NUM lines of leading context

-A, --after-context=NUM print NUM lines of trailing context

-C, --context=NUM print NUM lines of output context

-a, --text equivalent to --binary-files=text

-s, --no-messages suppress error messages

regexp:

http://ryanstutorials.net/regular-expressions-tutorial/

https://www.debuggex.com/

. (dot) - a single character.
? - the preceding character matches 0 or 1 times only.
* - the preceding character matches 0 or more times.
+ - the preceding character matches 1 or more times.
{n} - the preceding character matches exactly n times.
{n,m} - the preceding character matches at least n times and not more than m times.
[agd] - the character is one of those included within the square brackets.
[^agd] - the character is not one of those included within the square brackets.
[c-f] - the dash within the square brackets operates as a range. In this case it means either the letters c, d, e or f.
() - allows us to group several characters to behave as one.
| (pipe symbol) - the logical OR operation.
^ - matches the beginning of the line.
$ - matches the end of the line.

\s - matches anything which is considered whitespace. This could be a space, tab, line break etc.
\S - matches the opposite of \s, that is anything which is not considered whitespace.
\d - matches anything which is considered a digit. ie 0 - 9 (It is effectively a shortcut for [0-9]).
\D - matches the opposite of \d, that is anything which is not considered a digit.
\w - matches anything which is considered a word character. That is [A-Za-z0-9_]. Note the inclusion of the underscore character '_'. This is because in programming and other areas we regulaly use the underscore as part of, say, a variable or function name.
\W - matches the opposite of \w, that is anything which is not considered a word character.

Tab - represented in regular expressions as \t
Carriage return - represented in regular expressions as \r
Line feed (or newline) - represented in regular expressions as \n
Windows - uses the sequence \r\n (in that order)
Mac OS (version 9 and below) - uses the sequence \r
Unix/Linux and OSX - uses the sequence \n

\< - represents the beginning of a word.
\> - represents the end of a word.
\b - represents either the beginning or end of a word.
( )Group part of the regular expression.\1 \2 etcRefer to something matched by a previous grouping.|Match what is on either the left or right of the pipe symbol.(?=x)Positive lookahead.(?!x)Negative lookahead.(?<=x)Positive lookbehind.(?<!x)Negative lookbehind.

转载于:https://www.cnblogs.com/yaoyaohust/p/10363200.html