Python Regular Expressions 正则表达式

最新推荐文章于 2024-01-21 20:51:22 发布

qilin2016

最新推荐文章于 2024-01-21 20:51:22 发布

阅读量673

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/zhoudi2010/article/details/53160028

版权

Python 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

参考 python library 3.6 以及 Google for Education 编写

'.'
　　表示任意一个字符，除了'\n'
'*'
　　表示任意个字符（包括0个）
'+'
　　表示至少一个字符
'?'
　　表示0个或1个字符
{n}
　　表示n个字符，注意是iff n个字符
{n,m}
　　表示n到m个字符f
　　a{4,}b 可以匹配 aaaab 或者更多个’a’ 加一个 b, 但是不能匹配 aaab
　　{n,m}? 和{n,m}类似，但是它尽可能少的匹配。对于’aaaaaa’, a{3,5} 会匹配5个 ‘a’ , 但是a{3,5}? 只会匹配三个’a’
\w
　　表示一个字母或数字或下划线[a-zA-Z0-9_]
\d
　　一位数字 [0-9]
+
　　1 or more occurrences of the pattern to its left; 至少一个重复字符
　　ab+ 可以匹配 ‘a’ 后面接着任意非零个 ‘b’，但不能只匹配 ‘a’.
*
　　0 or more occurrences of the pattern to its left; 至少零个重复字符
　　ab* 可以匹配 ‘a’, ‘ab’, 或者 ‘a’ 接着任意数量的b.
?
　　match 0 or 1 occurrences of the pattern to its left; 0或1个字符
　　ab? 可以匹配 ‘a’ 或者 ‘ab’.
[]
　　用来表示字符的集合
　　[amk] 将会匹配’a’, ‘m’, 或者 ‘k’
　　[0-5][0-9]将会匹配00-59

^ = 开始, $ = 结束

一些例子：

match = re.search(r'iii', 'piiig') =>  found, match.group() == "iii"
match = re.search(r'igs', 'piiig') =>  not found, match == None

## . = any char but \n
match = re.search(r'..g', 'piiig') =>  found, match.group() == "iig"

## \d = digit char, \w = word char
match = re.search(r'\d\d\d', 'p123g') =>  found, match.group() == "123"
match = re.search(r'\w\w\w', '@@abcd!!') =>  found, match.group() == "abc"

match = re.search(r'pi+', 'piiig') =>  found, match.group() == "piii"

match = re.search(r'i+', 'piigiiii') =>  found, match.group() == "ii"

## \s* = zero or more whitespace chars
## Here look for 3 digits, possibly separated by whitespace.
match = re.search(r'\d\s*\d\s*\d', 'xx1 2   3xx') =>  found, match.group() == "1 2   3"
match = re.search(r'\d\s*\d\s*\d', 'xx12  3xx') =>  found, match.group() == "12  3"
match = re.search(r'\d\s*\d\s*\d', 'xx123xx') =>  found, match.group() == "123"

## ^ = matches the start of string, so this fails:
match = re.search(r'^b\w+', 'foobar') =>  not found, match == None
## but without the ^ it succeeds:
match = re.search(r'b\w+', 'foobar') =>  found, match.group() == "bar"

关于[]和()
这里我们想提取一个email地址：purple alice-ba@google.com monkey dishwasher
在使用了[]之后，@前后会尝试匹配\w,.,-字符，这里的.,-表示对应的字符。
如果我们想提取用户名和邮箱网站的域名，可以使用()来进行group extraction. Example 4中match.group(1)和match.group(2)直接提取两个部分。

## Example 1
str = 'purple alice-ba@google.com monkey dishwasher'
match = re.search(r'\w+@\w+', str)
if match:
    print match.group()  ## 'ba@google'

## Example 2
match = re.search(r'[\w.-]@[\w.-]', str) ## 如果不使用`+`，结果只会匹配一个字符
if match:
   print match.group()  ## 'b@g'

## Example 3
match = re.search(r'[\w.-]+@[\w.-]+', str)
if match:
   print match.group()  ## 'alice-ba@google.com'

## Example 4
match = re.search('([\w.-]+)@([\w.-]+)', str)
if match:
  print match.group()   ## 'alice-b@google.com' (the whole match)
  print match.group(1)  ## 'alice-b' (the username, group 1)
  print match.group(2)  ## 'google.com' (the host, group 2)

关于(?...)
(?=...) 表示只有当含有...内容时才匹配，lookahead assertion
(?!...) 表示只有当不含有...内容时才匹配, 所谓 negative lookahead assertion, 即是向前多匹配一些字符，不过按照否定逻辑来匹配。
(?<=...) 和第一个相比，多了一个<，表示positive lookbehind assertion. 正向匹配，但这次是向后正相匹配，即匹配到了目标字符后，再回头看一看是否满足条件。
(?<！...) 表示 negative lookbehind assertion，和前一个相比，只是逻辑为否定

m = re.search('abc(?=def)', 'abcef abcdef') # Match
# No Macth abc接着def，那就匹配 正向

m = re.search('abc(?=def)', 'abcef abcfef') 
# No match

m = re.search('abc(?!def)', 'abcef') 
# Macth

m = re.search('abc(?!def)', 'abcdef') 
# No Macth abc接着def，那就不匹配 负向

m = re.search('(?<=abc)def', 'abcdef')
# Match 不仅def匹配，前面接着abc，所以匹配

m = re.search('(?<!abc)def', 'abcdef')
# No match 虽然def匹配，前面接着abc，所以不匹配

关于(?P<name>...)和(?P=name)

m = re.match(r"(?P<first_name>\w+) (?P<last_name>\w+)", "Malcolm Reynolds")
m.group('first_name')
# 'Malcolm'
m.group('last_name')
# 'Reynolds'

# "(?P=name)" ：引用命名分组(别名)匹配：
pat=re.compile(r'(?P<K>a)\w(c)(?P=K)')    
# (?P=K)引用分组1的值，就是a
pat.search('abcdef').group()              
# No match，因为完整'a\wca',模式的第4位是a
pat.search('abcadef').group()             
# Match，模式的第4位和组1一样,值是c
# 'abca'
pat.search('abcadef').groups()
# ('a', 'c')

search() 和 match() 的区别
前者在整个string中匹配，而后者只从string的开头开始匹配。

import re
a = re.search(r'[1-5][0-9]',"653")
a.group() # '53'
a = re.match(r'[1-5][0-9]',"653")
a.group() # none
re.match("c", "abcdef")    # No match
re.search("c", "abcdef")   # Match
re.search("^c", "abcdef")  # No match
re.search("^a", "abcdef")  # Match

re.sub()

import re

phone = "2004-959-559 # This is Phone Number"

# Delete Python-style comments
num = re.sub(r'#.*$', "", phone)
print "Phone Num : ", num

# Remove anything other than digits
num = re.sub(r'\D', "", phone)    
print "Phone Num : ", num

Phone Num :  2004-959-559
Phone Num :  2004959559

re.match()

import re

line = "Cats are smarter than dogs"
matchObj = re.match( r'(.*) are (.*?) .*', line, re.M|re.I)

if matchObj:
   print "matchObj.group() : ", matchObj.group()
   print "matchObj.group(1) : ", matchObj.group(1)
   print "matchObj.group(2) : ", matchObj.group(2)
else:
   print "No match!!"

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

re.search()

import re

line = "Cats are smarter than dogs";

searchObj = re.search( r'(.*) are (.*?) .*', line, re.M|re.I)

if searchObj:
   print "searchObj.group() : ", searchObj.group()
   print "searchObj.group(1) : ", searchObj.group(1)
   print "searchObj.group(2) : ", searchObj.group(2)
else:
   print "Nothing found!!"

matchObj.group() :  Cats are smarter than dogs
matchObj.group(1) :  Cats
matchObj.group(2) :  smarter

re.split()

re.split(r'[_,.]','aa_100.txt')

待完善：
findall(); sub(); fullmatch();
边界

qilin2016

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Python Regular Expressions 正则表达式

Python Regular Expressions 正则表达式参考 python library 3.6 以及 Google for Education 编写
复制链接

扫一扫

专栏目录