python正则表达式笔记

最新推荐文章于 2021-01-29 17:17:21 发布
a599174211
最新推荐文章于 2021-01-29 17:17:21 发布
阅读量218
点赞数
分类专栏： Python
本文链接：https://blog.csdn.net/a599174211/article/details/82287983
版权
Python 专栏收录该内容
30 篇文章 2 订阅
订阅专栏

 # 常见正则表达式和特殊符号
     
   ###  表示法                                    描述                            正则表达式示例
   
   ###  literal                         匹配文本字符串的字面值literal               fool
   ### re1|re2                        匹配正则表达式re1或者re2                     foot|foot2
   ###   .                                  匹配任何字符串(除了\n之外)                   b.b
   ###   ^                                  匹配字符串起始位置                             ^Dear
   ###   $                             匹配字符串的结束位置                      dear$
   ###   *                                匹配0次或多次前面出现的正则表达式       [ab]*
   ###  +                                 匹配1次或多次前面出现的正则表达式      [ab]+
   ###  ?                                 匹配0次或1次前面出现的的正则表达式    [ab]?
   ###  {n}                              匹配n次前面出现的正则表达式                 {3}
   ###  {n,m}                          匹配n-m次前面出现的正则表达式             {3,5}
   ###  [...]                              匹配来自字符集的任意单一字符                [abc]
   ###  [x-y]                            匹配x~y范围中的任意单一字符                 [0-9],[A-Za-z]
   ###  [^...]                            不匹配此字符集中出现的任何一个字符     [^a-z],[^aeiou]
                                   包括某一范围的字符(如果在此字符集中出现)
   ###  (*|+|?|{})?                   用于匹配上面频繁出现的/重复出现的符号的  .*?[a-z]
                                   非贪婪版本(*、+、？、{})
   ###  (....)                              匹配封闭的正则表达式,然后另存为子组     ([0-9]{3})?,f(oo|u)bar
   
   ##   特殊字符
   
   ###  \d                                 匹配任何十进制的数字,与[0-9]一致,            data\d+.txt
                                   (\D与\d相反)不匹配任何非数值型的数字
   ###  \w                                 匹配任何字母数字字符,与[A-Za-z0-9]相同  [A-Za-z0-9]\w+
                                   (\W与\w相反)
   ###  \s                                匹配任何空格字符,与[\n \t \r \v \f]相同,与(\S)相反   of\she
   ###  \b                           匹配任何单词边界(\B与之相反)                          \bThe\b
   ###  \N                           匹配以保存的子组N                                             price:\16
   ###  \c                           逐子匹配任何特殊字符c(即按照字面意义匹配    \.,\*,\\
                               不匹配特殊含义)
   ###  \A(\Z)                     匹配字符串的起始(结束)                                       \ADear
   
   ##  拓展表示法
   ### (?iLmsux)           在正则表达式中嵌入一个或者多个特殊'标记'参数   (?x),(? im)
                            (或者通过函数/方法)
   ### (?:...)                    表示一个匹配不用保存的分组                                  (?:\w+\.)*
   ###  (?P<name>...)     由一个仅有name表示而不是数字ID标识的正则分配匹配           (?P<data>)
   
   ### (?P=name)          在同一个字符串中匹配由(?P<name)分组的之前文本 (?P=name)
   ### (?#...)                  表示注释所有内容都被忽略                                        (?#comment)
   ### (?=...)                 匹配条件是如果...出现之后的位置,而不是使用输入字符串   (?=.com)
                            称之为正向断言 
   ### (?!...)                   匹配条件是如果...不出现在之后的位置,二部使用输入          (?!.net)
                            字符串:称之为负向前视断言
   ### (?<=...)              匹配条件是如果...出现在之前的位置,而不是使用输入字符串  (?<=800-)
                           称之为正向后视断言 
   ### (?<!...)               匹配条件是如果...不出现在之前的位置,而不使用输入字符串  (?!<192\.168\.)
                           称之为负向后视断言
   ### (?(id/name)Y/N)   如果分组所提供的id或者name(名称)存在,就返回正则表达式的  (?(1)Y|X)
                             条件匹配Y,如果不存在,就返回N;|N为可选项

# 匹配对象以及group()贺groups()方法,使用match()方法匹配字符串



```python
#使用match()方法匹配字符串
import re
m = re.match('wangzhen', 'wangzhen')
if m is not None: print(m.group())
#特点从字符串开始到结束匹配(从左到右)
m = re.match('food', 'food is good')
if m is not None :
   print( m.group())
else :
    print('匹配失败')
m = re.match('food', 'check is good food')
if m is not None :
    print(m.group())
else :
    print('匹配失败')

#匹配多个字符
bt = 'bd|ba|bc'
m = re.match(bt, 'ba')
print(m.group())
```

    wangzhen
    food
    匹配失败
    ba
    

# 使用search()在一个字符串中查找模式(搜索与匹配)


```python
#search()无需向match一样从左到右匹配,可以从中间搜索
m = re.search('food', 'check is good food')
if m is not None: print('搜索成功{}'.format(m.group()))

#匹配多个字符串
bt = 'ba|bb|bc'
m = re.search(bt, 'is good ba check')
if m is not None : print('搜索成功{}'.format(m.group()))
    
#.除了换行符之外的所有字符
rd = '.end'
m = re.search(rd, '\nend')
if m is not None : 
    print('搜索成功{}'.format(m.group()))
else :
    print('搜索失败')

```

    搜索成功food
    搜索成功ba
    搜索失败
    

# 创建字符集([])


```python
m = re.match('[ab][cd][12][3-5]', 'ad24')
print(m.group())
```

    ad24
    

# 重复/特许字符以及分组


```python
#email正则表达式
emai1 = '\w+@(\w+\.)?\w+\.com'
emai2 = '\w+@(\w+\.)*\w+\.com'
print(re.match(emai1, 'wangzhen@163.com').group())
print(re.match(emai2, 'wangzhen@xxx.163.com').group())
```

    wangzhen@163.com
    wangzhen@xxx.163.com
    


```python
#连接符-
t = '\d\d\d-\w\w\w'
print(re.match(t,'123-wan').group())
#()表示存为一个子组
t = '(\d\d\d)-(\w\w\w)'
m = re.match(t, '123-wan')
print('m.group(1){}'.format(m.group(1)))
print('m.group(2){}'.format(m.group(2)))
print('m.groups(){}'.format(m.groups()))
groupss = '(a(b))'
m = re.match(groupss, 'ab')
print('m.group(1):{}'.format(m.group(1)))
print('m.group(2):{}'.format(m.group(2)))
```

    123-wan
    m.group(1)123
    m.group(2)wan
    m.groups()('123', 'wan')
    m.group(1):ab
    m.group(2):b
    

# 匹配字符串的起始贺结尾以及单词边界


```python
#起始^
m = re.match('^the', 'the hello')
if m is not None : print('m.group:{}'.format(m.group()))
#结尾$
m = re.search('the$', 'the')
if m is not None: print('m.group$:{}'.format(m.group()))
#边界
m = re.search('\bthe','hellothe world')#有边界 搜索失败
if m is not None : print('m.group{}'.format(m.group()))
m = re.search('\Bthe','helloothe world')#无边界 搜索成功
if m is not None: print('\Bm.group(){}'.format(m.group()))
```

    m.group:the
    m.group$:the
    \Bm.group()the
    

# 使用findall()和finditer()查找每一次出现的位置
#### -findal()总是返回一个列表,如果findal()没有找到匹配的部分,就返回一个空列表
#### -匹配成功,列表将包含所有成功匹配的部分(从左到右按顺序排列)


```python
#匹配一个
print(re.findall('car', 'car'))
```

    ['car']
    


```python
#匹配多个
print(re.findall('[ab]', 'abcaaa'))
```

    ['a', 'b', 'a', 'a', 'a']
    


```python
#finditer()函数必findall()函数类似,但是更节省内存的变体
#finditer()返回的是一个迭代器

s = 'This and that'
b = re.findall(r'(th\w+) and (th\w+)', s, re.I)

a= [i.groups() for i in re.finditer(r'(th\w+) and (th\w+)', s, re.I)]
print(a)

```

    [('This', 'that')]
    


```python
lista = [1,2,2,2,2,4,4,4,4,4,4,5,5,5,5,5,6,6,7,7,7,8,8,9]
listb = []
for i in lista:
    listb.append(lista.count(i))
print(listb)
print(lista)
dicta = dict(zip(listb, lista))
print('dicta{}'.format(dicta))
listc = []
for i in dicta :
    liste = []
    liste.append(i)
    liste.append(dicta[i])
    listc.append(liste)
print(listc)
listc.sort(reverse=True)
print('listc:{}'.format(listc))
for i in listc :
    print('{}出现了{}次'.format(i[1],i[0]))
#

# set1 = set(listc)
# print('set1{}'.format(set1))
# listd = list(set1)
# # listd = [i for i in set1]
# # listd.sort(reverse=True)
# listd.sort(reverse=True)
# print(listd)
# print(listd[0:3])
# help(list.sort)
```

    [1, 4, 4, 4, 4, 6, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 2, 2, 3, 3, 3, 2, 2, 1]
    [1, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 6, 6, 7, 7, 7, 8, 8, 9]
    dicta{1: 9, 4: 2, 6: 4, 5: 5, 2: 8, 3: 7}
    [[1, 9], [4, 2], [6, 4], [5, 5], [2, 8], [3, 7]]
    listc:[[6, 4], [5, 5], [4, 2], [3, 7], [2, 8], [1, 9]]
    4出现了6次
    5出现了5次
    2出现了4次
    7出现了3次
    8出现了2次
    9出现了1次
    

# 使用sub()和 subn()搜索与替换
   #### --这两个函数用于实现搜索和替换功能:两者几乎一样,都是将某字符串中所有匹配正则表达式的部分进行某种形式的替换
   #### -- 但是subn()还返回一个表示替换的总数,替换后的字符串和表示替换总数的数字一起作为一个拥有两个元素的元组返回


```python
s = 'hello'
sub1 = re.sub(r'x', s, 'x wangzhen x wang')
print(sub1)
sub2 = re.subn(r'x', s, 'x wangzhen x wang')
print(sub2)
```

    hello wangzhen hello wang
    ('hello wangzhen hello wang', 2)
    

# 在限定的模式上使用split()分隔字符串


```python
import re 
data = (
        'mountain view, CA 94040',
        'Sunnyvale, CA',
        'Los Altos, 94023',
        'Cupertino 95014',
        'Palo Alto CA',)
for i in data :
    print(re.split(r',|(?=(?:\d{5} | [A-Z]{2}))', i))   
```

    ['mountain view', ' CA 94040']
    ['Sunnyvale', ' CA']
    ['Los Altos', ' 94023']
    ['Cupertino 95014']
    ['Palo Alto CA']
    

    E:\Anaconda3\lib\re.py:212: FutureWarning: split() requires a non-empty pattern match.
      return _compile(pattern, flags).split(string, maxsplit)