re模块 Python语言下的正则表达式

  •  re模块是什么意思呢?  Regular Expression
  • match 模块  re.match(pattern,string,flags=0)  从初始位置开始匹配
    • pattern  匹配值
      • 表达式描述
        普通字符普通字符完全匹配
        []匹配[]中列举的字符
        .匹配任何单个字符,但‘\n’换行符除外
        \w匹配单词字符:字母,数字,下划线
        \W匹配任何非单词字符
        \b字词与非字词之间的界限
        \s匹配单个空格字符,即 空格 换行符 返回符 制表符
        \S匹配任何非空格字符
        \t,\n,\r制表符,换行符,退格符
        \d十进制数【0-9】
        \D匹配非数字
        ^匹配字符串的开头
        $匹配字符串的末尾
        \转义字符
        匹配多个字符
        *前一个字符出现0次或者任意次
        +前一个字符至少出现一次
        前一个字符要么出现一次要么不出现
        {m}前一个字符出现m次
        {m,n}前一个字符从m到n次;{,n}0到n从;{m,}m到无限次
        ()括起来更好看,利于group的处理
      • 几个例子:
      • import re
        
        pattern1 = "senior"
        pattern2 = "sen.\S\w"
        pattern3 = "sen[.abcd123]\S\w"
        
        string1 = "senior sister is really pretty!"
        string2 = "senior sister is really pretty!"
        string3 = "senior sister is really pretty!"
        
        flags1 = re.I
        
        result1 = re.match(pattern=pattern1,string=string1)
        result2 = re.match(pattern=pattern2,string=string2)
        result3 = re.match(pattern=pattern3,string=string3)
        print("这是result1:" , result1)
        print("这是result2:" , result2)
        print("这是result3:" , result3)
        
        >>>这是result1: <re.Match object; span=(0, 6), match='senior'>
        >>>这是result2: <re.Match object; span=(0, 6), match='senior'>
        >>>这是result3: None
        
      • 一个综合例子
      • import re
        
        pattern1 = "(i)\s(\w*)\s(\w*)\s(\w*)\s(\w*)"
        
        string1 = "I miss my best friend cyx who now in Beijing engineering!"
        string2 = "I find that she resonate with me less and less times than before."
        string3 = "maybe she is not as powerful and passionate as before."
        
        flags1 = re.I
        
        result1 = re.match(pattern=pattern1,string=string1,flags=flags1)
        result2 = re.match(pattern=pattern1,string=string2,flags=flags1)
        result3 = re.match(pattern=pattern1,string=string3,flags=flags1)
        
        print("这是result1:" , result1)
        print("这是result2:" , result2)
        print("这是result3:" , result3)
        
        >>>这是result1: <re.Match object; span=(0, 21), match='I miss my best friend'>
        >>>这是result2: <re.Match object; span=(0, 24), match='I find that she resonate'>
        >>>这是result3: None
        
    • string  被匹配字符串
    • flags  标志位
      •  
        标志位含义
        flags=re.I忽略大小写
        flags=re.L特殊字符集,取决的电脑的编译环境
        flags=re.M多行模式
        flags=re.S包括换行符在内的任意字符
        flags=re.U特殊字符集
        flags=re.X忽略空格和#后的注释
      • import re
        
        pattern1 = "Senior"
        string1 = "senior sister is really clever!"
        flags1 = re.I
        result1 = re.match(pattern=pattern1,string=string1,flags=flags1)
        print("这是result1:" , result1)
        
        >>>这是result1: <re.Match object; span=(0, 6), match='senior'>
    • 返回值的处理
      • 紧跟着上一个综合的例子
      • >>>result1.group()
        'I miss my best friend'
        
        >>>result1.group(1)
        'I'
        
        >>>result1.group(2)
        'miss'
        
        >>>result1.group(3)
        'my'
        
        >>>result1.group(4)
        'best'
        
      • 如果把pattern的括号去掉,问题就不一样了
      • pattern1 = "(i)\s\w*\s(\w*)\s(\w*)\s(\w*)"
        ...
        ...
        ...
        >>> result1.group()
        'I miss my best friend'
        >>> result1.group(1)
        'I'
        
        >>> result1.group(2)
        'my'
        >>> result1.group(3)
        'best'
        >>> result1.groups()
        ('I', 'my', 'best', 'friend')
        

      • match 中的匹配与分组 

        • 字符功能
          |匹配左右任意一个表达式
          (ab)括号中字符作为一个分组
          \num引用分组num匹配到的字符串
          (?P<name>)分组起别名
          (?P=name)引用别名为name匹配到的字符串
        •  举例
        • import re
          
          pattern1 = "(.*)\s(?P<name>\w*)\s(\w*).(\w*)\s(\w*)"
          pattern2 = "(?P<name>.*)\s(\w*)\s(\w*)\s(\w*)\s(\w*)"
          pattern3 = "(Even)\s(\w*)\s(\w*)\s(\w*)\s(\w*)"
          
          string1 = "In the past,we overcome many hard times."
          string2 = "At times,I get a lot of courage from this closed friendship."
          string3 = "Even she admitted I ranked high in her few friends."
          
          flags1 = re.I
          
          result1 = re.match(pattern=pattern1,string=string1,flags=flags1)
          result2 = re.match(pattern=pattern2,string=string2,flags=flags1)
          result3 = re.match(pattern=pattern3,string=string3,flags=flags1)
          print("这是result1:" , result1)
          print("这是result2:" , result2)
          print("这是result3:" , result3)
          
          
          
          这是result1: <re.Match object; span=(0, 39), match='In the past,we overcome many hard times'>
          这是result2: <re.Match object; span=(0, 59), match='At times,I get a lot of courage from this closed >
          这是result3: <re.Match object; span=(0, 26), match='Even she admitted I ranked'>
          >>> result1.group("name")
          'many'
          >>> result1.group(2)
          'many'
          
        • 如果我们观察仔细的话,你会发现result1的group的确是有点用的
        • 如果我们观察仔细的话,你会发现 这一切都是因为Python语言自带的贪婪性,总是想匹配更多的字符

  • re.search  函数
    • re.search 扫描整个字符串并返回第一个成功的匹配,或者None
    • 实例
    • import re
      
      pattern1 = "\d+"
      pattern2 = "\w*"
      pattern3 = "\w{4}"
      
      string1 = "From 2021,we become closed friends."
      string2 = "In 2022.10.1,I write this artical."
      string3 = "I have to face future and do more works in Physics."
      
      flags1 = re.I
      
      result1 = re.search(pattern=pattern1,string=string1,flags=flags1)
      result2 = re.search(pattern=pattern2,string=string2,flags=flags1)
      result3 = re.search(pattern=pattern3,string=string3,flags=flags1)
      
      print("这是result1:" , result1)
      print("这是result2:" , result2)
      print("这是result3:" , result3)
      
      这是result1: <re.Match object; span=(5, 9), match='2021'>
      这是result2: <re.Match object; span=(0, 2), match='In'>
      这是result3: <re.Match object; span=(2, 6), match='have'>
      

  • re.findall  函数
    • 找到正则表达式所匹配的所有子串
    • 实例
    • import re
      
      pattern1 = "\d+"
      pattern2 = "\w*"
      pattern3 = "\w{4}"
      
      string1 = "From 2021,we become closed friends."
      string2 = "In 2022.10.1,I write this artical."
      string3 = "I have to face future and do more works in Physics."
      
      flags1 = re.I
      
      result1 = re.findall(pattern=pattern1,string=string1,flags=flags1)
      result2 = re.findall(pattern=pattern2,string=string2,flags=flags1)
      result3 = re.findall(pattern=pattern3,string=string3,flags=flags1)
      
      print("这是result1:" , result1)
      print("这是result2:" , result2)
      print("这是result3:" , result3)
      
      
      这是result1: ['2021']
      这是result2: ['In', '', '2022', '', '10', '', '1', '', 'I', '', 'write', '', 'this', '', 'artical', '', '']
      这是result3: ['have', 'face', 'futu', 'more', 'work', 'Phys']

  •  re.finditer  函数
    • 与findall类似,并且作为一个迭代器返回
    • import re
      
      pattern1 = "\d+"
      pattern2 = "\w*"
      pattern3 = "\w{4}"
      
      string1 = "From 2021,we become closed friends."
      string2 = "In 2022.10.1,I write this artical."
      string3 = "I have to face future and do more works in Physics."
      
      flags1 = re.I
      
      result1 = re.finditer(pattern=pattern1,string=string1,flags=flags1)
      result2 = re.finditer(pattern=pattern2,string=string2,flags=flags1)
      result3 = re.finditer(pattern=pattern3,string=string3,flags=flags1)
      
      print("这是result1:" , result1)
      print("这是result2:" , result2)
      print("这是result3:" , result3)
      
      这是result1: <callable_iterator object at 0x0000019B7873CAC0>
      这是result2: <callable_iterator object at 0x0000019B7873C610>
      这是result3: <callable_iterator object at 0x0000019B787D26B0>
      
      >>> next(result2)
      <re.Match object; span=(0, 2), match='In'>
      >>> next(result2)
      <re.Match object; span=(2, 2), match=''>
      >>> next(result2)
      <re.Match object; span=(3, 7), match='2022'>
      >>> next(result2)
      <re.Match object; span=(7, 7), match=''>
      >>> next(result2)
      <re.Match object; span=(8, 10), match='10'>
      

  • re.compile 函数
    • 生成一个正则表达式对象,包括match和search两个方法
    • 实例

  • import re
    
    pattern1 = "(.*)\s(?P<name>\w*)\s(\w*).(\w*)\s(\w*)"
    pattern2 = "(?P<name>.*)\s(\w*)\s(\w*)\s(\w*)\s(\w*)"
    pattern3 = "(.*)\s(\w*)\s(\w*)\s(\w*)\s(\w*)"
    
    string1 = "I know that maybe today or tomorrow I will say goodbye and go and for always.All is over and long gone."
    string2 = "We seperate and no longer reunit.I leave a message here that I did really want do everything for her."
    string3 = "And this time,what I should do is to forsake her and never disturb her."
    
    flags1 = re.I
    
    prog1 = re.compile(pattern1,flags=flags1)
    prog2 = re.compile(pattern2,flags=flags1)
    prog3 = re.compile(pattern3,flags=flags1)
    
    result1 = prog1.match(string=string1)
    result2 = prog2.match(string=string2)
    result3 = prog3.match(string=string3)
    
    print("这是result1:" , result1)
    print("这是result2:" , result2)
    print("这是result3:" , result3)
    
    
    >>>这是result1: <re.Match object; span=(0, 102), match='I know that maybe today or tomorrow I will say go>
    >>>这是result2: <re.Match object; span=(0, 100), match='We seperate and no longer reunit.I leave a messag>
    >>>这是result3: <re.Match object; span=(0, 70), match='And this time,what I should do is to forsake her >
    

  •  re.sub  函数
    • sub -> substitute 替换
    • repl -> replace
    • re.sub(pattern,repl,string,count)
  • re.subn  函数
    • 返回一个元组(替换后的字符串,替换次数)
  • re.split   函数
    • 切割字符串
    • re.split(pattern,string,maxsplit=0,flags=0)
  • 实例
    • import re
      
      pattern1 = "\d+"
      pattern2 = "\w{4}"
      pattern3 = "\w{4}"
      
      string1 = "Ok.I do Physics from 2021 9,but like computer science.Even Feynman love striptease."
      string2 = "I hope I can do something and be someone in the future."
      string3 = "However,this may be impossible.I will tell you next time."
      
      flags1 = re.I
      
      result1 = re.sub(pattern=pattern1,repl="being admitted into SCU",string=string1,flags=flags1)
      result2 = re.subn(pattern=pattern2,repl="GAUSS",string=string2,flags=flags1)
      result3 = re.split(pattern=pattern3,string=string3,maxsplit=3,flags=flags1)
      
      print("这是result1:" , result1)
      print("这是result2:" , result2)
      print("这是result3:" , result3)
      
      这是result1: Ok.I do Physics from being admitted into SCU being admitted into SCU,but like computer science.Even Feynman love striptease.
      这是result2: ('I GAUSS I can do GAUSSGAUSSg and be GAUSSone in the GAUSSre.', 5)
      这是result3: ['', 'ver,', ' may be ', 'ssible.I will tell you next time.']
      

  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

River Chandler

谢谢,我会更努力学习工作的!!

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值