python攻城狮007-- 正则表达式

最新推荐文章于 2021-08-21 19:16:43 发布

MINUS大大

最新推荐文章于 2021-08-21 19:16:43 发布

阅读量285

点赞数

分类专栏： # Python

本文链接：https://blog.csdn.net/weixin_45625553/article/details/103342059

版权

Python 专栏收录该内容

32 篇文章 0 订阅

订阅专栏

之前五篇
001-函数技巧\变量\包\标准库
 002-高阶函数技巧-lambda\filter\map\reduce 函数
 003–文件の读写
 004–面向对象
 005–装饰器
 006–迭代器

文章目录

@[toc]

正则表达式内容
什么是正则表达式?
正则表达式中的符号:
一些在线正则表达式网站:
使用正则表达式
基本的匹配的
匹配同类型的
边界匹配
指定匹配
正则表达式分组
贪婪模式 vs 非贪婪模式
实战

在python中用正则表达式
学习目标
用`compile`编译
用`findall`方法查找
用`search`方法
用`group` 与 `groups`
用`split`切割
用`sub`替换

正则表达式内容

应用:爬虫用\ 输入框用(正则表达式验证) \ 邮件输入验证

入门及应用
正则表达式进阶
案例(搜东西,正则表达式找出,保存)
综合项目实战

什么是正则表达式?

正则表达式 regex 一些由字符和特殊符号组成的字符串
能按照某种模式匹配一系列有相似特征的字符串

正则表达式中的符号:

一些在线正则表达式网站:

使用正则表达式

多重写一起时候,不要加空格(如果是需要空格的除外)

基本的匹配的

简单匹配: abc => abc
多个匹配: abc | 12c ==> abc , 12c
匹配任意字符: . ==>abc , 12c
使用* 匹配0次或者多次(指的是*之前的内容) 如字符串aabc ,然后a*bc则可以匹配出aabc.
使用+匹配1次或者多次
使用?匹配0次或者1次
使用{N}匹配指定出现n次的
使用{M,N}匹配指定出现m到n次的,最大化优先

匹配同类型的

\d匹配数字的
\w匹配数字或字母
\s匹配任何空格字符串(包括tab,空格,换行)

边界匹配

^用于匹配以什么开头的,如^he.
$$用于匹配结尾
\的转义匹配.如点.

指定匹配

[]指定要匹配集合如[abc]{2}
[^]指定不要匹配内容,如[^abc]{2}

正则表达式分组

需要理解一层一层的嵌套组合起来
(\d{1,3}.){3}\d{1,3} 匹配ip地址
使用\1 \2反向引用,如下例子:

he loves her lover.
he likes her liker.

he (l..e)s her \1r    可以匹配两句话
he (?<name>l..e)s her \k<name>r    也可以匹配两句话

贪婪模式 vs 非贪婪模式

贪婪匹配—在整个表达式匹配成功前提下,尽可能多
非贪婪匹配—在整个表达式匹配成功前提下,以最少的匹配字符
默认是贪婪的
贪婪:表达式 ab.+c
非贪婪:表达式ab.*?c

实战

身份证号码
电话号码
邮箱

在python中用正则表达式

学习目标

python模块

import re

search()的使用
findall()的使用
group()与groups()及groupdict()的使用
split()正则分割
Sub()正则替换
掌握compile()和match()函数的使用

用`compile`编译

import re
# 通过compile,将正则表达式编译
# re.I  忽略大小写
pattern = re.compile(r'hello',re.I)
# print(dir(pattern))
# 通过match进行匹配
rest = pattern.match('Hello,world')
print(rest)   #<re.Match object; span=(0, 5), match='Hello'>
print(dir(rest))
print('string:',rest.string)
print('re:',rest.re)
print('group:',rest.group)

用`findall`方法查找

findall(pattern , string [,flage])
查找字符串中所有(非重复)出现的正则表达式模式,并返回一个匹配列表
下例使用了编译和不编译直接匹配两种方式

import re

content='one1Two12three33four444five5six698'
# 使用编译对象
p=re.compile(r'\d+')
p1=re.compile(r'[a-z]+',re.I)
rest=p.findall(content)
rest1=p1.findall(content)
print(rest)
# >>>['1', '12', '33', '444', '5', '698']
print(rest1)
# >>>['one', 'two', 'three', 'four', 'five', 'six']

# 不编译
re.findall(r'[a-z]+',content,re.I)
# >>>['one', 'two', 'three', 'four', 'five', 'six']

用`search`方法

search(pattern,string ,flags=0)
使用可选标记搜索字符串中第一次出现的正则表达式模式.如果匹配成功,则返回匹配对象 ; 如果失败,则返回None
代码方式理解如下

import re 
content ='hello, world'
# 通过编译对象再匹配
p=re.compile(r'world')
rest = p.search(content)
print(rest)
# >>><re.Match object; span=(7, 12), match='world'>

# 也可以直接用
rest_func = re.search(r'world',content,re.I)
print(rest_func)
# >>><re.Match object; span=(7, 12), match='world'>

# 使用match
rest_match=p.match(content)
print(rest_match)
# >>>None

# match是从头找 和 search全部依次找 区别是有的

用`group` 与 `groups`

group(num),不带num返回整个匹配对象 ,带num返回指定匹配对象
groups(): 返回一个包含所有匹配子组的元组(没有返回空元组)

import re

def test_group():
    content='hello,world!'
    p=re.compile(r'world')
    rest = p.search(content)
    print(rest)
    # >>><re.Match object; span=(6, 11), match='world'>

    # 判断,匹配到再打印
    if rest:
        # group的使用
        print(rest.group())
        # >>>world

        # groups的使用
        print(rest.groups())
def test_id_card():
    """身份证号码正则匹配"""
#     p = re.compile(r'(\d{6})(\d{4})((\d{2})(\d{2}))(\d{2})(\d{1})([0-9]|X)')
    p = re.compile(r'(\d{6})(?P<year>\d{4})((?P<month>\d{2})(?P<day>\d{2}))(\d{2})(\d{1})([0-9]|X)')
    ## 准备两个身份证号码
    id1='430656199610015493'
    id2='110229198909220036'
    id3='11012920090923304X'
    id4='110129199909220044'
    
    rest1=p.search(id1)
    rest3=p.search(id3)
    print(rest1.group())  
    # >>>430656199610015493
    print(rest3.group(2)) 
    # >>>2009
    print(rest3.groups()) 
    # >>>('110129', '2009', '0923', '09', '23', '30', '4', 'X')
    print(rest1.groupdict())
    # >>>{'year': '1996', 'month': '10', 'day': '01'}
        
if __name__=='__main__':
    test_group()
    test_id_card()
    
"""
函数执行结果如下:(上边已经分条粘贴过)
<re.Match object; span=(6, 11), match='world'>
world
()
430656199610015493
2009
('110129', '2009', '0923', '09', '23', '30', '4', 'X')
{'year': '1996', 'month': '10', 'day': '01'}
"""

用`split`切割

split(pattern, string, max=0)
split函数将字符串分割为列表,然后返回成功匹配的列表,分割最多操作max次(默认值是分割所有匹配位置)

import re
"""
使用正则分割字符串
"""
ss='one1Two12three3333four444five5six698'

p=re.compile(r'\d+',re.I)
rest = p.split(ss )
print(rest)
#>>>['one ', 'Two', 'three', 'four', 'five', 'six', '']

用`sub`替换

sub(pattern,repl ,string,max=0)
使用repl替换string中每一个匹配的子串后返回替换后字符串,最多操作max次(默认是所有)

import re
"""
使用正则表达式替换
"""
s='one1Two12three3333four444five5six698'
# 目标效果: one@Two@three@four@five@six@

# 使用正则------------------------------------
p =re.compile(r'\d+')
rest = p.sub('@',s)
print(rest)
# >>>one@Two@three@four@five@six@

# 使用字符串替换-效率低------------------------------------
rest_origin=s.replace('1','@').replace('2','@').replace('3333','@')
print(rest_origin)
# >>>one@Two@@three@four444five5six698

# 再玩一个 正则换位------------------------------------
s2= 'hello world'
p2=re.compile(r'(\w+) (\w+)')
rest_pos=p2.sub(r'\2 \1',s2)
print(rest_pos)
# >>>world hello

# 希望替换并改成大写------------------------------------
def f(m):
    """使用函数进行替换规则改变 """
    return m.group(2).upper()+' '+ m.group(1).upper()


rest_change = p2.sub(f,s2)

print(rest_change)
# >>>WORLD HELLO

# 使用几名函数进行替换:lambda------------------------------------
rest_lamb = p2.sub(lambda m : m.group(2).upper()+' '+m.group(1),s2)
print(rest_lamb)
# >>>WORLD hello

写在最后: 这篇文章有点儿长, 整理笔记累死人

MINUS大大

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
0
评论
python攻城狮007-- 正则表达式

之前五篇001-函数技巧\变量\包\标准库002-高阶函数技巧-lambda\filter\map\reduce 函数003–文件の读写004–面向对象005–装饰器006–迭代器文章目录@[toc]正则表达式内容什么是正则表达式?正则表达式中的符号:一些在线正则表达式网站:使用正则表达式基本的匹配的匹配同类型的边界匹配指定匹配正则表达式分组贪婪模式 vs 非贪婪模式实战在py...
复制链接

扫一扫