1. 字符匹配
- 字符相关
import re text = "Hello World! 123alialiya啊123alialiya啊 Hello" """简单字符串匹配""" data_list = re.findall("23a", text) print(data_list) """单个字符匹配""" data_list = re.findall("[abcde]", text) print(data_list) data_list = re.findall("23[abced]", text) print(data_list) """ [^a-z]除小写字母以外的其他字符""" data_list = re.findall("[^a-z]", text) print(data_list) """数字""" data_list = re.findall("1[0-9]", text) print(data_list) """ .代指除换行符以外的任意字符""" data_list = re.findall("a.i", text) print(data_list) """贪婪匹配""" data_list = re.findall("a.+i", text) # 贪婪匹配 print(data_list) """非贪婪匹配""" data_list = re.findall("a.+?i", text) # 非贪婪匹配 print(data_list) """ \w指字母或数字或下划线(包含汉字) +贪婪匹配""" data_list = re.findall("a\w+ya", text) print(data_list) """ \d代指数字""" data_list = re.findall("\dali", text) print(data_list) """贪婪匹配""" data_list = re.findall("\d+ali", text) print(data_list) """ \s代指意的空白符,包括空格、制表符等 """ """ ?问号代表前面的字符最多只可以出现一次(0次或1次)""" """ +?结合代指非贪婪匹配 """ data_list = re.findall("\s\w+?", text) print(data_list)
- 次数相关
""" * 重复0次或多次 + 重复1次或多次 ? 重复0次或1次 {n} 重复n次 {n,} 重复n次或更多次 {n,m} 重复n到m次 """
- 括号(分组)
"""提取指定区域""" import re text = "Hello World! 123alialiya啊123alialiya啊 Hello" data_list = re.findall("1(\d{2}\w{3})", text) print(data_list) data_list = re.findall("!\s(1(\d{2}\w{3}))", text) print(data_list)
"""提取指定区域 + 或条件""" data_list = re.findall("([\s|\w]123)", text) print(data_list) # [' 123', '啊123']
2.起始和结束
- 场景:要求用户输入的内容必须是指定的内容开头和结尾
- ^ 起始
- $ 结束
- 示例:
import re text1 = "aliali@qq.comali" text2 = "1234567@qq.com" email_list = re.findall("^\d+@\w+.com$", text1) print(email_list) # [] email_list = re.findall("^\d+@\w+.com$", text2, re.ASCII) print(email_list) # ['1234567@qq.com']
3. 特殊字符
- 正则表达式中:* . \ { } ( ) 等都具有特殊的含义,匹配时需要转义
- 转义符号:\
4. re模块常用方法
- findall,获取匹配到的所有数据,返回list
import re text = "dsf130429191912015219k13042919591219521Xkk" data_list = re.findall("(\d{6})(\d{4})(\d{2})(\d{2})(\d{3})([0-9]|X)", text) print(data_list) # [('130429', '1919', '12', '01', '521', '9'), ('130429', '1959', '12', '19', '521', 'X')]
- match,从起始位置开始匹配,匹配成功返回一个对象,未匹配成功返回None
import re text = "逗2B最逗3B欢乐" data = re.match("逗\dB", text) if data: content = data.group() # "逗2B" print(content)
- search,浏览整个字符串去匹配第一个,匹配成功返回对象,未匹配成功返回None
import re text = "大小逗2B最逗3B欢乐" data = re.search("逗\dB", text) if data: print(data.group()) # "逗2B"
- sub,替换匹配成功的位置
import re text = "逗2B最逗3B欢乐" data = re.sub("\dB", "沙雕", text, 1) # 替换一次,不指定次数默认替换全部 print(data) # 逗沙雕最逗3B欢乐
- split,根据匹配成功的位置分割,返回list
import re text = "逗2B最逗3B欢乐" data = re.split("\dB", text, 1) # 分割第一个,不指定次数时默认分割全部 print(data) # ['逗', '最逗3B欢乐']
- finditer,返回迭代器对象
import re text = "逗2B最逗3B欢乐" data = re.finditer("\dB", text) for item in data: print(item.group()) data = re.finditer("(?P<xx>\dB)", text) # 命名分组 for item in data: print(item.groupdict())
- 例:判断字符串是否是合法的整数或小数(首位可能含有空格),例如:‘.12’, ‘12’, ‘1.2’, ‘1e+2’都是合法数,’.', ‘2 3’, '3e’不合法
import re s = ' -234.3e+3 ' res = re.match('^\ *[\+-]?(\d+\.?\d*|\.\d+)([eE][\+-]?\d+)?\ *$', s) # re.match方法,符合返回第一个成功的匹配,否则返回None if res: return True else: return False