1.正则表达式是什么?
正则表达式是对字符串操作的一种逻辑公式,就是用事先定义好的一些特定字符串及这些特定字符的组合,组成有一个“规则字符串”,这个“规则字符串”用来表达对字符串的一种过滤逻辑
2.用python来实现正则表达式从而去过滤指定的字符串
需求:在 I say Good not food 字符串中过滤出想要过滤的内容(可灵活调度)
实现方式:
打开python终端(命令行输入python3),在已安装好Python的Redhat Linux8中
调用re模块,re模块是负责来做正则表达式应用的模块(输入指令import re来进行调用),可以通过dir(re)指令来查看此模块下的属性方法,常用的是findall方法,来实现过滤指定内容的需求
过滤出Good和food两个单词
- 单个字符匹配
.
点的作用,匹配单个任意字符
>>> re.findall(".ood","I say Good not food")
['Good', 'food']
>>>
[ ]
中括号中的内容会被逐一匹配
>>> re.findall("[Gf]ood","I say Good not food")
['Good', 'food']
>>>
\d
匹配单个数字
>>> re.findall("\d","I am 18")
['1', '8']
>>> re.findall("\d\d","I am 18")
['18']
>>>
\w
匹配0-9,a-z,A-Z
>>> re.findall("\w","I say Good not food")
['I', 's', 'a', 'y', 'G', 'o', 'o', 'd', 'n', 'o', 't', 'f', 'o', 'o', 'd']
>>> re.findall("\w","c d!1_")
['c', 'd', '1', '_']
>>>
\s
匹配空白字符(空格或tab键)
>>> re.findall("\s","c d!1_")
[' ']
>>>
- 匹配一组字符串
直接匹配
区分大小写
>>> re.findall("good","I say Good not food")
[]
>>> re.findall("Good","I say Good not food")
['Good']
>>>
特殊字符的应用:
分隔符(" | ")
匹配两个不同的字符串
>>> re.findall("Good|food","I say Good not food")
['Good', 'food']
>>>
" * "
匹配左邻字符出现0次或者多次
>>> re.findall("go*gle","I like google not ggle goooogle and gogle")
['google', 'ggle', 'goooogle', 'gogle']
>>>
" + "
匹配左邻字符出现1次或者多次
>>> re.findall("go+gle","I like google not ggle goooogle and gogle")
['google', 'goooogle', 'gogle']
>>>
" ? "
匹配左邻字符出现0次或者1次
>>> re.findall("go?gle","I like google not ggle goooogle and gogle")
['ggle', 'gogle']
>>>
" {} "
定义左邻字符出现的次数
>>> re.findall("go{2}gle","I like google not ggle goooogle and gogle") ##左邻字符出现2次的字符串
['google']
>>> re.findall("go{2,6}gle","I like google not ggle goooogle and gogle") ##左邻字符出现最少2次,最多6次的字符串
['google', 'goooogle']
>>> re.findall("go{2,3}gle","I like google not ggle goooogle and gogle") ##左邻字符出现最少2次,最多3次的字符串
['google']
>>>
" ^ "
匹配是否以某个字符串开头
>>> re.findall("^I like","I like google not ggle goooogle and gogle")
['I like']
>>>
" $ "
匹配是否以某个字符串结尾
>>> re.findall("gogle$","I like google not ggle goooogle and gogle")
['gogle']
>>>
" () "
分组保存 \数字
只写一个jerry,把jerryjerry字符串匹配出来
>>> re.findall("jerry","my name is jerryjerry")
['jerry', 'jerry']
>>> test = re.search("(jerry)\\1","my name is jerryjerry") ##变量赋值
>>> test.group()
'jerryjerry'
>>>