python正则表达式生成器_Python 学习笔记（四）正则、闭合、生成器

最新推荐文章于 2024-05-23 20:58:55 发布

weixin_39844963

最新推荐文章于 2024-05-23 20:58:55 发布

阅读量175

点赞数

文章标签： python正则表达式生成器

(一)正则表达式

基本规则：

^ 匹配字符串开始位置。

$ 匹配字符串结束位置。

\b 匹配一个单词边界。

\d 匹配一个数字。

\D 匹配一个任意的非数字字符。

x? 匹配可选的x字符。换句话说，就是0个或者1个x字符。

x* 匹配0个或更多的x。

x+ 匹配1个或者更多x。

x{n,m} 匹配n到m个x，至少n个，不能超过m个。

(a|b|c) 匹配单独的任意一个a或者b或者c。

(x) 这是一个组，它会记忆它匹配到的字符串。你可以用re.search返回的匹配对象的groups()函数来获取到匹配的值。

[abc] 匹配a b c 中的一个

[^abc] 除了a、b或c之外的任何字符

案例1：罗马数字

在罗马数字中，有七个不同的数字可以以不同的方式结合起来表示其他数字。

I = 1

V = 5

X = 10

L = 50

C = 100

D = 500

M = 1000

下面是几个通常的规则来构成罗马数字：

大部分时候用字符相叠加来表示数字。I是1， II是2， III是3。VI是6(挨个看来，是“5 和 1”的组合)，VII是7，VIII是8。

含有10的字符(I，X，C和M)最多可以重复出现三个。为了表示4，必须用同一位数的下一个更大的数字5来减去一。不能用IIII来表示4，而应该是IV(意思是比5小1)。40写做XL(比50小10)，41写做XLI，42写做XLII，43写做XLIII，44写做XLIV(比50小10并且比5小1)。

有些时候表示方法恰恰相反。为了表示一个中间的数字，需要从一个最终的值来减。比如：9需要从10来减：8是VIII，但9确是IX(比10小1)，并不是VIII(I字符不能重复4次)。90是XC，900是CM。

表示5的字符不能在一个数字中重复出现。10只能用X表示，不能用VV表示。100只能用C表示，而不是LL。

罗马数字是从左到右来计算，因此字符的顺序非常重要。DC表示600，而CD完全是另一个数字400(比500小100)。CI是101，IC不是一个罗马数字(因为你不能从100减1，你只能写成XCIX，表示比100小10，且比10小1)

可以从千位开始表示：

pattern = '^M{0,3}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})$'

View Code

转为松散正则表达式，

>>> pattern = '''^ # beginning of string

M{0,3} # thousands - 0 to 3 Ms

(CM|CD|D?C{0,3}) # hundreds - 900 (CM), 400 (CD), 0-300 (0 to 3 Cs),

# or 500-800 (D, followed by 0 to 3 Cs)

(XC|XL|L?X{0,3}) # tens - 90 (XC), 40 (XL), 0-30 (0 to 3 Xs),

# or 50-80 (L, followed by 0 to 3 Xs)

(IX|IV|V?I{0,3}) # ones - 9 (IX), 4 (IV), 0-3 (0 to 3 Is),

# or 5-8 (V, followed by 0 to 3 Is)

$ # end of string'''

View Codere.search(pattern, 'M', re.VERBOSE)

注意，如果要使用松散正则表达式，需要传递一个叫re.VERBOSE的参数

案例2：电话格式

800-555-1212

800 555 1212

800.555.1212

(800) 555-1212

1-800-555-1212

800-555-1212-1234

800-555-1212x1234

800-555-1212 ext. 1234

work 1-(800) 555.1212 #1234

>>> phonePattern = re.compile(r'(\d{3})\D*(\d{3})\D*(\d{4})\D*(\d*)$') ①>>> phonePattern.search('work 1-(800) 555.1212 #1234').groups() ②

('800', '555', '1212', '1234')>>> phonePattern.search('800-555-1212') ③

('800', '555', '1212', '')>>> phonePattern.search('80055512121234') ④

('800', '555', '1212', '1234')

View Code

>>> phonePattern = re.compile(r'''# don't match beginning of string, number can start anywhere

(\d{3}) # area code is 3 digits (e.g. '800')

\D* # optional separator is any number of non-digits

(\d{3}) # trunk is 3 digits (e.g. '555')

\D* # optional separator

(\d{4}) # rest of number is 4 digits (e.g. '1212')

\D* # optional separator

(\d*) # extension is optional and can be any number of digits

$ # end of string''', re.VERBOSE)>>> phonePattern.search('work 1-(800) 555.1212 #1234').groups() ①

('800', '555', '1212', '1234')

View Code

案例3：复数名词

如果某个单词以 S 、X 或 Z 结尾，添加 ES 。Bass 变成 basses， fax 变成 faxes，而 waltz 变成 waltzes。

如果某个单词以发音的 H 结尾，加 ES；如果以不发音的 H 结尾，只需加上 S 。什么是发音的 H ？指的是它和其它字母组合在一起发出能够听到的声音。因此 coach 变成 coaches 而 rash 变成 rashes，因为在说这两个单词的时候，能够听到 CH 和 SH 的发音。但是 cheetah 变成 cheetahs，因为 H 不发音。

如果某个单词以发 I 音的字母 Y 结尾，将 Y 改成 IES；如果 Y 与某个原因字母组合发其它音的话，只需加上 S 。因此 vacancy 变成 vacancies，但 day 变成 days 。

如果所有这些规则都不适用，只需加上 S 并作最好的打算。

defplural(noun):if re.search('[sxz]$', noun):return re.sub('$', 'es', noun) ①elif re.search('[^aeioudgkprt]h$', noun): ②return re.sub('$', 'es', noun)elif re.search('[^aeiou]y$', noun): ③return re.sub('y$', 'ies', noun)else:return noun + 's'

View Code

>>>re.sub('([^aeiou])y$', r'\1ies','vacancy')'vacancies'

\1，它表示“嘿，记住的第一个分组呢？把它放到这里。”在此例中，记住了y之前的c，在进行替换时，将用c替代c，用ies替代y。(如果有超过一个的记忆分组，可以使用\2和\3等等。)

(二)闭合

在动态函数中使用外部参数值的技术称为闭合【closures】

importredefmatch_sxz(noun):return re.search('[sxz]$', noun)defapply_sxz(noun):return re.sub('$', 'es', noun)defmatch_h(noun):return re.search('[^aeioudgkprt]h$', noun)defapply_h(noun):return re.sub('$', 'es', noun)defmatch_y(noun): ①return re.search('[^aeiou]y$', noun)defapply_y(noun): ②return re.sub('y$', 'ies', noun)defmatch_default(noun):returnTruedefapply_default(noun):return noun + 's'rules=((match_sxz, apply_sxz), ③ #rules数据结构——一个函数对的序列

(match_h, apply_h),

(match_y, apply_y),

(match_default, apply_default)

)defplural(noun):for matches_rule, apply_rule inrules:ifmatches_rule(noun):return apply_rule(noun)

匹配模式列表：

importredefbuild_match_and_apply_functions(pattern, search, replace):defmatches_rule(word): ①returnre.search(pattern, word)defapply_rule(word): ②returnre.sub(search, replace, word)return (matches_rule, apply_rule) ③

如何调用这个函数呢：

patterns =\ ①

(

('[sxz]$', '$', 'es'),

('[^aeioudgkprt]h$', '$', 'es'),

('(qu|[^aeiou])y$', 'y$', 'ies'),

('$', '$', 's') ②

)

rules=[build_match_and_apply_functions(pattern, search, replace) ③for (pattern, search, replace) in patterns]

patterns 为字符串的元组的元组

rules 为一个元组列表，每个元组都是一对函数

匹配模式文件

将规则放在独立的文件中，便于维护。

[sxz]$ $ es[^aeioudgkprt]h$ $ es[^aeiou]y$ y$ ies

$ $ s

importredefbuild_match_and_apply_functions(pattern, search, replace): ①defmatches_rule(word):returnre.search(pattern, word)defapply_rule(word):returnre.sub(search, replace, word)return(matches_rule, apply_rule)

rules=[]

with open('plural4-rules.txt', encoding='utf-8') as pattern_file: ② #with语句创建了叫做context【上下文】的东西：当with块结束时，Python 将自动关闭文件for line inpattern_file: ③

pattern, search, replace= line.split(None, 3) ④ #None 代表空格或者制表符，3表示只取前3个

rules.append(build_match_and_apply_functions( ⑤

pattern, search, replace))

(二)生成器

>>> defmake_counter(x):print('entering make_counter')whileTrue:print('-'*50)yieldxprint('incrementing x %d' %(x))

x= x+1

>>> counter = make_counter(2) #仅仅调用，未执行>>>counter #返回了一个生成器对象

>>>next(counter) #next()函数以一个生成器对象为参数，并返回其下一个值

entering make_counter--------------------------------------------------

>>>next(counter)

incrementing x2

--------------------------------------------------

3 #停止在yield位置

* “yield” 暂停一个函数。“next()” 从其暂停处恢复其运行。

斐波那奇生成器

deffib(max):

a, b= 0, 1①while a

a, b= b, a + b ③

>>> for n in fib(1000): ①

...print(n, end=' ') ②

01 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987

>>> list(fib(1000)) ③ #将一个生成器传递给list()函数，它将遍历整个生成器,并返回所有数值的列表

[0,1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]

weixin_39844963

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式生成器_Python 学习笔记（四）正则、闭合、生成器

(一)正则表达式基本规则：^匹配字符串开始位置。$匹配字符串结束位置。\b匹配一个单词边界。\d匹配一个数字。\D匹配一个任意的非数字字符。x?匹配可选的x字符。换句话说，就是0个或者1个x字符。x*匹配0个或更多的x。x+匹配1个或者更多x。x{n,m}匹配n到m个x，至少n个，不能超过m个。(a|b|c)匹配单独的任意一个a或者b或者c。(x)这是一个组，它会记忆它匹配到...
复制链接

扫一扫