【卷一】正则四 |> 练习

最新推荐文章于 2020-11-30 00:38:58 发布

acgtog1543

最新推荐文章于 2020-11-30 00:38:58 发布

阅读量115

点赞数

文章标签： ruby python

原文链接：http://www.cnblogs.com/Ruby517/p/5800907.html

版权

参考:《Python核心编程(3rd)》—P39

1-1 识别后续的字符串: "bat", "bit", "but" "hat", "hit" 或者 "hut"

1 # coding: utf-8
2 
3 # 导入re模块, re: Regex(Regular Expression) 正则表达式 
4 import re
5  
6 url = "https://www.baidu.com/baidu?tn=monline_3_dg&ie=utf-8&wd=bat%E7%"
7 #　直接用管道符号 "｜" 逐一匹配
8 print re.findall(r"bat|bit|but|com|hat|hit|hut", url)

1-2 匹配由单个空格分隔的任意单词对，也就是姓和名

1 # coding: utf-8
2 
3 import re
4 
5 txt = "My name is D_R Adams, her name is J.K Rowling, his name is Tomas Smith!"
6 # 管道符号左边是匹配D_R ~和J.K ~ 这种类型的，右边是匹配Tomas Smith这种常见的名字
7 print re.findall(r"[A-Z].[A-Z] \w+|[A-Z]\w+ [A-Z]\w+", txt)

1-6 匹配以"www"起始且以".com"结尾的简单Web域名，选做题:

你的正则表达式也可以支持其他高级域名, 如.edu， .net 等

1 # coding: utf-8
2 
3 import re
4 
5 url = "http://www.baidu.com/, http://www.google.cn/, http://www.foothill.edu"
6 
7 # 有多重括号时，如果只需最外面总的分组(一般用括号把我们要的分组给圈起来)，那么里面的括号就要加 ?: 表示不保存该分组 
8 print re.findall("(www(?:.+com|.+cn|.+edu))", url)
9 #print re.findall("www.+com|www.+cn|www.+edu", url)

1-11 匹配所有能够表示有效电子邮件地址的集合

 1 # coding: utf-8
 2  
 3 import re
 4  
 5 email = "QQ mailbox such as 1111@qq.com, 1234@163.com is WangYi mailbox, and google mailbox: 2222@gmail.com"
 6  
 7 for each_line in email.split(","):
 8     a = re.findall(r"\d+@.+com", each_line)
 9     # 用 join() 把列表转换成字符串
10     print "".join(a)

View Code

如果只是想把文本和邮箱分开:

1 # coding: utf-8
2 import re
3 email = "QQ mailbox such as 1111@qq.com, 1234@163.com is WangYi mailbox, and google mailbox: 2222@gmail.com"
4 
5 for each_line in email.split(","):
6     # 用(?= )表示按后面是 "数字+@"的情况来划分, 注意括号后边要加空格
7     re.split(r"(?= \d+@) ", each_line.strip())

点我

1-13 type(). 内置函数type()返回一个类型对象，创建一个从<type 'int'>提取 int，从<type 'str'>提取str的正则表达式

 1 # coding: utf-8
 2 
 3 import re
 4 
 5 string = type("Hello, world!")     # str: (string)字符串
 6 integer = type(123)                # int: (integer)整数
 7 f = type(3.14)                     # float: 浮点数
 8 
 9 def PiPei(a):
10     # 注意，因为string, integer, f本身是type类的字符, 而不是str类的, 所以此外要转换
11     print re.findall(r"type '(\w+)'", str(a))
12 
13 PiPei(string)
14 PiPei(integer)
15 PiPei(f)

展开

------------------------------------------------------------------------------------------------------------------------------

19-27答案参见:

正则三之数据生成器 —> http://www.cnblogs.com/Ruby517/p/5802984.html

------------------------------------------------------------------------------------------------------------------------------

1-28 区号（三个整数集合中的第一部分和后面的连字符）是可选的，也就是说，正则表达式应当匹配 800-555-1212，也能匹配

555-1212！

 1 # coding: utf-8
 2 
 3 import re
 4 
 5 num1 = "555-1212"
 6 num2 = "800-555-1212"
 7 
 8 # +表示匹配前面的字符 1 到多次，(?: )表示不保存该分组，由于我们要的是
 9 # 一整个正则表达式匹配的内容，所以加括号的分组是不需要保存的！
10 print re.findall(r"(?:\d{3}-)+\d{4}", num1)
11 print re.findall(r"(?:\d{3}-)+\d{4}", num2)

代码

1-29 支持使用圆括号或者连字符连接的区号(更不用说是可选的内容)；使正则表达式匹配 800-555-1212以及 (800) 555-1212

 1 # coding: utf-8
 2 
 3 import re
 4 
 5 n1 = "555-1212"
 6 n2 = "800-555-1212"
 7 n3 = "(800) 555-1212"
 8 
 9 # 我们要的是整个正则表达式匹配的内容，因此前2个括号括起来
10 # 的都是不需要的分组，所以用(?: )表示不保存该分组
11 # '?'表示匹配前面的字符0或1次，'+' 表示匹配前面的字符1或多次
12 print re.findall(r"(?:\(\d{3}\) )?(?:\d{3}-)+\d{4}", n1)
13 print re.findall(r"(?:\(\d+\) )?(?:\d+-)+\d+", n2)
14 print re.findall(r"(?:\(\d+\) )?(?:\d+-)+\d+", n3)

Click

转载于:https://www.cnblogs.com/Ruby517/p/5800907.html

acgtog1543

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【卷一】正则四 |> 练习

参考:《Python核心编程(3rd)》—P391-1 识别后续的字符串: "bat", "bit", "but" "hat", "hit" 或者 "hut"1 # coding: utf-82 3 # 导入re模块, re: Regex(Regular Expression) 正则表达式 4 import re5 6 url = "https:...
复制链接

扫一扫