一、实验目的
1.掌握元字符的使用方法。
2.理解正则表达式re模块。
二、实验环境
计算机及Python3.X和Pycharm软件。
三、实验内容与要求
1、匹配网址
有一批网址:
http://www.interoem.com/messageinfo.asp?id=35
http://3995503.com/class/class09/news_show.asp?id=14
http://lib.wzmc.edu.cn/news/onews.asp?id=769
http://www.zy-ls.com/alfx.asp?newsid=377&id=6http://www.fincm.com/newslist.asp?id=415
需要正则后为:
http://www.interoem.com/
http://3995503.com/
http://lib.wzmc.edu.cn/
http://www.zy-ls.com/
http://www.fincm.com/
源码:
a='http://www.interoem.com/messageinfo.asp?id=35'\
'http://3995503.com/class/class09/news_show.asp?id=14'\
'http://lib.wzmc.edu.cn/news/onews.asp?id=769'\
'http://www.zy-ls.com/alfx.asp?newsid=377&id=6http://www.fincm.com/newslist.asp?id=415'
# pattern12 = r"^[a-zA-Z0-9]{4,5}://[a-zA-Z0-9]*.[a-zA-Z0-9]{0,100}.[a-zA-Z0-9]{0,3}/"
# pattern3 = r"^[a-zA-Z0-9]{4,5}://[a-zA-Z0-9]*.[a-zA-Z0-9]{0,100}.[a-zA-Z0-9]{0,3}.[a-zA-Z0-9]{0,3}\.*[cn]*/"
pattern=r'[a-zA-Z0-9]{4,5}://[a-zA-Z0-9]*.[a-zA-Z0-9]{0,100}.[a-zA-Z0-9]{0,3}.\.*[com]*\.*[cn]*/'
# m2= re.findall(pattern1, a)
# m3 = re.findall(pattern1, c)
# m4 = re.findall(pattern1, d)
# # m5 = re.findall(pattern1, e)
# pattern=r'http:\/\/.+?\/'
str=re.findall(pattern, a)
for m in str:
print(m)
2、匹配合法的ip地址
(格式为:pattern=’正则表达式’
example=input(‘请输入一个IP地址’)
print(re.findall(pattern,example))
)
源码:
pattern="((\d{1,2})|(1\d{1,2})|(2[0-4]\d)|(25[0-5]))\.{3}(\d{1,2})|(1\d{1,2})|(2[0-4]\d)|(25[0-5])(\d{1,2})|(1\d{1,2})|(2[0-4]\d)|(25[0-5])(\d{1,2})|(1\d{1,2})|(2[0-4]\d)|(25[0-5])"
pattern1=r"(((\d{1,2})|(1\d{1,2})|(2[0-4]\d)|(25\d[0-4]))\.*){6,7}"
pattern1=r"^((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})(\.((2(5[0-5]|[0-4]\d))|[0-1]?\d{1,2})){3}$"
example = input('请输入一个ip地址:')
# example = "255.255.1.1"
print(re.search(pattern1, example))
3、匹配所有合法的电子邮件地址(格式如上)
pattern=r"^[0-9a-zA-Z_]{0,19}@[0-9a-zA-Z]{1,13}\.[com,cn,net]{1,3}$"
#mailbox = "1765211652@qq.com"
mailbox = input('请输入一个邮箱地址:')
print(re.search(pattern, mailbox))
4、打开test.txt文本,将里边得文本使用正则表达式筛选出数字,再存入test1.txt文件中。
(存入test1.txt文本内容格式为:
29384845
223444444422
323455111)
源码:
import re
# pattern = r'(\d{1,10})[-,*,/](\d{1,11})'
pattern = r'\d{1,11}'
f2 = open('test1.txt', 'w+')
with open('test.txt', encoding='utf-8') as f:
d = f.read()
print(d)
match = re.findall(pattern, d)
# for i in range(0,6):
i = 0
for out in match:
# f2.write(str(out[0]))
f2.write(str(out))
i = i + 1
if i % 2 == 0:
f2.write('\n')