目录
课程回顾:
-
python基础
-
函数
今日概要:
-
模块
- db.py - excel.py - app.py
-
自定义模块
-
内置模块
import random num = random.randint(10,99)
-
第三方模块
- 下载 - 使用
-
1.自定义模块
1.1 模块和包
-
模块,py文件
-
包,文件夹
1.2 导入问题
1.2.1 路径
-
import db
[ '/Users/wupeiqi/PycharmProjects/day03', '/Library/Frameworks/Python.framework/Versions/3.9/lib/python39.zip', '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9', '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages', '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests-2.26.0-py3.9.egg', '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/charset_normalizer-2.0.7-py3.9.egg', '/Applications/PyCharm.app/Contents/plugins/python/helpers/pycharm_matplotlib_backend' ]
-
sys.path.append(r'C:\Users\86136\Desktop')
加入之后,这个目录的python文件就可以被解释器找到
1.2.2 导入方式
-
import ,一般用于导入当前py文件兄弟目录下的py文件。
app.py db.py
import db
-
from xxx import ,文件和文件夹的嵌套
# import utils.xx.helper # data = utils.xx.helper.get_email() # print(data) from utils.xx import helper data = helper.get_email() print(data)
常见问题:
无论你的代码在哪层目录下,在导入模块时,一定要根据sys.path中的到模块的顺序来进行编写。
├── __pycache__ ├── app.py ├── db.py └── utils └── xx ├── excel.py └── helper.py
# excel.py import helper # 错误 from utils.xx import heler
1.3 主文件
主函数,程序运行时自定会执行的函数。
def run(): pass if __name__ == '__main__': run()
总结
-
运行程序时,导入模块,去哪里导入?
import sys print(sys.path)
- 【优先】运行的当前py文件所在的目录 - 【其次】python安装目录中
-
主文件和其他文件的区分
-
导入模块
import 文件 from xx import 文件夹/文件
2.第三方模块
别人写好的py文件或文件夹,我们要把它下载下来,然后在代码中进行使用。
下载并安装第三方模块时,使用的工具:pip
-
打开终端 或 Pycharm的Terminal
-
输入下载命令
/Library/Frameworks/Python.framework/Versions/3.9/bin/pip3.9 install requests
如果把Python安装目录加入到环境变量: /Library/Frameworks/Python.framework/Versions/3.9/bin 以后再去安装第三方模块时,就不需要写前缀: pip3.9 install requests
win同学
C:\python39 - Scripts - pip.exe - pip3.9.exe - Lib - site-packages - requests - openpyxl - python.exe
C:\python39\Scripts\pip install requests
把安装目录加入到环境变量 C:\python39\Scripts 以后再去安装第三方模块时,就不需要写前缀: pip3.9 install requests
2.1 pip的常见问题
-
pip更新的问题
-
下载慢,换成国内的源(豆瓣源)
-
一次性的
pip install requests pip install requests -i https://pypi.douban.com/simple/
-
永久配置
# 在终端执行如下命令 pip config set global.index-url https://pypi.douban.com/simple/ # 安装模块 pip install openpyxl
-
其他源:
阿里云:http://mirrors.aliyun.com/pypi/simple/ 中国科技大学:https://pypi.mirrors.ustc.edu.cn/simple/ 清华大学:https://pypi.tuna.tsinghua.edu.cn/simple/ 中国科学技术大学:http://pypi.mirrors.ustc.edu.cn/simple/
2.2 源码安装
-
下载源码包
-
解压
-
终端进入目录
python3.9 setup.py build python3.9 setup.py install
2.3 wheel包
-
让自己的pip支持直接读取wheel包
pip3.9 install wheel
-
pip就可以直接读取whl包并安装
pip3.9 install requests2-2.16.0-py2.py3-none-any.whl
注意:其他由于编译问题安装包不成功,可以尝试使用wheel 。
案例
-
接口的监测的功能。
import requests # 返回的很多的内容 res = requests.get("https://api.luffycity.com/api/v1/course/actual/?limit=12&offset=0&category_id=9999") # 获取字符串类型 # print(res.text) # 获取字符串类型->字典 data_dict = res.json() for item in data_dict["data"]['result']: print(item['name'])
-
企业微信发送报警信息
- 创建群 - 机器人 + Hook网址 - 向Hook网址发送请求
import requests web_hook_url = 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=e69fqwe3c-3572-4d7e-bbd9-34442c086e837' res = requests.post( url=web_hook_url, json={ "msgtype": "text", "text": { "content": "xxxxx报警信息", "mentioned_list": ["@all"] } } ) res.close()
requests模块的作用?
帮助我们通过代码想某个地址发送请求(代替浏览器)。 - 第一步:难点,分析整个请求。 - 第二步:用代码实现分析的过程。
import requests res = requests.post( url="https://api.luffycity.com/api/v1/auth/password/login/?loginWay=password", json={"username": "alex", "password": "123123"} ) # token=用户凭证 print(res.text)
3.内置模块
Python内部为我们提供的功能。
3.1 hashlib
可以对数据进行md5加密。
import hashlib data_string = "中国移动" obj = hashlib.md5() obj.update(data_string.encode('utf-8')) result = obj.hexdigest() print(result)
有什么用呢?保存密码。
-
用户注册
import hashlib def encrypt(data_string): obj = hashlib.md5() obj.update(data_string.encode('utf-8')) result = obj.hexdigest() return result def run(): # 提示用户输入用户名和密码 user = input("请输入用户:") pwd = input("请输入密码:") password = encrypt(pwd) line = "{}|{}\n".format(user, password) # 保存数据库:连接、发送、保存 # 保存文件:打开文件、写入内容、关闭文件 file_object = open("db.txt", mode='a', encoding='utf-8') file_object.write(line) file_object.close() if __name__ == '__main__': run()
-
用户登录
用户输入 用户名 & 密码(明文) 对密码进行md5加密 拿着 用户名+密码(密文)
import hashlib def encrypt(data_string): obj = hashlib.md5() obj.update(data_string.encode('utf-8')) result = obj.hexdigest() return result def run(): # 用户登录 user = input("请输入用户:") pwd = input("请输入密码:") password = encrypt(pwd) # 打开文件,读取内容,关闭文件 file_object = open("db.txt", mode='r', encoding='utf-8') data_string = file_object.read() file_object.close() data_list = data_string.strip().split('\n') for item in data_list: name, passwd = item.split('|') # [用户,密码] if name == user and password == passwd: print("登录成功") break if __name__ == '__main__': run()
md5进行加"盐",本质上:加密时再加上我们自定义的字符串。
import hashlib data_string = "admin" obj = hashlib.md5("asdfa99F99123ks@ldjlkjlksdf".encode('utf-8')) obj.update(data_string.encode('utf-8')) result = obj.hexdigest() print(result)
建议:只要用md5加密,就要进行加盐。
3.2 json模块
-
反序列化
import json data_string = '{"code":0,"data":{"count":2,"next":null,"previous":null,"result":[{"id":395,"name":"Word+PDF自动化办公","cover":"//hcdn2.luffycity.com/media/frontend/course/%E5%88%97%E8%A1%A8%E5%9B%BE_s4ZkvP5.png","course_img":"//hcdn2.luffycity.com/media/frontend/course/列表图_s4ZkvP5.png","hours":"2","numbers":"19","complete_numbers":19,"course_slogan":"Word+PDF自动化办公","learn_number":1007,"first_section_id":null,"teacher":{"id":32,"image":"//hcdn2.luffycity.com/media/frontend/activity/bo_1544078972.8352425.png","videos":[],"name":"波波","role":0,"title":"前百度数据挖掘工程师","signature":"曾就职于百度,任爬虫开发工程师","brief":"曾就职于百度,任爬虫开发工程师,擅长数据分析和爬虫技术,喜欢研究各种爬虫奇淫巧技,专治各种反爬取疑难杂症。"},"free_sections":[{"id":39668,"name":"Word自动化办公-写入指定样式的数据","free_trail":false},{"id":39681,"name":"PDF自动化办公-文档加密","free_trail":false},{"id":39664,"name":"Word自动化办公-环境安装介绍","free_trail":false},{"id":39669,"name":"Word自动化办公-单独设置文本样式","free_trail":false}],"payment_info":{"has_price":true,"price":19.9,"origin_price":59.0,"valid_period":10000,"is_promotion":true,"promotion_name":"限时折扣","promotion_price":"19.90","promotion_end_date":"2022-04-30 23:59:59"},"is_buy":false,"is_new":true,"learning_path_name":"ai"},{"id":396,"name":"Yuan老师带你深入浅出学Django","cover":"//hcdn2.luffycity.com/media/frontend/course/%E5%88%97%E8%A1%A8%E5%9B%BE_q8n8F4H.png","course_img":"//hcdn2.luffycity.com/media/frontend/course/列表图_q8n8F4H.png","hours":"5","numbers":"11","complete_numbers":34,"course_slogan":"Yuan老师带你深入浅出学Django","learn_number":1063,"first_section_id":null,"teacher":{"id":20,"image":"//hcdn2.luffycity.com/media/frontend/activity/yuanhao%403x_1517450106.3359919.png","videos":[{"title":"web应用程序1","img":"/static/frontend/degree_course_detail/苑昊-3_1533097716.7837143.jpeg","vid":"ff0a0f81dae94a3a941d01b873eac9ab","video_time":"19:02","play_count":654},{"title":"django基础介绍","img":"/static/frontend/degree_course_detail/苑昊-1_1533097717.4351325.jpeg","vid":"3b9b08e1aa924fa8addb1ca771488c8a","video_time":"09:30","play_count":456}],"name":"Avrion","role":0,"title":"路飞学城高级讲师","signature":"擅长Python开发/生物图像自动识别及处理技术","brief":"路飞学城高级讲师,曾参与新加坡南洋理工大学大数据医疗相关项目,就职过多家互联网企业,有着多年开发经验,精通java,python,go等编程语言,Uric开源软件作者,致力于人工智能与大数据方向,对机器学习,深度学习等算法有深度研究。"},"free_sections":[{"id":39684,"name":"Django开篇","free_trail":false},{"id":40831,"name":"模板继承","free_trail":false},{"id":40838,"name":"django的查询API","free_trail":false},{"id":40849,"name":"图书管理系统案例","free_trail":false}],"payment_info":{"has_price":true,"price":79.0,"origin_price":399.0,"valid_period":10000,"is_promotion":true,"promotion_name":"限时折扣","promotion_price":"79.00","promotion_end_date":"2022-04-30 23:59:59"},"is_buy":false,"is_new":true,"learning_path_name":"python"}]}}' data_dict = json.loads(data_string) print(data_dict) print(data_dict['code'])
-
序列化,python的数据类型=>JSON格式的字符串
import json data_dict = {"k1": 123, "k2:": 456, 'k3': True, 'k4': [11, 22, 33]} data_string = json.dumps(data_dict) print(data_string)
json是什么?是一个格式,字符串的形式存在。
v1 = "{'k1':123,'k2':456}" -> 是字符串 不是JSON格式 v2 = '{"k1":123,"k2":456}' -> 是字符串 是JSON
v3 = '{"k1":123,"k2":456,"k3":True}' -> 是字符串 不是JSON格式 v4 = '{"k1":123,"k2":456,"k3":true}' -> 是字符串 是JSON格式
v5 = '{"k1":123,"k2":456,"k3":(11,22,33)}' -> 是字符串 不是JSON格式 v4 = '{"k1":123,"k2":456,"k3":[11,22,33]}' -> 是字符串 是JSON格式
import json data_dict = {"k1": 123, "k2": 456, 'k3': True, 'k4': (11, 22, 33)} data_string = json.dumps(data_dict) print(data_string)
import json data_string = '{"k1":123,"k2":456}' v1 = json.loads(data_string) print(v1)
json无法序列化所有的数据类型。
+-------------------+---------------+ | Python | JSON | +===================+===============+ | dict | object | +-------------------+---------------+ | list, tuple | array | +-------------------+---------------+ | str | string | +-------------------+---------------+ | int, float | number | +-------------------+---------------+ | True | true | +-------------------+---------------+ | False | false | +-------------------+---------------+ | None | null | +-------------------+---------------+
import json import datetime ctime = datetime.datetime.now() ctime_string = ctime.strftime("%Y-%m-%d %H:%M:%S") #返回是字符串 #data_dict = {"k1": 123, "k2": ctime} 会报错 data_dict = {"k1": 123, "k2": ctime_string} v1 = json.dumps(data_dict) print(v1)
案例:爬虫小案例
import requests import json # 返回的很多的内容 res = requests.get("https://api.luffycity.com/api/v1/course/actual/?limit=12&offset=0&category_id=9999") # 获取字符串类型 JSON格式字符串 print(res.text) # 将从其他网站拿到的JSON字符串转换成Python的数据类型 data_dict = json.loads(res.text) print(data_dict)
案例:网站
pip3.9 install flask
import json from flask import Flask app = Flask(__name__) # http://127.0.0.1:5000/login @app.route("/login") def login(): return "登录" # http://127.0.0.1:5000/users @app.route("/users") def users(): data_dict = {"code": 0, "data": [11, 22, 33, 44]} return json.dumps(data_dict) if __name__ == '__main__': app.run()
3.3 os模块
-
绝对路径
import os # /Users/wupeiqi/PycharmProjects/day03/db.txt v1 = os.path.abspath("db.txt") print(v1) v2 = os.path.abspath(__file__) print(v2)
-
路径的拼接
import os # mac # file_path = "files/account.txt" # win # file_path = r"files\naccount.txt" # 路径拼接 file_path = os.path.join("files", "xx", "fff", "account.txt") print(file_path)
-
文件/文件夹是否存在
import os file_path_1 = os.path.join("files", "account.txt") file_path_2 = os.path.join("files", "xx", "fff", "account.txt") v1 = os.path.exists(file_path_1) # True/False print(v1) v2 = os.path.exists(file_path_2) # True/False print(v2)
-
创建文件夹
import os folder_path = os.path.join("file", "db", "work") if not os.path.exists(folder_path): os.makedirs(folder_path)
案例:用户注册的功能,为每个用户创建一个文件,文件名=用户名,内容:写密码。所有的数据要放在db目录下。
import os def run(): # 1.输入用户名和密码 user = input("用户名:") pwd = input("密码:") # 2.先创建文件夹 folder_path = os.path.abspath("db") if not os.path.exists(folder_path): os.makedirs(folder_path) # 3.在目录下写内容 # r,读模式 # a,追加模式(文件不存在,则创建) # w,写模式,源文件清空,再写入。 file_path = os.path.join(folder_path, "{}.txt".format(user)) file_object = open(file_path, mode="w", encoding='utf-8') # 写内容 file_object.write(pwd) # 关闭文件 file_object.close() if __name__ == '__main__': run()
-
os.listdir,查看你目录下都有哪些文件,找1级
import os # name_list = os.listdir("/Users/wupeiqi/PycharmProjects/day03/db") # print(name_list) # ['wupeiqi.txt', 'root.txt'] name_list = os.listdir("/Users/wupeiqi/PycharmProjects/mtb/mtb") print(name_list) # ['asgi.py', '__init__.py', '__pycache__', 'local_settings.py', 'celery.py', 'settings.py', 'urls.py', 'wsgi.py']
-
os.walk,查看你目录下都有哪些文件,找遍所有角落。
import os obj = os.walk("/Users/wupeiqi/PycharmProjects/mtb/amazon") for in_folder, b, file_list in obj: # a=/Users/wupeiqi/PycharmProjects/mtb/mtb/__pycache__ 进入那个目录 # b=['__pycache__'] 目录下所有的文件夹 # c=['asgi.py', '__init__.py', 'local_settings.py', 'celery.py', 'settings.py', 'urls.py', 'wsgi.py'] 目录下面所有的文件 for name in file_list: file_abs_path = os.path.join(in_folder, name) print(file_abs_path)
import os obj = os.walk("/Users/wupeiqi/PycharmProjects/mtb/amazon") for in_folder, b, file_list in obj: for name in file_list: # xxxxx.txt # xxxxx.png ext = name.split(".")[-1] if ext == "pyc": file_abs_path = os.path.join(in_folder, name) print(file_abs_path)
3.4 random
import random v1 = random.randint(100000, 999999) print(v1)
import random num_list = [11, 22, 33, 44] num = random.choice(num_list) print(num) xxxx='0123456789' print("".join([random.choice(xxxx) for i in range(5)])) #随机拿出5个数字
import random def f1(): print("f1") def f2(): print("f2") def f3(): print("f3") def f4(): print("f4") num_list = [f1, f2, f3, f4] func = random.choice(num_list) func()
注意:用处比较大。
import random num_list = [99, 11, 8, 22, 18, 33, 44] random.shuffle(num_list) print(num_list)
案例:扑克牌+抽牌。
import random color_list = ["黑桃", "红桃", "梅花", "方片"] num_list = range(1, 14) # [1,2,3,4..13] 不包括14 # poke_list = [ ("黑桃",1),("黑桃",2),("黑桃",13) ] poke_list = [] for color in color_list: # color = "黑桃" for num in num_list: # num=1 /2/34...13 group = (color, num) poke_list.append(group) # 打乱顺序 random.shuffle(poke_list) # 抽牌 data = random.choice(poke_list) print(data)
3.5 时间相关
-
time
-
datetime
3.5.1 time
import time # 时间戳 v1 = time.time() print(v1)
import time while True: print(123) # 时间停止 time.sleep(5)
3.5.2 datetime
-
datetime对象的形式
import datetime # 本地时间 v1 = datetime.datetime.now() print(v1) tz = datetime.timezone(datetime.timedelta(hours=7)) v2 = datetime.datetime.now(tz) print(v2) v3 = datetime.datetime.utcnow() print(v3)
有点,方便实现时间的加减
from datetime import datetime, timedelta # 本地时间 v1 = datetime.now() print(v1) v2 = v1 + timedelta(days=527,hours=19) print(v2)
-
字符串的形式
v1 = "2022-04-14 19:18:17" v1 = "2022-04-14" v1 = "2022年04月14日"
时间之间的相互转换:
-
datateime -> 字符串类型
from datetime import datetime, timedelta # 本地时间 v1 = datetime.now() print(v1) # datetime对象转换成为字符串 v2 = v1.strftime("%Y%m%d%H%M%S") print(v2, type(v2))
-
字符串 -> datetime类型
from datetime import datetime, timedelta v1 = "2022-04-14" v2 = datetime.strptime(v1, "%Y-%m-%d") v3 = v2 + timedelta(days=10) print(v3)
案例
-
提示用户输入手机号,记录入网时间(当前时间),把输入写入到文件(a模式)。
15566666666,2022-04-14 15566666666,2022-04-14 15566666666,2022-04-14 15566666666,2022-04-14
from datetime import datetime def run(): phone = input("手机号:") ctime_string = datetime.now().strftime("%Y-%m-%d") line = "{},{}\n".format(phone, ctime_string) # 打开文件并追加到里面 f = open("data.txt", encoding='utf-8', mode='a') f.write(line) f.close() if __name__ == '__main__': run()
-
文件夹操作
让用户去注册,每天一个文件。 设计思路:database 20220414.txt 20220415.txt 20220416.txt
import os from datetime import datetime def run(): phone = input("手机号:") ctime_string = datetime.now().strftime("%Y-%m-%d") file_path = os.path.join("database", "{}.txt".format(ctime_string)) f = open(file_path, encoding='utf-8', mode='a') f.write(phone + "\n") f.close() if __name__ == '__main__': run()
3.6 ini格式
my.ini
[mysqld] datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock log-bin=py-mysql-bin character-set-server=utf8 collation-server=utf8_general_ci log-error=/var/log/mysqld.log # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 [mysqld_safe] log-error=/var/log/mariadb/mariadb.log pid-file=/var/run/mariadb/mariadb.pid [client] default-character-set=utf8
读取数据
import configparser # 自动会文件的内容进行解析 parser = configparser.ConfigParser() parser.read("my.ini", encoding='utf-8') # v1 = parser.sections() # print(v1) # ['mysqld', 'mysqld_safe', 'client'] # # v2 = parser.items("mysqld_safe") # for k, v in v2: # print(k, v) v3 = parser.get("mysqld","socket") print(v3)
删除
import configparser # 自动会文件的内容进行解析 parser = configparser.ConfigParser() parser.read("my.ini", encoding='utf-8') # 内存删除 # parser.remove_option("mysqld", "socket") parser.remove_section("mysqld_safe") parser.write(open('my.ini', encoding='utf-8', mode='w'))
修改或者添加
import configparser # 自动会文件的内容进行解析 parser = configparser.ConfigParser() parser.read("my.ini", encoding='utf-8') parser.add_section("group") parser.set("group", "datadir", "xxxxx") parser.write(open('my.ini', encoding='utf-8', mode='w'))
以后开发时,可以把它当成我们项目的配置文件。
3.7 正则表达式
-
正则表达式
-
re模块
import re text = "楼主太手机号也可18731255799牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" # 提取文本中手机号 # 表示手机号:1[3|5|8|9]\d{9} data = re.findall("1[3|5|8|9]\d{9}", text) print(data)
3.7.1 字符相关
-
wupeiqi
匹配文本中的wupeiqiimport re text = "你好wupeiqi,阿斯顿发wupeiqasd 阿士大夫能接受的wupeiqiff" data_list = re.findall("wupeiqi", text) print(data_list) # ['wupeiqi', 'wupeiqi'] 可用于计算字符串中某个字符出现的次数
import re text = "你好wupeiqi,阿斯顿发wupeiqasd 阿士大夫能接受的wupeiqiff" data_list = re.findall("wupeiqi", text) print(data_list) # ['wupeiqi', 'wupeiqi'] 可用于计算字符串中某个字符出现的次数
-
[abc]
匹配a或b或c 字符。import re text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqaceiqiff" data_list = re.findall("[abc]", text) print(data_list) # ['b', 'a', 'a', 'a', 'b', 'b', 'c']
import re text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqcceiqiff" data_list = re.findall("q[abc]", text) print(data_list) # ['qa', 'qc']
-
[^abc]
匹配除了abc意外的其他字符。import re text = "你wffbbupceiqiff" data_list = re.findall("[^abc]", text) print(data_list) # ['你', 'w', 'f', 'f', 'u', 'p', 'e', 'i', 'q', 'i', 'f', 'f']
-
[a-z]
匹配a~z的任意字符( [0-9]也可以 )。import re text = "alexrootrootadmin" data_list = re.findall("t[a-z]", text) print(data_list) # ['tr', 'ta']
-
.
代指除换行符以外的任意字符。import re text = "alexraotrootadmir9on" data_list = re.findall("r.o", text) print(data_list) # ['rao', 'roo',"r9o"]
import re text = "alexraotrootadmin" data_list = re.findall("r.+o", text) # 贪婪匹配 print(data_list) # ['raotroo']
import re text = "alexraotrootadmin" data_list = re.findall("r.+?o", text) # 非贪婪匹配 print(data_list) # ['rao']
-
\w
代指字母或数字或下划线(汉字)。import re text = "北京武沛alex齐北 京武沛alex齐" data_list = re.findall("武\wa", text) print(data_list) # ["武沛a","武沛a"]
import re text = "北京武沛alex齐北 京武沛alex齐" data_list = re.findall("武\w+x", text) print(data_list) # ['武沛alex', '武沛alex']
-
\d
代指数字import re text = "root-ad32min-add3-admd1in" data_list = re.findall("d\d", text) print(data_list) # ['d3', 'd3', 'd1']
import re text = "root-ad32min-add3-admd1in" data_list = re.findall("d\d+", text) print(data_list) # ['d32', 'd3', 'd1']
关于正则表达式中的字符(默认情况下只能代表1个):
固定,例如:a、b、c 范围,例如:[a-z] [0-9] [A-Z] 数字,例如:\d 字母或数字或下划线(汉字),例如:\w 所有,例如:.
表示: 10个数字 \d\d\d\d\d\d\d\d\d\d 10个数字 \d{10} 1个或n个数字 \d+ 0个1个数字 \d? 任意个数字 \d*
3.7.2 个数
-
*
重复0次或更多次import re text = "他是大B个,确实是个大2B。大22B。大29B。" data_list = re.findall("大2*B", text) print(data_list) # ['大B', '大2B', '大22B']
import re text = "他是大B个,确实是个大2B。大22B。大29B。" data_list = re.findall("大\d*B", text) print(data_list) # ['大B', '大2B', '大22B', '大29B']
-
+
重复1次或更多次import re text = "他是大B个,确实是个大2B,大3B,大66666B。" data_list = re.findall("大\d+B", text) print(data_list) # ['大2B', '大3B', '大66666B']
-
?
重复0次或1次import re text = "他是大B个,确实是个大2B,大3B,大66666B。" data_list = re.findall("大\d?B", text) print(data_list) # ['大B', '大2B', '大3B']
-
{n}
重复n次import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("151312\d{5}", text) print(data_list) # ['15131255789']
-
{n,}
重复n次或更多次import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("\d{9,}", text) print(data_list) # ['442662578', '15131255789']
-
{n,m}
重复n到m次import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("\d{10,15}", text) print(data_list) # ['15131255789']
关于数量:
? + * {}
3.7.3 贪婪匹配
import re text = "alexraotrootadmin" data_list = re.findall("r.+o", text) # 贪婪匹配 print(data_list) # ['raotroo']
import re text = "alexraotrootadmin" data_list = re.findall("r.+?o", text) # 非贪婪匹配 print(data_list) # ['rao', 'roo']
3.7.4 分组
-
提取区域
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("15131(2\d{5})", text) print(data_list) # ['255789']
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("15(1\d)1(2\d{5})", text) print(data_list) # [('13', '255789')]
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" data_list = re.findall("(15(1\d)1(2\d{5}))", text) print(data_list) # [('15131255789', '13', '255789')]
-
或 + 提取数据区域
import re text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" # 151312\d{5} # 15131r\w+太 data_list = re.findall("15131(2\d{5}|r\w+太)", text) print(data_list) # ['root太', '255789']
import re text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" # 151312\d{5} # 15131r\w+太 data_list = re.findall("(15131(2\d{5}|r\w+太))", text) print(data_list) # [('15131root太', 'root太'), ('15131255789', '255789')]
案例
-
利用正则匹配QQ号码
[1-9]\d{4,}
-
身份证号码
import re text = "dsf130429191912015219k13042919591219521Xkk" data_list = re.findall("\d{17}[\dX]", text) # [abc] print(data_list) # ['130429191912015219', '13042919591219521X']
import re text = "dsf130429191912015219k13042919591219521Xkk" data_list = re.findall("\d{17}(\d|X)", text) print(data_list) # ['9', 'X']
import re text = "dsf130429191912015219k13042919591219521Xkk" data_list = re.findall("(\d{17}(\d|X))", text) print(data_list) # [('130429191912015219', '9'), ('13042919591219521X', 'X')]
import re text = "dsf130429191912015219k13042919591219521Xkk" data_list = re.findall("\d{6}(\d{4})(\d{2})(\d{2})\d{3}[\d|X]", text) print(data_list) # [('1919', '12', '01'), ('1959', '12', '19')] import re text = "dsf130429191912015219k13042919591219521Xkk" data_list = re.findall("(\d{6}(\d{4})(\d{2})(\d{2})\d{3}[\d|X])", text) print(data_list) # [('130429191912015219', '1919', '12', '01'), ('13042919591219521X', '1959', '12', '19')]
-
手机号
import re text = "我的手机哈是15133377892,你的手机号是1171123啊?" data_list = re.findall("1[3-9]\d{9}", text) print(data_list) # ['15133377892']
-
邮箱地址
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" email_list = re.findall("\w+@\w+\.\w+",text) print(email_list) # ['442662578@qq.com和xxxxx'] \w可以匹配到中文
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" email_list = re.findall("[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+", text) print(email_list) # ['442662578@qq.com', 'xxxxx@live.com']
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" email_list = re.findall("\w+@\w+\.\w+", text, re.ASCII) print(email_list) # ['442662578@qq.com', 'xxxxx@live.com']
import re text = "楼主太牛44266-2578@qq.com逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" email_list = re.findall("(\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*)", text, re.ASCII) print(email_list) # [('44266-2578@qq.com', '-2578', '', ''), ('xxxxx@live.com', '', '', '')]
3.7.5 re模块
-
findall
import re text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀" email_list = re.findall("[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+", text) print(email_list) # ['442662578@qq.com', 'xxxxx@live.com']
-
match,从文本的开头进行匹配
import re text = "大小逗2B最逗3B欢乐" data = re.match("逗\dB", text) print(data) # None
import re text = "逗2B最逗3B欢乐" data = re.match("逗\dB", text) print(data) # 对象 v1 = data.group() print(v1)
-
search,通篇去匹配,只获取匹配成功的第一个
import re text = "大小逗2B最逗3B欢乐" data = re.search("逗\dB", text) if data: print(data.group())
3.7.6 校验
起始符 ^ 终止符 $
import re text = input("请输入手机号:") mtc = re.match("^1[3-9]\d{9}$", text) if mtc: print("格式正确", text) else: print("格式错误")
3.8 多进程
- 创建
import multiprocessing
def worker():
"""该函数将在子进程中执行"""
print('Worker')
if __name__ == '__main__':
# 创建子进程
p = multiprocessing.Process(target=worker)
# 启动子进程
p.start()
# 等待子进程结束
p.join()
- 进程池
import multiprocessing
def worker(num):
"""该函数将在子进程中执行"""
print('Worker %d' % num)
if __name__ == '__main__':
# 创建进程池
pool = multiprocessing.Pool(4)
# 启动进程池中的进程
pool.map(worker, range(10))
# 关闭进程池
pool.close()
# 等待进程池中的进程结束
pool.join()
- 队列-进程间通信
import multiprocessing
def producer(q):
"""该函数将在生产者进程中执行"""
for i in range(10):
q.put(i)
def consumer(q):
"""该函数将在消费者进程中执行"""
while True:
item = q.get()
if item is None:
break
print(item)
if __name__ == '__main__':
# 创建队列
q = multiprocessing.Queue()
# 创建生产者进程
p1 = multiprocessing.Process(target=producer, args=(q,))
# 创建消费者进程
p2 = multiprocessing.Process(target=consumer, args=(q,))
# 启动进程
p1.start()
p2.start()
# 等待进程结束
p1.join()
# 发送结束信号
q.put(None)
p2.join()
3.9 urllib
from urllib.parse import unquote,quote
txt = '你好'
#URL编码
new_txt = quote(txt)
print(new_txt)
#url解码
str='%E4%BD%A0%E5%A5%BD'
print(unquote(str))