python03-模块

目录

课程回顾:

1.自定义模块

1.1 模块和包

1.2 导入问题

1.2.1 路径

1.2.2 导入方式

1.3 主文件

总结

2.第三方模块

2.1 pip的常见问题

2.2 源码安装

2.3 wheel包

案例

3.内置模块

3.1 hashlib

3.2 json模块

案例:爬虫小案例

案例:网站

3.3 os模块

3.4 random

3.5 时间相关

3.5.1 time

3.5.2 datetime

案例

3.6 ini格式

3.7 正则表达式

3.7.1 字符相关

3.7.2 个数

3.7.3 贪婪匹配

3.7.4 分组

案例

3.7.5 re模块

3.7.6 校验

3.8 多进程

3.9 urllib


课程回顾:

  • python基础

  • 函数

今日概要:

  • 模块

    - db.py
    - excel.py
    - app.py
    • 自定义模块

    • 内置模块

      import random
      ​
      num = random.randint(10,99)
    • 第三方模块

      - 下载
      - 使用

1.自定义模块

1.1 模块和包

  • 模块,py文件

  • 包,文件夹

1.2 导入问题

1.2.1 路径
  • import db
[
 '/Users/wupeiqi/PycharmProjects/day03', 
 '/Library/Frameworks/Python.framework/Versions/3.9/lib/python39.zip',
 '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9',
 '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/lib-dynload',
 '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages',
 '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/requests-2.26.0-py3.9.egg',
 '/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/charset_normalizer-2.0.7-py3.9.egg',
 '/Applications/PyCharm.app/Contents/plugins/python/helpers/pycharm_matplotlib_backend'
]
  • sys.path.append(r'C:\Users\86136\Desktop')

        加入之后,这个目录的python文件就可以被解释器找到

1.2.2 导入方式
  • import ,一般用于导入当前py文件兄弟目录下的py文件。

    app.py
    db.py
    import db
  • from xxx import ,文件和文件夹的嵌套

    # import utils.xx.helper
    # data = utils.xx.helper.get_email()
    # print(data)
    ​
    from utils.xx import helper
    data = helper.get_email()
    print(data)

常见问题:

无论你的代码在哪层目录下,在导入模块时,一定要根据sys.path中的到模块的顺序来进行编写。

├── __pycache__
├── app.py
├── db.py
└── utils
    └── xx
        ├── excel.py
        └── helper.py
# excel.py

import helper # 错误
from utils.xx import heler

1.3 主文件

主函数,程序运行时自定会执行的函数。

def run():
    pass


if __name__ == '__main__':
    run()

总结

  • 运行程序时,导入模块,去哪里导入?

    import sys
    print(sys.path)
    - 【优先】运行的当前py文件所在的目录
    - 【其次】python安装目录中
  • 主文件和其他文件的区分

  • 导入模块

    import          文件
    from xx import  文件夹/文件

2.第三方模块

别人写好的py文件或文件夹,我们要把它下载下来,然后在代码中进行使用。

下载并安装第三方模块时,使用的工具:pip

  • 打开终端 或 Pycharm的Terminal

  • 输入下载命令

    /Library/Frameworks/Python.framework/Versions/3.9/bin/pip3.9  install  requests
    如果把Python安装目录加入到环境变量:
       /Library/Frameworks/Python.framework/Versions/3.9/bin
    以后再去安装第三方模块时,就不需要写前缀:
    	pip3.9  install  requests

    win同学

    C:\python39
    	- Scripts
    		- pip.exe
    		- pip3.9.exe
    	- Lib
    		- site-packages
    			- requests
    			- openpyxl
    	- python.exe
    C:\python39\Scripts\pip  install requests
    把安装目录加入到环境变量
    	C:\python39\Scripts
    以后再去安装第三方模块时,就不需要写前缀:
    	pip3.9  install  requests

2.1 pip的常见问题

  • pip更新的问题

  • 下载慢,换成国内的源(豆瓣源)

    • 一次性的

      pip install requests
      
      pip install requests -i https://pypi.douban.com/simple/
    • 永久配置

      # 在终端执行如下命令
      pip config set global.index-url https://pypi.douban.com/simple/
      
      # 安装模块
      pip install openpyxl

其他源:

阿里云:http://mirrors.aliyun.com/pypi/simple/
中国科技大学:https://pypi.mirrors.ustc.edu.cn/simple/ 
清华大学:https://pypi.tuna.tsinghua.edu.cn/simple/
中国科学技术大学:http://pypi.mirrors.ustc.edu.cn/simple/

2.2 源码安装

  • 下载源码包

  • 解压

  • 终端进入目录

    python3.9 setup.py build
    python3.9 setup.py install

2.3 wheel包

  • 让自己的pip支持直接读取wheel包

    pip3.9 install wheel
  • pip就可以直接读取whl包并安装

    pip3.9 install requests2-2.16.0-py2.py3-none-any.whl

注意:其他由于编译问题安装包不成功,可以尝试使用wheel 。

案例

  1. 接口的监测的功能。

    import requests
    
    # 返回的很多的内容
    res = requests.get("https://api.luffycity.com/api/v1/course/actual/?limit=12&offset=0&category_id=9999")
    
    # 获取字符串类型
    # print(res.text)
    
    # 获取字符串类型->字典
    data_dict = res.json()
    for item in data_dict["data"]['result']:
        print(item['name'])
  2. 企业微信发送报警信息

    - 创建群
    - 机器人 + Hook网址
    - 向Hook网址发送请求
    import requests
    
    web_hook_url = 'https://qyapi.weixin.qq.com/cgi-bin/webhook/send?key=e69fqwe3c-3572-4d7e-bbd9-34442c086e837'
    res = requests.post(
        url=web_hook_url,
        json={
            "msgtype": "text",
            "text": {
                "content": "xxxxx报警信息",
                "mentioned_list": ["@all"]
            }
        }
    
    )
    res.close()

requests模块的作用?

帮助我们通过代码想某个地址发送请求(代替浏览器)。
- 第一步:难点,分析整个请求。
- 第二步:用代码实现分析的过程。
import requests

res = requests.post(
    url="https://api.luffycity.com/api/v1/auth/password/login/?loginWay=password",
    json={"username": "alex", "password": "123123"}
)

# token=用户凭证
print(res.text)

3.内置模块

Python内部为我们提供的功能。

3.1 hashlib

可以对数据进行md5加密。

import hashlib

data_string = "中国移动"

obj = hashlib.md5()
obj.update(data_string.encode('utf-8'))
result = obj.hexdigest()
print(result)

有什么用呢?保存密码。

  • 用户注册

    import hashlib
    
    
    def encrypt(data_string):
        obj = hashlib.md5()
        obj.update(data_string.encode('utf-8'))
        result = obj.hexdigest()
        return result
    
    
    def run():
        # 提示用户输入用户名和密码
        user = input("请输入用户:")
        pwd = input("请输入密码:")
        password = encrypt(pwd)
    
        line = "{}|{}\n".format(user, password)
    
        # 保存数据库:连接、发送、保存
        # 保存文件:打开文件、写入内容、关闭文件
        file_object = open("db.txt", mode='a', encoding='utf-8')
        file_object.write(line)
        file_object.close()
    
    
    if __name__ == '__main__':
        run()
  • 用户登录

    用户输入 用户名 & 密码(明文)
    
    对密码进行md5加密
    
    拿着 用户名+密码(密文)
    import hashlib
    
    
    def encrypt(data_string):
        obj = hashlib.md5()
        obj.update(data_string.encode('utf-8'))
        result = obj.hexdigest()
        return result
    
    
    def run():
        # 用户登录
        user = input("请输入用户:")
        pwd = input("请输入密码:")
        password = encrypt(pwd)
    
        # 打开文件,读取内容,关闭文件
        file_object = open("db.txt", mode='r', encoding='utf-8')
        data_string = file_object.read()
        file_object.close()
    
        data_list = data_string.strip().split('\n')
        for item in data_list:
            name, passwd = item.split('|')  # [用户,密码]
            if name == user and password == passwd:
                print("登录成功")
                break
    
    
    if __name__ == '__main__':
        run()

md5进行加"盐",本质上:加密时再加上我们自定义的字符串。

import hashlib

data_string = "admin"

obj = hashlib.md5("asdfa99F99123ks@ldjlkjlksdf".encode('utf-8'))
obj.update(data_string.encode('utf-8'))
result = obj.hexdigest()

print(result)

建议:只要用md5加密,就要进行加盐。

3.2 json模块

  • 反序列化

    import json
    
    data_string = '{"code":0,"data":{"count":2,"next":null,"previous":null,"result":[{"id":395,"name":"Word+PDF自动化办公","cover":"//hcdn2.luffycity.com/media/frontend/course/%E5%88%97%E8%A1%A8%E5%9B%BE_s4ZkvP5.png","course_img":"//hcdn2.luffycity.com/media/frontend/course/列表图_s4ZkvP5.png","hours":"2","numbers":"19","complete_numbers":19,"course_slogan":"Word+PDF自动化办公","learn_number":1007,"first_section_id":null,"teacher":{"id":32,"image":"//hcdn2.luffycity.com/media/frontend/activity/bo_1544078972.8352425.png","videos":[],"name":"波波","role":0,"title":"前百度数据挖掘工程师","signature":"曾就职于百度,任爬虫开发工程师","brief":"曾就职于百度,任爬虫开发工程师,擅长数据分析和爬虫技术,喜欢研究各种爬虫奇淫巧技,专治各种反爬取疑难杂症。"},"free_sections":[{"id":39668,"name":"Word自动化办公-写入指定样式的数据","free_trail":false},{"id":39681,"name":"PDF自动化办公-文档加密","free_trail":false},{"id":39664,"name":"Word自动化办公-环境安装介绍","free_trail":false},{"id":39669,"name":"Word自动化办公-单独设置文本样式","free_trail":false}],"payment_info":{"has_price":true,"price":19.9,"origin_price":59.0,"valid_period":10000,"is_promotion":true,"promotion_name":"限时折扣","promotion_price":"19.90","promotion_end_date":"2022-04-30 23:59:59"},"is_buy":false,"is_new":true,"learning_path_name":"ai"},{"id":396,"name":"Yuan老师带你深入浅出学Django","cover":"//hcdn2.luffycity.com/media/frontend/course/%E5%88%97%E8%A1%A8%E5%9B%BE_q8n8F4H.png","course_img":"//hcdn2.luffycity.com/media/frontend/course/列表图_q8n8F4H.png","hours":"5","numbers":"11","complete_numbers":34,"course_slogan":"Yuan老师带你深入浅出学Django","learn_number":1063,"first_section_id":null,"teacher":{"id":20,"image":"//hcdn2.luffycity.com/media/frontend/activity/yuanhao%403x_1517450106.3359919.png","videos":[{"title":"web应用程序1","img":"/static/frontend/degree_course_detail/苑昊-3_1533097716.7837143.jpeg","vid":"ff0a0f81dae94a3a941d01b873eac9ab","video_time":"19:02","play_count":654},{"title":"django基础介绍","img":"/static/frontend/degree_course_detail/苑昊-1_1533097717.4351325.jpeg","vid":"3b9b08e1aa924fa8addb1ca771488c8a","video_time":"09:30","play_count":456}],"name":"Avrion","role":0,"title":"路飞学城高级讲师","signature":"擅长Python开发/生物图像自动识别及处理技术","brief":"路飞学城高级讲师,曾参与新加坡南洋理工大学大数据医疗相关项目,就职过多家互联网企业,有着多年开发经验,精通java,python,go等编程语言,Uric开源软件作者,致力于人工智能与大数据方向,对机器学习,深度学习等算法有深度研究。"},"free_sections":[{"id":39684,"name":"Django开篇","free_trail":false},{"id":40831,"name":"模板继承","free_trail":false},{"id":40838,"name":"django的查询API","free_trail":false},{"id":40849,"name":"图书管理系统案例","free_trail":false}],"payment_info":{"has_price":true,"price":79.0,"origin_price":399.0,"valid_period":10000,"is_promotion":true,"promotion_name":"限时折扣","promotion_price":"79.00","promotion_end_date":"2022-04-30 23:59:59"},"is_buy":false,"is_new":true,"learning_path_name":"python"}]}}'
    
    
    data_dict = json.loads(data_string)
    print(data_dict)
    print(data_dict['code'])
    
    
  • 序列化,python的数据类型=>JSON格式的字符串

    import json
    
    data_dict = {"k1": 123, "k2:": 456, 'k3': True, 'k4': [11, 22, 33]}
    
    data_string = json.dumps(data_dict)
    
    print(data_string)

json是什么?是一个格式,字符串的形式存在。

v1 = "{'k1':123,'k2':456}"      ->  是字符串   不是JSON格式
v2 = '{"k1":123,"k2":456}'      ->  是字符串   是JSON
v3 = '{"k1":123,"k2":456,"k3":True}'      ->  是字符串   不是JSON格式
v4 = '{"k1":123,"k2":456,"k3":true}'      ->  是字符串   是JSON格式
v5 = '{"k1":123,"k2":456,"k3":(11,22,33)}'      ->  是字符串   不是JSON格式
v4 = '{"k1":123,"k2":456,"k3":[11,22,33]}'      ->  是字符串   是JSON格式
import json
data_dict = {"k1": 123, "k2": 456, 'k3': True, 'k4': (11, 22, 33)}
data_string = json.dumps(data_dict)
print(data_string)
import json
data_string = '{"k1":123,"k2":456}'
v1 = json.loads(data_string)
print(v1)

json无法序列化所有的数据类型。

    +-------------------+---------------+
    | Python            | JSON          |
    +===================+===============+
    | dict              | object        |
    +-------------------+---------------+
    | list, tuple       | array         |
    +-------------------+---------------+
    | str               | string        |
    +-------------------+---------------+
    | int, float        | number        |
    +-------------------+---------------+
    | True              | true          |
    +-------------------+---------------+
    | False             | false         |
    +-------------------+---------------+
    | None              | null          |
    +-------------------+---------------+
import json
import datetime

ctime = datetime.datetime.now()
ctime_string = ctime.strftime("%Y-%m-%d %H:%M:%S") #返回是字符串
#data_dict = {"k1": 123, "k2": ctime} 会报错
data_dict = {"k1": 123, "k2": ctime_string}

v1 = json.dumps(data_dict)
print(v1)

案例:爬虫小案例
import requests
import json

# 返回的很多的内容
res = requests.get("https://api.luffycity.com/api/v1/course/actual/?limit=12&offset=0&category_id=9999")

# 获取字符串类型 JSON格式字符串
print(res.text)

# 将从其他网站拿到的JSON字符串转换成Python的数据类型
data_dict = json.loads(res.text)
print(data_dict)

案例:网站
pip3.9 install flask
import json
from flask import Flask

app = Flask(__name__)


# http://127.0.0.1:5000/login
@app.route("/login")
def login():
    return "登录"


# http://127.0.0.1:5000/users
@app.route("/users")
def users():
    data_dict = {"code": 0, "data": [11, 22, 33, 44]}
    return json.dumps(data_dict)


if __name__ == '__main__':
    app.run()

3.3 os模块

  • 绝对路径

    import os
    
    # /Users/wupeiqi/PycharmProjects/day03/db.txt
    v1 = os.path.abspath("db.txt")
    print(v1)
    
    v2 = os.path.abspath(__file__)
    print(v2)
  • 路径的拼接

    import os
    
    # mac
    # file_path = "files/account.txt"
    
    # win
    # file_path = r"files\naccount.txt"
    
    # 路径拼接
    file_path = os.path.join("files", "xx", "fff", "account.txt")
    print(file_path)
  • 文件/文件夹是否存在

    import os
    file_path_1 = os.path.join("files", "account.txt")
    file_path_2 = os.path.join("files", "xx", "fff", "account.txt")
    
    v1 = os.path.exists(file_path_1) # True/False
    print(v1)
    
    v2 = os.path.exists(file_path_2) # True/False
    print(v2)
  • 创建文件夹

    import os
    
    folder_path = os.path.join("file", "db", "work")
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)

案例:用户注册的功能,为每个用户创建一个文件,文件名=用户名,内容:写密码。所有的数据要放在db目录下。

import os


def run():
    # 1.输入用户名和密码
    user = input("用户名:")
    pwd = input("密码:")

    # 2.先创建文件夹
    folder_path = os.path.abspath("db")
    if not os.path.exists(folder_path):
        os.makedirs(folder_path)

    # 3.在目录下写内容
    #    r,读模式
    #    a,追加模式(文件不存在,则创建)
    #    w,写模式,源文件清空,再写入。
    file_path = os.path.join(folder_path, "{}.txt".format(user))
    file_object = open(file_path, mode="w", encoding='utf-8')
    # 写内容
    file_object.write(pwd)

    # 关闭文件
    file_object.close()


if __name__ == '__main__':
    run()

  • os.listdir,查看你目录下都有哪些文件,找1级

    import os
    
    # name_list = os.listdir("/Users/wupeiqi/PycharmProjects/day03/db")
    # print(name_list)
    # ['wupeiqi.txt', 'root.txt']
    
    
    name_list = os.listdir("/Users/wupeiqi/PycharmProjects/mtb/mtb")
    print(name_list)
    
    # ['asgi.py', '__init__.py', '__pycache__', 'local_settings.py', 'celery.py', 'settings.py', 'urls.py', 'wsgi.py']
  • os.walk,查看你目录下都有哪些文件,找遍所有角落。

    import os
    
    obj = os.walk("/Users/wupeiqi/PycharmProjects/mtb/amazon")
    
    for in_folder, b, file_list in obj:
        # a=/Users/wupeiqi/PycharmProjects/mtb/mtb/__pycache__  进入那个目录
        # b=['__pycache__']   目录下所有的文件夹
        # c=['asgi.py', '__init__.py', 'local_settings.py', 'celery.py', 'settings.py', 'urls.py', 'wsgi.py']  目录下面所有的文件
        for name in file_list:
            file_abs_path = os.path.join(in_folder, name)
            print(file_abs_path)
    import os
    
    obj = os.walk("/Users/wupeiqi/PycharmProjects/mtb/amazon")
    
    for in_folder, b, file_list in obj:
        for name in file_list:
            # xxxxx.txt
            # xxxxx.png
            ext = name.split(".")[-1]
            if ext == "pyc":
                file_abs_path = os.path.join(in_folder, name)
                print(file_abs_path)

3.4 random

import random

v1 = random.randint(100000, 999999)
print(v1)
import random

num_list = [11, 22, 33, 44]

num = random.choice(num_list)
print(num)

xxxx='0123456789'
print("".join([random.choice(xxxx) for i in range(5)]))  #随机拿出5个数字
import random

def f1():
    print("f1")

def f2():
    print("f2")

def f3():
    print("f3")

def f4():
    print("f4")


num_list = [f1, f2, f3, f4]
func = random.choice(num_list)
func()

注意:用处比较大。

import random

num_list = [99, 11, 8, 22, 18, 33, 44]
random.shuffle(num_list)
print(num_list)

案例:扑克牌+抽牌。

import random

color_list = ["黑桃", "红桃", "梅花", "方片"]
num_list = range(1, 14)  # [1,2,3,4..13]  不包括14

# poke_list = [ ("黑桃",1),("黑桃",2),("黑桃",13) ]
poke_list = []
for color in color_list:
    # color = "黑桃"
    for num in num_list:
        # num=1 /2/34...13
        group = (color, num)
        poke_list.append(group)

# 打乱顺序
random.shuffle(poke_list)

# 抽牌
data = random.choice(poke_list)
print(data)

3.5 时间相关

  • time

  • datetime

3.5.1 time
import time

# 时间戳
v1 = time.time()
print(v1)
import time

while True:
    print(123)
    # 时间停止
    time.sleep(5)

3.5.2 datetime
  • datetime对象的形式

    import datetime
    
    # 本地时间
    v1 = datetime.datetime.now()
    print(v1)
    
    tz = datetime.timezone(datetime.timedelta(hours=7))
    v2 = datetime.datetime.now(tz)
    print(v2)
    
    v3 = datetime.datetime.utcnow()
    print(v3)

    有点,方便实现时间的加减

    from datetime import datetime, timedelta
    
    # 本地时间
    v1 = datetime.now()
    print(v1)
    
    v2 = v1 + timedelta(days=527,hours=19)
    print(v2)
  • 字符串的形式

    v1 = "2022-04-14 19:18:17"
    v1 = "2022-04-14"
    v1 = "2022年04月14日"

时间之间的相互转换:

  • datateime -> 字符串类型

    from datetime import datetime, timedelta
    
    # 本地时间
    v1 = datetime.now()
    print(v1)
    
    # datetime对象转换成为字符串
    v2 = v1.strftime("%Y%m%d%H%M%S")
    
    print(v2, type(v2))
  • 字符串 -> datetime类型

    from datetime import datetime, timedelta
    
    v1 = "2022-04-14"
    
    v2 = datetime.strptime(v1, "%Y-%m-%d")
    
    v3 = v2 + timedelta(days=10)
    print(v3)

案例
  1. 提示用户输入手机号,记录入网时间(当前时间),把输入写入到文件(a模式)。

    15566666666,2022-04-14
    15566666666,2022-04-14
    15566666666,2022-04-14
    15566666666,2022-04-14
    from datetime import datetime
    
    
    def run():
        phone = input("手机号:")
        ctime_string = datetime.now().strftime("%Y-%m-%d")
    
        line = "{},{}\n".format(phone, ctime_string)
    
        # 打开文件并追加到里面
        f = open("data.txt", encoding='utf-8', mode='a')
        f.write(line)
        f.close()
    
    
    if __name__ == '__main__':
        run()
  2. 文件夹操作

    让用户去注册,每天一个文件。
    
    设计思路:database
    	20220414.txt
    	20220415.txt
    	20220416.txt
    import os
    from datetime import datetime
    
    
    def run():
        phone = input("手机号:")
        ctime_string = datetime.now().strftime("%Y-%m-%d")
    
        file_path = os.path.join("database", "{}.txt".format(ctime_string))
    
        f = open(file_path, encoding='utf-8', mode='a')
        f.write(phone + "\n")
        f.close()
    
    
    if __name__ == '__main__':
        run()

3.6 ini格式

my.ini

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-bin=py-mysql-bin
character-set-server=utf8
collation-server=utf8_general_ci
log-error=/var/log/mysqld.log
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

[client]
default-character-set=utf8

读取数据

import configparser

# 自动会文件的内容进行解析
parser = configparser.ConfigParser()
parser.read("my.ini", encoding='utf-8')

# v1 = parser.sections()
# print(v1)  # ['mysqld', 'mysqld_safe', 'client']
#
# v2 = parser.items("mysqld_safe")
# for k, v in v2:
#     print(k, v)

v3 = parser.get("mysqld","socket")
print(v3)

删除

import configparser

# 自动会文件的内容进行解析
parser = configparser.ConfigParser()
parser.read("my.ini", encoding='utf-8')

# 内存删除
# parser.remove_option("mysqld", "socket")
parser.remove_section("mysqld_safe")

parser.write(open('my.ini', encoding='utf-8', mode='w'))

修改或者添加

import configparser

# 自动会文件的内容进行解析
parser = configparser.ConfigParser()
parser.read("my.ini", encoding='utf-8')

parser.add_section("group")
parser.set("group", "datadir", "xxxxx")

parser.write(open('my.ini', encoding='utf-8', mode='w'))

以后开发时,可以把它当成我们项目的配置文件。

3.7 正则表达式

  • 正则表达式

  • re模块

import re

text = "楼主太手机号也可18731255799牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"

# 提取文本中手机号
# 表示手机号:1[3|5|8|9]\d{9}
data = re.findall("1[3|5|8|9]\d{9}", text)
print(data)

3.7.1 字符相关
  • wupeiqi 匹配文本中的wupeiqi

    import re
    
    text = "你好wupeiqi,阿斯顿发wupeiqasd 阿士大夫能接受的wupeiqiff"
    data_list = re.findall("wupeiqi", text)
    print(data_list) # ['wupeiqi', 'wupeiqi'] 可用于计算字符串中某个字符出现的次数
    import re
    
    text = "你好wupeiqi,阿斯顿发wupeiqasd 阿士大夫能接受的wupeiqiff"
    data_list = re.findall("wupeiqi", text)
    print(data_list) # ['wupeiqi', 'wupeiqi'] 可用于计算字符串中某个字符出现的次数

  • [abc] 匹配a或b或c 字符。

    import re
    
    text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqaceiqiff"
    data_list = re.findall("[abc]", text)
    print(data_list) # ['b', 'a', 'a', 'a', 'b', 'b', 'c']
    import re
    
    text = "你2b好wupeiqi,阿斯顿发awupeiqasd 阿士大夫a能接受的wffbbupqcceiqiff"
    data_list = re.findall("q[abc]", text)
    print(data_list) # ['qa', 'qc']
  • [^abc] 匹配除了abc意外的其他字符。

    import re
    
    text = "你wffbbupceiqiff"
    data_list = re.findall("[^abc]", text)
    print(data_list)  # ['你', 'w', 'f', 'f', 'u', 'p', 'e', 'i', 'q', 'i', 'f', 'f']
  • [a-z] 匹配a~z的任意字符( [0-9]也可以 )。

    import re
    
    text = "alexrootrootadmin"
    data_list = re.findall("t[a-z]", text)
    print(data_list)  # ['tr', 'ta']
  • . 代指除换行符以外的任意字符。

    import re
    
    text = "alexraotrootadmir9on"
    data_list = re.findall("r.o", text)
    print(data_list) # ['rao', 'roo',"r9o"]
    import re
    
    text = "alexraotrootadmin"
    data_list = re.findall("r.+o", text) # 贪婪匹配
    print(data_list) # ['raotroo']
    import re
    
    text = "alexraotrootadmin"
    data_list = re.findall("r.+?o", text) # 非贪婪匹配
    print(data_list) # ['rao']
  • \w 代指字母或数字或下划线(汉字)。

    import re
    
    text = "北京武沛alex齐北  京武沛alex齐"
    data_list = re.findall("武\wa", text)
    print(data_list) # ["武沛a","武沛a"]
    import re
    
    text = "北京武沛alex齐北  京武沛alex齐"
    data_list = re.findall("武\w+x", text)
    print(data_list) # ['武沛alex', '武沛alex']
  • \d 代指数字

    import re
    
    text = "root-ad32min-add3-admd1in"
    data_list = re.findall("d\d", text)
    print(data_list) # ['d3', 'd3', 'd1']
    import re
    
    text = "root-ad32min-add3-admd1in"
    data_list = re.findall("d\d+", text)
    print(data_list) # ['d32', 'd3', 'd1']

关于正则表达式中的字符(默认情况下只能代表1个):

固定,例如:a、b、c
范围,例如:[a-z]  [0-9]  [A-Z]
数字,例如:\d
字母或数字或下划线(汉字),例如:\w
所有,例如:. 
表示:
	10个数字       \d\d\d\d\d\d\d\d\d\d
	10个数字      \d{10}
	1个或n个数字   \d+
	0个1个数字     \d?
	任意个数字     \d*

3.7.2 个数
  • * 重复0次或更多次

    import re
    
    text = "他是大B个,确实是个大2B。大22B。大29B。"
    data_list = re.findall("大2*B", text)
    print(data_list)  # ['大B', '大2B', '大22B']
    import re
    
    text = "他是大B个,确实是个大2B。大22B。大29B。"
    data_list = re.findall("大\d*B", text)
    print(data_list)  # ['大B', '大2B', '大22B', '大29B']
  • + 重复1次或更多次

    import re
    
    text = "他是大B个,确实是个大2B,大3B,大66666B。"
    data_list = re.findall("大\d+B", text)
    print(data_list) # ['大2B', '大3B', '大66666B']
  • ? 重复0次或1次

    import re
    
    text = "他是大B个,确实是个大2B,大3B,大66666B。"
    data_list = re.findall("大\d?B", text)
    print(data_list) # ['大B', '大2B', '大3B']
  • {n} 重复n次

    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("151312\d{5}", text)
    print(data_list) # ['15131255789']
  • {n,} 重复n次或更多次

    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("\d{9,}", text)
    print(data_list) # ['442662578', '15131255789']
    
  • {n,m} 重复n到m次

    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("\d{10,15}", text)
    print(data_list) # ['15131255789']

关于数量:

?
+
*
{}

3.7.3 贪婪匹配
import re

text = "alexraotrootadmin"
data_list = re.findall("r.+o", text) # 贪婪匹配
print(data_list) # ['raotroo']
import re

text = "alexraotrootadmin"
data_list = re.findall("r.+?o", text)  # 非贪婪匹配
print(data_list)  # ['rao', 'roo']

3.7.4 分组
  • 提取区域

    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("15131(2\d{5})", text)
    print(data_list)  # ['255789']
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("15(1\d)1(2\d{5})", text)
    print(data_list)  # [('13', '255789')]
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    data_list = re.findall("(15(1\d)1(2\d{5}))", text)
    print(data_list)  # [('15131255789', '13', '255789')]
  • 或 + 提取数据区域

    import re
    
    text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    # 151312\d{5}
    # 15131r\w+太
    data_list = re.findall("15131(2\d{5}|r\w+太)", text)
    print(data_list)  # ['root太', '255789']
    import re
    
    text = "楼主15131root太牛15131alex逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    # 151312\d{5}
    # 15131r\w+太
    data_list = re.findall("(15131(2\d{5}|r\w+太))", text)
    print(data_list)  # [('15131root太', 'root太'), ('15131255789', '255789')]

案例
  1. 利用正则匹配QQ号码

    [1-9]\d{4,}
  2. 身份证号码

    import re
    
    text = "dsf130429191912015219k13042919591219521Xkk"
    data_list = re.findall("\d{17}[\dX]", text) # [abc]
    print(data_list) # ['130429191912015219', '13042919591219521X']
    import re
    
    text = "dsf130429191912015219k13042919591219521Xkk"
    data_list = re.findall("\d{17}(\d|X)", text)
    print(data_list) # ['9', 'X']
    import re
    
    text = "dsf130429191912015219k13042919591219521Xkk"
    data_list = re.findall("(\d{17}(\d|X))", text)
    print(data_list) # [('130429191912015219', '9'), ('13042919591219521X', 'X')]
    import re
    
    text = "dsf130429191912015219k13042919591219521Xkk"
    data_list = re.findall("\d{6}(\d{4})(\d{2})(\d{2})\d{3}[\d|X]", text)
    print(data_list)  # [('1919', '12', '01'), ('1959', '12', '19')]
    
    import re
    
    text = "dsf130429191912015219k13042919591219521Xkk"
    data_list = re.findall("(\d{6}(\d{4})(\d{2})(\d{2})\d{3}[\d|X])", text)
    print(data_list)
    # [('130429191912015219', '1919', '12', '01'), ('13042919591219521X', '1959', '12', '19')]
  3. 手机号

    import re
    
    text = "我的手机哈是15133377892,你的手机号是1171123啊?"
    data_list = re.findall("1[3-9]\d{9}", text)
    print(data_list)  # ['15133377892']
  4. 邮箱地址

    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    email_list = re.findall("\w+@\w+\.\w+",text)
    print(email_list) # ['442662578@qq.com和xxxxx']   \w可以匹配到中文
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    email_list = re.findall("[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+", text)
    print(email_list) # ['442662578@qq.com', 'xxxxx@live.com']
    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    email_list = re.findall("\w+@\w+\.\w+", text, re.ASCII)
    print(email_list)  # ['442662578@qq.com', 'xxxxx@live.com']
    import re
    
    text = "楼主太牛44266-2578@qq.com逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    email_list = re.findall("(\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*)", text, re.ASCII)
    print(email_list)  # [('44266-2578@qq.com', '-2578', '', ''), ('xxxxx@live.com', '', '', '')]

3.7.5 re模块
  • findall

    import re
    
    text = "楼主太牛逼了,在线想要 442662578@qq.com和xxxxx@live.com谢谢楼主,手机号也可15131255789,搞起来呀"
    email_list = re.findall("[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+\.[a-zA-Z0-9_-]+", text)
    print(email_list) # ['442662578@qq.com', 'xxxxx@live.com']
  • match,从文本的开头进行匹配

    import re
    
    text = "大小逗2B最逗3B欢乐"
    data = re.match("逗\dB", text)
    print(data) # None
    import re
    
    text = "逗2B最逗3B欢乐"
    data = re.match("逗\dB", text)
    print(data) # 对象
    v1 = data.group()
    print(v1)
  • search,通篇去匹配,只获取匹配成功的第一个

    import re
    
    text = "大小逗2B最逗3B欢乐"
    data = re.search("逗\dB", text)
    if data:
        print(data.group())

3.7.6 校验
起始符 ^
终止符 $
import re

text = input("请输入手机号:")

mtc = re.match("^1[3-9]\d{9}$", text)
if mtc:
    print("格式正确", text)
else:
    print("格式错误")

3.8 多进程

  • 创建
import multiprocessing
def worker():
    """该函数将在子进程中执行"""
    print('Worker')
if __name__ == '__main__':
    # 创建子进程
    p = multiprocessing.Process(target=worker)
    # 启动子进程
    p.start()
    # 等待子进程结束
    p.join()
  • 进程池
import multiprocessing
def worker(num):
    """该函数将在子进程中执行"""
    print('Worker %d' % num)
if __name__ == '__main__':
    # 创建进程池
    pool = multiprocessing.Pool(4)
    # 启动进程池中的进程
    pool.map(worker, range(10))
    # 关闭进程池
    pool.close()
    # 等待进程池中的进程结束
    pool.join()
  • 队列-进程间通信
import multiprocessing

def producer(q):
    """该函数将在生产者进程中执行"""
    for i in range(10):
        q.put(i)

def consumer(q):
    """该函数将在消费者进程中执行"""
    while True:
        item = q.get()
        if item is None:
            break
        print(item)

if __name__ == '__main__':
    # 创建队列
    q = multiprocessing.Queue()
    # 创建生产者进程
    p1 = multiprocessing.Process(target=producer, args=(q,))
    # 创建消费者进程
    p2 = multiprocessing.Process(target=consumer, args=(q,))
    # 启动进程
    p1.start()
    p2.start()
    # 等待进程结束
    p1.join()
    # 发送结束信号
    q.put(None)
    p2.join()

3.9 urllib

from urllib.parse import unquote,quote 
txt = '你好'
#URL编码 
new_txt = quote(txt) 
print(new_txt) 
#url解码 
str='%E4%BD%A0%E5%A5%BD' 
print(unquote(str))

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值