python（010）

最新推荐文章于 2022-07-26 00:42:39 发布

一起看日落啊

最新推荐文章于 2022-07-26 00:42:39 发布

阅读量276

点赞数

分类专栏： python语言学习文章标签： python

本文链接：https://blog.csdn.net/weixin_44642263/article/details/122448475

版权

python语言学习专栏收录该内容

9 篇文章 0 订阅

订阅专栏

shutil模块

主要针对文件的拷贝、删除、移动、压缩和解压缩的操作

copyfile()：拷贝整个文件，只拷贝内容——权限、用户、时间等不拷贝

copy()：拷贝文件，除了拷贝内容，还有其权限

copy2()：内容，权限，状态信息一同拷贝过来

copyTree():递归地复制目录及其子目录的文件和状态信息

rmtree()：递归删除某一个目录中所有的内容

move()：递归地移动文件或目录，重命名

make_archive：创建压缩文件

unpack_archive：解压缩文件包

import shutil
shutil.copyfile("FileTest01.py","FileTest01_copy.py")
shutil.copy("FileTest02.py","FileTest02_copy.py")
shutil.copy2("FileTest03.py","FileTest03_copy.py")
source = "C:\\Users\\HENG\\Desktop\\PythonCode\\Day01"
aim = "Day01"
shutil.copytree(source,aim)
source = "D:\\文档\\锐旗设计375套PPT模版锦集"
aim = "ppt"
shutil.copytree(source,aim,ignore=shutil.ignore_patterns("*.doc","*.jpg",".zip"))#忽略那些文件
shutil.rmtree("C:\\Users\\HENG\\Desktop\\PythonCode2")
shutil.move("Day01","Day01_1")
shutil.make_archive("code01","zip",base_dir="Day01_1")
shutil.unpack_archive("code01.zip","code01","zip")

2.2 模式匹配与正则表达式

主要用来验证文本模式的一种技术

验证电话号码的正确定 : 415-555-4242，3个数字-3个数字-4个数字

phone = "415-555-4242"
def isPhoneNumber(phone):
    if len(phone) != 12:
        return False
    for i in range(0,3):
        if not phone[i].isdecimal():#是否是数字
            return False
    if phone[3] != '-':
        return False
    for i in range(4,7):
        if not phone[i].isdecimal():
            return False
    if phone[7] != '-':
        return False
    for i in range(8,12):
        if not phone[i].isdecimal():
            return False
    return True
print(isPhoneNumber(phone))
print(isPhoneNumber("123-123-123"))
print(isPhoneNumber("123-abc-123a"))

如果需要改变验证的规则，则需要添加更多的代码

如何用正则表达式来做？

import re
# 先去定义匹配的规则 \d 表示数字
regex = re.compile(r'\d\d\d-\d\d\d-\d\d\d\d') # 加了r 字符串中\就不是转义符了
# 目标文本信息
text = "My phone number is 123-123-1122 and yours is 321-321-3322"
# 调用search方法，在目标文本信息中按照定义好的匹配规则去查找相应的结果
# 第一次出现的结果 - match
match = regex.search(text)
# 得到匹配的结果
print(match.group())#见下面的示例

但是这样，可读性差
用括号进行分组

如果想将这个电话号码分为三部分

import re
regex = re.compile(r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)')
match = regex.search("My phone number is 123-123-1122")
print(match.group()) # 从左到右第一次匹配的全结果
print(match.group(1))
print(match.group(2))
print(match.group(3))
print(match.groups())#将匹配到的结果分组（元组）了

在这里插入图片描述

用管道匹配多个分组

希望匹配多个表达式中的一个时

import re
regex = re.compile(r'Batman|Tina Fey')#从左到右匹配第一个，是或的关系
match1 = regex.search("Batman and Tina Fey")
print(match1.group())

match2 = regex.search("Tina Fey and Batman")
print(match2.group())

如果希望匹配的是Batman、Batmobile、Batcopter、Batbat中任意一个时

import re
regex = re.compile(r'Bat(man|mobile|copter|bat)(haha|xixi|lala)')#分两组，是必选的
match1 = regex.search("Batbatxixi and Batmanhaha")
print(match1.group())
print(match1.group(1))
print(match1.group(2))
print(match1.groups())#相同部分不管，只把后面两个分组的作为元组表示

用问号实现可选匹配

有时候，想匹配的模式是可选的？>= 0次

import re
regex = re.compile(r'Bat(wo|ko)?man(aa|bb)')
match1 = regex.search("Batman")
print(match1 == None)#不能匹配，会报错，本处处理了
match1 = regex.search("Batwoman")
print(match1 == None)
match1 = regex.search("Batwomanbb")
print(match1.group())
print(match1.groups())
match1 = regex.search("Batmanaa")
print(match1.group())
print(match1.groups())

将国号可选

import re
"""
国-区-机
123-123-1122
123-1122
"""
regex = re.compile(r'(\d\d\d-)?(\d\d\d)-(\d\d\d\d)')
match1 = regex.search("123-123-1212")
print(match1.groups())
match1 = regex.search("123-1212")
print(match1.groups())

用星号匹配零次或多次

import re
regex = re.compile(r'Bat(wo)*man')#用*表示可以出现0次也可以出现多次
match1 = regex.search("The Adventures Batman")
print(match1.group())
match1 = regex.search("The Adventures Batwoman")
print(match1.group())
match1 = regex.search("The Adventures Batwowowoman")
print(match1.group())

用加号匹配一次或多次

import re
regex = re.compile(r'Bat(wo)+man')
match1 = regex.search("The Adventures Batman")
print(match1 == None)
match1 = regex.search("The Adventures Batwoman")
print(match1.group())
match1 = regex.search("The Adventures Batwowowoman")
print(match1.group())

用花括号匹配特定次数

{n} 必须是n次

import re
regex = re.compile(r'\d{3}-\d{3}-\d{4}')
match = regex.search("123-123-1122")
print(match.group())

{n,m} 最少n次，最多m次

import re
regex = re.compile(r'(Ha){3,5}')
match = regex.search("HaHaHa")
print(match.group())
match = regex.search("HaHaHaHa")
print(match.group())
match = regex.search("HaHaHaHaHa")
print(match.group())

贪心匹配和非贪心匹配

(Ha){3,5}默认是贪心匹配（Python中的正则也默认是贪心）——尽可能将结果放大，如何做非贪心匹配

import re
regex = re.compile(r'(Ha){3,5}?')
match = regex.search("HaHaHa")
print(match.group())
match = regex.search("HaHaHaHa")
print(match.group())
match = regex.search("HaHaHaHaHa")
print(match.group())

最后结果只匹配3个
此时问号有两种含义：声明为贪心匹配或表示可选的分组

findall方法

search返回的是从左到右第一次匹配的结果，findall返回的是所有的匹配结果（所以结果的列表）

import re
regex = re.compile(r'\d{3}-\d{3}-\d{4}')
lst = regex.findall("My number is 123-123-1212 and yours is 321-321-3322")
print(lst)
regex = re.compile(r'(\d{3})-(\d{3})-(\d{4})')#返回元组封装在
lst = regex.findall("My number is 123-123-1212 and yours is 321-321-3322")
print(lst)
print(lst[1][1])

字符分类

\d表示0~9，等效于(0|1|2|3|4|5|6|7|8|9)

\d：0~9的任何一个数字
\D：非\d，非数字
\w：任何字母、数字或下划线字符（匹配“单词”字符）（不完全等同标识符，因为标识符不能用数字开头）
\W：非\w，非字母、数字和下划线字符
\s：空格、制表符或换行符（匹配“空白”字符）
\S：非\s

import re
regex = re.compile(r'\w+')
lst = regex.findall("haha xixi 123baba mama321")
print(lst)#四个全打印了

自定义字符分类

可以用方括号自定义字符分类

import re
regex = re.compile(r'[a-zA-Z_]+\w+')
lst = regex.findall("haha xixi 123baba mama321")
print(lst)

import re
regex = re.compile(r'0x[0-9A-F]{4}')
lst = regex.findall("0xAK47 0xABCD 0x1A2B 0x1234 0xA1W3")
print(lst)

可以在自定义字符前加^，表示匹配不在这个字符类中的所有字符

import re
regex = re.compile(r'[^aeiouAEIOU]{3}')#相当于取非
lst = regex.findall("Refrigerator is good but cool")
print(lst)

插入字符和美元字符

如果在正则表达式的开始使用^表示必须以此为文本的开头，如果在末尾使用$必须以此为文本的结尾

import re
regex = re.compile(r'^Hello')
lst = regex.findall("HelloWorld My Name is Hellow!")#结果是Hello
print(lst)
regex = re.compile(r'\d$')#必须以某个数字结尾，最后结果是2
lst = regex.findall("My age is 12")
print(lst)

通配符

.表示所有字符，一个（除了回车）

import re
regex = re.compile(r'.at')
lst = regex.findall("The cat in the hat sat on the flat mat")#没有flat,只要一个，lat
print(lst)

点星匹配所有字符

import re
regex = re.compile(r'name:(.*),age:(.*);')#*任意次数
lst = regex.findall("name:张三,age:12;"
                    "name:李四,age:24;")
print(lst)
[('张三,age:12;name:李四', '24')]

贪心的有点过火，所以非贪心

import re
regex = re.compile(r'name:(.*?),age:(.*?);')
lst = regex.findall("name:张三,age:12;"
                    "name:李四,age:24;")
print(lst)

import re
regex = re.compile(r'<.*>')
lst = regex.findall("<html><head></head><body></body></html>")#不换行会匹配成一个
print(lst)
# ['<html><head></head><body></body></html>']
regex = re.compile(r'<.*?>')
lst = regex.findall("<html><head></head><body></body></html>")
print(lst)
# ['<html>', '<head>', '</head>', '<body>', '</body>', '</html>']

?匹配零次或一次前面的分组
*匹配零次或多次前面的分组
+匹配一次或多次前面的分组
{n}匹配n次前面的分组
{n,}匹配n次或更多前面的分组
{,m}匹配零次到m次前面的分组
{n,m}匹配n次到m次前面的分组
{n,m}?或*?或+?对前面的分组进行非贪心匹配
^spam意味着字符串必须以spam开头
spam$意味着字符串必须以spam结尾
.匹配所有的字符，除了换行符
[abc]匹配方括号内任意的一个字符
[^abc]匹配不在方括号内的任意的一个字符

获取cmd中通过ipconfig得到的所有IP地址

import pyperclip,re#导包，可以读取剪贴板最近一次复制的内容第一次使用可能没有这个包，在settings中搜索python——>python Interpereter——>点+号，搜pyperclip安装（点install）
regex = re.compile(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
text = str(pyperclip.paste())
lst = regex.findall(text)
for ip in lst:
    print(ip)

自己写需要能看懂，但是到用的时候就不需要全部自己写了，直接百度IP地址合法性严重正则表达式就可以了
获取网页中图片的地址
day10下76：00开始

import pyperclip,re
"""
allimg /200206/30-200206141A70-L. jpg
allimg /150723/9-150H31916410-L. jpg
"""
url = "https://img.tupianzj.com/uploads/allimg"
regex = re.compile(r'allimg(.*)jpg')
text = str(pyperclip.paste())#pyperclip.paste()即复制粘贴的粘贴
lst = regex.findall(text)
for item in lst:
    print(url + item + "jpg")

捉取数据时尽量范围要小一点
常用正则表达式

校验数字的相关表达式

功能	表达式
数字	`^[0-9]*$`
n位的数字	`^\d{n}$`
至少n位的数字	`^\d{n,}$`
m-n位的数字	`^\d{m,n}$`
零和非零开头的数字	`^(0\|[1-9][0-9]*)$`
非零开头的最多带两位小数的数字	`^([1-9][0-9]*)+(.[0-9]{1,2})?$`
带1-2位小数的正数或负数	`^(\-)?\d+(\.\d{1,2})?$`
正数、负数、和小数	`^(\-\|\+)?\d+(\.\d+)?$`
有两位小数的正实数	`^[0-9]+(.[0-9]{2})?$`
有1~3位小数的正实数	`^[0-9]+(.[0-9]{1,3})?$`
非零的正整数	`^[1-9]\d*$`
非零的负整数	`^-[1-9]\d*$`
非负整数	`^\d+$`
非正整数	`^-[1-9]\d*\|0$`
非负浮点数	`^\d+(\.\d+)?$`
非正浮点数	`^((-\d+(\.\d+)?)\|(0+(\.0+)?))$`
正浮点数	`^[1-9]\d\.\d\|0\.\d[1-9]\d$`
负浮点数	`^-([1-9]\d\.\d\|0\.\d[1-9]\d)$`
浮点数	`^(-?\d+)(\.\d+)?$`

校验字符的相关表达式

功能	表达式
汉字	`^[\u4e00-\u9fa5]{0,}$`
英文和数字	`^[A-Za-z0-9]+$`
长度为3-20的所有字符	`^.{3,20}$`
由26个英文字母组成的字符串	`^[A-Za-z]+$`
由26个大写英文字母组成的字符串	`^[A-Z]+$`
由26个小写英文字母组成的字符串	`^[a-z]+$`
由数字和26个英文字母组成的字符串	`^[A-Za-z0-9]+$`
由数字、26个英文字母或者下划线组成的字符串	`^\w+$`
中文、英文、数字包括下划线	`^[\u4E00-\u9FA5A-Za-z0-9_]+$`
中文、英文、数字但不包括下划线等符号	`^[\u4E00-\u9FA5A-Za-z0-9]+$`
可以输入含有`^%&’,;=?$\”`等字符	`[^%&’,;=?$\x22]+`
禁止输入含有`~`的字符	`[^~\x22]+`

特殊场景的表达式

功能	表达式
Email地址	`^\w+([-+.]\w+)@\w+([-.]\w+)\.\w+([-.]\w+)*$`
域名	`[a-zA-Z0-9][-a-zA-Z0-9]{0,62}(/.[a-zA-Z0-9][-a-zA-Z0-9]{0,62})+/.?`
InternetURL	`[a-zA-z]+://[^\s]` 或 `^http://([\w-]+\.)+[\w-]+(/[\w-./?%&=])?$`
手机号码	`^(13[0-9]\|14[5\|7]\|15[0\|1\|2\|3\|5\|6\|7\|8\|9]\|18[0\|1\|2\|3\|5\|6\|7\|8\|9])\d{8}$`
国内电话号码	`\d{3}-\d{8}\|\d{4}-\d{7}`(0511-4405222、021-87888822)
身份证号	`^\d{15}\|\d{18}$`(15位、18位数字)
短身份证号码	`^([0-9]){7,18}(x\|X)?$` 或 `^\d{8,18}\|[0-9x]{8,18}\|[0-9X]{8,18}?$`(数字、字母x结尾)
帐号是否合法	`^[a-zA-Z][a-zA-Z0-9_]{4,15}$`(字母开头，允许5-16字节，允许字母数字下划线)
密码	`^[a-zA-Z]\w{5,17}$`(以字母开头，长度在6~18之间，只能包含字母、数字和下划线)
强密码	`^(?=.\d)(?=.[a-z])(?=.*[A-Z]).{8,10}$`(必须包含大小写字母和数字的组合，不能使用特殊字符，长度在8-10之间)
日期格式	`^\d{4}-\d{1,2}-\d{1,2}`
一年的12个月(01～09和1～12)	`^(0?[1-9]\|1[0-2])$`
一个月的31天(01～09和1～31)	`^((0?[1-9])\|((1\|2)[0-9])\|30\|31)$`
xml文件	`^([a-zA-Z]+-?)+[a-zA-Z0-9]+\\.[x\|X][m\|M][l\|L]$`
双字节字符	`[^\x00-\xff]`(包括汉字在内，可以用来计算字符串的长度(一个双字节字符长度计2，ASCII字符计1))
空白行的正则表达式	`\n\s*\r` (可以用来删除空白行)
HTML标记的正则表达式	`<(\S?)[^>]>.?\|<.? />`(对于复杂的嵌套标记依旧无能为力)
首尾空白字符的正则表达式	`^\s\|\s$或(^\s)\|(\s$)`(可以用来删除行首行尾的空白字符(包括空格、制表符、换页符等等))
腾讯QQ号	`[1-9][0-9]{4,}` (腾讯QQ号从10000开始)
中国邮政编码	`[1-9]\d{5}(?!\d)` (中国邮政编码为6位数字)
IP地址提取	`\d+\.\d+\.\d+\.\d+` (提取IP地址时有用)
IP地址合法性判断	`((?:(?:25[0-5]\|2[0-4]\\d\|[01]?\\d?\\d)\\.){3}(?:25[0-5]\|2[0-4]\\d\|[01]?\\d?\\d))`

声明：请正确合法使用知识，不得用于其他非法操作，请自觉遵守法律法规，为自己的行为负责！本贴只做学习交流之用！不得用于非法操作，否则一切结果自行承担！

一起看日落啊

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python（010）

shutil模块主要针对文件的拷贝、删除、移动、压缩和解压缩的操作copyfile()：拷贝整个文件，只拷贝内容——权限、用户、时间等不拷贝copy()：拷贝文件，除了拷贝内容，还有其权限copy2()：内容，权限，状态信息一同拷贝过来copyTree():递归地复制目录及其子目录的文件和状态信息rmtree()：递归删除某一个目录中所有的内容move()：递归地移动文件或目录，重命名make_archive：创建压缩文件unpack_archive：解压缩文件包import shuti
复制链接

扫一扫