Day32python相关基础知识，正则表达式以及使用python获得网站图片

想成为前端工程师滴小小白

已于 2022-10-23 08:56:12 修改

阅读量168

点赞数

分类专栏：网络安全文章标签： python

于 2022-10-22 19:37:07 首次发布

本文链接：https://blog.csdn.net/qq_45169358/article/details/127439533

版权

网络安全专栏收录该内容

43 篇文章 2 订阅

订阅专栏

在这里插入图片描述

在这里插入图片描述

正则表达式：
是对字符串操作的一种逻辑公式，就是用事先定义好的一些特定字符，以及这些特定字符的组合，组成一个“规则字符串”，这个“规则字符串”用来表达对字符串的一种过滤逻辑
re模块的应用：
I say Good not food
import re
dir(re)

单个字符匹配：
. 	点 匹配单个任意字符
re.findall(".ood","I say Good not food")
[]	[]里的内容被逐一单个匹配
re.findall("[Gf]ood","I say Good not food")
\d		匹配单个数字
re.findall("/d","I am 40")//['4','0']
\w	匹配[0~9,a~z,A~Z]
re.findall("\w","I am 40")//['4','0']
\s 匹配空的字符   空格，tab键算
re.findall("\s","I am 40")
匹配一组字符串直接匹配即可：
直接匹配：
re.findall("good","I say Good not food")//空白因为直接匹配需要严格大小写
分隔符的应用：
re.findall("Good|food","I say Good not food")
匹配两个不同的字符串：
*号：匹配左邻出现0次或多次
re.findall("go*gle","I like google not ggle goooogle and gooooooogle")
+号：左邻字符出现1次或多次
re.findall("go+gle","I like google not ggle goooogle and gooooooogle")
？号：左邻字符出现0次或1次
re.findall("go?gle","I like google not ggle goooogle and gooooooogle")
{}号：定义左邻字符出现的次数
re.findall("go{2}gle","I like google not ggle goooogle and gooooooogle")
re.findall("go{2,10}gle","I like google not ggle goooogle and gooooooogle")
re.findall("go{2,3}gle","I like google not ggle goooogle and gooooooogle")
^号：匹配是否以某个字符串开头
re.findall("^I like","I like google not ggle goooogle and gooooooogle")//有
re.findall("^and","I like google not ggle goooogle and gooooooogle")//无
$号：匹配是否以某字符串结尾
re.findall("gogle$","I like google not ggle goooogle and gooooooogle")
()分组和保存： \数字
test=re.search("(allen)\\1","my name is allenallen")
test.group()
\\1
\\:转义字符
1:存在的内容

在这里插入图片描述

在这里插入图片描述
1.爬虫获取主页信息：如何使用爬虫获取网页的html代码
2.过滤图片地址
3.爬虫图片获取
本次爬取图片测试使用的网址是https://www.dxsbb.com/
paqu.py:

import urllib.request
import re
class GetHtml(object):
    def __init__(self,URL,HEAD):
        self.url=URL
        self.head=HEAD
    def get_index(self):
        self.request=urllib.request.Request(self.url)
        self.request.add_header("user_agent",self.head)
        self.response=urllib.request.urlopen(self.request)
        return self.response.read()
    def get_list(self):
        self.strimglist=[]
        self.imglist=re.findall(b"upFiles/infoImg/\w{16}.jpg",self.get_index())
        for i in self.imglist:
            self.strimglist.append(self.url+str(i,encoding="utf8"))
        return self.strimglist
    def get_image(self):
        num=0
        for self.url in self.get_list():
            num+=1
            with open(str(num)+".jpg","wb") as f:
                f.write(self.get_index())
html=GetHtml("https://www.dxsbb.com/","Mozilla/5.0 (Windows NT 8.1; Win32; x32; rv:105.0) Gecko/20100101 QQBroswer/105.0")
#print(html.get_index())
html.get_image()