使用python简单编写爬取图片_python爬虫编写图片-CSDN博客

本文链接：https://blog.csdn.net/p_utao/article/details/108812468

步骤：
1.爬取源代码
2.获得图片链接
3.下载图片

安装一下requests然后调用，接下来我们获取一下网站的源代码

import requests

url = "http://www.yangzixuanyzx.cn/anli/"
response = requests.get(url)
print(response.text)

可以看到乱码了，设置为编码utf-8

response.encoding = 'utf-8'

接下来获取图片的链接，打开f12来查看图片的地址有没有规律
在这里插入图片描述
都在/uploads/allimg/的目录下

/uploads/allimg/160614/1-1606141521200-L.png
/uploads/allimg/200708/seo1.jpg

使用re模块来通过正则匹配网站源代码中的图片地址

import requests
import re

url = "http://www.yangzixuanyzx.cn/anli/"

response = requests.get(url)
response.encoding = 'utf-8'
html = response.text

himg = re.findall("/uploads/allimg/\w{0,9}/..\w{0,15}.{0,9}[jpg|png]",html)
print(himg)

但是现在只有绝对路径还要拼接一下网址

import requests
import re

pinjie = []
url = "http://www.yangzixuanyzx.cn/anli/"

response = requests.get(url)
response.encoding = 'utf-8'
html = response.text

himg = re.findall("/uploads/allimg/\w{0,9}/..\w{0,15}.{0,9}[jpg|png]",html)
for i in himg:
    pinjie.append(url + ".." + i)

print(pinjie)

拼接完后，通过写入的方式来到达下载的效果，用循环来重命名图片，保存到D盘的文件夹1中（提前在D盘创建一下）在这里插入图片描述
完整的代码：

import requests
import re
import os

pinjie = []
shuzi = 0
url = "http://www.yangzixuanyzx.cn/anli/"

response = requests.get(url)
response.encoding = 'utf-8'
html = response.text

himg = re.findall("/uploads/allimg/\w{0,9}/..\w{0,15}.{0,9}[jpg|png]",html)
for i in himg:
    pinjie.append(url + ".." + i)

for img in pinjie:
    shuzi += 1
    with open("D:\\1\\"+str(shuzi) + ".jpg",'wb') as f:
        f.write(requests.get(img).content)
    print(os.path.basename(img) + "保存成功")