步骤:
1.爬取源代码
2.获得图片链接
3.下载图片
安装一下requests然后调用,接下来我们获取一下网站的源代码
import requests
url = "http://www.yangzixuanyzx.cn/anli/"
response = requests.get(url)
print(response.text)
可以看到乱码了,设置为编码utf-8
response.encoding = 'utf-8'
接下来获取图片的链接,打开f12来查看图片的地址有没有规律
都在/uploads/allimg/的目录下
/uploads/allimg/160614/1-1606141521200-L.png
/uploads/allimg/200708/seo1.jpg
使用re模块来通过正则匹配网站源代码中的图片地址
import requests
import re
url = "http://www.yangzixuanyzx.cn/anli/"
response = requests.get(url)
response.encoding = 'utf-8'
html = response.text
himg = re.findall("/uploads/allimg/\w{0,9}/..\w{0,15}.{0,9}[jpg|png]",html)
print(himg)
但是现在只有绝对路径还要拼接一下网址
import requests
import re
pinjie = []
url = "http://www.yangzixuanyzx.cn/anli/"
response = requests.get(url)
response.encoding = 'utf-8'
html = response.text
himg = re.findall("/uploads/allimg/\w{0,9}/..\w{0,15}.{0,9}[jpg|png]",html)
for i in himg:
pinjie.append(url + ".." + i)
print(pinjie)
拼接完后,通过写入的方式来到达下载的效果,用循环来重命名图片,保存到D盘的文件夹1中(提前在D盘创建一下)
完整的代码:
import requests
import re
import os
pinjie = []
shuzi = 0
url = "http://www.yangzixuanyzx.cn/anli/"
response = requests.get(url)
response.encoding = 'utf-8'
html = response.text
himg = re.findall("/uploads/allimg/\w{0,9}/..\w{0,15}.{0,9}[jpg|png]",html)
for i in himg:
pinjie.append(url + ".." + i)
for img in pinjie:
shuzi += 1
with open("D:\\1\\"+str(shuzi) + ".jpg",'wb') as f:
f.write(requests.get(img).content)
print(os.path.basename(img) + "保存成功")