因为写博客总喜欢配些高质量的美图,鉴于惰性,不想去网上找各种素材,于是利用爬虫获取了一批壁纸到本地,以下是抓取的网址:https://wallhaven.cc/,具体代码实现如下 :
#-- coding:utf-8 --
import requests
from lxml import etree
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:64.0) Gecko/20100101 Firefox/64.0"
}
filepath = "C:\\Users\金少\Desktop\壁纸\wallhaven" # 文件路径
for i in range(1, 20): # 爬取页数
kv = {"page": i}
url = "https://wallhaven.cc/toplist"
try:
r = requests.get(url, headers=headers, params=kv, timeout=20)
# 开始解析
html = etree.HTML(r.text)
srcs = html.xpath(".//li//a[@class='preview']/@href") # 获取到跳转网页
for src in srcs:
r = requests.get(src, headers=headers, timeout=20)
html = etree.HTML(r.text)
img_src = html.xpath(".//img[@id='wallpaper']/@src")
for src in img_src:
filename_1 = src.split('/')[-1] # 获取文件名
response = requests.get(src, headers=headers)
with open(filepath + filename_1, 'wb') as file:
file.write(response.content)
print(filename_1)
print("Succeed")
except:
continue
print("跳过")
print("Triumph")
抓取的图片分享百度云链接如下:高清壁纸 提取码:0z07
以下是壁纸鉴赏环节: