今天来爬取壁纸图片
url = "http://www.bizhi88.com/"
还是先获取页面源代码,找到每张图片具体的链接
url = "http://www.bizhi88.com/"
headers = {
"user-agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
resp = requests.get(url=url,headers=headers)
resp.encoding = "utf-8"
html = etree.HTML(resp.text)
li_src = html.xpath('/html/body/div[3]/div')
得到每张图片的src后 再进行request操作
取到图片对应的数据 然后写入存储路径即可
完整代码:
import requests
from lxml import etree
def get_one_page(url):
headers = {
"user-agent":"Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
resp = requests.get(url=url,headers=headers)
resp.encoding = "utf-8"
html = etree.HTML(resp.text)
li_src = html.xpath('/html/body/div[3]/div')
for l_src in li_src:
every_src = l_src.xpath('./a[1]/@href')[0]
every_src = "http://www.bizhi88.com"+every_src
resp1 = requests.get(url = every_src,headers=headers)
resp1.encoding = "utf-8"
html1 = etree.HTML(resp1.text)
photo = html1.xpath('/html/body/div[3]/div[1]/div/div[1]/img/@src')[0]
photo_name = html1.xpath('/html/body/div[3]/div[1]/div/ul/li[2]/a/text()')[0]
path = r'/home/chq/桌面/2021暑假/python爬虫/photo/'
image_data = requests.get(url=photo,headers=headers).content
with open(path+photo_name,'wb') as fp:
fp.write(image_data)
print("下载完毕!",photo_name)
if __name__ == '__main__':
url = "http://www.bizhi88.com/"
get_one_page(url)