【Python爬虫】爬取商品图片并下载

zhouwhui

已于 2023-03-30 16:54:06 修改

阅读量2k

点赞数 2

分类专栏： Python爬虫文章标签： python xpath 网络爬虫

于 2020-08-10 11:47:34 首次发布

本文链接：https://blog.csdn.net/qq_37251994/article/details/107908561

版权

Python爬虫专栏收录该内容

6 篇文章 0 订阅

订阅专栏

1. 引入库

import requests
from lxml import etree

2. 请求数据（headers 的作用是将请求伪装成浏览器的请求，可以跳过简单的爬虫拦截）

# 更换一个可以下载图片的网址
url = "https://search.jd.com/Search?keyword=笔记本电脑&wq=笔记本电脑&page=%d&s=%d&click=0"%(page,size)
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36"}

resp = requests.get(url=url,headers=headers)
content = etree.HTML(resp.text)

3. 获取图片链接

# 获取图片网址
img_list = content.xpath('//*[@id="J_goodsList"]/ul/li/div/div[1]/a/img/@src')
func = lambda x:"https:"+x # 给图片链接拼接请求头“https:”
img_list = list(map(func,img_list))

4. 分别请求图片链接（获取图片的二进制数据，并以二进制的形式存入到图片文件中）

i = 1
print("图片开始下载，请注意查看文件夹")
for img in img_list:
    # 请求图片路径，获取图片内容
    r = requests.get(img)
    img_path = 'img/img'+str(i)+'.png'
    # r.text为响应的文本内容， r.content为响应内容的二进制格式数据
    # 将图片的二进制数据存放到图片文件中（使用二进制形式写入）
    open(img_path, 'wb').write(r.content)
    i+=1 # 控制图片命名不重复
else:
    print("图片下载完成")