代码:
import requests
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
headers = {'User-Agent':'Mozilla/5.0 (s NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}
resp = requests.get(url, headers=headers)
resp_json = resp.json()
# 根据键获取data的值
data_list = resp_json['data']
# 创建空列表存储图片的链接地址
lst = [ ]
# 继续提取数据,遍历列表中的数据,根据键获取thumbURL的值
for item in data_list:
# 最后一个object没有数据,故此处加一个判断
if len(item) != 0:
lst.(item['thumbURL'])
7.请求url为每张图片的地址,获取数据,再存储数据
代码:
import requests
url = 'https://image.baidu.com/search/acjson?tn=resultjson_com&logid=12117865351080430388&ipn=rj&ct=201326592&is=&fp=result&queryWord=%E7%BE%8E%E5%A5%B3&cl=2&lm=-1&ie=utf-8&oe=utf-8&adpicid=&st=-1&z=&ic=0&hd=&latest=©right=&word=%E7%BE%8E%E5%A5%B3&s=&se=&tab=&width=&height=&face=0&istype=2&qc=&nc=1&fr=&expermode=&force=&cg=girl&pn=30&rn=30&gsm=1e&1612964334559='
headers = {'User-Agent':'Mozilla/5.0 (s NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'}
resp = requests.get(url, headers=headers)
resp_json = resp.json()
data_list = resp_json['data']
lst = [ ]
for item in data_list:
if len(item) != 0:
lst.(item['thumbURL'])
# 计数作为图片名称
count = 0
# 遍历列表存储所有图片
for item in lst:
# 发送请求
resp = requests.get(item, headers=headers)
count += 1
#,创建img文件夹, wb:写入二进制数据
with open('img/'+str(count)+'.jpg', 'wb') as file:
file.write(resp.content)
print('图片爬取完毕')
8.附录
response对象的常用属性
response.status_code,检查请求是否成功
response.content,把response对象转换成二进制数据
response.text,把response对象转换成字符串数据
response.encoding,定义response对象的编码