1、直接百度图片搜索会以一页的形式加载所有图片,不是以分页的形式显示图片,所以需要修改链接中的index为flip,修改完毕会以分页的形式显示
如:
2、不是很懂html,把链接删减为http://image.baidu.com/search/flip?tn=baiduimage&word=%CD%BC%C6%AC&pn=0之后好像也没什么变化
所以只需要修改word和pn关键字即可
word为我们需要搜索的关键字,如猫、狗等,pn为页数,我搜索时每页按20张图片排版,每点一次下一页pn会加20,如第一页,则修改pn为0,如果是第十页,则修改pn为180
下面贴代码
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time : 2018/12/4 10:31
# @Author :
# @Site :
# @File : baiduPic.py
# @Software: PyCharm
# from bs4 import BeautifulSoup
import requests
import re
import os
if __name__ == '__main__':
keyword = input("输入你要下载的图片名称: ")
path = "D:/baiduimage/"+keyword
if not os.path.exists(path):
os.makedirs(path)
url = "https://image.baidu.com/search/flip"
pattern_objURL = re.compile(r'"middleURL":"(.*?)"')
#百度图片每页20张图片,pn每次加20
i = 1
for page in range(0,100,20):
# params = {"ipn": "rj", "tn": "baiduimage", "word": keyword, "pn": str(page)}
params = {"tn": "baiduimage", "word": keyword, "pn": str(page)}
try:
html = requests.get(url,params=params,timeout=10)
except Exception as e:
print(e)
continue
print(html.url)
html.encoding = 'utf-8'
# print(html.text)
img_list = pattern_objURL.findall(html.text)
for img in img_list[:min(20,len(img_list))]:
print(i,":",img)
try:
img_data = requests.get(img,timeout=10).content
with open(os.path.join(path,str(i)+".jpg"),"wb+") as f:
f.write(img_data)
except Exception as e:
print(e)
i += 1
更新:
python有更方便的爬虫插件icrawler,需要pip安装,可以很方便地实现百度、bing、google等网站爬虫