1024_scsdn_徽章获取日_日常工作记录_百度图片爬取小程序

最新推荐文章于 2021-10-24 21:02:08 发布

繁华三千东流水

最新推荐文章于 2021-10-24 21:02:08 发布

阅读量2.3k

点赞数

分类专栏：爬虫文章标签：爬虫 python

本文链接：https://blog.csdn.net/qq872890060/article/details/102716492

版权

爬虫专栏收录该内容

2 篇文章 0 订阅

订阅专栏

运行下述代码，将会自动打开百度图片搜索并开始无限下载所搜索到的图片，你不停，它不停，会保存至当前文件夹。
前提是使用python，然后安装了webdriver驱动器

from selenium import webdriver
import os
import time
import requests
import warnings
warnings.filterwarnings('ignore')

# 相当于在百度图片的搜索上输入搜索关键字
name = input("输入一个要爬取的图片名称：")

# 创建谷歌驱动器
driver = webdriver.Chrome(executable_path='chromedriver.exe')

# 驱动谷歌浏览器打开连接
driver.get('https://image.baidu.com/')
# 屏幕最大化
driver.maximize_window()

# 定位到搜索输入框
input1 = driver.find_element_by_xpath('//input[@id="kw"]')
# 输入要搜索的关键字
input1.send_keys(name)
# 定位到搜索按钮并点击
driver.find_element_by_xpath('//span[@class="s_search"]').click()

# 循环下载
a = 0
list1 = []
list2 = []
while True:
    # 每5秒控制页面下拉
    time.sleep(5)
    driver.execute_script('window.scrollTo(0,document.body.scrollHeight)')
    # 获取到每张图片
    list = driver.find_elements_by_xpath('//li[@class="imgitem"]')

    list1 = set(list) - set(list2)
    mpathjoin = name
    for li in list1:
        # 获取每张图片的网址链接
        url_img = li.get_attribute('data-thumburl')
        # 访问图片链接
        response = requests.get(url_img)
        # 拼接图片下载路径
        mfilejoin = os.path.join(mpathjoin,name + str(list.index(li)) + '.jpg')
        if os.path.exists(mpathjoin):
            pass
        else:
            os.mkdir(mpathjoin)
        with open(mfilejoin,'wb') as fw:
            # 下载操作
            fw.write(response.content)
            print(url_img,mfilejoin,'已下载')
            a += 1
    list2 = list