Python学习：使用selenium和requests爬取网页照片

小石_coding

已于 2022-12-26 10:46:10 修改

阅读量1.4k

点赞数

分类专栏： Python学习文章标签： python selenium find_elements

于 2022-12-25 21:47:44 首次发布

本文链接：https://blog.csdn.net/weixin_47468969/article/details/128438378

版权

Python学习专栏收录该内容

3 篇文章 0 订阅

订阅专栏

从网页爬取照片基本上分为以下几步：

1、使用requests请求网络地址；

2、使用headers进行伪装；

3、使用selenium模块中的find_elements对目标元素进行定位；

4、下载图片

5、保存图片

1、先把包给导进来

import requests
from selenium.webdriver import Chorme
import os

2、发送请求

def get_picture(url):
    #requests模块
    header={
        'user - agent': 'Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 108.0.0.0Safari / 537.36'
    }  #伪装，给网站一种正常访问的感觉，不觉得是爬虫在爬取
    repos=requests.get(url,headers=header)
    repos.encoding='utf-8'  #防止汉字出现乱码
    repos_content=repos.content
    #selenium模块
    web=Chrome()
    web.get(url)
    web.implicitly_wait(5)  #隐式等待，可以缓解网页加载不出来，可以给网页一个缓冲的时间

3、使用find_elements对目标位置进行定位

li=web.find_elements(By.CSS_SELECTOR,'body .Clbc_top .taotu-main>ul>li')

定位准备只需要三步，第一步选中所选的照片->Copy selector->ctrl+F->粘贴进去可以看到是1 of 99，这个就是我们想要的那个CSS selector值了

4、创建文件来保存我们所要的图片：

    if not os.path.exists('./唯美图片'):
        os.mkdir('./唯美图片')

然后再进行循环：

    for img in li:
        #这块需要注意，因为是相对路径，所以必须使用By_xpath
        src=img.find_element(By.XPATH,'./a/img').get_attribute('src')  
        #注意照片的格式，不能带逗号、空格等其他符号
        name=img.find_element(By.XPATH,'./a/img').get_attribute('alt').split('，')[0].split(' ')[0]+'.jpg'

5、保存

        #这里是requests和selenium的结合体，content可以将字符串转化为图片、视频等形式
        data=requests.get(url=src,headers=header).content  
        #print(data)
        path="唯美图片/"+name  #设定接收目录
        with open(path, 'wb') as f:  #保存
            f.write(data)
            print(name,'下载成功！！！')

    web.close()

6、因为以上包装成一个函数形式了，所以再加入主函数

if __name__=='__main__':
    url='https://www.umei.cc/bizhitupian/'
    get_picture(url)

7、输出结果

小石_coding

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
2
评论
Python学习：使用selenium和requests爬取网页照片

Python学习：使用selenium和requests爬取网页照片
复制链接

扫一扫

专栏目录