selenium学习笔记

最新推荐文章于 2024-07-18 22:08:30 发布

fresh_nam

最新推荐文章于 2024-07-18 22:08:30 发布

阅读量223

点赞数

分类专栏：爬虫 python 文章标签： selenium python

本文链接：https://blog.csdn.net/fresh_nam/article/details/124489140

版权

python 同时被 2 个专栏收录

27 篇文章 6 订阅

订阅专栏

爬虫

2 篇文章 0 订阅

订阅专栏

文章目录

前言
一、环境
二、学习
- 1.安装
- 2.使用

前言

在学习爬虫的时候，接触到了selenium，感觉很有趣，所以做些笔记方便后面使用。

一、环境

python 3.7
selenium 3.141.0

二、学习

1.安装

使用pip安装selenium

pip install selenium

使用selenium的时候需要对应浏览器的driver，我用的是谷歌浏览器，所以要下载谷歌浏览器的driver，我的driver已经放到了百度网盘https://pan.baidu.com/s/1KNtrd1lyUOCe323WtWS9Nw ，提取码：yxw8

2.使用

创建一个test.py来做各种测试，先使用selenium打开网站https://pic.netbian.com/4kmeinv/index.html
test.py:

from selenium import webdriver

# 实例化一个浏览器对象（传入浏览器驱动），executable_path指的是driver路径
web = webdriver.Chrome('./chromedriver.exe')

web.get('https://pic.netbian.com/4kmeinv/index.html')

你会看到打开一个新的浏览器端口，并且访问对应网站
在这里插入图片描述
要获取打开网页的源码可以通过下面代码进行：

print(web.page_source)

可以看到控制台输出了网页的html代码
在这里插入图片描述
selenium的强大之处在于，它能够模拟人对网页进行操作。下面的的代码将执行的操作：打开网页，定位到搜索框，往搜索框里面输入‘刘亦菲’，再定位到搜索按钮，点击搜索按钮，最后等待五秒之后，关闭浏览器

from selenium import webdriver
from time import sleep

# 实例化一个浏览器对象
web = webdriver.Chrome('./chromedriver.exe')

web.get('https://pic.netbian.com/4kmeinv/index.html')

# print(web.page_source)

# 获取搜索框对象
search_input = web.find_element_by_css_selector('.search p input')

# 往输入框输入内容
search_input.send_keys('刘亦菲')

# 获取搜索按钮对象
search_button = web.find_element_by_css_selector('.search .sub')

# 点击按钮
search_button.click()

# 休眠5秒
sleep(5)

# 关闭浏览器
web.close()

selenium如何定位页面元素，网上很多教程，这里就不多说了，就解释一下search_input = web.find_element_by_css_selector(‘.search p input’)，这句代码吧。
在这里插入图片描述
find_element_by_css_selector顾名思义，根据css选择器查找页面元素。如上图，绿色框出来的是搜索框，因为它是包含在class为search的div里面，所以’.search’可以找到它的父元素，又因为搜索框是在’.search’下的p标签里面，所以完整写法web.find_element_by_css_selector(‘.search p input’)。

来看看结果：
在这里插入图片描述

然而，在实际爬取数据的过程中，我们会爬取很多页面的数据，使用selenium的请求网页的时候每一个页面都会打开一个窗口，而且浏览器也会检测到正在被selenium控制。
但是不用担心，我们的前辈早就为我们解决了以上问题，添加如下代码即可

# 实现无可视化界面的操作
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
# 实现规避检测
options.add_experimental_option('excludeSwitches', ['enable-automation'])

# 实例化一个浏览器对象
web = webdriver.Chrome('./chromedriver.exe', options=options)

你会发现，再次使用selenium访问网站是不会新建浏览器端口了，也能打印对应的网页源码。

ps：附上完整代码
test.py

from selenium import webdriver
from time import sleep

# 实现无可视化界面的操作和规避检测
from selenium.webdriver.chrome.options import Options

# 实现无可视化界面的操作
options = Options()
options.add_argument('--headless')
options.add_argument('--disable-gpu')
# 实现规避检测
options.add_experimental_option('excludeSwitches', ['enable-automation'])

# 实例化一个浏览器对象（传入浏览器驱动），executable_path指的是driver路径
web = webdriver.Chrome('./chromedriver.exe', options=options)

web.get('https://pic.netbian.com/4kmeinv/index.html')

print(web.page_source)

# 获取搜索框对象
search_input = web.find_element_by_css_selector('.search p input')

# 往输入框输入内容
search_input.send_keys('刘亦菲')

# 获取搜索按钮对象
search_button = web.find_element_by_css_selector('.search .sub')

# 点击按钮
search_button.click()

# 休眠5秒
sleep(5)

# 关闭浏览器
web.close()

有什么问题欢迎在评论区留言。

fresh_nam

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
selenium学习笔记

文章目录前言一、环境二、学习1.安装2.使用前言在学习爬虫的时候，接触到了selenium，感觉很有趣，所以做些笔记方便后面使用。一、环境python 3.7selenium 3.141.0二、学习1.安装使用pip安装seleniumpip install selenium使用selenium的时候需要对应浏览器的driver，我用的是谷歌浏览器，所以要下载谷歌浏览器的driver，我的driver已经放到了百度网盘https://pan.baidu.com/s/1KNtrd1ly
复制链接

扫一扫

专栏目录