selenium 学习笔记

最新推荐文章于 2024-07-20 20:25:01 发布

三三两两2

最新推荐文章于 2024-07-20 20:25:01 发布

阅读量407

点赞数

分类专栏：爬虫练习爬虫学习笔记文章标签： selenium

本文链接：https://blog.csdn.net/luanyongli/article/details/81302848

版权

爬虫学习笔记同时被 2 个专栏收录

8 篇文章 0 订阅

订阅专栏

爬虫练习

3 篇文章 0 订阅

订阅专栏

安装与配置：

直接pip install selenium 即可，有时候会报错，多尝试一下即可

使用selenium 还需要安装配置一下浏览器的驱动，我使用的是chrome浏览器，最先chrome和chromedriver 对应关系：

https://blog.csdn.net/yoyocat915/article/details/80580066

国内镜像对应的下载网址：

http://npm.taobao.org/mirrors/chromedriver

可能存在的坑：

2.4 与2.40 不是一个版本。.后面的数字越大，版本越新

-------------------------------------------------------------------------------------------------------------

下载后解压至得到的是一个chromedriver.exe文件
将chromedriver.exe拷贝至谷歌浏览器目录（如 C:\Program Files\Google\Chrome\Application）

将谷歌浏览器环境变量添加到path（C:\Users\XXXAppData\Local\Google\Chrome\Application）

这时候有些机器还是回报错，“”chromedriver' executable needs to be in PATH“”

可以通过把chromedriver.exe拷贝到python的根目录下一份就可以解决了。我的机器路径是：

C:\Users\lxxxx\AppData\Local\Programs\Python\Python36

使用：

一直在参照这个教程基本上能够解决各类问题：

http://www.testclass.net/selenium_python

使用时的一些经验与问题等：

1.使用代理IP：

ip是为“182.90.80.137:8123”相同格式的字符串

火狐浏览器：

ip_ip = ip.split(":")[0]
ip_port = int(ip.split(":")[1])
print(ip_ip)
print(ip_port)
random_header = random.choice(HEADERS)
webdriver.DesiredCapabilities.FIREFOX['firefox.page.settings.userAgent'] = random_header
profile = webdriver.FirefoxProfile()
profile.set_preference('network.proxy.type', 1)  # 默认值0，就是直接连接；1就是手工配置代理。
profile.set_preference('network.proxy.http', ip_ip)
profile.set_preference('network.proxy.http_port', ip_port)
profile.set_preference('network.proxy.ssl', ip_ip)
profile.set_preference('network.proxy.ssl_port', ip_port)
profile.update_preferences()
driver = webdriver.Firefox(profile)

谷歌浏览器：

chromedriver = 'C:/Program Files (x86)/Google/Chrome/Application/chromedriver.exe' #配置了驱动，没这么繁琐
chome_options = webdriver.ChromeOptions()
chome_options.add_argument(('--proxy-server=http://' + ip))
os.environ["webdriver.chrome.driver"] = chromedriver
driver = webdriver.Chrome(chromedriver, chrome_options=chome_options)

2.Chrome 无头浏览器

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument('--headless')   #简单的声明即可
driver = webdriver.Chrome(chrome_options=chrome_options)

3.定位不到元素时解决方法

有时候我们定位不到元素还有其它的原因，下面说明几种：
1.Frame/Iframe原因定位不到元素：
2.Xpath描述错误原因：
解决办法：编写好Xpath路径，chrome的F12->html，ctrl+F进行查找，看是否能查找到。

3.页面还没有加载出来，就对页面上的元素进行的操作：
解决办法：导入time模块设置等待时间。
4.动态id定位不到元素
解决办法：如果是动态的id，最好不要使用，转而使用xpath或其它方式定位
5.二次定位，如弹出框登录
解决办法：先定位到弹出框，再定位到弹出框内的元素。相当于Frame/Iframe原因
6.不可见元素定位
如上百度登录代码，通过名称为tj_login查找的登录元素，有些是不可见的，所以加一个循环判断，找到可见元素（is_displayed()）点击登录即可。