(一)selenium基础
Selenium是一个网站应用程序自动化的工具。它支持的浏览器包括Firefox、Chrome、IE等主流浏览器,同时支持java、python、ruby等编程语言。
1.selenium结构
selenium 1.0 :主要由Selenium IDE、Selenium Grid和Selenium RC组成。
selenium 2.0 : 在1.0版本的基础上结合了WebDriver。
selenium 3.0 :待更
2.selenium安装
安装seleium库可以使用pip指令(没有的话需先安装)安装。
pip install selenium
3.浏览器驱动安装
- Firefox浏览器驱动安装
- firefox 驱动下载: 点击下载webdrive
- 手动创建一个存放浏览器目录的文件,并将路径添加到环境变量(path)中
- Google Chrome浏览器驱动安装
1.查看chrome当前的版本信息(帮助->关于google chrome)
2.chrome 驱动下载: 点击下载webdrive
3.查找与chrome版本对应的webdrive版本
4.webdrive可以直接将其放在python安装目录下
注意:把浏览器的自动更新关掉,不然会出现版本不匹配问题
4.测试是否可用
from selenium import webdriver
browser = webdriver.Chrome()
browser.get("http://www.baidu.com/")
注意:如果采用上述方法不能启动浏览器,则可能是driver路径不对,请手动添加driver对应的路径,如下所示(firefox同理)
from selenium import webdriver
#参数executable_path是设置chromedriver的路径
path = "D:\\chromedrive\\chromedriver.exe"
browser = webdriver.Chrome(executable_path=path)
browser.get("http://www.baidu.com/")
启动后的效果:
(二)selenium快速入门
先修知识
- HTML
- CSS
- JavaScript
selenium定位元素
Selenium定位网页元素主要通过元素的属性值或者元素在HTML里的路径位置,定位方式一共有八种。
1.通过属性id和name来实现定位
find_element_by_id()
find_element_by_name()
注意:如果一个元素没有id和name属性,则无法使用;
~~~~~~~~~~
如果多个元素有相同的id或name,则只能定位第一个元素
2.通过HTML标签类型和属性class实现定位
find_element_by_tag_name()
find_element_by_class_name()
注意:当有多个时只能定位第一个符合条件的元素
3.通过标签值实现定位,partial_link用于实现模糊匹配
find_element_by_link_text()
find_element_by_partial_link_text()
注意:同样不唯一时,只定位第一个符合条件的元素
4.元素路径定位选择器
find_element_by_xpath()
find_element_by_css_selector()
注意:这种方式定位准确,每个标签的路径都是唯一的。
Tips:通过Google chrome可以快速获取xpath和css_selector的语法,在Elements标签页中,找到查找元素的位置,右击选择“copy”,选择“copy xpath”或“copy selector”.
两种定位的语法可以参照相关资料,在爬虫中用的还挺多。
5.Other
以上八种定位方式只能定位到第一个,如果有多个相同的元素想全部获取,使用下面这八个:
find_elements_by_id()
find_elements_by_name()
find_elements_by_tag_name()
find_elements_by_class_name()
find_elements_by_link_text()
find_elements_by_partial_link_text()
find_elements_by_xpath()
find_elements_by_css_selector()
selenium操作元素
selenium可以模拟任何操作,比如单击、右击、拖拉、滚动、复制粘贴或者文本输入等,总之功能非常强大。操作分为三大类:
1.常规操作
包括:文本清除、文本输入、单击元素、提交表单、获取元素值等。
2.鼠标事件操作
看源码还是好处多多的,一下是源码的一段话:(路径:\Lib\site-packages\selenium\webdriver\common\action_chains.py)
Generate user actions.
When you call methods for actions on the ActionChains object,
the actions are stored in a queue in the ActionChains object.
When you call perform(), the events are fired in the order they
are queued up
上面的这段话的大概意思是,当调用ActionChains对象的方法时,相应的方法会被存储到一个队列中去,当你调用perform()方法时,会依次调用队列中存储的方法执行。
ActionChains定义的类方法:
def perform(self):
"""
Performs all stored actions.
"""
def reset_actions(self):
"""
Clears actions that are already stored locally and on the remote end
"""
def click(self, on_element=None):
"""
Clicks an element.
:Args:
- on_element: The element to click.
If None, clicks on current mouse position.
"""
def click_and_hold(self, on_element=None):
"""
Holds down the left mouse button on an element.
:Args:
- on_element: The element to mouse down.
If None, clicks on current mouse position.
"""
def context_click(self, on_element=None):
"""
Performs a context-click (right click) on an element.
:Args:
- on_element: The element to context-click.
If None, clicks on current mouse position.
"""
def double_click(self, on_element=None):
"""
Double-clicks an element.
:Args:
- on_element: The element to double-click.
If None, clicks on current mouse position.
"""
def drag_and_drop(self, source, target):
def drag_and_drop_by_offset(self, source, xoffset, yoffset):
def key_down(self, value, element=None):
"""
Sends a key press only, without releasing it.
Should only be used with modifier keys (Control, Alt and Shift).
:Args:
- value: The modifier key to send. Values are defined in `Keys` class.
- element: The element to send keys.
If None, sends a key to current focused element.
Example, pressing ctrl+c::
ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
"""
def key_up(self, value, element=None):
"""
Releases a modifier key.
:Args:
- value: The modifier key to send. Values are defined in Keys class.
- element: The element to send keys.
If None, sends a key to current focused element.
Example, pressing ctrl+c::
ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
"""
def move_by_offset(self, xoffset, yoffset):
def move_to_element(self, to_element)
def move_to_element_with_offset(self, to_element, xoffset, yoffset):
def pause(self, seconds):
""" Pause all inputs for the specified duration in seconds """
def release(self, on_element=None)
3.键盘事件操作
需要导入keys类
from selenium.webdrive.common.keys import Keys
常用的类方法:
def send_keys(self, *keys_to_send):
"""
Sends keys to current focused element.
:Args:
- keys_to_send: The keys to send. Modifier keys constants can be found in the
'Keys' class.
"""
def send_keys_to_element(self, element, *keys_to_send):
def key_down(self, value, element=None):
"""
Sends a key press only, without releasing it.
Should only be used with modifier keys (Control, Alt and Shift).
:Args:
- value: The modifier key to send. Values are defined in `Keys` class.
- element: The element to send keys.
If None, sends a key to current focused element.
Example, pressing ctrl+c::
ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
"""
def key_up(self, value, element=None):
"""
Releases a modifier key.
:Args:
- value: The modifier key to send. Values are defined in Keys class.
- element: The element to send keys.
If None, sends a key to current focused element.
Example, pressing ctrl+c::
ActionChains(driver).key_down(Keys.CONTROL).send_keys('c').key_up(Keys.CONTROL).perform()
"""