【爬虫】selenium-python 安装和入门

最新推荐文章于 2024-11-06 07:26:29 发布

栗子ma

最新推荐文章于 2024-11-06 07:26:29 发布

阅读量389

点赞数

分类专栏：爬虫 Python Selenium 文章标签：爬虫 Python Selenium

爬虫同时被 3 个专栏收录

14 篇文章 0 订阅

订阅专栏

Python

14 篇文章 1 订阅

订阅专栏

Selenium

4 篇文章 0 订阅

订阅专栏

【原文链接】http://selenium-python.readthedocs.io/installation.html

【原文链接】http://selenium-python.readthedocs.io/getting-started.html

1. Installation

1.1. Introduction

Selenium Python bindings provides a simple API to write 功能/acceptance 测试 using Selenium WebDriver. Through Selenium Python API you can access all 功能 of Selenium WebDriver in an intuitive way.

Selenium Python bindings provide a convenient API to access Selenium WebDrivers like Firefox, Ie, Chrome, Remote etc. The current supported Python versions are 2.7, 3.5 and above.

This documentation explains Selenium 2 WebDriver API. Selenium 1 / Selenium RC API is not covered here. (我的版本是Selenium 3.12)

1.2. Downloading Python bindings for Selenium

You can download Python bindings for Selenium from the PyPI page for selenium package. However, a better approach would be to use pip to install the selenium package. Python 3.6 has pip available in the standard library. Using pip, you can install selenium like this:

pip install selenium

You may consider using virtualenv to create isolated Python environments. Python 3.6 has pyvenv which is almost same as virtualenv.

1.3. Drivers

Selenium requires a driver to interface with the chosen browser. Firefox, for example, requires geckodriver, which needs to be installed before the below examples can be run. Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin.

Failure to observe this step will give you an error selenium.common.exceptions.WebDriverException: Message: ‘geckodriver’ executable needs to be in PATH.

Other supported browsers will have their own drivers available. Links to some of the more popular browser drivers follow.

Chrome:	https://sites.google.com/a/chromium.org/chromedriver/downloads
Edge:	https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Firefox:	https://github.com/mozilla/geckodriver/releases
Safari:	https://webkit.org/blog/6900/webdriver-support-in-safari-10/

2. Getting Started

2.1. Simple Usage

使用Spyder创建一个新工程Selenium

If you have installed Selenium Python bindings, you can start using it from Python like this.

# -*- coding: utf-8 -*-
"""
Created on Mon Jul 23 11:37:46 2018

@author: Administrator
"""

from selenium import webdriver
#The Keys 类提供了键盘上面的所有按键，比如：RETURN, F1, ALT etc
from selenium.webdriver.common.keys import Keys

#the instance of Firefox WebDriver is created
#如果geckodriver的位置已经加入环境变量，使用：
#driver = webdriver.Firefox()
#如果geckodriver的位置尚未加入环境变量，使用：
driver = webdriver.Firefox(executable_path='E:\software\python\geckodriver-v0.21.0-win64\geckodriver.exe') 
#The driver.get method 会按照给出的URL导航至该页面
driver.get("http://www.python.org")
#assert condition：to test that condition, and trigger an error if the condition is false
#print(driver.title) #Welcome to Python.org
assert "Python" in driver.title #True
elem = driver.find_element_by_name("q")
#首先清空输入 field 中所有 pre-populated text (e.g. “Search”) so it doesn’t affect our search results
elem.clear()
#发送按键, this is 类似于使用键盘 entering keys
elem.send_keys("pycon") #确实有<li class="tier-2 element-4" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source #True
driver.close()

The above script can be saved into a file (eg:- python_org_search.py), then it can be run like this:

python python_org_search.py

The python which you are running should have the selenium module installed.

或者也可以点击Spyder运行按钮，会弹出火狐浏览器：

2.2. Example Explained

The selenium.webdriver module provides all the WebDriver implementations. Currently supported WebDriver implementations are Firefox, Chrome, IE and Remote. The Keys 类提供了键盘上面的所有按键，比如：RETURN, F1, ALT etc.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

Next, the instance of Firefox WebDriver is created.

driver = webdriver.Firefox()

The driver.get method 会按照给出的URL导航至该页面. WebDriver 会一直等待，直到页面完全被加载 (也就是说, the “onload” event has fired (onload事件被触发，onload事件通常会在页面或图像加载完成后立即发生)), 然后将 control 交还给你的测试或脚本. It’s worth noting that 如果你的页面 on load (加载) 时使用了很多AJAX, then WebDriver may not know when it has completely loaded.:

driver.get("http://www.python.org")

The next line is an assertion to confirm that title has “Python” word in it (assert condition：to test that condition, and trigger an error if the condition is false):

assert "Python" in driver.title

WebDriver 提供了一系列名字为 find_element_by_* 的方法来发现元素. For example, 输入文本元素可以通过 name 属性被找到，hence 可以使用 find_element_by_name 方法. A detailed explanation of finding elements is available in the Locating Elements chapter:

elem = driver.find_element_by_name("q")

Next, we 发送按键, this is 类似于使用键盘 entering keys. 可以使用 Keys 类 imported from selenium.webdriver.common.keys 发送 Special keys. 为了安全起见，首先清空输入 field 中所有 pre-populated text (e.g. “Search”) so it doesn’t affect our search results:

elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)

After submission of the page, you should get the result if there is any. To ensure that some results are found, make an assertion:

assert "No results found." not in driver.page_source

Finally, the browser window is closed. You can also call quit method instead of close. The quit will exit entire browser whereas close will close one tab, but if just one tab was open, by default most browser will exit entirely.:

driver.close()

2.3. Using Selenium to write tests

Selenium is mostly used for writing test cases. The selenium package itself doesn’t provide a testing tool/framework. You can write test cases using Python’s unittest module. The other options for a tool/framework are py.test and nose.

In this chapter, we use unittest as the framework of choice. Here is the modified example which uses unittest module. This is a test for python.org 搜索功能:

import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

class PythonOrgSearch(unittest.TestCase):

    def setUp(self):
        self.driver = webdriver.Firefox(executable_path='E:\software\python\geckodriver-v0.21.0-win64\geckodriver.exe')

    def test_search_in_python_org(self):
        driver = self.driver
        driver.get("http://www.python.org")
        self.assertIn("Python", driver.title)
        elem = driver.find_element_by_name("q")
        elem.send_keys("pycon")
        elem.send_keys(Keys.RETURN)
        assert "No results found." not in driver.page_source

    def tearDown(self):
        self.driver.close()

if __name__ == "__main__":
    unittest.main()

You can run the above test case from a shell like this:

python test_python_org_search.py
.
----------------------------------------------------------------------
Ran 1 test in 15.566s

OK

The above result shows that the test has been successfully completed.

2.4. Walk through of the example

Initially, all the basic modules required are imported. The unittest module is a built-in Python based on Java’s JUnit. This module provides the framework for organizing the test cases. The selenium.webdriver 模块提供了所有 WebDriver 的实现. Currently supported WebDriver 实现有 Firefox, Chrome, Ie and Remote. The Keys class provide keys in the keyboard like RETURN, F1, ALT etc.

import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

The test case class is inherited from unittest.TestCase. Inheriting from TestCase class is the way to tell unittest module that this is a test case:

class PythonOrgSearch(unittest.TestCase):

The setUp is part of initialization, this method will get called before every test function which you are going to write in this test case class. Here you are creating the instance of Firefox WebDriver.

def setUp(self):
    self.driver = webdriver.Firefox()

This is the test case method. The 测试样例代码 should always start with characters test. The first line inside this method create a local reference to the driver object created in setUp method.

def test_search_in_python_org(self):
    driver = self.driver

The driver.get method will navigate to a page given by the URL. WebDriver will wait until the page has fully loaded (that is, the “onload” event has fired) before returning control to your test or script. It’s worth noting that if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded.:

driver.get("http://www.python.org")

The next line is an assertion to confirm that title has “Python” word in it:

self.assertIn("Python", driver.title)

WebDriver offers a number of ways to find elements using one of the find_element_by_* methods. For example, the input text element can be located by its name attribute using find_element_by_name method. Detailed explanation of finding elements is available in theLocating Elements chapter:

elem = driver.find_element_by_name("q")

Next, we are sending keys, this is similar to entering keys using your keyboard. Special keys can be send using Keys class imported from selenium.webdriver.common.keys:

elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)

After submission of the page, you should get the 结果 as per search if there is any. 为了确定确实 some results are found, make an assertion:

assert "No results found." not in driver.page_source

The tearDown method will get called after every test method. This is a place to do all cleanup actions. In the current method, the browser window is closed. You can also call quit method instead of close. The quit will exit the entire browser, whereas close will close a tab, but if it is the only tab opened, by default most browser will exit entirely.:

def tearDown(self):
    self.driver.close()

Final lines are some boiler plate code (sections of code that have to be included in many places with little or no alteration. It is often used when referring to languages that are considered verbose, i.e. the programmer must write a lot of code to do minimal jobs.) to run the test suite:

if __name__ == "__main__":
    unittest.main()