【爬虫】selenium-python 安装和入门

14 篇文章 1 订阅
4 篇文章 0 订阅

【原文链接】http://selenium-python.readthedocs.io/installation.html

【原文链接】http://selenium-python.readthedocs.io/getting-started.html

 

1. Installation

1.1. Introduction

Selenium Python bindings provides a simple API to write 功能/acceptance 测试 using Selenium WebDriver. Through Selenium Python API you can access all 功能 of Selenium WebDriver in an intuitive way.

Selenium Python bindings provide a convenient API to access Selenium WebDrivers like Firefox, Ie, Chrome, Remote etc. The current supported Python versions are 2.7, 3.5 and above.

This documentation explains Selenium 2 WebDriver API. Selenium 1 / Selenium RC API is not covered here. (我的版本是Selenium 3.12)

1.2. Downloading Python bindings for Selenium

You can download Python bindings for Selenium from the PyPI page for selenium package. However, a better approach would be to use pip to install the selenium package. Python 3.6 has pip available in the standard library. Using pip, you can install selenium like this:

pip install selenium

You may consider using virtualenv to create isolated Python environments. Python 3.6 has pyvenv which is almost same as virtualenv.

1.3. Drivers

Selenium requires a driver to interface with the chosen browser. Firefox, for example, requires geckodriver, which needs to be installed before the below examples can be run. Make sure it’s in your PATH, e. g., place it in /usr/bin or /usr/local/bin.

Failure to observe this step will give you an error selenium.common.exceptions.WebDriverException: Message: ‘geckodriver’ executable needs to be in PATH.

Other supported browsers will have their own drivers available. Links to some of the more popular browser drivers follow.

Chrome:https://sites.google.com/a/chromium.org/chromedriver/downloads
Edge:https://developer.microsoft.com/en-us/microsoft-edge/tools/webdriver/
Firefox:https://github.com/mozilla/geckodriver/releases
Safari:https://webkit.org/blog/6900/webdriver-support-in-safari-10/

2. Getting Started

2.1. Simple Usage

使用Spyder创建一个新工程Selenium

If you have installed Selenium Python bindings, you can start using it from Python like this.

# -*- coding: utf-8 -*-
"""
Created on Mon Jul 23 11:37:46 2018

@author: Administrator
"""

from selenium import webdriver
#The Keys 类提供了键盘上面的所有按键,比如:RETURN, F1, ALT etc
from selenium.webdriver.common.keys import Keys

#the instance of Firefox WebDriver is created
#如果geckodriver的位置已经加入环境变量,使用:
#driver = webdriver.Firefox()
#如果geckodriver的位置尚未加入环境变量,使用:
driver = webdriver.Firefox(executable_path='E:\software\python\geckodriver-v0.21.0-win64\geckodriver.exe') 
#The driver.get method 会按照给出的URL导航至该页面
driver.get("http://www.python.org")
#assert condition:to test that condition, and trigger an error if the condition is false
#print(driver.title) #Welcome to Python.org
assert "Python" in driver.title #True
elem = driver.find_element_by_name("q")
#首先清空输入 field 中所有 pre-populated text (e.g. “Search”) so it doesn’t affect our search results
elem.clear()
#发送按键, this is 类似于使用键盘 entering keys
elem.send_keys("pycon") #确实有<li class="tier-2 element-4" role="treeitem"><a href="http://pycon.blogspot.com/" title="PyCon Blog">PyCon News</a></li>
elem.send_keys(Keys.RETURN)
assert "No results found." not in driver.page_source #True
driver.close()

The above script can be saved into a file (eg:- python_org_search.py), then it can be run like this:

python python_org_search.py

The python which you are running should have the selenium module installed.

或者也可以点击Spyder运行按钮,会弹出火狐浏览器:

2.2. Example Explained

The selenium.webdriver module provides all the WebDriver implementations. Currently supported WebDriver implementations are Firefox, Chrome, IE and Remote. The Keys 类提供了键盘上面的所有按键,比如:RETURN, F1, ALT etc.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

Next, the instance of Firefox WebDriver is created.

driver = webdriver.Firefox()

The driver.get method 会按照给出的URL导航至该页面. WebDriver 会一直等待,直到页面完全被加载 (也就是说, the “onload” event has fired (onload事件被触发,onload事件通常会在页面或图像加载完成后立即发生)), 然后将 control 交还给你的测试或脚本. It’s worth noting that 如果你的页面 on load (加载) 时使用了很多AJAX, then WebDriver may not know when it has completely loaded.:

driver.get("http://www.python.org")

The next line is an assertion to confirm that title has “Python” word in it (assert condition:to test that condition, and trigger an error if the condition is false):

assert "Python" in driver.title

WebDriver 提供了一系列名字为 find_element_by_* 的方法来发现元素. For example, 输入文本元素可以通过 name 属性被找到,hence 可以使用 find_element_by_name 方法. A detailed explanation of finding elements is available in the Locating Elements chapter:

elem = driver.find_element_by_name("q")

Next, we 发送按键, this is 类似于使用键盘 entering keys. 可以使用 Keys 类 imported from selenium.webdriver.common.keys 发送 Special keys. 为了安全起见,首先清空输入 field 中所有 pre-populated text (e.g. “Search”) so it doesn’t affect our search results:

elem.clear()
elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)

After submission of the page, you should get the result if there is any. To ensure that some results are found, make an assertion:

assert "No results found." not in driver.page_source

Finally, the browser window is closed. You can also call quit method instead of close. The quit will exit entire browser whereas close will close one tab, but if just one tab was open, by default most browser will exit entirely.:

driver.close()

2.3. Using Selenium to write tests

Selenium is mostly used for writing test cases. The selenium package itself doesn’t provide a testing tool/framework. You can write test cases using Python’s unittest module. The other options for a tool/framework are py.test and nose.

In this chapter, we use unittest as the framework of choice. Here is the modified example which uses unittest module. This is a test for python.org 搜索功能:

import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

class PythonOrgSearch(unittest.TestCase):

    def setUp(self):
        self.driver = webdriver.Firefox(executable_path='E:\software\python\geckodriver-v0.21.0-win64\geckodriver.exe')

    def test_search_in_python_org(self):
        driver = self.driver
        driver.get("http://www.python.org")
        self.assertIn("Python", driver.title)
        elem = driver.find_element_by_name("q")
        elem.send_keys("pycon")
        elem.send_keys(Keys.RETURN)
        assert "No results found." not in driver.page_source

    def tearDown(self):
        self.driver.close()

if __name__ == "__main__":
    unittest.main()

You can run the above test case from a shell like this:

python test_python_org_search.py
.
----------------------------------------------------------------------
Ran 1 test in 15.566s

OK

The above result shows that the test has been successfully completed.

2.4. Walk through of the example

Initially, all the basic modules required are imported. The unittest module is a built-in Python based on Java’s JUnit. This module provides the framework for organizing the test cases. The selenium.webdriver 模块提供了所有 WebDriver 的实现. Currently supported WebDriver 实现有 Firefox, Chrome, Ie and Remote. The Keys class provide keys in the keyboard like RETURN, F1, ALT etc.

import unittest
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

The test case class is inherited from unittest.TestCase. Inheriting from TestCase class is the way to tell unittest module that this is a test case:

class PythonOrgSearch(unittest.TestCase):

The setUp is part of initialization, this method will get called before every test function which you are going to write in this test case class. Here you are creating the instance of Firefox WebDriver.

def setUp(self):
    self.driver = webdriver.Firefox()

This is the test case method. The 测试样例代码 should always start with characters test. The first line inside this method create a local reference to the driver object created in setUp method.

def test_search_in_python_org(self):
    driver = self.driver

The driver.get method will navigate to a page given by the URL. WebDriver will wait until the page has fully loaded (that is, the “onload” event has fired) before returning control to your test or script. It’s worth noting that if your page uses a lot of AJAX on load then WebDriver may not know when it has completely loaded.:

driver.get("http://www.python.org")

The next line is an assertion to confirm that title has “Python” word in it:

self.assertIn("Python", driver.title)

WebDriver offers a number of ways to find elements using one of the find_element_by_* methods. For example, the input text element can be located by its name attribute using find_element_by_name method. Detailed explanation of finding elements is available in theLocating Elements chapter:

elem = driver.find_element_by_name("q")

Next, we are sending keys, this is similar to entering keys using your keyboard. Special keys can be send using Keys class imported from selenium.webdriver.common.keys:

elem.send_keys("pycon")
elem.send_keys(Keys.RETURN)

After submission of the page, you should get the 结果 as per search if there is any. 为了确定确实 some results are found, make an assertion:

assert "No results found." not in driver.page_source

The tearDown method will get called after every test method. This is a place to do all cleanup actions. In the current method, the browser window is closed. You can also call quit method instead of close. The quit will exit the entire browser, whereas close will close a tab, but if it is the only tab opened, by default most browser will exit entirely.:

def tearDown(self):
    self.driver.close()

Final lines are some boiler plate code (sections of code that have to be included in many places with little or no alteration. It is often used when referring to languages that are considered verbose, i.e. the programmer must write a lot of code to do minimal jobs.) to run the test suite:

if __name__ == "__main__":
    unittest.main()
要学习selenium爬虫Python入门,你可以参考以下步骤和资源: 1. 首先,你需要掌握Python基础知识,包括语法、数据类型、变量、条件语句、循环和函数等。你可以参考[1]中提到的Python基础部分进行学习和练习。 2. 掌握Python的库和工具对于爬虫技术也非常重要。在学习selenium爬虫之前,你需要了解Urllib、requests等库的使用。你可以参考中提到的这些内容进行学习。 3. 学习解析技术也是爬虫中的关键一环。你可以学习XPath、JSONPath和beautiful等解析技术,以便从网页中提取所需的数据。同样,你可以参考中提到的相关部分进行学习。 4. 掌握selenium库的使用是进行Web自动化爬虫的关键。你可以通过学习selenium的API文档和示例代码来了解其基本用法。另外,你也可以参考中提到的selenium部分进行学习。 5. 最后,了解Scrapy框架是爬虫进阶的一步。Scrapy是一个强大的Python爬虫框架,可以帮助你更高效地编写和管理爬虫。你可以参考中提到的Scrapy部分进行学习。 总结起来,学习selenium爬虫Python入门可以通过以下步骤进行:掌握Python基础知识 -> 学习Urllib和requests库 -> 学习解析技术(如XPath、JSONPath和beautiful) -> 掌握selenium库的使用 -> 了解Scrapy框架。希望这些信息能对你有所帮助! 引用: : 本套视频教程适合想掌握爬虫技术的学习者,以企业主流版本Python 3.7来讲解,内容包括:Python基础、Urllib、解析(xpath、jsonpath、beautiful)、requests、selenium、Scrapy框架等。针对零基础的同学可以从头学起。<span class="em">1</span> #### 引用[.reference_title] - *1* [零基础Python爬虫入门到精通-视频教程网盘链接提取码下载 .txt](https://download.csdn.net/download/m0_66047725/81741433)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v93^chatsearchT3_2"}}] [.reference_item style="max-width: 100%"] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值