selenium 解析网页_用Selenium进行网页搜刮

selenium 解析网页

网页抓取系列 (WEB SCRAPING SERIES)

总览 (Overview)

Selenium is a portable framework for testing web applications. It is open-source software released under the Apache License 2.0 that runs on Windows, Linux and macOS. Despite serving its major purpose, Selenium is also used as a web scraping tool. Without delving into the components of Selenium, we shall focus on a single component that is useful for web scraping, WebDriver. Selenium WebDriver provides us with an ability to control a web browser through a programming interface to create and execute test cases.

Selenium是用于测试Web应用程序的可移植框架。 它是在Windows,Linux和macOS上运行的Apache许可2.0下发行的开源软件。 尽管Selenium具有主要用途,但它也被用作网络抓取工具。 在不深入研究Selenium的组件的情况下,我们将只关注对Web抓取有用的单个组件WebDriver 。 Selenium WebDriver使我们能够通过编程界面控制Web浏览器以创建和执行测试用例。

In our case, we shall be using it for scraping data from websites. Selenium comes in handy when websites display content dynamically i.e. use JavaScripts to render content. Even though Scrapy is a powerful web scraping framework, it becomes useless with these dynamic websites. My goal for this tutorial is to make you familiarize with Selenium and carry out some basic web scraping using it.

就我们而言,我们将使用它来从网站上抓取数据。 当网站动态显示内容(即使用JavaScript呈现内容)时,Selenium会派上用场。 尽管Scrapy是强大的Web抓取框架,但对于这些动态网站而言,它就变得毫无用处。 本教程的目的是使您熟悉Selenium并使用它进行一些基本的Web抓取。

Let us start by installing selenium and a webdriver. WebDrivers support 7 Programming Languages: Python, Java, C#, Ruby, PHP, .Net and Perl. The examples in this manual are with Python language. There are tutorials available on the internet with other languages.

让我们从安装selenium和一个webdriver开始。 WebDrivers支持7种编程语言:Python,Java,C#,Ruby,PHP,.Net和Perl。 本手册中的示例均使用Python语言。 互联网上有其他语言的教程。

This is the third part of a 4 part tutorial series on web scraping using Scrapy and Selenium. The other parts can be found at

这是有关使用Scrapy和Selenium进行Web抓取的4部分教程系列的第三部分。 其他部分可以在找到

Part 1: Web scraping with Scrapy: Theoretical Understanding

第1部分:使用Scrapy进行Web抓取:理论理解

Part 2: Web scraping with Scrapy: Practical Understanding

第2部分:使用Scrapy进行Web爬取:实践理解

Part 4: Web scraping with Selenium & Scrapy

第4部分:使用Selenium和Scrapy进行Web抓取

安装Selenium和WebDriver (Installing Selenium and WebDriver)

安装Selenium (Installing Selenium)

Installing Selenium on any Linux OS is easy. Just execute the following command in a terminal and Selenium would be installed automatically.

在任何Linux操作系统上安装Selenium都很容易。 只需在终端中执行以下命令,即可自动安装Selenium。

pip install selenium

安装WebDriver (Installing WebDriver)

Selenium officially has WebDrivers for 5 Web Browsers. Here, we shall see the installation of WebDriver for two of the most widely used browsers: Chrome and Firefox.

Selenium正式具有用于5个Web浏览器的 WebDrivers 在这里,我们将看到为两种最广泛使用的浏览器安装了WebDriver:Chrome和Firefox。

安装适用于Chrome的Chromedriver (Installing Chromedriver for Chrome)

First, we need to download the latest stable version of chromedriver from Chrome’s official site. It would be a zip file. All we need to do is extract it and put it in the executable path.

首先,我们需要从Chrome的官方网站下载最新的稳定版chromedriver 这将是一个zip文件。 我们需要做的就是提取它并将其放在可执行文件路径中。

wget https://chromedriver.storage.googleapis.com/83.0.4103.39/chromedriver_linux64.zipunzip chromedriver_linux64.zipsudo mv chromedriver /usr/local/bin/

为Firefox安装Geckodriver (Installing Geckodriver for Firefox)

Installing geckodriver for Firefox is even simpler since it is maintained by Firefox itself. All we need to do is execute the following line in a terminal and you are ready to play around with selenium and geckodriver.

为Firefox安装geckodriver更加简单,因为它是由Firefox自己维护的。 我们需要做的就是在终端中执行以下行,您可以使用Selenium和geckodriver。

sudo apt install firefox-geckodriver

例子 (Examples)

There are two examples with increasing levels of complexity. First one would be a simpler webpage opening and typing into textboxes and pressing key(s). This example is to showcase how a webpage can be controlled through Selenium using a program. The second one would be a more complex web scraping example involving mouse scrolling, mouse button clicks and navigating to other pages. The goal here is to make you feel confident to start web scraping with Selenium.

有两个示例的复杂性不断提高。 第一个将是一个更简单的网页,打开并在文本框中键入内容并按键。 这个例子展示了如何使用程序通过Selenium控制网页。 第二个是更复杂的Web抓取示例,其中涉及鼠标滚动,鼠标按钮单击以及导航到其他

  • 0
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值