Selenium+PhantomJS使用出错以及解决方案

问题

在学习使用selenium+PhantomJS来爬取网页的时候,刚刚运行就出现了下面的报错信息:

UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '

大概意思就是说新版本的Selenium不再支持PhantomJS了,要求使用Chrome或者Firefox的无头版本来替代,估摸着后面所有的Selenium版本都不会再支持PhantomJS

解决办法

1,使用老版本的Selenium

通过  pip list 来查看自己下载的selenium是哪个版本的,使用  pip uninstall selenium 来卸载 , 然后指定安装2.48版本的selenium    pip install selenium==2.48  ,  这样的话再运行对应的代码应该是不会有问题的

2,使用无界面浏览器

这里我电脑只安装了Chrome浏览器,所有我用的是  selenium + Headless Chrome

前提条件:

       1,电脑需要安装Chrome浏览器

       2,本地需要有chromedriver驱动器文件,其实和PhantomJS一样,如果不配置环境变量的话需要手动指定executable_path参数。

示例代码

from selenium import webdriver
from selenium.webdriver.chrome.options import Options


def main():
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver = webdriver.Chrome(executable_path='chromedriver', chrome_options=chrome_options)
    driver.get("https://www.baidu.com")
    print(driver.page_source)
    driver.close()


if __name__ == '__main__':
    main()

上面代码需要注意的是,既然chromedriver的用法和PhantomJS差不多,那么只需要把下载好的chromedriver.exe放到你python的根目录下就可以使用了,上面代码的executable_path不添加也没有问题,chrome_options=chrome_options不指定的话运行代码则会出现一个新的Chrom窗口。

Headless Chrome 对 Chrome版本要求 ---->官方文档

官方文档中介绍,maclinux 环境要求chrome版本是59+, 而windows版本的chrome要求是60+

还有一个需要注意的是,不同版本的Chrome对应的chromedriver版本也是不同的, 我的Chrom版本是68.0的所以对应的chromedriver版本是2.41,其他版本的对应关系可以去官网查找下

参考:https://blog.csdn.net/u010358168/article/details/79749149

待续。。。

例:

根据从零开始学Python网络爬虫教学这本书的关于Selenium+PhantomJS的案例来改写的Selenium+headless Chrome,(毕竟新版本的Selenium已经不支持PhantomJS了)来爬取淘宝的数据,然后存到MongoDB数据库中。

# -*- encoding:utf8 -*-
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from lxml import etree
import pymongo
import time

client = pymongo.MongoClient("localhost", 27017)
mydb = client["mydb"]   # 新建mydb数据库
taobao = mydb["taobao"]     # 新建taobao数据集合

chrome_options = Options()
# 添加启动参数
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
# 添加了chrome_options 后则会不显示出Chrome窗口,没有添加的话运行会跳出Chrome窗口
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.maximize_window()


def get_info(url, page):
    page = page+1
    driver.get(url)
    selector = etree.HTML(driver.page_source)
    # 这里的xpath是找到class='item'的div标签, 然后找到他下面class='J_MouserOnverReq'的div,只有相同标签才这么写
    infos = selector.xpath("//div[@class='item J_MouserOnverReq  ']")
    # 这里获取的是搜索内容的 图片,价格,超链接,标题,店铺名
    for info in infos:
        data = info.xpath("div/div/a")[0]
        # 当遇到标签套标签的情况时,想要同时爬取文本内容,可以使用string(.)来获取
        title = data.xpath("string(.)").strip()
        price = info.xpath("div/div/div/strong/text()")[0]
        sell = info.xpath('div/div/div[@class="deal-cnt"]/text()')[0]
        shop = info.xpath('div[2]/div[3]/div[1]/a/span[2]/text()')[0]
        address = info.xpath('div[2]/div[3]/div[2]/text()')[0]
        commodity = {
            'title': title,
            'price': price,
            'sell': sell,
            'shop': shop,
            'address': address
        }
        taobao.insert_one(commodity)
    if page <= 10:
        NextPage(url, page)
    else:
        driver.close()


def NextPage(url, page):
    driver.get(url)
    driver.find_element_by_xpath("//a[@trace='srp_bottom_pagedown']").click()
    time.sleep(4)
    driver.get(driver.current_url)
    get_info(driver.current_url, page)      # driver.current_url可以获取当前页面的url


if __name__ == "__main__":
    page = 1
    driver.get("https://www.taobao.com")
    driver.implicitly_wait(10)      # 隐式等待10秒
    driver.find_element_by_id("q").clear()  # 清除id为q 输入框里面的内容
    driver.find_element_by_id("q").send_keys("小米8")    # 给输入框一个值
    driver.find_element_by_class_name("btn-search").click()     # 点击对应条件的标签
    get_info(driver.current_url, page)      # 给get_info函数当前页面URL和page

基本的注释也都有,唯一的改动就是只在主函数的入口写了一个隐式等待,其他的地方没有写,因为无意中好像看到有博客说瘾式等待只需要写一次就可以了,暂时没有验证,后续再看看吧

  • 1
    点赞
  • 4
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
v2.45.0 ======= * Supports native events for Firefox versions 31 (current ESR), and 24 (immediately previous ESR). Native event support has been discontinued for versions of Firefox later than 33. * Removed automatic installation of SafariDriver extention for .NET. From this point forward, users are expected to manually install the SafariDriver extension into their Safari installation in order to drive the browser. This is due to Apple's changes in architecture for Safari extensions. * Added initial implementation of .NET bindings OperaDriver. The .NET bindings will now support the Chromium-based Opera driver without requiring the use of the Java remote WebDriver server. This driver will work with Opera 26 and above, and requires the download of the Opera driver executable. Code cleanup and refactoring will take place under a separate commit. Note that there is still no support in the .NET bindings for the Presto-based Opera without using the remote server, nor is there likely to be. * Added option to not delete anonymous Firefox profile in .NET. This change adds an option to the .NET FirefoxProfile class so that the driver will not delete the anonymous profile created by the driver. Since the driver cannot and should not use an existing profile in situ because of the multiple instance automation case, this change means that modifications applied to the anonymous profile can be retained and used in future anonymous profiles. The implication is that the user can now make modifications to a profile, and retain those profile modifications (e.g., cookies) into other future profiles, simulating persistent changes over multiple browser launches. Fixes issue #7374. * Introduced type safe option in InternetExplorerOptions to set the capability to disable check of mime type of the doucment when setting cookies. When setting cookies, there is a check in the IE driver to validate that the page in the browser is, in fact, an HTML page. Despite the fact that omitting this check can cause unrecoverable crashes in the driver, there is demand for a mechanism to disable this check for older, legacy versions of Internet Explorer. Fixes issue #1227. v2.44.0 ======= * Supports native events for Firefox versions 33 (current), 32 (immediately previous release), 31 (current ESR), and 24 (immediately previous ESR). * Rolled back improper spec compliance for finding elements. * Fixed WebDriverBackedSelenium compatibility with IE5. Fixes issue #7938. v2.43.1 ======= * Point-release to correct version resources in .NET bindings assemblies. No functional changes. v2.43.0 ======= * Supports native events for Firefox versions 32 (current), 31 (immediately previous release and current ESR), and 24 (immediately previous ESR). * Integrated the Microsoft Internet Explorer driver implementation into the .NET bindings. By setting the Implementation property of the .NET bindings' InternetExplorerDriverService class, the user can now force the driver to use the Microsoft implementation. Note that the default is to use the existing open-source implementation based on the Automation Atoms. Over time as the Microsoft implementation matures, this will be switched to use the Microsoft implementation, first by default, then exclusively. To use the Microsoft implementation, the user must have the August 2014 updates to Internet Explorer installed through Windows Update, and must install the IE Web Driver Tool for Internet Explorer 11 download from Microsoft (http://www.microsoft.com/en-us/download/details.aspx?id=44069). * Added safe check for window.localStorage in .NET WebDriverBackedSelenium implementation. Patch provided by Timofey Vasenin. * Implemented pluggable element locator factories for .NET PageFactory. This change allows the user to specify a custom IElementLocatorFactory for locating elements when used with the PageFactory. This gives much more control over the algorithm used to locate elements, and allows the incorporation of things like retries or handling of specific exceptions. * Issue #7367: Set Json.NET to ignore dates when parsing response values. * Issue #7419: Added support for SwitchTo().ParentFrame() in .NET bindings. This brings the .NET bindings into parity with other languages. * Belatedly removed long-obsolete .NET tests for AndroidDriver.

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值