Python3 Selenium+ChromeDriver抓取动态网页

以前抓取动态网页是用PhantomJS + Selenium + ChromeDriver,但是新版的Selenium不支持PhantomJS了,程序跑的时候总会跳出一些warnings.

现在的操作是放弃PhantomJS,直接用headless ChromeDriver。可直接在Google主页下载个ChromeDriver,都是支持headless的。

下面的程序就是启动driver,抓取数据,关闭driver的例子。记得要关掉driver,不然会占内存。

# -*- coding: UTF-8 -*-
'''
@version: Python 3.6
@introduction:
@author: 
@date: 2018-3
'''

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

# 启动driver
def init_web_driver():
    global driver
    chrome_options = Options()
    chrome_options.add_argument('--headless')
    chrome_options.add_argument('--disable-gpu')
    driver_path = 'E:\chromedriver.exe'    #这里放的就是下载的driver本地路径
    driver = webdriver.Chrome(chrome_options=chrome_options, executable_path = driver_path)

# 关掉driver
def close_web_driver():
    driver.quit()

def get_data():
    driver.get('https://www.baidu.com')
  driver.implicitly_wait(10)  # wait up to 10 seconds for the elements to become available
    # ======  网页中静态部分抓取,采用BeautifulSoup去解析   
    html = driver.page_source    # 获取网页html
    html_soup = BeautifulSoup(html.text,"lxml")
    time.sleep(0.1)
    coin_list = html_soup .find(name='table', attrs={"class": "table maintable"})
    # 页面元素的提取请查看 BeautifulSoup的用法
    # ====== 网页中动态部分抓取,采用driver自带的方法
    # 下面展示的从调用百度搜索,在搜索框中输入"headless chrome",然后获取结果。具体自行百度driver的用法
    text = driver.find_element_by_css_selector('#kw')
    search = driver.find_element_by_css_selector('#su')
    text.send_keys('headless chrome')
    # search
    search.click()
    driver.get_screenshot_as_file('search-result.png')
    results = driver.find_elements_by_xpath('//div[@class="result c-container "]')
    for result in results:
        res = result.find_element_by_css_selector('a')
        title = res.text
        link = res.get_attribute('href')
        print ('Title: %s \nLink: %s\n' % (title, link) )


if __name__ == '__main__':
    init_web_driver()
    get_data()
    close_web_driver()

发布了65 篇原创文章 · 获赞 24 · 访问量 13万+
展开阅读全文

新手--python3.5调用webdriver报错

03-22

脚本启动Fiefox报错 代码: # -*- coding:utf-8 -*- from selenium import webdriver browser = webdriver.Firefox() browser.get('http;//www.baidu.com') browser.find_element_by_id('kw').send_keys('selenium') browser.find_element_by_id('su').click() browser.quit() 报错信息: Traceback (most recent call last): File "C:\Users\hxx\AppData\Local\Programs\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 74, in start stdout=self.log_file, stderr=self.log_file) File "C:\Users\hxx\AppData\Local\Programs\Python\Python35\lib\subprocess.py", line 676, in __init__ restore_signals, start_new_session) File "C:\Users\hxx\AppData\Local\Programs\Python\Python35\lib\subprocess.py", line 955, in _execute_child startupinfo) FileNotFoundError: [WinError 2] 系统找不到指定的文件。 During handling of the above exception, another exception occurred: Traceback (most recent call last): File "E:/workspace/baidu.py", line 3, in <module> browser = webdriver.Firefox() File "C:\Users\hxx\AppData\Local\Programs\Python\Python35\lib\site-packages\selenium\webdriver\firefox\webdriver.py", line 145, in __init__ self.service.start() File "C:\Users\hxx\AppData\Local\Programs\Python\Python35\lib\site-packages\selenium\webdriver\common\service.py", line 81, in start os.path.basename(self.path), self.start_error_message) selenium.common.exceptions.WebDriverException: Message: 'geckodriver' executable needs to be in PATH. 问答

没有更多推荐了,返回首页

©️2019 CSDN 皮肤主题: 编程工作室 设计师: CSDN官方博客

分享到微信朋友圈

×

扫一扫,手机浏览