python 无头浏览器_python3使用无头浏览器

最新推荐文章于 2024-08-05 16:21:44 发布

weixin_39610774

最新推荐文章于 2024-08-05 16:21:44 发布

阅读量2.4k

点赞数 1

文章标签： python 无头浏览器

selenium是一个用于Web应用程序测试的工具。Selenium测试直接运行在浏览器中,就像真正的用户在操作一样。

pip3 install selenium -i https://pypi.douban.com/simple/

1.PhantomJS+Selenium

Python

1

2

3

4

5

6

7

8

9

10

11

12

fromseleniumimportwebdriver

# driver = webdriver.PhantomJS() # 配置到了环境变量

driver=webdriver.PhantomJS(executable_path="/apps/phantomjs")# 指定PhantomJS执行程序的路径

driver.get('http://www.baidu.com/')# 访问

page=driver.page_source# 网页源码

tableData=driver.find_elements_by_tag_name('tableData').get_attribute('innerHTML')# 获取元素的html源码

tableI=driver.find_elements_by_tag_name('tableData').get_attribute('id')# 获取元素的id值

tableI=driver.find_elements_by_tag_name('tableData').text# 获取元素的文本内容

driver.quit()# 退出浏览器

2.Chrome+Selenium

要使用Chrome浏览器的无头模式驱动，须根据当前浏览器的版本下载，即本地须已有一个正常的Chrome浏览。

http://npm.taobao.org/mirrors/chromedriver ，当前Chrome版本可以从 chrome://settings/help 查看。

将之安装在一个环境变量目录里(或新配置) 如 $JAVA_HOME/bin 下。如果不放到环境变量目录里，则须要在代码中指定这个驱动文件的路径。

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

fromseleniumimportwebdriver

option=webdriver.ChromeOptions()# 1.配置参数

# option.set_headless() # 设为无头模式

option.add_argument("--headless")# 无头模式

option.add_argument("--no-sandbox")

option.add_argument("--disable-gpu")

option.add_argument("disable-infobars")

# browser = webdriver.Chrome(executable_path="chromedriver路径", options=option) # 2.创建浏览器

browser=webdriver.Chrome(options=option)# 2.创建浏览器

browser.get("https://www.baidu.com/")# 3.请求连接

html=browser.page_source# 拿到页面

browser.close()# 关闭连接

browser.quit()

3.Firefox+Selenium

首先本地安装正常的Firefox浏览器。然后在 https://github.com/mozilla/geckodriver/releases 下载一个无头驱动geckodriver，将之安装在一个环境变量目录里(或新配置)。

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

fromseleniumimportwebdriver

option=webdriver.FirefoxOptions()# 1.配置参数

option.set_headless()# 无头模式

option.add_argument("--no-sandbox")

option.add_argument("--disable-gpu")

option.add_argument("disable-infobars")

option.add_argument("--proxy-server="+proxy)# 代理协议://host:port 如 "http://www.baidu.com"

# browser = webdriver.Firefox(executable_path="geckodriver路径", options=option)

browser=webdriver.Firefox(options=option)# 2.创建浏览器

browser.get("https://www.baidu.com/")

html=browser.page_source

browser.close()

browser.quit()

4.Pyppeteer

selenium的保护机制不允许跨域cookies保存。以及登录的时候必须先打开网页，然后加载cookies，再刷新，这种方式很不友好。并且有些网页会检测到是否是使用了selenium。

Puppeteer是Chrome开发团队在2017年发布的一个Node.js包，用来模拟Chrome浏览器，可以无头方式运行，https://github.com/puppeteer/puppeteer。

Pyppeteer是Puppeteer的Python移植，API跟JavaScript版本基本一致。

pip3 install pyppeteer -i https://pypi.douban.com/simple/

Pyppeteer通常和异步库asyncio合作使用。

Python

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

importasyncio

frompyppeteerimportlaunch

asyncdefmain():

browser=awaitlaunch()# 启动浏览器

page=awaitbrowser.newPage()# 打开新页面tab

awaitpage.goto('https://www.baidu.com')# 访问网址

awaitpage.screenshot({'path':'baidu_screenshot.png'})# 截图。/VSCode的当前项目目录/图片.png

awaitbrowser.close()# 关闭浏览器

asyncio.run(main())

# 如果出现如下错误，是因为首次使用pyppeteer，会下载chromium插件：

# raise MaxRetryError(_pool, url, error or ResponseError(cause))

# urllib3.exceptions.MaxRetryError:

# HTTPSConnectionPool(host='storage.googleapis.com', port=443):

# Max retries exceeded with url: /chromium-browser-snapshots/Mac/588429/chrome-mac.zip (

# Caused by SSLError(

# SSLCertVerificationError(

# 1,

# '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: \

# unable to get local issuer certificate (_ssl.c:1056)')))

# 1.

# 从 pyppeteer 库里找到chromium_downloader.py文件，

# /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pyppeteer/chromium_downloader.py

# 2.

# 替换里面

# DEFAULT_DOWNLOAD_HOST = 'https://storage.googleapis.com' 为

# DEFAULT_DOWNLOAD_HOST = 'http://storage.googleapis.com'

-end

weixin_39610774

关注

1
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。