前言
playwright是微软设计的一款工具,可以爬取网页,还可以自动化测试自己编写的网站,而且不像bs4、request编写爬虫那么复杂,也不需要考虑反爬技术,只需要知道最基础的前端知识,就可以高效、便捷的编写爬虫代码
但是这篇文章不可能将playwright的所有功能全部讲到,但是覆盖了最基础的一些知识,如果想更系统的学习,可以参考playwright python的官方文档:https://playwright.dev/python/docs/intro
也非常推荐白月黑羽的教程:Playwright web自动化 - Python版_哔哩哔哩_bilibili
安装步骤
pip install pytest-playwright
playwright install
简单demo
使用下面命令可以开启录制
playwright codegen
右边窗口会自动根据左边浏览器做的操作进行记录,下面是自动生成的代码模板,定义了同步操作(不是异步)的playwright类,并输入到run函数中,browser这句代码开辟新的进程,打开了playwright自带的浏览器chromium,这个可以自己改为其他的浏览器(参考官网),headless为False时,执行过程会打开浏览器,否则将不会显示。
from playwright.sync_api import Playwright, sync_plawright, expect
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=False)
context = browser.new_context()
page = context.new_page()
# ---------------------
context.close()
browser.close()
with sync_playwright() as playwright:
run(playwright)
在左边浏览器键入www.baidu.com,右边代码就新增了代码,表示跳转到了这个界面
page.goto("https://www.baidu.com/")
然后点击输入框,并输入nihao,就会新增下面这些代码,其中locator是定位器,page.locator("#kw")
定位到的是输入的文本框,即定位到网页中的一个部件,后面的click、fill、press分别表示点击、填充、输入回车等操作,对部件的操作见Actions | Playwright Python
page.locator("#kw").click()
page.locator("#kw").fill("nihao")
page.locator("#kw").press("Enter")
这样一来就可以自动设计代码啦,同时还可以使用tracing模块来记录执行的过程
context = browser.new_context()
context.tracing.start(snapshots=True, sources=True, screenshots=True)
...
context.tracing.stop(path="trace.zip")
然后在终端执行就可以显示整个的过程,左边为playwright的各种操作,右边有Action,Before和After显示每一步的动作、先前状态和执行过后的状态
playwright show-trace trace.zip
这样一个简单的demo,打开baidu网站,查询nihao,就实现啦
from playwright.sync_api import Playwright, sync_playwright, expect
def run(playwright: Playwright) -> None:
browser = playwright.chromium.launch(headless=False)
context = browser.new_context()
context.</