1. Playwright简介
Playwright是一个强大的自动化测试和网页操作工具,由Microsoft开发。它支持Python、JavaScript、TypeScript等多种编程语言,可以自动化控制Chromium、Firefox和WebKit浏览器。相比Selenium等传统工具,Playwright具有以下优势:
- 跨浏览器支持
- 自动等待元素加载
- 强大的选择器引擎
- 网络请求拦截与修改
- 模拟移动设备
- 支持并发执行
2. 安装Playwright
使用pip安装Playwright及其依赖:
pip install playwright
playwright install
3. 基本使用
下面是一个简单的例子,演示如何使用Playwright打开网页并获取标题:
from playwright.sync_api import sync_playwright
def run(playwright):
browser = playwright.chromium.launch()
page = browser.new_page()
page.goto("https://www.baidu.com")
print(page.title())
browser.close()
with sync_playwright() as playwright:
run(playwright)
4. 元素定位与交互
Playwright提供了多种元素定位方式,如CSS选择器、XPath等。以下示例展示了如何在百度首页进行搜索:
from playwright.sync_api import sync_playwright
def baidu_search(playwright):
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://www.baidu.com")
# 定位搜索框并输入
search_input = page.locator("#kw")
search_input.fill("Playwright Python")
# 点击搜索按钮
search_button = page.locator("#su")
search_button.click()
# 等待搜索结果加载
page.wait_for_load_state("networkidle")
# 获取搜索结果
results = page.locator(".result h3")
for result in results.all():
print(result.inner_text())
browser.close()
with sync_playwright() as playwright:
baidu_search(playwright)
5. 截图与PDF导出
Playwright可以轻松实现网页截图和PDF导出功能:
from playwright.sync_api import sync_playwright
def capture_and_export(playwright):
browser = playwright.chromium.launch()
page = browser.new_page()
page.goto("https://www.example.com")
# 截取全页面截图
page.screenshot(path="example_full.png", full_page=True)
# 截取指定元素截图
element = page.locator("h1")
element.screenshot(path="example_h1.png")
# 导出为PDF
page.pdf(path="example.pdf")
browser.close()
with sync_playwright() as playwright:
capture_and_export(playwright)
6. 网络请求拦截与修改
Playwright允许拦截和修改网络请求,这在测试和调试中非常有用:
from playwright.sync_api import sync_playwright
def intercept_requests(playwright):
browser = playwright.chromium.launch()
page = browser.new_page()
def handle_request(route):
if ".png" in route.request.url:
route.abort()
else:
route.continue_()
page.route("**/*", handle_request)
page.goto("https://www.example.com")
page.screenshot(path="no_images.png")
browser.close()
with sync_playwright() as playwright:
intercept_requests(playwright)
7. 自动化表单填写
使用Playwright可以轻松实现表单的自动填写:
from playwright.sync_api import sync_playwright
def fill_form(playwright):
browser = playwright.chromium.launch(headless=False)
page = browser.new_page()
page.goto("https://www.example.com/form")
# 填写文本框
page.fill("#username", "testuser")
page.fill("#password", "testpass")
# 选择下拉菜单
page.select_option("#country", "China")
# 勾选复选框
page.check("#agree")
# 点击单选按钮
page.click("#gender-male")
# 提交表单
page.click("input[type=submit]")
page.wait_for_load_state("networkidle")
browser.close()
with sync_playwright() as playwright:
fill_form(playwright)
8. 结语
Playwright为Python自动化办公提供了强大而灵活的工具。本文仅介绍了其基本用法,更多高级特性如并发执行、移动设备模拟、认证处理等,读者可以参考官方文档深入学习。相信掌握Playwright后,您的自动化办公效率将得到显著提升。
希望这篇文章对您有所帮助。如有任何问题,欢迎在评论区讨论。