【Python】网页自动化操作研究记录

qilei2010

已于 2023-03-13 08:43:06 修改

阅读量209

点赞数

分类专栏： Python 文章标签： python Powered by 金山文档

于 2023-02-14 23:35:44 首次发布

本文链接：https://blog.csdn.net/qilei2010/article/details/129035398

版权

Python 专栏收录该内容

36 篇文章 27 订阅

订阅专栏

仅作粗略记录。

准备工作

安装谷歌浏览器

安装谷歌浏览器版本对应的驱动driver

安装库 selenium 4.1.1 （更高版本有闪退BUG）

2. 步骤

2.1 引入库和包

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

2.2 点击元素

# 打开谷歌浏览器，最大化，不自动关闭
options = webdriver.ChromeOptions()
options.add_experimental_option('detach', True)  # 不自动关闭浏览器
options.add_argument('--start-maximized')  # 浏览器窗口最大化
wd = webdriver.Chrome(options=options)

# 打开网页页
wd.get("https://www.baidu.com/")
#  程序暂停1秒，模拟真人操作的间隙
time.sleep(1)
# 查找元素
golg = wd.find_element(By.XPATH,
                       '/html/body/section/main/div/section/main/div/div[1]/div[3]/button')
# 点击元素
golg.click()
# 程序暂停2秒，模拟真人操作的间隙
time.sleep(2)

2.3 等待网页元素出现

网页模拟，最难的就是自动化过程很容易受到电脑性能和网速的不确定因素的影响。若在重复第99次后，网速突然变差，网页还未加载出来，代码就去点击某个按钮，此时代码会直接报错，程序直接退出，会导致整个过程失败。

使用 WebDriverWait 函数可以解决这个问题。该函数会等待指定的元素出现，并在等待的期间每隔0.5秒去判断元素是否出现，如果15秒后该元素依然没有出现，则程序报错。

WebDriverWait(wd, 15, 0.5).until(EC.presence_of_element_located(
            (By.XPATH, 'xxxxxx')))
time.sleep(0.1)

其中细节不多讲，更多知识请百度Python自动化测试库 selenium 了解。

2.4 集成一下

此为测试代码，可自己封装成工具类。

每个动作的函数实现最后都增加延时，目的是等待网页加载、模拟真人操作。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

# 日志
import logging

logger = logging.getLogger("mylogger")
logger.setLevel(level=logging.DEBUG)
handler = logging.FileHandler("log.txt", mode='w', encoding="utf-8")
handler.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
handler.setFormatter(formatter)
logger.addHandler(handler)

# 打开谷歌浏览器，最大化，不自动关闭
options = webdriver.ChromeOptions()
options.add_experimental_option('detach', True)  # 不自动关闭浏览器
options.add_argument('--start-maximized')  # 浏览器窗口最大化
wd = webdriver.Chrome(options=options)

# 打开网址
def openUrl(url):
    driver.get(url)
    logger.debug("==打开网址：{}".format(url))
    time.sleep(2)

# 点击元素
def clickByXPath(xpath):
    ele = driver.find_element(By.XPATH, xpath)
    ele.click()
    logger.debug("==点击元素")
    time.sleep(1)

# 输入内容
def inputTxt(xpath, txt):
    ele = driver.find_element(By.XPATH,xpath)
    ele.send_keys(txt)
    logger.debug("==输入内容：{}".format(txt))
    time.sleep(1)

# waitTime等待时长/秒, recheckTime重复检测间隔/秒, 检测元素
def waitEleExist(waitTime, recheckTime, eleXPath):
    WebDriverWait(driver, waitTime, recheckTime).until(EC.presence_of_element_located(
        (By.XPATH, eleXPath)))
    time.sleep(0.1)

def run_webdriver():
    mainpage = "https://www.baidu.com"
    openUrl(mainpage)

    # 同意
    inputXPath = '/html/body/div[1]/div[2]/div[5]/div[1]/div/form/span[1]/input'
    inputTxt(inputXPath, "北京天气")

    searchXPath = '/html/body/div[1]/div[2]/div[5]/div[1]/div/form/span[2]/input'
    clickByXPath(searchXPath)
    time.sleep(2)

if __name__ == '__main__':
    logger.info("")
    logger.info("==================软件开始运行=======================")
    run_webdriver()
    logger.info("==================软件正常结束=======================")

3 反“反调试”

部分网页对调试模式做了屏蔽和限制，导致F12模式和selenium无法使用，此时可使用 undetected_chromedriver，具体操作见：

https://blog.csdn.net/Scott0902/article/details/127024380

刚方案也非完美方案，亲测部分网页依然失效，无解。

# 前提要安装 undetected_chromedriver
# pip install undetected_chromedriver
# 更换上个代码片段的 18-21行
import undetected_chromedriver as uc
driver = uc.Chrome()

qilei2010

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
【Python】网页自动化操作研究记录

简要介绍Python网页自动化测试库 selenium 的基本用法，含基础代码。
复制链接

扫一扫

专栏目录