selenium实战篇【过极验-文字点选验证】

孤寒者

已于 2024-10-04 20:49:05 修改

阅读量3.3w

点赞数 17

分类专栏：爬虫进阶+实战系列教程文章标签： python 爬虫极验文字点选验证

于 2021-01-21 19:42:54 首次发布

转载需文末联系本作者（未经本人允许的转载完全不允许！）

本文链接：https://blog.csdn.net/qq_44907926/article/details/112970289

版权

爬虫进阶+实战系列教程专栏收录该内容

15 篇文章 25 订阅 ¥99.90 ¥299.90

订阅专栏

超级会员免费看

本文介绍了如何使用selenium进行截图获取图片并利用ddddocr进行文字识别，以解决极验的文字点选验证问题。文章分为四个步骤：selenium截图获取图片、目标识别、文字坐标识别和坐标点击。在识别阶段，提到了ddddocr的低识别率问题，以及使用打码平台（如超级鹰）作为替代方案。最后，文章强调了点击坐标需与识别坐标对应调整的方法。

摘要由CSDN通过智能技术生成

目标url：

aHR0cHM6Ly93d3cuZ2VldGVzdC5jb20vYWRhcHRpdmUtY2FwdGNoYS1kZW1v

第一步：selenium获取所需图片

在《selenium实战篇【三种方法过极验-滑动拼图验证】》这篇文章中，对于图片的获取采用的是通过图片url得到；
所以本文就来个不一样的方法——通过selenium截图获取！

import re
import time
import ddddocr
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver import ActionChains

service = Service(r'E:\pycharm_project\chromedriver.exe')
driver = webdriver.Chrome(service=service)

driver.get('https://www.geetest.com/adaptive-captcha-demo')

# 点击【文字点选验证】
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.XPATH,
        '//*[@id="gt-showZh-mobile"]/div/section/div/div[2]/div[1]/div[2]/div[3]/div[4]'
    )
)
tag.click()

# 点击开始验证
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.CLASS_NAME,
        'geetest_btn_click'
    )
)
tag.click()

time.sleep(5)

# 要识别的目标图片
target_tag = driver.find_element(
    By.CLASS_NAME,
    'geetest_ques_back'
)
target_tag.screenshot('target.png')

# 待识别图片
bg_tag = driver.find_element(
    By.CLASS_NAME,
    'geetest_bg'
)
bg_tag.screenshot('bg.png')

time.sleep(30)
driver.close()

有个比较奇怪的地方：
在这里插入图片描述
比如上述【点击按钮开始验证】这个标签class值不止有geetest_btn_click一个，但是使用find_element( By.CLASS_NAME, 'geetest_btn_click' )可以匹配到！

这就证明目标元素的 class 属性确实只包含了 ‘geetest_btn_click’ 这个类名，而没有其他多余的空格分隔的类名，所以自相矛盾了，虽然可以用，但是建议碰到类似情况的话用css选择器或者xpath来实现！

以下是使用 CSS Selector 进行部分匹配的示例代码：
element = driver.find_element(By.CSS_SELECTOR, ".geetest_btn_click")
以下是使用 XPath 进行部分匹配的示例代码：
element = driver.find_element(By.XPATH, "//*[contains(@class, 'geetest_btn_click')]")

第二步：识别

1. 目标识别【简单：ddddocr】：

上述图片获取，关于目标图片是直接截图获取。
但是为了识别及后续处理方便，所以此处改为通过图片url分别获取目标文字图片，然后分别进行识别~

import re
import time
import ddddocr
import requests
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver import ActionChains

service = Service(r'E:\pycharm_project\chromedriver.exe')
driver = webdriver.Chrome(service=service)

driver.get('https://www.geetest.com/adaptive-captcha-demo')

# 点击【文字点选验证】
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.XPATH,
        '//*[@id="gt-showZh-mobile"]/div/section/div/div[2]/div[1]/div[2]/div[3]/div[4]'
    )
)
tag.click()

# 点击开始验证
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.CLASS_NAME,
        'geetest_btn_click'
    )
)
tag.click()

time.sleep(5)

# 识别任务图片
target_word_list = []
parent = driver.find_element(By.CLASS_NAME, 'geetest_ques_back')
tag_list = parent.find_elements(By.TAG_NAME, 'img')
for tag in tag_list:
    ocr = ddddocr.DdddOcr(show_ad=False)
    word = ocr.classification(tag.screenshot_as_png)
    target_word_list.append(word)

print(f'要识别的文字：{target_word_list}')

time.sleep(30)
driver.close()

在这里插入图片描述

2. 文字坐标识别【复杂：ddddocr或打码平台】：

识别背景图中的文字，并获得每个文字的坐标（后续需按照顺序点击）~
下述两种方法识别出的文字坐标都是根据图片左上角为坐标原点的坐标！

（1）ddddocr实现：

这种方式可以识别，但是默认识别率较低，想要提升识别率，可以搭建Pytorch环境对模型进行训练。参考：https://github.com/sml2h3/dddd_trainer

import time
from io import BytesIO

import ddddocr
from PIL import Image
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait

service = Service(r'E:\pycharm_project\chromedriver.exe')
driver = webdriver.Chrome(service=service)

driver.get('https://www.geetest.com/adaptive-captcha-demo')

# 点击【文字点选验证】
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.XPATH,
        '//*[@id="gt-showZh-mobile"]/div/section/div/div[2]/div[1]/div[2]/div[3]/div[4]'
    )
)
tag.click()

# 点击开始验证
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.CLASS_NAME,
        'geetest_btn_click'
    )
)
tag.click()

time.sleep(5)

# 识别任务图片
target_word_list = []
parent = driver.find_element(By.CLASS_NAME, 'geetest_ques_back')
tag_list = parent.find_elements(By.TAG_NAME, 'img')
for tag in tag_list:
    ocr = ddddocr.DdddOcr(show_ad=False)
    word = ocr.classification(tag.screenshot_as_png)
    target_word_list.append(word)

print(f'要识别的文字：{target_word_list}')

# 背景图片
bg_tag = driver.find_element(
    By.CLASS_NAME,
    'geetest_bg'
)
content = bg_tag.screenshot_as_png

# 识别背景图中的所有文字区域坐标
ocr = ddddocr.DdddOcr(show_ad=False, det=True)
poses = ocr.detection(content)			# 列表类型，其中每个元素是形如(x1, y1, x2, y2)的元组类型 —— (x1, y1)是文字左上角坐标；(x2, y2)是文字右下角坐标.

# 循环坐标中的每个文字并识别
bg_word_dict = {}
img = Image.open(BytesIO(content))
for box in poses:
    x1, y1, x2, y2 = box
    # 根据坐标获取每个文字的图片
    corp = img.crop(box)
    img_byte = BytesIO()
    corp.save(img_byte, 'png')
    # 识别文字
    ocr2 = ddddocr.DdddOcr(show_ad=False)
    word = ocr2.classification(img_byte.getvalue())  # 识别率较低

    # 获取每个字的坐标
    bg_word_dict[word] = [int((x1 + x2) / 2), int(y1 + y2) / 2]

print(bg_word_dict)

time.sleep(30)
driver.close()

但是识别率太差了：
在这里插入图片描述

在这里插入图片描述

（2）打码平台实现：

此处以超级鹰为例，官网：https://www.chaojiying.com/
官方文档说明

import base64
import requests
from hashlib import md5

file_bytes = open('bg.jpg', 'rb').read()

res = requests.post(
    url='http://upload.chaojiying.net/Upload/Processing.php',
    data = {
        'user': '用户名',
        'pass2': md5('密码'.encode('utf-8')).hexdigest(),
        'codetype': '9501',
        'file_base64': base64.b64encode(file_bytes)
    },
    headers = {
        'Connection': 'Keep-Alive',
        'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
    }
)

res_dict = res.json()
print(res_dict)

在这里插入图片描述

第三步：图片获取&识别整合

import time
import base64
import requests
from hashlib import md5
import ddddocr
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait

service = Service(r'E:\pycharm_project\chromedriver.exe')
driver = webdriver.Chrome(service=service)

driver.get('https://www.geetest.com/adaptive-captcha-demo')

# 点击【文字点选验证】
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.XPATH,
        '//*[@id="gt-showZh-mobile"]/div/section/div/div[2]/div[1]/div[2]/div[3]/div[4]'
    )
)
tag.click()

# 点击开始验证
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.CLASS_NAME,
        'geetest_btn_click'
    )
)
tag.click()

time.sleep(5)

# 识别任务图片
target_word_list = []
parent = driver.find_element(By.CLASS_NAME, 'geetest_ques_back')
tag_list = parent.find_elements(By.TAG_NAME, 'img')
for tag in tag_list:
    ocr = ddddocr.DdddOcr(show_ad=False)
    word = ocr.classification(tag.screenshot_as_png)
    target_word_list.append(word)

print(f'要识别的文字：{target_word_list}')

# 背景图片
bg_tag = driver.find_element(
    By.CLASS_NAME,
    'geetest_bg'
)
content = bg_tag.screenshot_as_png
bg_tag.screenshot('bg.png')

# 识别背景图中的所有文字并获取坐标
res = requests.post(
    url='http://upload.chaojiying.net/Upload/Processing.php',
    data={
        'user': '用户名',
        'pass2': md5('密码'.encode('utf-8')).hexdigest(),
        'codetype': '9501',
        'file_base64': base64.b64encode(content)
    },
    headers={
        'Connection': 'Keep-Alive',
        'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
    }
)
res_dict = res.json()
print(res_dict)

# 每个字的坐标   {"鸭": (195, 82), ...}
bg_word_dict = {}
for item in res_dict['pic_str'].split('|'):
    word, x, y = item.split(',')
    bg_word_dict[word] = (x, y)

print(bg_word_dict)

time.sleep(30)
driver.close()

会识别出图片上不存在的字，但是没影响呀，有需要的文字的就可以！
在这里插入图片描述

第四步：文字坐标点击

根据坐标，在背景图上进行点击：

ActionChains(driver).move_to_element_with_offset(标签对象, xoffset=x, yoffset=y).click().perform()

【需要注意的是：上述代码“标签对象”即背景图标签对象，而且上述点击的xoffset和yoffset坐标是以图片中心为坐标原点的坐标。所以如果要和识别的坐标统一，识别的每个文字坐标x和y都要减去图片长和宽的一半！】

import time
import base64
import requests
from hashlib import md5
import ddddocr
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver import ActionChains

service = Service(r'E:\pycharm_project\chromedriver.exe')
driver = webdriver.Chrome(service=service)

driver.get('https://www.geetest.com/adaptive-captcha-demo')

# 点击【文字点选验证】
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.XPATH,
        '//*[@id="gt-showZh-mobile"]/div/section/div/div[2]/div[1]/div[2]/div[3]/div[4]'
    )
)
tag.click()

# 点击开始验证
tag = WebDriverWait(driver, 30, .5).until(
    lambda dv: dv.find_element(
        By.CLASS_NAME,
        'geetest_btn_click'
    )
)
tag.click()

time.sleep(5)

# 识别任务图片
target_word_list = []
parent = driver.find_element(By.CLASS_NAME, 'geetest_ques_back')
tag_list = parent.find_elements(By.TAG_NAME, 'img')
for tag in tag_list:
    ocr = ddddocr.DdddOcr(show_ad=False)
    word = ocr.classification(tag.screenshot_as_png)
    target_word_list.append(word)

print(f'要识别的文字：{target_word_list}')

# 背景图片
bg_tag = driver.find_element(
    By.CLASS_NAME,
    'geetest_bg'
)
content = bg_tag.screenshot_as_png
bg_tag.screenshot('bg.png')

# 识别背景图中的所有文字并获取坐标
res = requests.post(
    url='http://upload.chaojiying.net/Upload/Processing.php',
    data={
        'user': '用户名',
        'pass2': md5('密码'.encode('utf-8')).hexdigest(),
        'codetype': '9501',
        'file_base64': base64.b64encode(content)
    },
    headers={
        'Connection': 'Keep-Alive',
        'User-Agent': 'Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0)',
    }
)
res_dict = res.json()
print(res_dict)

# 每个字的坐标   {"鸭": (195, 82), ...}
bg_word_dict = {}
for item in res_dict['pic_str'].split('|'):
    word, x, y = item.split(',')
    bg_word_dict[word] = (x, y)

print(bg_word_dict)
# target_word_list = ['粉', '菜', '香']
# bg_word_dict = {'粉': ('10', '14'), '菜': ('20', '42'), '香': ('100', '92')}

# 点击
for word in target_word_list:
    time.sleep(2)
    group = bg_word_dict.get(word)
    if not group:
        continue
    x, y = group
    x = int(x) - int(bg_tag.size['width'] / 2)
    y = int(y) - int(bg_tag.size['height'] / 2)
    ActionChains(driver).move_to_element_with_offset(bg_tag, xoffset=x, yoffset=y).click().perform()

time.sleep(30)
driver.close()