笔记：Generic Solving of Text-based CAPTCHAs

最新推荐文章于 2024-10-15 19:28:18 发布

置顶 Yvonne_fan

最新推荐文章于 2024-10-15 19:28:18 发布

阅读量493

点赞数 1

分类专栏：验证码识别文章标签：优化算法验证码识别

本文链接：https://blog.csdn.net/choubaguaihailan/article/details/71374485

版权

验证码识别专栏收录该内容

2 篇文章 0 订阅

订阅专栏

笔记：基于文本的验证码一般解决方法

Generic Solving of Text-based CAPTCHAs

Introduction
Algorithm
Optimization
Evaluation

Introduction

captcha

Many websites use CAPTCHAs, or Completely Automated Public Turing tests to tell Computers and Humans Apart, to block automated interaction with their sites, captchas are sometimes called reverse Turing tests, because they are intended to allow a computer to determine whether a remote client is human or machine.
许多网站都使用验证码来验证对方是机器还是人类，为了阻止与网站的自动交互，验证码有些时候也被称作反向图灵测试，因为它们的目的是让计算机确定远程客户机是人还是机器。
这里写图片描述

Negative kerning:

captcha uses negative space between characters to resist segmentation by ensuring that each character is occluded by its neighbors.
验证码使用字符之间的负空间来抵御分割，确保每个字符都会被其旁边的字符覆盖。
这里写图片描述

the segment then recognize approach

这里写图片描述
可以看到验证码从分割到识别的过程。

Improvement

The limitation of the segment then recognize approach has been the attacker’s ability to find new flaws.
分割后识别的方法的限制是攻击者找到了方法的缺陷。
The work in this paper overcomes this limitation by segmenting and recognizing the captcha simultaneously, thus removing the need for manually discovered heuristics to segment captchas.
这篇文章中的方法通过同时进行分割识别的方法，克服了分割后识别的方法的限制，从而去除了需要手动发现分割验证码的需要。

dataset

这里写图片描述
数据集展示如上。
recaptcha数据集解决了古籍中的一些纸质书电子化的工作量。

Algorithm

Overview of the algorithms four components

这里写图片描述

how the algorithm works

这里写图片描述

Example of the algorithm successfully appliedto a Yahoo captcha

这里写图片描述

Reinforcement learning

这里写图片描述

Occluding lines

这里写图片描述

Optimization

Reducing the number of cuts

这里写图片描述

Algorithm recognition rate (recognition), averagesolve time (time) and average number of segments (segments) with and without the cut point reduction heuristic

Sequential recognition

the best approach when making a local decision is to consider a window of two letters at a time.

这里写图片描述
The sequential and right-left recognition processes.
The closer a character is to the center of the captcha, the less accurate the algorithm is. We also observed that the sequential recognition approach is less accurate on the right side of the captcha than on the left side.

这里写图片描述
Recognition rate per letter for the each learning approach on the Baidu 2011 scheme
the idea of performing the sequential recognition from both directions and then combining the two recognition scores to improve the overall accuracy.