笔记:Generic Solving of Text-based CAPTCHAs

笔记:基于文本的验证码一般解决方法

Generic Solving of Text-based CAPTCHAs

  • Introduction
  • Algorithm
  • Optimization
  • Evaluation

Introduction

captcha

Many websites use CAPTCHAs, or Completely Automated Public Turing tests to tell Computers and Humans Apart, to block automated interaction with their sites, captchas are sometimes called reverse Turing tests, because they are intended to allow a computer to determine whether a remote client is human or machine.
许多网站都使用验证码来验证对方是机器还是人类,为了阻止与网站的自动交互,验证码有些时候也被称作反向图灵测试,因为它们的目的是让计算机确定远程客户机是人还是机器。
这里写图片描述

Negative kerning:

captcha uses negative space between characters to resist segmentation by ensuring that each character is occluded by its neighbors.
验证码使用字符之间的负空间来抵御分割,确保每个字符都会被其旁边的字符覆盖。
这里写图片描述

the segment then recognize approach

这里写图片描述
可以看到验证码从分割到识别的过程。

Improvement

The limitation of the segment then recognize approach has been the attacker’s ability to find new flaws.
分割后识别的方法的限制是攻击者找到了方法的缺陷。
The work in this paper overcomes this limitation by segmenting and recognizing the captcha simultaneously, thus removing the need for manually discovered heuristics to segment captchas.
这篇文章中的方法通过同时进行分割识别的方法,克服了分割后识别的方法的限制,从而去除了需要手动发现分割验证码的需要。

dataset

这里写图片描述
数据集展示如上。
recaptcha数据集解决了古籍中的一些纸质书电子化的工作量。

Algorithm

Overview of the algorithms four components

这里写图片描述

how the algorithm works

这里写图片描述

Example of the algorithm successfully appliedto a Yahoo captcha

这里写图片描述

Reinforcement learning

这里写图片描述
这里写图片描述

Occluding lines

这里写图片描述

Optimization

Reducing the number of cuts

这里写图片描述
这里写图片描述
Algorithm recognition rate (recognition), averagesolve time (time) and average number of segments (segments) with and without the cut point reduction heuristic

Sequential recognition

the best approach when making a local decision is to consider a window of two letters at a time.

这里写图片描述
The sequential and right-left recognition processes.
The closer a character is to the center of the captcha, the less accurate the algorithm is. We also observed that the sequential recognition approach is less accurate on the right side of the captcha than on the left side.

这里写图片描述
Recognition rate per letter for the each learning approach on the Baidu 2011 scheme
the idea of performing the sequential recognition from both directions and then combining the two recognition scores to improve the overall accuracy.

这里写图片描述
Recognition rate per letter for the various approaches for the Yahoo! scheme

Recognition rates for real-world schemes.

这里写图片描述
a test set of approximately 1000 captchas for each captcha scheme that were not used during training

Experiment

Evaluation


这里写图片描述

Learnability

这里写图片描述
Recognition rate as a function of the number of example in each class.

it does not take very many examples to achieve a sufficient accuracy rate.

Human accuracy

这里写图片描述
Ranged from 0 pixels to -7 pixels . Thecaptchas were 6 characters long.
Human and Algorithm accuracy vs. spacing between letters in pixel

参考文献:
http://s3.amazonaws.com/elie-data/publications/the-end-is-nigh-generic-solving-of-text-based-captchas.pdf

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值