使用Python破解验证码

最新推荐文章于 2024-04-20 20:34:55 发布

「已注销」

最新推荐文章于 2024-04-20 20:34:55 发布

阅读量1.9k

点赞数

本文链接：https://blog.csdn.net/qianglee/article/details/17482671

版权

本文介绍了如何使用Python进行验证码识别，作者分享了从图像中提取文本、AI和向量空间图像识别、建立训练集到最后实现验证码识别的全过程。通过详细的代码示例，展示了如何识别和破解简单的验证码，并指出尽管存在一定的识别率，但仍然存在一些挑战，如字母O和数字0的混淆等问题。

摘要由CSDN通过智能技术生成

Keywords: python captcha

Most people don’t know this but my honours thesis was about using a computer program to read text out of web images. My theory was that if you could get a high level of successful extraction you could use it as another source of data which could be used to improve search engine results. I was even quite successful in doing it, but never really followed my experiments up. My honours advisor Dr Junbin Gao http://csusap.csu.edu.au/~jbgao/ had suggested the following writing my thesis I should write some form of article on what I had learnt. Well I finally got around to doing it. While what follows is not exactly what I was studying it is something I wish had existed when I started looking around.

Legal Stuff/Disclaimer.
I am posting this article solely for information and educational purposes. I do not condone the use or act of ethical or unethical "hacking" nor circumvention of copy protection etc... The use of this code for any unethical or illegal purposes is not condoned by myself nor any other person(s) mentioned in this article.

This work is licenced under a Creative Commons Licence.

So as I mentioned essentially what I attempted to do was take standard images on the web, and extract the text out them as a way of improving search results. Interestingly I based most of my research/ideas by looking at methods of cracking CAPTCHA's. A CAPTCHA as you may well know is one of those annoying "Type in the letters you see in the image above" things you see on many website signup pages or comment sections.

A CAPTCHA image is designed so that a human can read it without difficulty while a computer is unable to. This in practice has never really worked with pretty much every CAPTCHA that is published on the web getting cracked within a matter of months. Knowing this my theory was that since people can get a computer to read something that it shouldn’t be able to, then normal images such as website logos should be much easier to break using the same methods.

I was actually surprisingly successful in my goal with over 60% successful recognition rates for most of the images I used in my sample set. Rather high considering the variety of different images that are on the web.

What I did find however while doing my research was a lack of sample code or applications which show you how to crack CAPTCHA's. While there are some excellent tutorials and many published papers on it they are very light on algorithms or sample code. In fact I didn't find any beyond some non working PHP scripts and some Perl fragments which strung together a few non related programs and gave some reasonable results when presented with very simple CAPTCHA’s. None of them helped me very much. I found that what I needed was some detailed code with examples I could run and tweak and see how it worked. I think I am just one of those people that can read the theory, and follow along, but without something to prod and poke I never really understand it. Most of the papers and articles said they would not publish code due the potential for missuse. Personally I think it is a waste of time since in reality building a CAPTCHA breaker is quite easy once you know how.

So because of the lack of examples, and the problems I had initially getting started, I thought I would put together this article with full detailed explanations working code showing how to go about breaking a CAPTCHA.

Let’s get started.

Here is a list in order of things I am going to discuss.

Technology used

All of the sample code is written in Python 2.5 using the Python Image Library. It will probably work in Python 2.6 but 2.5 is what I had installed. To get started just install Python then install the Python Image Library.

Python	http://www.python.org/
Python Image Library	http://www.pythonware.com/products/pil/

Install them in the above order and you should be ready to run the examples.

Prefix

I am going to hardcode a lot of the values in this example. I am not trying to create a general CAPTCHA solver, but one specific to the examples given. This is just to keep the examples short and concise.

CAPTCHA’s, What are they Anyway?

A CAPTCHA is basically just an implementation of a one way function. This is a function where it is easy to take input and compute the result, but difficult to take the result and compute the input. What is different about them though is that while they are difficult for a computer to take the result and output the inputs, it should be easy for a human to do it. A CAPTCHA can be thought of in simple terms as a "Are you a human?" test. Essentially they are implemented by showing an image which has some word or letters embedded in it.

They are used for preventing automated spam on many online websites. An example can be found on the Windows Live ID signup page

最低0.47元/天解锁文章

「已注销」

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
使用Python破解验证码

Keywords: python captchaMost people don’t know this but my honours thesis was about using a computer program to read text out of web images. My theory was that if you could get a high level of succe
复制链接

扫一扫