爬虫实战---python图片验证码破解，PIL和安装

最新推荐文章于 2024-06-29 08:30:00 发布

累兰羽

最新推荐文章于 2024-06-29 08:30:00 发布

阅读量1.1k

点赞数 1

分类专栏： python ubuntu 文章标签：爬虫爬虫-python 验证码图片破解

本文链接：https://blog.csdn.net/qq_34971175/article/details/68959155

版权

python 同时被 2 个专栏收录

21 篇文章 0 订阅

订阅专栏

ubuntu

8 篇文章 0 订阅

订阅专栏

咱们最开始先安装几个包：

1.首先安装PIL包对图片的支持：

1.1先下载压缩包：Python Imaging Library 1.1.6 Source Kit (all platforms) 我使用的是这个

下载网址：http://www.pythonware.com/products/pil/

1.2然后解压缩，

1.3进入到解压文件目录，cd Imaging-1.1.6

1.4 执行安装，python setup.py install

还可以看这里http://blog.csdn.net/u010258605/article/details/43735159 安装PIL

2.安装Pytesseract 库

sudo pip install pytesseract

3.光光安装这两个库可不行，

3.1还需要安装：tesseract-ocr

不安装这个库的后果就是：这个错误坑了我老半天+_+

  File "/usr/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1259, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

3.2 pip install tesseract-ocr (貌似会报错，反正我这样没有安装成功的）

3.3 sudo apt-get install tesseract-ocr （亲测有用这个）

安装完了之后开启的图片读取之旅把！！！

先对简单的图片读取：

import pytesseract
fromPIL importImage
#里面的图片路径是你安装pytesseract里面自带图片
image = Image.open(u'/usr/local/lib/python2.7/dist-packages/pytesseract/test.png')
print image 
text = pytesseract.image_to_string(image)
#接下来输出的就是图片中文本，但是识别度比较低
print text

升级版本：加强对图片的识别处理：

import pytesseract
from PIL import Image
#版本二
image_path = '/home/zhan/Desktop/7025.jpeg'
im = Image.open(image_path)
# im.convert('L').show()
imgry = im.convert('L') #将图片变成灰白色,

#二值化
threshold = 140
table =[]
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)

out = imgry.point(table,'1')
# out.show()
#读取出字符串
cap_str = pytesseract.image_to_string(out)
print cap_str

这样可以取出一些全是数字的验证码处理，数字加图片的识别度不高。