获取验证码图片、通过background-position重组验证码图片

     现在有很多验证码图片获取后都是乱序的需要重组,webdriver截图是一个很方便的处理方式,但是webdriver过于占用内存,故提供一个重组的方式,现已前程无忧为例,记录一下解决方案,大体思路可以分为以下几个步骤:获取原始验证码图片----->获取css偏移量数组---->新建空白图片文件---->按顺序根据css偏移量和验证码图片尺寸抠图并粘贴到空白文件。

验证码的html源码如下:

 可以看到验证码图片为一个个的小图片拼接而成。

首先获取原始验证码图片,上图红框中的url即为原始验证码图片

import requests
from PIL import Image


def get_captcha(url):
    session = requests.Session()
    session.headers = {
        'Host': 'ehire.51job.com',
        'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:61.0) Gecko/20100101 Firefox/61.0',
        'Accept': '*/*',
        'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
        'Accept-Encoding': 'gzip, deflate, br',
        'Referer': 'https://ehire.51job.com/',
        'Cookie': 'guid=15257617072656350082; search=jobarea%7E%60180200%7C%21ord_field%7E%600%7C%21recentSearch0%7E%601%A1%FB%A1%FA180200%2C00%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA26%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA%A1%FB%A1%FA2%A1%FB%A1%FA%A1%FB%A1%FA-1%A1%FB%A1%FA1525763865%A1%FB%A1%FA0%A1%FB%A1%FA%A1%FB%A1%FA%7C%21recentSearch1%7E%601%A1%FB%A1%FA180200%2C00%A1%FB%A1%FA000000%A1%FB%A1%FA0000%A1%FB%A1%FA26%A1%FB%A1%FA9%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA99%A1%FB%A1%FA%B9%A4%B3%CC%CF%EE%C4%BF%A1%FB%A1%FA2%A1%FB%A1%FA%A1%FB%A1%FA-1%A1%FB%A1%FA1525761723%A1%FB%A1%FA0%A1%FB%A1%FA%A1%FB%A1%FA%7C%21; nsearch=jobarea%3D%26%7C%26ord_field%3D%26%7C%26recentSearch0%3D%26%7C%26recentSearch1%3D%26%7C%26recentSearch2%3D%26%7C%26recentSearch3%3D%26%7C%26recentSearch4%3D%26%7C%26collapse_expansion%3D; slife=lowbrowser%3Dnot%26%7C%26; ps=us%3DWmdbOQN%252FCzxdO1szCnFWZAEyATUELFs4VmgBLwoxVWBbYVAzBmULPVw8DWhQPQQ1ATVUbVFlBSwBZAVgCm5RNFob%26%7C%26; EhireGuid=c892eff94eec45e1a6a32afce3c57ed0; LangType=Lang=&Flag=1; adv=adsnew%3D1%26%7C%26adsresume%3D1%26%7C%26adsfrom%3Dhttps%253A%252F%252Fwww.baidu.com%252Fbaidu%253Ftn%253Dmonline_3_dg%2526ie%253Dutf-8%2526wd%253D%2525E5%252589%25258D%2525E7%2525A8%25258B%2525E6%252597%2525A0%2525E5%2525BF%2525A7%26%7C%26adsnum%3D2004282; partner=www_baidu_com; 51job=cenglish%3D0%26%7C%26; ASP.NET_SessionId=kgah40twoauxke1poschvlpl',
        'Connection': 'keep-alive'
    }
    text = session.get(url).content
    with open('captcha.jpg', 'wb') as f:
        f.write(text)

获取到的原始图片如下:

第二步获取css偏移量数组, 每个小图片的坐标位置是通过css样式来确定的,通过class找到对应的css样式:

 

获取background-position并转化为数组如下,必须按顺序排列:

offset_list = [['66', '40'], ['286', '40'], ['66', '98'], ['44', '40'], ['154', '40'], ['22', '40'], ['88', '98'],
               ['198', '40'], ['198', '98'], ['264', '98'], ['308', '40'], ['176', '40'], ['0', '98'], ['132', '98'],
               ['132', '40'], ['176', '98'], ['88', '40'], ['154', '98'], ['220', '40'], ['264', '40'], ['110', '40'],
               ['242', '98'], ['286', '98'], ['0', '40'], ['242', '40'], ['44', '98'], ['220', '98'], ['22', '98'],
               ['308', '98'], ['110', '98']]

图片重组:

#获取每张小图的偏移量
def convert_index_to_offset(index):
    if index < 15:                #完整的验证码图片是由30个小图片组合而成,共2行15列
        return (index * 22, 0)
    else:
        i = index - 15
        return (i * 22, 58)       #每张小图的大小为22*58

#获取每张小图的坐标,供抠图时使用
def convert_css_to_offset(off):
    # (left, upper)o ----- o
    #         |       |
    #         o ----- o(right, lower)
    return (int(off[0]), int(off[1]), int(off[0]) + 22, int(off[1]) + 58)

#图片重组
def recombine_captcha():
    captcha = Image.new('RGB', (22 * 15, 58 * 2))  #新建空白图片
    img = Image.open('captcha.jpg')   #实例化原始图片Image对象
    for i, off in enumerate(offset_list):
        box = convert_css_to_offset(off)   #根据css backgound-position获取每张小图的坐标
        regoin = img.crop(box)             #抠图
        offset = convert_index_to_offset(i)  #获取当前小图在空白图片的坐标
        captcha.paste(regoin, offset)        #根据当前坐标将小图粘贴到空白图片
    captcha.save('regoin.jpg')  

重组后的图片如下:

  • 2
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值