python爬虫滑动验证码_Python公开课 - 爬虫识别滑动验证码

最新推荐文章于 2024-06-24 16:10:32 发布

weixin_39585886

最新推荐文章于 2024-06-24 16:10:32 发布

阅读量209

点赞数

文章标签： python爬虫滑动验证码

前言

做爬虫碰到验证码是家常便饭，现在Geetest作为作为一个专业的验证码服务提供商，为不少网站提供了用户行为的验证服务。

其中常见的就是滑动验证，也就是说用户必须手动将滑块拉到对应的图片缺口上。

作为爬虫开发者碰到这类验证码，该如何处理呢，本篇将介绍的一种，通过Selenium模拟用户滑动解锁方法。

我们以得意网登录页面为例阐述

分析网页信息

得意网的极验证是登录弹框的方式出现的，当输入用户名和密码后，点击登录将弹出极验证

弹出极验证后，我们对图片元素进行分析，发现下载了三张图片，两个.webp图像文件为一个完整图和一个有缺口的图，而.png图像文件则是滑块，但是不管是完整图合适含缺口的图，都是乱序的。那么接下来，我们需要将图片还原成正常显示的。

还原并合成图片

分析图片元素的样式信息，我们可以看到图片是合成的，也就是说你只保存所有地址的图片是不行的。它是通过background-position的方法进行合成的。

获取原始图片的及位置信息：

def get_image_and_location():

options = Options()

options.binary_location = '/usr/bin/google-chrome'

options.add_argument('--headless')

options.add_argument('--disable-gpu')

options.add_argument('--no-sandbox')

driver = webdriver.Chrome(executable_path='./chromedriver', chrome_options=options)

driver.implicitly_wait(5)

driver.get("http://www.deyi.com/account.php?mod=login")

content = driver.find_elements_by_class_name('gt_cut_bg_slice')

cut_loc_list = []

img_url = None

for i in content:

style = i.get_attribute("style")

import re

img_url, x, y = re.findall('url\(\"(.*)\"\); background-position: (.*)px (.*)px;', style)[0]

cut_loc_list.append({'x':int(x), 'y': int(y)})

import requests

r = requests.get(img_url)

with open('cut.webp', 'wb') as f:

f.write(r.content)

from PIL import Image

im = Image.open("cut.webp")

rgb_im = im.convert('RGB')

rgb_im.save('cut.jpg')

content = driver.find_elements_by_class_name('gt_cut_fullbg_slice')

full_loc_list = []

img_url = None

for i in content:

style = i.get_attribute("style")

import re

img_url, x, y = re.findall('url\(\"(.*)\"\); background-position: (.*)px (.*)px;', style)[0]

full_loc_list.append({'x':int(x), 'y': int(y)})

import requests

r = requests.get(img_url)

with open('full.webp', 'wb') as f:

f.write(r.content)

from PIL import Image

im = Image.open("full.webp")

rgb_im = im.convert('RGB')

rgb_im.save('full.jpg')

driver.quit()

return ('cut.jpg', cut_loc_list, 'full.jpg', full_loc_list)

还原合成图片的代码如下：

def get_merge_image(filename, location_list):

from PIL import Image as image

im = image.open(filename)

new_im = image.new('RGB', (260,116))

im_list_upper=[]

im_list_down=[]

for location in location_list:

if location['y']==-58:

im_list_upper.append(im.crop((abs(location['x']),58,abs(location['x'])+10,166)))

if location['y']==0:

im_list_down.append(im.crop((abs(location['x']),0,abs(location['x'])+10,58)))

x_offset = 0

for im in im_list_upper:

new_im.paste(im, (x_offset,0))

x_offset += im.size[0]

x_offset = 0

for im in im_list_down:

new_im.paste(im, (x_offset,58))

x_offset += im.size[0]

new_im.save(filename)

return new_im

经过处理后，可以正常还原出来两张图片full.jpg, cut.jpg

计算缺口位置

在得到了两张图片后，我们需要对他们进行比较，来计算缺口的位置，以便于移动滑块到指定的位置

思路简单点来说，就是设定一个阈值，对每个像素进行比较，找到像素不同的那个点

考虑到滑块的像素为60，所以我们可以设置起始位置为60

方法代码如下：

def get_distance(image1,image2):

left=60

threhold=70

for i in range(left,image1.size[0]):

for j in range(image1.size[1]):

rgb1=image1.load()[i,j]

rgb2=image2.load()[i,j]

res1=abs(rgb1[0]-rgb2[0])

res2=abs(rgb1[1]-rgb2[1])

res3=abs(rgb1[2]-rgb2[2])

if not (res1 < threhold and res2 < threhold and res3 < threhold):

return i

return left

划重点- 核心逻辑之移动滑块算法

在获得缺口位置后，只需要移动滑块即可，但是这里极验证对轨迹数据也进行了校验，换句话说，如果轨迹模拟被识别出是程序控制的，极验证也无法通过，这种情况会出提示信息“怪兽吃了拼图再来一次”

经过尝试，这里给出一个简单可靠的算法

slider=driver.find_element_by_class_name("gt_slider_knob")

from selenium.webdriver import ActionChains

ActionChains(driver).click_and_hold(slider).perform()

left = dis-6

while left>0:

import random

x = min(random.randint(20, 30), left)

print x

ActionChains(driver).move_by_offset(xoffset=x, yoffset=-1).perform()

left -= x

time.sleep(random.random() + 0.15)

ActionChains(driver).release().perform()

至此大功搞成，可以正常实现自动登录或者注册了。

总结

极验证的滑动验证具体还是需要先了解一下页面情况，具体情况具体分析。其中的难点还是滑块的拖动轨迹处理，处理不好，一切白费。