1.操作cookies
获取登录前和登录后的cookies,找出差异的cookies(登录和未登录的区别,把重要的提取出来),通过webdriver提供的方法删除和添加cookies,刷新页面或get登录后的url,通过cookies绕过登录。
webdriver提供操作cookies的方法:
- driver.get_cookies() 获取所有cookie
- driver.get_cookie(name) 获取key为name的cookie
- driver.add_cookie(cookie_dict) 添加cookie到浏览器
- driver.delete_cookie(name) 删除key为name的cookie
- driver.delete_all_cookies() 删除所有cookies
2.验证码识别技术
获取图文验证码顶点坐标,获取屏幕截图,通过顶点坐标获取验证码截图,通过-tesseract-OCR工具识别图文验证码,正则表达式处理特殊字符。
tesseract获取链接:https://pan.baidu.com/s/1ToZMynsl1Ev8XH9mGOcTsw 提取码:5z5m
tesseract安装流程:https://jingyan.baidu.com/article/219f4bf788addfde442d38fe.html
#!/usr/bin/python3
# coding=utf-8
from selenium import webdriver
from PIL import Image
import pytesseract
import re
# 获取元素顶点坐标
def get_location(css, multiple=1.25):
location_element = driver.find_element_by_xpath(css).location # 获取元素顶点坐标
size_element = driver.find_element_by_xpath(css).size # 获取元素的宽和高
location = (int(location_element['x'])*multiple, int(location_element['y'])*multiple, # 获取元素像素坐标
(int(location_element['x'])+size_element['width'])*multiple,
(int(location_element['y'])+size_element['height'])*multiple)
return location
# 截图并识别二维码
def discern_verification_code(location):
screenshot_name = 'screenshot_windows.png'
code_name = 'code.png'
driver.save_screenshot(screenshot_name) # 获取屏幕截图
img = Image.open(screenshot_name)
img = img.crop(location) # 获取验证码截图
img.save(code_name)
img = Image.open(code_name)
codes = pytesseract.image_to_string(img) # 识别图文验证码
code = ''
for i in codes.strip(): # 正则表达式去除特殊字符
pattern = re.compile(r'[a-zA-Z0-9]')
m = pattern.search(i)
if m != None:
code += i
return code
driver = webdriver.Chrome()
driver.implicitly_wait(5)
driver.get('')
driver.maximize_window()
code_str = discern_verification_code(get_location(css("")))
print(code_str)