前言
一直想学习自动化测试,但是都没行动,业余时间学习零零碎碎并记录20210417。
【Selenium项目实战】包含以下内容:
- 项目环境搭建:安装JDK、mysql、Tomcat和测试系统
- 需求分析和用例设计
- 项目架构设计
- 完成项目基本测试
- 解决验证码问题
- 完成项目测试用例
解决验证码问题(使用pytesseract模块和pil模块解决)
先了解下这两个模块
- pytesseract模块是可以把一个图片自动保存成一个字符串
- pil模块是专门图片处理的
可以使用pytesseract模块和pll模块解决不太复杂的验证码问题,实现步骤如下:
Mac安装pytesseract模块:sudo pip3 install pytesseract
Mac安装pil(Pillow)模块:sudo pip3 install pil 或者sudo pip3 installPillow
1、安装pytesseract模块
- cd /usr/local/bin
- sudo pip3 install pytesseract
提示更新,可选择性更新pip:sudo pip3 install --upgrade pip
2、安装pil(Pillow)模块
- cd /usr/local/bin
- sudo pip3 install pil
比较麻烦,一直报错,后面发现其实现在已经用Pillow代替了PIL,在使用方面没有不同,API都是相同的。于是直接安装pillow:sudo pip3 install Pillow
ffdeMacBook-Pro bin % sudo pip3 install Pillow
The directory '/Users/zhengxiaofang/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
The directory '/Users/zhengxiaofang/Library/Caches/pip' or its parent directory is not owned by the current user and caching wheels has been disabled. check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
Requirement already satisfied: Pillow in /Library/Python/3.7/site-packages (8.2.0)
You are using pip version 19.0.3, however version 21.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
3、安装Tesseract-OCR(很重要)
(1)安装有四种方式:
- brew install --with-training-tools tesseract //安装tesseract, 同时安装训练工具
- brew install --all-languages tesseract //安装tesseract,同时它还会安装所有语言
- brew install --all-languages --with-training-tools tesseract //安装附加组件
- brew install tesseract //安装tesseract,但是不安装训练工具,我选择这种方式进行安装
(2)安装完tesseract后,进行测试:tesseract -v
ffdeMacBook-Pro bin % tesseract -v
tesseract 4.1.1
leptonica-1.80.0
libgif 5.2.1 : libjpeg 9d : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.1.0 : libopenjp2 2.3.1
Found AVX512BW
Found AVX512F
Found AVX2
Found AVX
Found FMA
Found SSE
(3)下载语言库
这里可以根据自己的需求来下载所需要的语言库,例如chi_sim.traineddata为简体中文:
下载地址:https://github.com/tesseract-ocr/tessdata
将chi_sim.traineddata下载后,需要将它放在/usr/local/Cellar/tesseract/4.1.1/share/tessdata目录下。
ffdembp ~ % cd Downloads
ffdembp Downloads % cp -r chi_sim.traineddata /usr/local/Cellar/tesseract/4.1.1/share/tessdata
4、编写验证代码进行测试。
4.1 写代码前,先理一下页面验证码解决思路
- 截图整个页面
- 获得验证码坐标数据
- 根据坐标数据抠图
- 使用pytesseract模块进行验证
4.2测试获取登录链接http://localhost:8080/jpress/user/register的验证码,并截取。示例图片如下:
4.3 验证码识别。拿了3个随机验证码截图,进行识别以下3个图片:
代码如下:
from selenium import webdriver
import pytesseract
from PIL import Image
import os
from time import sleep, strftime, localtime,time
# 解决简单验证码问题
def test_koutu():
# 先把验证码图片扣取下来
# 打开谷歌
browser = webdriver.Chrome()
browser.get('http://localhost:8080/jpress/user/register')
sleep(3)
# browser.maximize_window()
# 获取http://localhost:8080/jpress/user/register图片
# 获取验证码图片方法一:最简单,但是名称乱,看着不舒服
# picture_name1 = str(time()) + '.png'
# browser.save_screenshot(picture_name1)
# 获取验证码图片方法二
# path = os.path.abspath('screenshots') # 这种方法找到图片存放路径,mian.py同级下的screenshots文件夹
# print(path)
# 获取验证码图片方法三:想把截图放自定义里,方便阅读,个人习惯仅供参考
path = '/Users/zhengxiaofang/PycharmProjects_py3/Selenium_project/'
path = os.path.join(path, 'screenshots')
if not os.path.exists(path): # 如果找不到/Users/zhengxiaofang/PycharmProjects_py3/Selenium_project/screenshots就创建
os.mkdir(path)
# 设置要截图的文件名:自定义名称+系统时间命名+后缀.png
picture_name1 = '验证码未切' + strftime('%Y_%m_%d_%H_%M_%S', localtime()) + '.png'
path = os.path.join(path, picture_name1) #截图1的路径+名称
browser.get_screenshot_as_file(path) # 截图保存
print(path)
# 找到验证码坐标,利用pil模块中的抠图方法crop,把图片抠出来,保存为picture_name2
# yzm_img = browser.find_element_by_xpath('//*[@id="captchaimg"]')
yzm_img = browser.find_element_by_id("captchaimg")
print(yzm_img.location) # 打印左上角坐标{'x': 547, 'y': 447}
left = yzm_img.location['x']
top = yzm_img.location['y']
width =yzm_img.size['width'] + left
height =yzm_img.size['height'] + top
# 打开截图1
im = Image.open(path)
# 把图片抠出来,保存为picture_name2
img = im.crop((left,top,width,height)) # 注意这边坐标要加()
picture_name2 = '验证码切图' + strftime('%Y_%m_%d_%H_%M_%S', localtime()) + '.png'
print(picture_name2)
img.save(picture_name2) # 这种方式图片默认是保存到main.py当前路径下
browser.close()
#由于这种方式不能获取这种复杂的验证码 所以就只能自己传了一个图片来获取了
def test_yzm():
path = os.path.abspath('screenshots')
print(path)
# image1 = Image.open(path + '/' + 'yzm1.png')
# image1 = Image.open(path + '/' + 'yzm2.png')
image1 = Image.open(path + '/' + 'yzm3.png')
str = pytesseract.image_to_string(image1)
print(str)
4.4 验证码识别结论:只有yzm3.png有识别出来,如下图所示:
这样是不是太菜了,那么,下一篇文章就利用第三方库来获取复杂验证码的方法。