2021-09-18-CR-013 Python 爬虫，使用selenium做自动化测试

最新推荐文章于 2023-04-06 14:08:55 发布

Amoor123

最新推荐文章于 2023-04-06 14:08:55 发布

阅读量336

点赞数 3

分类专栏： Python爬虫文章标签： python selenium 爬虫

本文链接：https://blog.csdn.net/sabian2/article/details/120194947

版权

Python爬虫专栏收录该内容

26 篇文章 2 订阅

订阅专栏

selenium使用
安装selenium可使用pip install selenium。

配置浏览器

先安装Google浏览器和相应的驱动
Google浏览器百度搜索下载
驱动在这里，下载对应对的浏览器版本，差不多的版本
http://chromedriver.storage.googleapis.com/index.html
驱动下载好以后拷贝到浏览器安装目录（chrome.exe所在位置）和python安装目录（python.exe所在位置）即可
如果是安装的浏览器，一般不需要设置浏览器的环境变量，如果是解压直接用的，需要在path添加chrome.exe 所在的目录
在这里插入图片描述
没有放驱动地址的话就会出错，我的Python是anaconda安装的，所以是在Pycharm里面打开了anaconda的文件夹找到的位置。
运行下测试正常的就是下面的样子

from selenium import webdriver
browser=webdriver.Chrome()

这里给Chrome()里增加一个executable_path='路径’也可以实现这个添加驱动的操作
在这里插入图片描述
有些版本的谷歌浏览器会自动更新到最新版，建议更新后下载驱动或者在浏览器禁用更新

selenium操作页面

打开一个页面

browser.get()
这里还是用一个经常访问的网页

browser.get('https://pic.netbian.com/')

查找元素

find_element_by_
在这里插入图片描述
可以用xpath，也可以用类名，id，css选择器，内部文字等做筛选，如果element是单数，那么只会选择第一个，如果是
复数，获得的是个列表。
find_element()使用一个By类型的参数来指定筛选的类型，此时需要导入By这个库。

在这个网页我使用xpath选择搜索框，xpath可以直接在浏览器里检查相关元素后复制获取
在这里插入图片描述

browser.find_element_by_xpath('//*[@id="schform"]/p/input')

模拟输入文字

xx=browser.find_element_by_xpath('//*[@id="schform"]/p/input')

xx.send_keys('美女')

运行结果是这样的
在这里插入图片描述
使用

xx.clear()

可以清除输入框已有的文字

模拟点击

在这里插入图片描述

search=browser.find_element_by_xpath('//*[@id="schform"]/input[1]')
search.click()

在这里插入图片描述
浏览器会进入结果页面

模拟滚动

通过js代码来模拟滚动

browser.execute_script('window.scrollTo(0,document.body.scrollHeight)')

在这里插入图片描述

获取小图片的文件网址

对着小图片获取xpath只能读取到这一条，分析网页结构，上两级的li是最后并列的元素

#先获取小图片的后续链接
#单张图片检查后的xpath-->//*[@id="main"]/div[2]/ul/li[1]/a/img
img=browser.find_elements_by_xpath('//*[@id="main"]/div[2]/ul/li/a/img')
#可以先 打印看看对不对
srclist=[]
for  i  in img:
    print(i)
    src=i.get_attribute('src')
    srclist.append(src)
print(srclist)

取到的都是完整地址
在这里插入图片描述

屏幕截图

这里需要有安装pillow 库，导入的时候是PIL

from PIL import Image
screen=browser.get_screenshot_as_png()
screen=Image.open(BytesIO(screen))
screen.save('123.png')

截取的是屏幕的尺寸，跟浏览器窗口大小也有关系
在这里插入图片描述

在这里插入图片描述

截取小图片

小图片只能在屏幕截图的范围里才能被裁剪完整，整个页面的获取需要配合浏览器模拟滚动
location获取的是元素的左上角，size获取宽和高，然后crop()方法以顺时针读取（左上右下）的元组，获取整个元素的范围，截取出来


from PIL import Image
screen=browser.get_screenshot_as_png()
screen=Image.open(BytesIO(screen))


for index,i in enumerate(img):
    location = i.location
    size=i.size
    top,bottom,left,right=location['y'],location['y']+size['height'],location['x'],location['x']+size['width']
    pic=screen.crop((left,top,right,bottom))
    pic.save('vm7/'+str(index)+'.png')

截取到的图片文件夹
在这里插入图片描述

开启新窗口

这里打开三张小图片

img=browser.find_elements_by_xpath('//*[@id="main"]/div[2]/ul/li/a/img')
for i in range(3):
    src=img[i].get_attribute('src')
    browser.execute_script(f"window.open('{src}')")

在这里插入图片描述

控制窗口最大化

有些浏览器打开的默认大小是最后关闭的时候的大小，这里想在哪里最大化的时候加上就可以了

browser.maximize_window()

操作cookies

print(browser.get_cookies())

在这里插入图片描述

还可以根据名称获取单个cookie
删除单个
增加一个cookie字典
删除所有cookie
在这里插入图片描述

操作窗口

前面开了三个新窗口，浏览器会停留在最后一个窗口，这里切换到第一个

browser.switch_to.window(browser.window_handles[0])

最后会回到第一个窗口
在这里插入图片描述

配置浏览器

先做options，再把options加到浏览器的参数里

无图模式

options=webdriver.ChromeOptions()
prefs={'profile.managed_default_content_settings.images':2}
options.add_experimental_option('prefs',prefs)


browser = webdriver.Chrome(options=options)
browser.get('https://pic.netbian.com/')

无头模式

无头模式指的是不启动浏览器的方式，在后台运行，这里测试比较慢，比正常浏览器花费的时间多点

options=webdriver.ChromeOptions()
options.add_argument('-headless')

browser = webdriver.Chrome(options=options)
browser.get('https://pic.netbian.com/')

xx=browser.find_element_by_xpath('//*[@id="schform"]/p/input')

xx.send_keys('美女')
search=browser.find_element_by_xpath('//*[@id="schform"]/input[1]')
search.click()
img=browser.find_elements_by_xpath('//*[@id="main"]/div[2]/ul/li/a/img')
for i in range(3):
    src=img[i].get_attribute('src')
    print(src)

在这里插入图片描述

屏蔽自动测试提示消息

最上方的自动测试提示就不显示了

options.add_experimental_option('excludeSwitches', ['enable-automation'])

在这里插入图片描述

屏蔽自动测试的检测

添加一个参数，并对浏览器进行设置webdriver为未定义，这样可以绕过那些检测是不是自动测试的网页了。
这里参照了这篇文章的设置
Selenium被禁止的解决方法

options.add_argument('--disable-blink-features=AutomationControlled')
browser.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
        "source": """
                    Object.defineProperty(navigator, 'webdriver', {
                      get: () => undefined
                    })
                  """
})