爬虫学习（四）

最新推荐文章于 2024-09-17 23:15:58 发布

Floating Star

最新推荐文章于 2024-09-17 23:15:58 发布

阅读量309

点赞数 5

文章标签： python 开发语言

本文链接：https://blog.csdn.net/qq_69206769/article/details/137056268

版权

问题：selenium模块和爬虫之间具有怎样的关联？

- 便捷的获得网站中动态加载的数据

- 便捷实现模拟登录

什么是selenium模块？

- 基于浏览器自动化的一个模块

selenium使用流程：

- 环境安装：pip install selenium

- 下载一个浏览器的驱动程序 http://chromedriver.storage.googleapis.com/index.html

Chrome for Testing availability

- 实例化一个浏览器程序

- 编写基于浏览器自动化的操作代码

- 发起请求：get(url)

- 标签定位：find_element系列的方法

- 标签交互： send_keys('xxx')

- 执行js程序： excute_script('jsCode')

- 前进，后退：forward(),back()

- selenium处理iframe

- 如果定位的标签存在于iframe标签之中，则需要使用switch_to.frame('id')

- 动作链（拖动）：from selenium.webdriver import ActionChains

- 实例化一个动作链对象：action = ActionChains(driver)

- click_and_hold(div) : 长按且点击操作

- move_by_offset(x,y)

- perform()让动作链立即执行

- action.release()释放动作链对象

from time import sleep

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
#导入动作链对应的类
from selenium.webdriver import ActionChains

driver = webdriver.Chrome(service=Service('./chromedriver.exe'))

driver.get('https://www.runoob.com/try/try.php?filename=jqueryui-api-droppable')

#如果定位的标签是存在于iframe标签之中的则必须通过如下操作再进行标签定位
#切换浏览器标签定位的作用域
driver.switch_to.frame('iframeResult')
div = driver.find_element('id','draggable')

#动作链
action = ActionChains(driver)
#点击长按指定的标签
action.click_and_hold(div)

for i in range(5):
    #perform()立即执行动作链操作
    #move_by_offset(x,y):x水平方向，y竖直方向
    action.move_by_offset(50,0).perform()
    sleep(0.3)
action.release()

print(div)

input()

- selenium无头浏览器和检测规避

from time import sleep
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
#实现无可视化界面和规避检测
from selenium.webdriver.chrome.options import Options


#实例化一个对象，用来控制chrome以无界面模式打开
chrome_options = Options()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--disable-gpu')
#实现规避检测
chrome_options.add_experimental_option("excludeSwitches",["enable-automation"])


#如何实现让selenium规避被检测到的风险
driver = webdriver.Chrome(options=chrome_options,service=Service('./chromedriver.exe'))
#无可视化界面(无头浏览器) phantomJs

driver.get('https://www.baidu.com')

print(driver.page_source)

input()

12306模拟登录

- 超级鹰的使用 Python语言Demo下载-超级鹰验证码识别API接口

- bilibli模拟登录编码流程

- 使用selenium打开登录页面

- 对当前selenium打开的这张页面进行截图

- 对当前图片局部区域（验证码）进行裁剪

- 好处：将验证码图片和模拟登录进行一一对应

- 使用超级鹰识别验证码坐标

from time import sleep
from PIL import Image
import location as location
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
import chaojiying
from selenium.webdriver import ActionChains
import selenium.webdriver.chrome.options

options=webdriver.ChromeOptions()
options.add_argument('start-maximized')
driver = webdriver.Chrome(options=options,service=Service('./chromedriver.exe'))
driver.get('https://t.bilibili.com/')
sleep(2)
btn = driver.find_element(By.CLASS_NAME,'bili-dyn-login-register__login-btn')
btn.click()
sleep(2)
text = driver.find_element(By.CSS_SELECTOR,'body > div.bili-mini-mask > div > div.bili-mini-content > div.bili-mini-login-wrapper > div.bili-mini-password-wrapper > div.bili-mini-account > input[type=text]')
text.send_keys('13551606341')
sleep(2)
pwd = driver.find_element(By.CSS_SELECTOR,'body > div.bili-mini-mask > div > div.bili-mini-content > div.bili-mini-login-wrapper > div.bili-mini-password-wrapper > div.bili-mini-password > div.left > input[type=password]')
pwd.send_keys('wshjj5244273..')
sleep(2)
btn_2 = driver.find_element(By.CSS_SELECTOR,'body > div.bili-mini-mask > div > div.bili-mini-content > div.bili-mini-login-wrapper > div.bili-mini-login-register-wrapper > div.universal-btn.login-btn')
btn_2.click()
sleep(2)
driver.save_screenshot('./yanzhengma.png')
sleep(2)
#确定验证码图片对应左上角和右下角的坐标

code_img_ele = driver.find_element(By.XPATH,'/html/body/div[5]/div[2]/div[6]/div/div')
print(code_img_ele)
location = code_img_ele.location #验证码图片左上角坐标x,y
size = code_img_ele.size #验证码标签对应的长和宽
#左上角和右下角坐标
print(location)
print(size)
rangle = (
    1100,385,1440,810
)
print(rangle)
#至此验证码图片区域就确定下来了
i = Image.open('yanzhengma.png')
code_img_name = 'code.png'
frame = i.crop(rangle)
frame.save(code_img_name)

#将图片提交给超级鹰识别
chaojiying = chaojiying.Chaojiying_Client('SilentHzz', 'wshjj5244273..', '958746')	#用户中心>>软件ID 生成一个替换 96001
im = open('code.png', 'rb').read()

list = chaojiying.PostPic(im, 9004)['pic_str']
print(list)
lo_list = list.split('|')
#遍历列表，使用动作链对每一个列表元素对应的x，y指定的位置进行点击操作
for lo in lo_list:
    x  = lo.split(',')[0]
    y = lo.split(',')[-1]
    print(x,y)
    ActionChains(driver).move_by_offset(1100+int(x),385+int(y)).click().perform()
    sleep(1)
    ActionChains(driver).move_by_offset(-1100 - int(x), -385 - int(y)).perform()
    sleep(1)

driver.find_element(By.CSS_SELECTOR,'body > div.geetest_panel.geetest_wind > div.geetest_panel_box.geetest_panelshowclick > div.geetest_panel_next > div > div > div.geetest_panel > a > div').click()

input()