5--selenium模块

朝游碧海暮苍梧

已于 2022-02-19 20:46:00 修改

阅读量780

点赞数 1

分类专栏：爬虫文章标签： selenium chrome 前端爬虫

于 2022-02-19 20:30:40 首次发布

本文链接：https://blog.csdn.net/qq_42530422/article/details/123023371

版权

爬虫专栏收录该内容

6 篇文章 2 订阅

订阅专栏

环境搭建：
下载浏览器驱动
https://registry.npmmirror.com/binary.html?path=chromedriver/

将下载好的文件解压后放到相应的文件中，例如
在这里插入图片描述
将解压后的文件放入到相应的位置

from selenium.webdriver import Chrome
web = Chrome() #面向谷歌浏览器建立一个对象
web.get(“http://www.baidu.com”)

1、对招聘网站进行职位搜索

操作网站： https://www.lagou.com/

打开网站

web = Chrome()
web.get("http://lagou.com")

在这里插入图片描述
点击弹窗中的x，进入主页面

#找到某个元素，点击它
# el = web.find_element_by_xpath('//*[@id="changeCityBox"]/p[1]/a')  #旧版本的，新版本如下
el = web.find_element(By.XPATH, '//*[@id="changeCityBox"]/p[1]/a')
el.click()

在搜索框中输入python,按下回车进行搜索

time.sleep(2)
# 找到输入框，输入python ==> 输入回车/点击搜索按钮
web.find_element(By.XPATH, '//*[@id="search_input"]').send_keys("python", Keys.ENTER)

在这里插入图片描述

代码：

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
import time
web = Chrome()
web.get("http://lagou.com")

#找到某个元素，点击它
# el = web.find_element_by_xpath('//*[@id="changeCityBox"]/p[1]/a')  #旧版本的，新版本如下
el = web.find_element(By.XPATH, '//*[@id="changeCityBox"]/p[1]/a')
el.click()

time.sleep(2)
# 找到输入框，输入python ==> 输入回车/点击搜索按钮
web.find_element(By.XPATH, '//*[@id="search_input"]').send_keys("python", Keys.ENTER)

# 查找存放数据的位置，进行数据提取
# 找到页面中存放数据的所有li
li_list = web.find_elements(By.XPATH, '//*[@id="jobList"]/div[1]/div')
for li in li_list:
    job_name = li.find_element(By.TAG_NAME, "a").text
    job_price = li.find_element(By.XPATH, './div/div/div[2]/span').text
    print(job_price, job_name)
    time.sleep(1)

2、招聘网站窗口切换

from selenium.webdriver import Chrome
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
import time
web = Chrome()
web.get("http://lagou.com")
#关闭弹窗
web.find_element(By.XPATH, '//*[@id="cboxClose"]').click()
time.sleep(1)

#搜索栏搜索python,并进行回车操作
web.find_element(By.XPATH, '//*[@id="search_input"]').send_keys("python", Keys.ENTER)
time.sleep(1)

#点击进入当前岗位，会打开一个新的页面
web.find_element(By.XPATH, '//*[@id="jobList"]/div[1]/div[1]/div[1]/div[1]/div[1]/a').click()

#如何进入到新窗口中进行提取
web.switch_to.window(web.window_handles[-1])

#在新窗口中提取内容
job_detil = web.find_element(By.XPATH, '//*[@id="job_detail"]/dd[2]/div/p').text
print(job_detil)

#关闭子窗口
web.close()
#变更selenium的窗口视角，回到原来的窗口中
web.switch_to.window(web.window_handles[0])

3、让浏览器在后台操作

#准备好参数配置
opt = Options()
opt.add_argument("–headless") #无头操作
opt.add_argument("–disable-gpu") #不让gpu渲染
web = Chrome(options=opt)

代码：

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
#引入下拉选择的包
from selenium.webdriver.support.select import Select
import time
#浏览器在后台操作
from selenium.webdriver.chrome.options import Options

#准备好参数配置
opt = Options()
opt.add_argument("--headless")  #无头操作
opt.add_argument("--disable-gpu")  #不让gpu渲染


web = Chrome(options=opt)
web.get("https://www.endata.com.cn/BoxOffice/BO/Year/index.html")

#定位到下拉页表
sel_el = web.find_element(By.XPATH, '//*[@id="OptionDate"]')
#对元素进行包装，包装成下拉菜单
sel = Select(sel_el)

#让浏览器进行调整选项
for i in range(len(sel.options)):
    sel.select_by_index(i)   #按照索引进行切换
    time.sleep(2)
    table = web.find_element(By.XPATH, '//*[@id="TableList"]/table')
    print(table.text)
    print("===================================================")
print("运行完毕")
web.close()

# 如何拿到页面代码Elements(经过数据加载以及js执行之后的结果的html内容)
print(web.page_source)

4、验证码操作

操作网站： http://www.chaojiying.com/user/login/

在这里插入图片描述
使用超级鹰进行对验证码的解析

对验证码进行解析

img = web.find_element(By.XPATH, ‘/html/body/div[3]/div/div[3]/div[1]/form/div/img’).screenshot_as_png
chaojiying = Chaojiying_Client(‘登录账户名’, ‘登录密码’, ‘929057’) #最后一个是软件ID号

代码：

from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from chaojiying import Chaojiying_Client
import time
web = Chrome()
web.get("http://www.chaojiying.com/user/login/")

#处理验证码
img = web.find_element(By.XPATH, '/html/body/div[3]/div/div[3]/div[1]/form/div/img').screenshot_as_png
chaojiying = Chaojiying_Client('登录账户名', '登录密码', '929057')
dic = chaojiying.PostPic(img, 1902)  #1902
verify_code = dic['pic_str']

#向页面中填入用户名，密码，验证码
web.find_element(By.XPATH, '/html/body/div[3]/div/div[3]/div[1]/form/p[1]/input').send_keys("登录账户名")
web.find_element(By.XPATH, '/html/body/div[3]/div/div[3]/div[1]/form/p[2]/input').send_keys("登录密码")
web.find_element(By.XPATH, '/html/body/div[3]/div/div[3]/div[1]/form/p[3]/input').send_keys(verify_code)
time.sleep(5)
#点击登录
web.find_element(By.XPATH, '/html/body/div[3]/div/div[3]/div[1]/form/p[4]/input').click()

其中dic = chaojiying.PostPic(img, 1902) #1902是根据下面的内容决定的，我们验证的是四位数字字母组合的验证码，开发文档就为这种，其它验证码，根据开发手册说明决定

在这里插入图片描述

朝游碧海暮苍梧

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
5--selenium模块

环境搭建：下载浏览器驱动https://registry.npmmirror.com/binary.html?path=chromedriver/将下载好的文件解压后放到相应的文件中，例如将解压后的文件放入到相应的位置from selenium.webdriver import Chromeweb = Chrome() #面向谷歌浏览器建立一个对象web.get(“http://www.baidu.com”)对招聘网站进行职位搜索操作网站： https://www.lag.
复制链接

扫一扫

专栏目录