Python爬取51jobs之selenium(2)

最新推荐文章于 2024-01-15 10:21:48 发布

就问一下

最新推荐文章于 2024-01-15 10:21:48 发布

阅读量712

点赞数 1

分类专栏： Python进阶学习文章标签： Python selenium

本文链接：https://blog.csdn.net/qq_41472529/article/details/87266620

版权

Python进阶学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

No 1,获取目标网站

开始前你首先要了解一点关于前端的知识和selenium的基本操作

首先，使用selenium打开网站

class FindJobs(object):
    
    def __init__(self):
        self.driver = webdriver.Chrome()
        self.url = 'https://mkt.51job.com/tg/sem/pz_2018.html?from=baidupz'
        self.driver.get(self.url)
   

if __name__ == "__main__":
    find_jobs = FindJobs()

结果如下：

NO 2，获取元素

2.1获取输入框元素

进入前程无忧网站，按F12,然后将鼠标放在输入框内，右键鼠标，点击检查

鼠标放到蓝色阴影部分，右键

点击copy xpath，然后在pyhthon编辑器粘贴，内容如下

// *[ @ id = "kwdselectid"]

利用代码实现该功能：

class FindJobs(object):
    
    def __init__(self):    
        ....
    
    def find_position(self):
        self.driver.find_element_by_xpath("//*[@id='kwdselectid']").send_keys("软件测试工程师")  # 职位

值得一说的是：

selenium有很多种元素定位方法，上面这种用xpath定位方法简单鼠标点击复制然后到代码中，一般都能实现(内嵌框架需转到框架在用xpath定位)。

2.2获取求职地点

实现功能：利用selenium在求职地点输入杭州

按照上面知识，先获取工作地点的xpath，

//*[@id="work_position_input"]

然后依次获取下列元素xpath

代码实现：

    def find_position(self):
        self.driver.find_element_by_xpath("//*[@id='kwdselectid']").send_keys("软件测试工程师")  # 职位
        # 选择城市
        self.driver.find_element_by_xpath("//*[@id='work_position_input']").click()
        
        
        self.driver.find_element_by_xpath("//*[@id='work_position_click_center_left_each_220200']").click()
        self.driver.find_element_by_xpath("//*[@id='work_position_click_center_right_list_category_220200_080200']").click()
        self.driver.find_element_by_xpath("//*[@id='work_position_click_bottom_save']").click()
        # 点击查询
        self.driver.find_element_by_xpath("/html/body/div[1]/div[2]/div/div/div/button").click()    # 点击查询按钮

按道理这样就可以实现功能，很可惜，用selenium跑起来的并没有实现

继续分析，刚进入选择城市时，各标签情况

点击HI这块地方后，

也就是说点击后class属性变化了，在python的selenium中这样实现

        js = "var x = document.getElementById('work_position_click_center_left_each_000000');" \
             "x.style.color='red';" \
             "x.className='';" \
             "y = document.getElementById('work_position_click_center_left_each_220200');" \
             "y.className='on'"
        self.driver.execute_script(js)

点击查询后，进入以下界面

如何获取该页面所有的网站呢，selenium有方法定位一组元素

webpages = self.driver.find_elements_by_xpath("//*[@id='resultList']/div/p/span/a")

和

webpages = self.driver.find_element_by_xpath("//*[@id='resultList']/div/p/span/a")

是不一样的，前者是定位了一组元素保存在一个列表内，后者只定位了一个元素

网站在元素的href属性上，获取所有网站信息代码

    def get_webpage(self):
        webpages = self.driver.find_elements_by_xpath("//*[@id='resultList']/div/p/span/a")
        for webpage in webpages:
            webpage = webpage.get_attribute("href")
            print(webpage)

NO 3 ，完整代码

from selenium import webdriver
import time


class FindJobs(object):
    
    def __init__(self):
        self.driver = webdriver.Chrome()
        self.driver.maximize_window()       # 屏幕最大化
        self.driver.implicitly_wait(2)      # 隐式等待
        self.url = 'https://mkt.51job.com/tg/sem/pz_2018.html?from=baidupz'
        self.driver.get(self.url)
        self.webpages_list = []
        # self.cookies = self.driver.get_cookies()    删除cookie值
        # print(f"main: cookies = {self.cookies}")
        # self.driver.delete_all_cookies()

   
    def find_position(self):
        self.driver.find_element_by_xpath("//*[@id='kwdselectid']").send_keys("软件测试工程师")  # 职位
        # 选择城市
        self.driver.find_element_by_xpath("//*[@id='work_position_input']").click()
        js = "var x = document.getElementById('work_position_click_center_left_each_000000');" \
             "x.style.color='red';" \
             "x.className='';" \
             "y = document.getElementById('work_position_click_center_left_each_220200');" \
             "y.className='on'"
        self.driver.execute_script(js)
        time.sleep(0.5)       # JS执行需要时间，不停顿的话有时找不到元素
        self.driver.find_element_by_xpath("//*[@id='work_position_click_center_left_each_220200']").click()
        self.driver.find_element_by_xpath("//*[@id='work_position_click_center_right_list_category_220200_080200']").click()
        self.driver.find_element_by_xpath("//*[@id='work_position_click_bottom_save']").click()
        # 点击查询
        self.driver.find_element_by_xpath("/html/body/div[1]/div[2]/div/div/div/button").click()    # 点击查询按钮
    
    def find_position_2(self):
        self.driver.find_element_by_xpath("/html/body/div[2]/div[1]/div[16]/span").click()
        self.driver.find_element_by_xpath("//*[@id='filter_providesalary']/ul/li[7]").click()   # 月薪范围
        self.driver.find_element_by_xpath("//*[@id='filter_workyear']/ul/li[3]/a").click()      # 工作年限
        self.driver.find_element_by_xpath("//*[@id='filter_degreefrom']/ul/li[5]/a").click()    # 学历要求

    def get_webpage(self):
        webpages = self.driver.find_elements_by_xpath("//*[@id='resultList']/div/p/span/a")
        for webpage in webpages:
            webpage = webpage.get_attribute("href")
            self.webpages_list.append(webpage)
        print(self.webpages_list)   # 只获取了第一页网站信息
        self.driver.close()

if __name__ == "__main__":
    find_jobs = FindJobs()
    find_jobs.find_position()
    find_jobs.find_position_2()
    find_jobs.get_webpage()

Selenium Tips

1，有时候我们定位一个元素，定位器没有问题，但一直定位不了,这时候就要检查这个元素是否在一个frame中，selenium webdriver提供了一个switch_to.frame方法，有可能嵌套的不是框架，而是窗口，还有针对窗口的方法switch_to.window

driver.switch_to.window("windowName")

2,selenium拖动滚动条

ex:打开百度贴吧，然后拖动滚动条到左侧“地区”

3，多窗口切换

有些网站点击是新开一个网页，原先网页依旧存在，这在我们selenium自动化中是占用资源的

先看看窗口切换的脚本呢代码

关闭不使用窗口，释放资源

4，无界面浏览

5，截屏处理

就问一下

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python爬取51jobs之selenium(2)

目录No 1,获取目标网站NO 2，获取元素2.1获取输入框元素2.2获取求职地点NO 3，完整代码Selenium TipsNo 1,获取目标网站开始前你首先要了解一点关于前端的知识和selenium的基本操作首先，使用selenium打开网站class FindJobs(object): def __init__(self): ...
复制链接

扫一扫

专栏目录