分析智联招聘的API接口，进行数据爬取

最新推荐文章于 2025-03-11 14:39:23 发布

dbirder

最新推荐文章于 2025-03-11 14:39:23 发布

阅读量5.1k

点赞数 3

本文链接：https://blog.csdn.net/ZGQ3586/article/details/90461188

版权

分析智联招聘的API接口，进行数据爬取

一丶简介

现在的网站基本上都是前后端分离的，前端的你看到的数据，基本上都不是HTML上的和数据，都是通过后端语言来读取数据库服务器的数据然后动态的加载数据到前端的网页中。

然后自然而然的然后随着ajax技术的出现，前端的语言也可以实现对后端数据库中的数据进行获取，然后就出现了api接口这一说法。简单的说就是通过特定的参数和地址来对某一网站的某个接口进行数据的获取。

一般api接口获取到的数据都是json的，就算不是接送的数据，也是又规律，又秩序的数据。对于这些数据进行分析，那是非常简单的。

这也只是本人的一个小小的看法和简单的理解。

二丶分析

进入到智联招聘的官方网站中，按F12进入到开发者模式中。从数据的加载中可以很轻易的找到三个api接口

第一个API接口

https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity=合肥

参数	作用
输入你要的查询的城市的名称	会使返回的结果有按城市的编码（code）

第二个API接口

https://dict.zhaopin.cn/dict/dictOpenService/getDict?dictNames=region_relation,education,recruitment,education_specialty,industry_relation,careet_status,job_type_parent,job_type_relation

参数值	return—result（code）
region_relation	地区信息
education	学历信息
recruitment	招聘信息（是否统招）
education_specialty	职业类别
industry_relation	行业
careet_status	到岗状态
job_type_parent	职位类别
job_type_relation	职位

第三个API接口

https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId=664&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3

这个API接口的值都是在上面两个接口中获取到的代码，

参数	作用
pageSize	获取的数据的大小
cityId	城市
workExperience	工作经验
education	学历
companyType	公司性质
employmentType	职位类型
jobWelfareTag	工作福利
kw	关键字
kt	值可变，作用暂时不明，参数不能少

三丶数据爬取

现在API接口都已经找到了，就是数据的获取和本地的存储了。

爬取数据的目标

根据输入城市来进行数据的查询和存储，本次数据只查找python的工作岗位

每个职位信息中都有很多的字段信息，为了方便我就只提取几个字段，方法相同

全部代码：

"""
本次的数据爬取只做简单的反爬虫预防策略
"""
import requests
import os
import json

class siper(object):
    def __init__(self):
        self.header={
            "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.110 Safari/537.36",
            "Origin":"https://sou.zhaopin.com",
            "Host":"fe-api.zhaopin.com",
            "Accept-Encoding":"gzip, deflate, br"
        }
        print("职位查询程序开始······")
        # 打开文件
        self.file = "result.json"
        path = os.getcwd()
        pathfile = os.path.join(path,self.file)
        self.fp = open(pathfile,"w",encoding="utf-8")
        self.fp.write("[\n")

    def get_response(self,url):
        return requests.get(url=url,headers = self.header)

    def get_citycode(self,city):
        url = "https://fe-api.zhaopin.com/c/i/city-page/user-city?ipCity={}".format(city)
        response = self.get_response(url)
        result = json.loads(response.text)
        return result['data']['code']

    def parse_data(self,url):
        response = self.get_response(url)
        result = json.loads(response.text)['data']['results']
        items = []
        for i in result:
            item = {}
            item['职位'] = i['jobName']
            item['工资'] = i['salary']
            item['招聘状态'] = i['timeState']
            item['经验要求'] = i['workingExp']['name']
            item['学历要求'] = i['eduLevel']['name']
            items.append(item)
        return items

    def save_data(self,items):
        num = 0
        for i in items:
            num = num + 1
            self.fp.write(json.dumps(i,ensure_ascii=False))
            if num == len(items):
                self.fp.write("\n")
            else:
                self.fp.write(",\n")
            print("%s--%s"%(str(num),str(i)))

    def end(self):
        self.fp.write("]")
        self.fp.close()
        print("职位查询程序结束······")
        print("数据已写入到{}文件中······".format(self.file))

    def main(self):
        try:
            cityname = input("请输入你要查询的城市的名称（市级城市）：")
            city = self.get_citycode(cityname)
            url = "https://fe-api.zhaopin.com/c/i/sou?pageSize=200&cityId={}&workExperience=-1&education=5&companyType=-1&employmentType=-1&jobWelfareTag=-1&kw=python&kt=3".format(
                city)
            items = self.parse_data(url)
            self.save_data(items)
            self.end()
        except Exception as e:
            print("城市输入错误！！！（强制退出程序）")
            print(e)
            exit(0)


if __name__ == '__main__':
    siper = siper()
    siper.main()

执行结果：