使用Python的request库抓取某直聘网站工作岗位数据

页面分析

        首先检查一下浏览器url请求过后的内容是什么,根据响应内容可以知道,这个url请求回来的数据并没有我们想要的,所以可以确定工作数据是异步加载的。

 接下来进行抓包分析,查看ajax请求的数据,可以看见这个joblist的包里面就是异步加载的数据

 对这个url进行请求分析,可以控制分页与每页的数据,page和pageSize

resp = requests.get("https://www.zhipin.com/wapi/zpgeek/search/joblist.json?"
                    "scene=1&"
                    "query=&"
                    "city=101120200"
                    "&experience=&"
                    "payType=&"
                    "partTime=&"
                    "degree=&"
                    "industry=&"
                    "scale=&"
                    "stage=&"
                    "position=100101&"
                    "jobType=&"
                    "salary=&"
                    "multiBusinessDistrict=&"
                    "multiSubway=&"
                    "page=1&"  # 控制页数
                    "pageSize=30"  # 控制条数

 对url请求发现我们请求到数据的json格式

 所以可以对这个url发起请求获取数据,用request发起请求发现,boos网站发起请求需要cookie,和请求头,都必须有,不然请求就会失败。

 设置好cookie和请求头发起请求,数据就保存为json格式

所有代码

import requests
headers = {
    "Cookie": "wd_guid=91468f12-d30d-43e4-94bb-68b6cfb2fa12; historyState=state; _9755xjdesxxd_=32; YD00951578218230%3AWM_TID=TQELr2c3JspFAURUABPADb0ror7BeG03; _bl_uid=82lmpfLtfb5bUnmnCsk660p59Rgk; gdxidpyhxdE=X8t%5COO%2FfuPeAdxS955iq1%5CMMeIgdyn3cMb3Xkaj6gKVp%5CVkYBGLINhXHHBxU8nzQVWneo%2FeO6Cz1PdrpyRBTk1TBWd5w9%5CxwevU2YO89%2FwUuVddNTWbYQ55CfXnvqJl%2B5489qIUWAHAKYBWo%2FebRkdB5yHrin8X%5CH7%2F75iAXYZv%2BrGhI%3A1682255309352; YD00951578218230%3AWM_NI=SgEeqGRRDH4N6fPuzPtYfU6hhstXDKgzg2M5eQ%2Fk8hhYEBBuMmxOfK7STDhstbTAg%2BvhLiOm8XJCemw081Q%2B5TNgOqc5VMAduMNc6Shscb3D1eHkwGCzypInDToB9hC7cGc%3D; YD00951578218230%3AWM_NIKE=9ca17ae2e6ffcda170e2e6eed2d7438c8bfdd6d63398b88aa6d54a969a9a87c841b1eefba5e4528eed9da6f72af0fea7c3b92a91b98b86f44d928e8ca9b746f6baa096f95cb887aab6b564acadbcb9c741a5a6a9d3d03fa78fe18cbb64ac878ba3c17f8fb3b8b0e541f58d9ca5eb79b8be9ca2e621b6b8848ace67fc8886a4e867b5f0f8d2d421f78c8288ce7ba1eb9aaec470b7938284b559acadad83b250f3bb888cf64d81a6b791cc66888ca1a3bc6481e9ada8e237e2a3; lastCity=101120200; __zp_seo_uuid__=b2ce6f4f-ce6e-491d-8c02-03bdd9673df8; __g=-; Hm_lvt_194df3105ad7148dcf2b98a91b5e727a=1680527899,1682254387,1682323965; __fid=0ad9a31a6fb5aa9f556489963c669e4f; boss_login_mode=sms; Hm_lpvt_194df3105ad7148dcf2b98a91b5e727a=1682326918; __c=1682323965; __l=r=https%3A%2F%2Fwww.baidu.com%2Flink%3Furl%3DpwrxT1cOUiB3s74B5FumK_pIgyku_XWHlvqCYnefIu2CN2sHte7fdIJ0lxRoqB8e%26wd%3D%26eqid%3Dad44ce280000deea00000003644639f8&l=%2Fwww.zhipin.com%2Fweb%2Fgeek%2Fjob%3Fquery%3D%26city%3D101120200%26position%3D100101&s=3&g=&friend_source=0&s=3&friend_source=0; __a=83933772.1661747470.1682254387.1682323965.196.8.31.196; __zp_stoken__=c59eeWA01RExyPydVFBo1UWJyNgJmQBVhG1cvOEIjVgwNU2sNcGIBRlhCWEsYcyNWLVdKRAhrahZfRz0VR3onJD8FEGZkZE4SXkYWWnMCbmMXMA5pAQ8AHGB9XEByFRwdb254P3VDZ3RBBjk%3D; __zp_sseed__=zmCUJJrI3nwlF5sxNwC3WnSKfRSL1IsxDgKu3/6wFUc=; __zp_sname__=1037a913; __zp_sts__=1682327428284"
    ,
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36"
}
resp = requests.get("https://www.zhipin.com/wapi/zpgeek/search/joblist.json?"
                    "scene=1&"
                    "query=&"
                    "city=101120200"
                    "&experience=&"
                    "payType=&"
                    "partTime=&"
                    "degree=&"
                    "industry=&"
                    "scale=&"
                    "stage=&"
                    "position=100101&"
                    "jobType=&"
                    "salary=&"
                    "multiBusinessDistrict=&"
                    "multiSubway=&"
                    "page=1&"  # 控制页数
                    "pageSize=30"  # 控制条数
                    , headers=headers)
print(resp.json())

job_list = resp.json()["zpData"]['jobList']
with open("./joblist.json",mode="w",encoding="utf-8") as f:
    f.write(str(job_list))
print(job_list[29]["jobName"])

        有时候会请求失败,可以换一下cookie再发送请求 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值