快速爬取腾讯招聘信息

最新推荐文章于 2025-03-28 11:30:00 发布

梁萌

最新推荐文章于 2025-03-28 11:30:00 发布

阅读量2.1k

点赞数 3

分类专栏： python 文章标签： python 爬虫招聘信息 json 字典

本文链接：https://blog.csdn.net/liangmengbk/article/details/107136707

版权

python 专栏收录该内容

7 篇文章

订阅专栏

目标网站：https://careers.tencent.com/search.html?pcid=40001

目标数据：爬取前10页招聘信息中的岗位名称、工作职责、工作要求

准备工作：先看下目标数据的获取方式，是从页面的HTML中获取还是从接口中直接获取。

通过对网站的分析，发现需要的数据是来自接口。如下图所示：

话不多说，直接上代码：

import requests
import json


headers={
	"user-agent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.108 Safari/537.36"
}


def main(url):
	response=requests.get(url,headers=headers)

	#将json字符串转字典
	jsonDic=json.loads(response.text)

	result=jsonDic["Data"]["Posts"]

	for x in result:
		postID=x["PostId"]
		postUrl="https://careers.tencent.com/tencentcareer/api/post/ByPostId?postId="+postID #构造详情页url
		resDetail=requests.get(postUrl,headers=headers)

		jsonDicDetail=json.loads(resDetail.text) #将json字符串转字典

		# 获取需要的信息
		RecruitPostName=jsonDicDetail["Data"]["RecruitPostName"]  #岗位名称
		Responsibility=jsonDicDetail["Data"]["Responsibility"]  #工作职责
		Requirement=jsonDicDetail["Data"]["Requirement"] #工作要求

		print("岗位名称: "+RecruitPostName+"\n")
		print("工作职责: "+Responsibility+"\n")
		print("工作要求: "+Requirement+"\n")
		print("------------------------------------------------------------------")




if __name__ == '__main__':
	# 构造请求url
	url="https://careers.tencent.com/tencentcareer/api/post/Query?pageIndex="
	for i in range(1,11):
		main(url+str(i)+"&pageSize=10")

执行结果：