爬取tx招聘的小记录

最新推荐文章于 2024-08-26 18:56:03 发布

老白菜c

最新推荐文章于 2024-08-26 18:56:03 发布

阅读量87

点赞数

分类专栏：爬虫文章标签： python 开发语言后端

本文链接：https://blog.csdn.net/qq_49130527/article/details/121038645

版权

爬虫专栏收录该内容

6 篇文章 0 订阅

订阅专栏

# -*- ecoding: utf-8 -*-
# @ModuleName: SpiderTX
# @Function: 
# @Author: C
# @Time: 2021/10/28 19:46
import module.spider
import json
import jsonpath
import urllib.request
import xlwt
class Spidertx(object):
        base_url = "https://careers.tencent.com/tencentcareer/api/post/Query?timestamp=1635421641765&countryId=&cityId=&bgIds=&productId=&categoryId=&parentCategoryId=40001&attrId=&keyword=&pageIndex="
        use_url = "&pageSize=10&language=zh-cn&area=cn"
        def __init__(self):
            self.begin_page = int(input("请输入起始页:"))
            self.end_page = int(input("请输入终止页:"))




def load_tx(spidertx):

    page = 0
    headers = {
        "User-Agent": "Mozilla / 5.0(Windows NT 10.0;Win64;x64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 89.04389.114 Safari/537.36"
    }
    xlsx = xlwt.Workbook(encoding="utf-8")
    sheet = xlsx.add_sheet("清洗完成的数据", True)
    style = xlwt.XFStyle()
    font = xlwt.Font()
    font.name = 'Calibri'  # 设置字体
    # font.colour_index = 4  # 设置字体颜色
    font.height = 400  # 字体大小
    style.font = font
    sheet.write(0, 0, "城市", style)
    sheet.write(0, 1, "职业名称", style)
    sheet.write(0, 2, "职业描述", style)
    for i in range(spidertx.begin_page, spidertx.end_page):
        i = str(i)
        url = spidertx.base_url + i + spidertx.use_url
        req = urllib.request.Request(url, headers=headers)
        resp = urllib.request.urlopen(req)
        html = resp.read().decode("utf-8")
        json0 = json.loads(html)
        Posts = json0.get("Data").get("Posts")
        for i in range(0,len(Posts)):
            post = Posts[i]
            sheet.write(page+i+1, 0, post.get("LocationName"), style)
            sheet.write(page+i+1, 1, post.get("RecruitPostName"), style)
            sheet.write(page+i+1, 2, post.get("Responsibility"), style)

            if i == len(Posts)-1:
                page += len(Posts)

            # list.append(jsonpath.jsonpath(json0, "$.Data.Posts" + "[" + i + "]" + ".LocationName")[0])
            # list.append(jsonpath.jsonpath(json0, "$.Data.Posts"+"["+ i +"]"+".RecruitPostName")[0])
            # list.append(jsonpath.jsonpath(json0, "$.Data.Posts" + "[" + i + "]" + ".Responsibility")[0])
    xlsx.save("腾讯招聘.xls")


if __name__ == '__main__':

    spidertx = Spidertx()

    load_tx(spidertx)

小总结一下就是：

在读取到数据的时候就可以写入表格中！
动态获取数据的可以通过F12中查看XHR获取到接口
json可以用字典类型处理

老白菜c

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
爬取tx招聘的小记录

# -*- ecoding: utf-8 -*-# @ModuleName: SpiderTX# @Function: # @Author: C# @Time: 2021/10/28 19:46import module.spiderimport jsonimport jsonpathimport urllib.requestimport xlwtclass Spidertx(object): base_url = "https://careers.tencent.com
复制链接

扫一扫

专栏目录