学习笔记(42):21天通关Python（仅视频课）-Scrapy开发爬虫的步骤（上）

最新推荐文章于 2024-07-28 15:46:11 发布

Hello-Rock

最新推荐文章于 2024-07-28 15:46:11 发布

阅读量117

点赞数 1

分类专栏：研发管理文章标签： python 移动开发编程语言 Python 网络编程

本文链接：https://blog.csdn.net/happyk213/article/details/105396974

版权

研发管理专栏收录该内容

25 篇文章 0 订阅

订阅专栏

立即学习:https://edu.csdn.net/course/play/24797/282223?utm_source=blogtoedu

生成一个蜘蛛：

scrapy genspider job_position 'zhipin.com'

job_position 创建的类名称

zhipin.com 爬取的域名

# -*- coding: utf-8 -*-
import scrapy
from ZhipinSpider.items import ZhipinspiderItem


# view-source:https://www.zhipin.com/job_detail/?query=&city=101120200&industry=&position=110101

class TestScrapySpider(scrapy.Spider):
    # 蜘蛛的名字
    name = 'test_scrapy'
    # 定义蜘蛛只爬取哪写域名
    allowed_domains = ['zhipin.com']
    # 从哪个页面开始爬
    start_urls = ['https://www.zhipin.com/job_detail/?query=&city=101120200&industry=&position=110101']

    # 该response就代表Scrapy下载器所获取的目标响应
    def parse(self, response):
        # 每个job_primary元素包含一个工作信息
        for job_primary in response.xpath('//div[@class="job-primary"]'):
            item = ZhipinspiderItem()
            # 获取工作信息的内容DIV
            info_primary = job_primary.xpath('./div[@class="info-primary"]')
            work_primary = info_primary.xpath('./div[@class="primary-wrapper"]/div [@class="primary-box"]')
            # 工作名称
            item['title'] = work_primary.xpath('./div[@class="job-title"]/span[@class="job-name"]/a/text()').extract()[
                0]
            # # 工资
            item['salary'] = work_primary.xpath(
                './div[@class="job-limit clearfix"]/span[@class="red"]/text()').extract_first()
            # # 工作的连接
            item['url'] = work_primary.xpath('./div[@class="job-title"]/span[@class="job-name"]/a/@href').extract()[0]
            # # 工作地点
            item['work_addr'] = work_primary.xpath(
                './div[@class="job-title"]/span[@class="job-area-wrapper"]/span[@class="job-area"]/text()').extract_first()

            # # 招聘公司
            company_primary = job_primary.xpath('./div[@class="info-company"]/div[@class="company-text"]')
            item['company'] = company_primary.xpath('./h3/a/text()').extract_first()
            company_info=company_primary.xpath('./p/text()').extract_first()
            if company_info and len(company_info)>0:
                item['industry']=company_info[0]
            if company_info and len(company_info)>1:
                item['company_size'] =company_info[2]
            # # 行业
            item['industry'] = company_primary.xpath('./p/a/text()')
            # # 公司规模
            # item['company_size'] = scrapy.Field()
            # # 招聘人
            # item['recruiter'] = scrapy.Field()

Hello-Rock

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
学习笔记(42):21天通关Python（仅视频课）-Scrapy开发爬虫的步骤（上）

本页面购买不发书！！！仅为视频课购买！！！请务必到https://edu.csdn.net/bundled/detail/49下单购买课+书。本页面，仅为观看视频页面，如需一并购买图书，请务必到https://edu.csdn.net/bundled/detail/49下单购买课程+图书！！！疯狂Python精讲课程覆盖《疯狂Py...
复制链接

扫一扫