Python 之scrapy框架58同城招聘爬取案例

一、项目目录结构:

代码如下:

# -*- coding: utf-8 -*-

# Define here the models for your scraped items
#
# See documentation in:
# https://doc.scrapy.org/en/latest/topics/items.html

import scrapy


class Job58CityItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    job_name = scrapy.Field()
    money = scrapy.Field()
    job_wel = scrapy.Field()
    company = scrapy.Field()
    position_type = scrapy.Field()
    xueli = scrapy.Field()
    jingyan = scrapy.Field()
    address = scrapy.Field()

# -*- coding: utf-8 -*-
import scrapy
from ..items import Job58CityItem


class JobsSpider(scrapy.Spider):
    name = 'jobs'
    allowed_domains = ['58.com']
    # 配置起始页url
    offset = 1
    url = "https://cd.58.com/job/pn{0}/"
    start_urls = [url.format(str(offset))]

    #解析html内容
    def parse(self, response):
        for each in response.xpath("//ul[@id='list_con']/li"):
            item = Job58CityItem()
            item['job_name'] = each.xpath(".//span[@class='name']/text()").extract()[0]
            money_list = each.xpath(".//p[@class='job_salary']/text()").extract()
            money = "未知"
            if len(money_list) > 0:
                money = money_list[0]
            item['money'] = money
            span = each.xpath(".//div[@class='job_wel clearfix']/span")
            item['job_wel'] = []
            for i in span:
                item['job_wel'].append(i.xpath("./text()").extract()[0])
            item['company'] = each.xpath(".//div[@class='comp_name']/a/text()").extract()[0]
            item['position_type'] = each.xpath(".//span[@class='cate']/text()").extract()[0]
            item['xueli'] = each.xpath(".//span[@class='xueli']/text()").extract()[0]
            item['jingyan'] = each.xpath(".//span[@class='jingyan']/text()").extract()[0]
            item['address'] = each.xpath("//span[@class='address']/text()").extract()[0]
            yield item
        if self.offset < 100:
            self.offset += 1
        yield scrapy.Request("https://cd.58.com/job/pn{0}/".format(str(self.offset)), callback=self.parse)
from scrapy import cmdline

if __name__ == '__main__':
    cmdline.execute("scrapy crawl jobs".split())

 数据:

 源码链接:https://github.com/yangsphp/Scrapy-master


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

样子2018

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值