通过CrawlSpider爬取网易社会招聘信息

最新推荐文章于 2024-07-08 19:11:24 发布

梦途的测开笔记

最新推荐文章于 2024-07-08 19:11:24 发布

阅读量874

点赞数

分类专栏： Python

本文链接：https://blog.csdn.net/Mahumd/article/details/98476165

版权

通过CrawlSpider爬取网易社会招聘信息

1.创建工程

scrapy startproject 项目名称

2.创建crawlspider爬虫

scrapy genspider -t crawl 爬虫名 爬虫的范围.com

3.爬虫代码如下

# -*- coding: utf-8 -*-

from scrapy.linkextractors import LinkExtractor
from scrapy.spiders import CrawlSpider, Rule


class WangyishezhaoSpider(CrawlSpider):
    name = 'wangyishezhao'
    allowed_domains = ['163.com']
    start_urls = ['https://hr.163.com/position/list.do?postType=01']

    rules = (
        Rule(LinkExtractor(restrict_xpaths='//div[@class="m-page"]/a'), follow=True),
        Rule(LinkExtractor(restrict_xpaths='//tbody/tr/td[1]/a'), callback='parse_item', follow=False),
    )

    def parse_item(self, response):
        data_dict