Python爬虫框架Scrapy实战 - 抓取BOSS直聘招聘信息

最新推荐文章于 2024-05-09 20:44:33 发布

huaqiangu1123

最新推荐文章于 2024-05-09 20:44:33 发布

阅读量812

点赞数

本文链接：https://blog.csdn.net/huaqiangu1123/article/details/78780652

版权

本文介绍了使用Python爬虫框架Scrapy抓取BOSS直聘招聘信息的实战过程，包括开发环境、安装Scrapy、新建项目、定义Item、分析页面、编写爬虫代码、设置UTF-8编码、控制爬取速度、自定义爬取条件、保存数据到MongoDB，以及项目不足和开源地址。

摘要由CSDN通过智能技术生成

零、开发环境

MacBook Pro (13-inch, 2016, Two Thunderbolt 3 ports)
CPU : 2 GHz Intel Core i5
RAM : 8 GB 1867 MHz LPDDR3
Python 版本: v3.6.3 [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
MongoDB 版本: v3.4.7
MongoDB 可视化工具：MongoBooster v4.1.3

一、准备工作

安装 Scrapy

pip3 install scrapy

如果顺利的话,会像本人这样,装了一大堆软件包

参考翻译文档的安装教程：http://scrapy-chs.readthedocs.io/zh_CN/latest/intro/install.html

官方 GitHub 地址：https://github.com/scrapy/scrapy

二、新建项目

scrapy startproject www_zhipin_com

如果顺利的话,会像本人这样

三、定义要抓取的 Item

在items.py 文件中定义一个类

class WwwZhipinComItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    pid = scrapy.Field()
    positionName = scrapy.Field()
    positionLables = scrapy.Field()
    workYear = scrapy.Field()
    salary = scrapy.Field()
    city = scrapy.Field()
    education = scrapy.Field()
    companyShortName = scrapy.Field()
    industryField = scrap