零、开发环境
- MacBook Pro (13-inch, 2016, Two Thunderbolt 3 ports)
- CPU : 2 GHz Intel Core i5
- RAM : 8 GB 1867 MHz LPDDR3
- Python 版本:
v3.6.3
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin - MongoDB 版本:
v3.4.7
- MongoDB 可视化工具 :MongoBooster v4.1.3
一、准备工作
安装 Scrapy
pip3 install scrapy
如果顺利的话,会像本人这样,装了一大堆软件包
参考翻译文档的安装教程:http://scrapy-chs.readthedocs.io/zh_CN/latest/intro/install.html
官方 GitHub 地址:https://github.com/scrapy/scrapy
二、新建项目
scrapy startproject www_zhipin_com
如果顺利的话,会像本人这样
三、定义要抓取的 Item
在items.py 文件中定义一个类
class WwwZhipinComItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
pid = scrapy.Field()
positionName = scrapy.Field()
positionLables = scrapy.Field()
workYear = scrapy.Field()
salary = scrapy.Field()
city = scrapy.Field()
education = scrapy.Field()
companyShortName = scrapy.Field()
industryField = scrap