基于python2.7,使用scrapy爬取豌豆荚app的名字大小及下载次数等字段并将其存储到MongoDB数据库中,步骤如下:
一.新建scrapy项目并编写爬虫程序
使用scarpy命令新建爬虫项目:
scrapy startproject ChannelCrawler
生成爬虫项目后在Items.py中对爬取数据结构的Item进行编写,根据要爬取的4个字段有Items.py:
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class AppInfo(scrapy.Item):
name = scrapy.Field()
size = scrapy.Field()
downloadTimes = scrapy.Field()
description = scrapy.Field()
然后r在spider文件夹的init.py下进行爬虫程序的编写,程序代码如下,步骤是先爬取所有app的分类再进行app详细数据的爬取:
# This package will contain the spiders of your Scrapy project
#
# Please refer to the documentation for information on how to