最近开始接触Scrapy这个开源的爬虫,看了一些文档和人家的技术博客,模仿一下,来爬取自己博客。
首先创建项目:
scrapy startproject myblog
items.py的编写:
我准备爬取博客文章标题,文章链接及文章被阅读的次数
# -*- coding: utf-8 -*-# Define here the models for your scraped items## See documentation in:# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class MyBlogItem(scrapy.Item):
article_name = scrapy.Field()article_url = scrapy.Field()article_readcount = scrapy.Field()
pipelines.py的编写:
# -*- coding: utf-8 -*-# Define your item pipelines here