目标: 爬取天气网天气
目标链接: http://beijing.tianqi.com/
我们依据上篇文章http://blog.csdn.net/co_zy/article/details/77189416
建立一个工程和一个爬虫
> scrapy startproject weather
> > scrapy genspider BeijingSpider tianqi.com
在本次爬虫项目案例中,需要修改,填空的只有4个文件,分别是items.py
,settings.py
,pipelines.py
,BeijingSpider.py
(1)打开目标链接,审查元素
在这里,包含的信息有城市日期,星期,天气图标,温度,天气状况以及风向.至此,items.py
文件已经呼之欲出
# -*- coding: utf-8 -*-
# Define here the models for your scraped items
#
# See documentation in:
# http://doc.scrapy.org/en/latest/topics/items.html
import scrapy
class WeatherItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
cityDate = scrapy.Field()
week = scrapy.Field()
img = scrapy.Field()
temperature = scrapy.Field()
weather = scrapy.Field()
wind = scrapy.Field