python爬虫天气实例scrapy_2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报...

最新推荐文章于 2021-11-08 16:30:44 发布

闪电肉

最新推荐文章于 2021-11-08 16:30:44 发布

阅读量288

点赞数

文章标签： python爬虫天气实例scrapy

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/weixin_42365688/article/details/113672625

版权

本文介绍了如何使用Python的Scrapy框架爬取天气预报网站的数据，包括创建项目、定义Items、编写Spider、设置选择器、处理响应、定义Pipeline保存数据，以及最后运行爬虫的详细步骤。

摘要由CSDN通过智能技术生成

1.项目准备：网站地址：http://quanzhou.tianqi.com/

2.创建编辑Scrapy爬虫：

scrapy startproject weather

scrapy genspider HQUSpider quanzhou.tianqi.com

项目文件结构如图：

3.修改Items.py：

4.修改Spider文件HQUSpider.py：

(1)先使用命令：scrapy shell http://quanzhou.tianqi.com/ 测试和获取选择器：

(2)试验选择器：打开chrome浏览器，查看网页源代码：

(3)执行命令查看response结果：

(4)编写HQUSpider.py文件：

# -*- coding: utf-8 -*-

import scrapy

from weather.items import WeatherItem

class HquspiderSpider(scrapy.Spider):

name = ‘HQUSpider‘

allowed_domains = [‘tianqi.com‘]

citys=[‘quanzhou‘,‘datong‘]

start_urls = []

for city in citys:

start_urls.append(‘http://‘+city+‘.tianqi.com/‘)

def parse(self, response):

subSelector=response.xpath(‘//div[@class="tqshow1"]‘)

items=[]

for sub in subSelector:

item=WeatherItem()

cityDates=‘‘

for cityDate in sub.xpath(‘./h3//text()‘).extract():

cityDates+=cityDate

item[‘cityDate‘]=cityDates

item[‘week‘]=sub.xpath(‘./p//text()‘).extract()[0]

item[‘img‘]=sub.xpath(‘./ul/li[1]/img/@src‘).extract()[0]

temps=‘‘

for temp in sub.xpath(‘./ul/li[2]//text()‘).extract():

temps+=temp

item[‘temperature‘]=temps

item[‘weather‘]=sub.xpath(‘./ul/li[3]//text()‘).extract()[0]

item[‘wind‘]=sub.xpath(‘./ul/li[4]//text()‘).extract()[0]

items.append(item)

return items

(5)修改pipelines.py我，处理Spider的结果：

# -*- coding: utf-8 -*-

# Define your item pipelines here

#

# Don‘t forget to add your pipeline to the ITEM_PIPELINES setting

# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html

import time

import os.path

import urllib2

import sys

reload(sys)

sys.setdefaultencoding(‘utf8‘)

class WeatherPipeline(object):

def process_item(self, item, spider):

today=time.strftime(‘%Y%m%d‘,time.localtime())

fileName=today+‘.txt‘

with open(fileName,‘a‘) as fp:

fp.write(item[‘cityDate‘].encode(‘utf-8‘)+‘\t‘)

fp.write(item[‘week‘].encode(‘utf-8‘)+‘\t‘)

imgName=os.path.basename(item[‘img‘])

fp.write(imgName+‘\t‘)

if os.path.exists(imgName):

pass

else:

with open(imgName,‘wb‘) as fp:

response=urllib2.urlopen(item[‘img‘])

fp.write(response.read())

fp.write(item[‘temperature‘].encode(‘utf-8‘)+‘\t‘)

fp.write(item[‘weather‘].encode(‘utf-8‘)+‘\t‘)

fp.write(item[‘wind‘].encode(‘utf-8‘)+‘\n\n‘)

time.sleep(1)

return item

(6)修改settings.py文件，决定由哪个文件来处理获取的数据：

(7)执行命令：scrapy crawl HQUSpider

到此为止，一个完整的Scrapy爬虫就完成了。

原文：http://www.cnblogs.com/hqutcy/p/7284302.html

关注

0
点赞
踩
5

收藏

觉得还不错? 一键收藏
0
评论
python爬虫天气实例scrapy_2017.08.04 Python网络爬虫之Scrapy爬虫实战二天气预报...

1.项目准备：网站地址：http://quanzhou.tianqi.com/2.创建编辑Scrapy爬虫：scrapy startproject weatherscrapy genspider HQUSpider quanzhou.tianqi.com项目文件结构如图：3.修改Items.py：4.修改Spider文件HQUSpider.py：(1)先使用命令：scrapy shell http...
复制链接

扫一扫

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。