爬虫：抓取某年某月某日某地的天气信息

最新推荐文章于 2024-07-21 21:11:33 发布

Miss_0

最新推荐文章于 2024-07-21 21:11:33 发布

阅读量847

点赞数

文章标签：爬虫

本文链接：https://blog.csdn.net/sinat_32017675/article/details/59127834

版权

爬虫：抓取某年某月某日某地的天气信息

1、解析URL
由天气网历史天气查询链接:http://lishi.tianqi.com/beijing/201412.html

url 可为http://' + address1 + '.tianqi.com/' + date1 + '.html

2、根据网页源码，编码不是UTF-8,因此不能直接使用json

具体代码如下：

# -*- coding:utf-8 -*-
import urllib2
import os

# 抓取网页内容，并存入临时txt

def get_web_page(address1, date1):
    url = 'http://' + address1 + '.tianqi.com/' + date1 + '.html'
    page = urllib2.urlopen(url).read()
    page_content = page[:7330].decode('GB2312').encode('utf-8')
    f = open('temp.txt', 'w')
    f.write(page_content)
    f.close()
    return

# 处理txt，返回结果，并删除临时txt

def deal_page():
    f = open('temp.txt', 'r')
    for i in range(6):
        f.readline()
    result = f.readline()
    site = result.index('，') + 3
    result = result[site:]
    site = result[:-1].index('"')
    f.close()
    os.remove('temp.txt')
    return result[:site]

if __name__ == '__main__':
    address = input(u"请用拼音输入您查询的地点(如'luoyang')")
    date = input(u"请输入您查询的日期(如'20170228'最早只能查询到2012年4月1日)")
    get_web_page(address, date)
    print u'以下是您输入条件的天气情况，如有误请在地点名称后加1，再次查询'
    print deal_page()

Miss_0

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
爬虫：抓取某年某月某日某地的天气信息

爬虫：抓取某年某月某日某地的天气信息1、解析URL 由天气网历史天气查询链接:http://lishi.tianqi.com/beijing/201412.htmlurl 可为http://' + address1 + '.tianqi.com/' + date1 + '.html2、根据网页源码，编码不是UTF-8,因此不能直接使用json具体代码如下：
复制链接

扫一扫