python爬取历史天气

最新推荐文章于 2024-06-05 11:29:15 发布

蝶落花

最新推荐文章于 2024-06-05 11:29:15 发布

阅读量600

点赞数 2

文章标签： python 爬虫数据挖掘

本文链接：https://blog.csdn.net/qq_45563208/article/details/111664858

版权

利用python的爬虫语言，来进行对天气网址的历史最高气温进行爬取。
由于自己技术并不高，所以采用的还是切片处理所抓取的文件。
PS：在对爬取时，为防止短时间内多次对对方服务器多次请求，而造成困扰，我会在每次爬取一个月的天气数据后，进行休息5秒。这也是尊重对方所提供的数据，不能恩将仇报，到最后还被封了IP不能访问。

from bs4 import BeautifulSoup
import requests
import re
import time

headers={'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.66'}
def pa(di,yearq,yearo):
    fp=open('{}.txt'.format(di),'a',encoding='UTF-8')
    for x in range(yearq,yearo+1):
        for y in range(1,13):
            if 0<y<=9:
                url='http://lishi.tianqi.com/{}/{}0{}.html'.format(di,x,y)
            elif 10<=y<=12:
                url='http://lishi.tianqi.com/{}/{}{}.html'.format(di,x,y)
            req=requests.get(url=url,headers=headers)
            bf=BeautifulSoup(req.text,'html.parser')
            weather=bf.find_all('ul',attrs={'class':'thrui'})
            for i in weather:
                txt=i.get_text()
                list=txt.split('\n')
    # fp.write(str(list[2::7]))
                r=list[2::7]
                q=list[3::7]
    # print(len(r))
    # print(len(q))
            # for j in range(len(q)):
            #     no=r[j].split(' ')
            #     fp.write(no[0]+'\t')
            #     fp.write(q[j]+'\n')
                for j in range(len(q)):
                    no=r[j].split(' ')
                    fp.write(no[0]+'\t')
                    no2=q[j].split('℃')
                    fp.write(no2[0]+'°C'+'\n')
            print("已抓取{}地区{}年{}月的历史天气".format(di,x,y))
            time.sleep(5)
    fp.close()
di=input("请输入你想查询天气的城市（以字母方式，如武陟县：wuzhi）\n")
while 1:
    year1=int(input("请输入你想查询天气的起始年份（以数字方式，如2011）注：只能>=2011\n"))
    year2=int(input("请输入你想查询天气的结束年份（以数字方式，如2020）\n"))
    if year1<2011:
        print("请输入正确起始年份")
        continue
    elif year2<year1:
        print("结尾年份不可以比起始年份少")
    elif year2>=2020:
        print("2020还没过完呢亲！")
    else:
        break
pa(di,year1,year2)

仅可用于学习研究，不可以做出损坏对方财产的行为。

蝶落花

关注

2
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
python爬取历史天气

利用python的爬虫语言，来进行对天气网址的历史最高气温进行爬取。由于自己技术并不高，所以采用的还是切片处理所抓取的文件。PS：在对爬取时，为防止短时间内多次对对方服务器多次请求，而造成困扰，我会在每次爬取一个月的天气数据后，进行休息5秒。这也是尊重对方所提供的数据，不能恩将仇报，到最后还被封了IP不能访问。from bs4 import BeautifulSoupimport requestsimport reimport timeheaders={'User-Agent':'Mozil
复制链接

扫一扫