天气爬虫网站（flask+sqlite3+selenium+echarts）

最新推荐文章于 2024-06-05 11:29:15 发布

CodeYello

最新推荐文章于 2024-06-05 11:29:15 发布

阅读量4.1k

点赞数 26

分类专栏： Python 文章标签： python 爬虫前端数据库 echarts

本文链接：https://blog.csdn.net/m0_52937388/article/details/118151756

版权

项目介绍：

项目名称：动态爬虫天气网站
一、项目核心：
通过动态爬虫，根据用户关键词即时检索并爬取天气网站的相关天气数据，
进行相应的数据处理，最后绘制相应统计图进行数据展示。
二、任务要求
1）主要城市的天气情况(七天),(天气状况(比如晴天、多云)、风级、相对湿度、空气质量)。
2）统计分析全国各大省会当天天气情况，并作出相应的统计数据，在网页上渲染统计图。
3）天气记忆。第三点分析的数据需要记录，实现能够查看以前的天气状况。

技术介绍：

1、使用flask连接python代码和前端
2、因为天气网站的部分内容使用js表现无法直接爬取，所以我们需要使用selenium来获得这部分内容
3、使用数据库sqlite3来记录天气数据达到天气记忆的功能
4、使用echarts来达到中国地图的体现

结果展示：

首页/查询界面
在这里插入图片描述
整体视频演示：

天气爬虫网站（flask+sqlite3+selenium+echarts）

注：因为我默认浏览器的设置和pycharm的冲突，所以有弹窗警告

代码部分：

代码基于视频里的各项功能分别展示：

后端python代码：

#coding=utf-8
from flask import Flask,render_template,request
import urllib.request
import urllib.parse
from bs4 import BeautifulSoup
import sqlite3
import re
from selenium import webdriver
import datetime

#创建出一个列表，用于记录每次搜索出的城市天气情况，并且会在每次关闭pycharm时自动清零
datalist = []
citys = {
    "宁夏": "101170101",
    	  "四川": "101270101"} //你想要添加的城市，由于过多我这里这里只列了两个城市的
"""
项目名称：动态爬虫天气网站
一、项目核心：
通过动态爬虫，根据用户关键词即时检索并爬取天气网站的相关天气数据，
进行相应的数据处理，最后绘制相应统计图进行数据展示。
二、任务要求
1）主要城市的天气情况(七天),(天气状况(比如晴天、多云)、风级、相对湿度、空气质量)。
2）统计分析全国各大省会当天天气情况，并作出相应的统计数据，在网页上渲染统计图。
3）天气记忆。第三点分析的数据需要记录，实现能够查看以前的天气状况。
"""
#代码缺陷:部分城市(如延边)在天气预报网中没有空气质量这一信息,导致代码爬取时会出错。

#定义一个变量
app = Flask(__name__)

#查询页面
@app.route('/')
def register():
    return render_template("mapchina.html")

#大地图
@app.route('/bigchina')
def bigchina():
        sum= ['北京','天津','上海', '重庆', '河北', '河南', '云南', '辽宁',
                    '黑龙江', '湖南', '安徽', '山东', '新疆', '江苏', '浙江', '江西',
                    '湖北','广西','甘肃','山西','内蒙古','陕西','吉林','福建','贵州',
                    '广东','青海','西藏','四川','宁夏','海南','台湾','香港','澳门']
        
        wd_d = []
        for one in sum:
            html_dizhi = "http://www.weather.com.cn/weather/" + citys[one] + ".shtml"
            html = urllib.request.urlopen(html_dizhi)
            obj = BeautifulSoup(html.read(), 'html.parser')
            mes_links = obj.find_all("li", {
   "class": re.compile('sky skyid lv\d')})
        

            wd_d.append(obj.select('.tem i')[0].get_text().strip('℃'))

        return render_template("bigchina.html",wd_d = wd_d)

#城市7天天气情况
@app.route('/qitian',methods=['POST','GET'])
def qitian():
    #接收搜索的城市名称
    if request.method == 'POST':
        result = request.form
        city = request.form.get('location')
        city = str(city)

    #各项天气情况的列表
        date, wter, wd_d, wd_g, wind_dire, wind = [], [], [], [], [], []

    #爬虫部分
        html_dizhi = "http://www.weather.com.cn/weather/" + citys[city] + ".shtml"
        html = urllib.request.urlopen(html_dizhi)
        obj = BeautifulSoup(html.read(), 'html.parser')
        mes_links = obj.find_all("li", {
   "class": re.compile('sky skyid lv\d')})
        for mes in mes_links:
            date.append(mes.h1.get_text())
            wter.append(mes.p.get_text())
            wd_g.append(mes.span.get_text())
        for i in range(7):
            wd_d.append(obj.select('.tem i')[i].get_text())
            wind.append(obj.select('.win i')[i].get_text())

        #测试:将爬出的数据去掉℃并数字化以用于大地图的数字显示
        print(wd_d)
        print(wd_d[0].strip('℃'))


        return render_template("qitian.html",date = date,wter = wter,wd_g = wd_g,wd_d = wd_d,wind = wind)

#当天具体天气内容的显示页面(同时包含天气记忆不重复的内容)
@app.route('/home',methods=['POST','GET'])
def home():
    if request.method == 'POST':
        city = request.form.get('location')
        city = str(city)
        #城市对应数字序列

        #爬虫部分
        driver_path = r"D:\Selenium webdriver\geckodriver.exe"
        HTML1 = 'http://www.weather.com.cn/weather1d/' + citys[city] + '.shtml'
        driver = webdriver.Firefox(executable_path=driver_path)
        driver.get(HTML1)
        page = BeautifulSoup(driver.page_source,"html5lib")
        mes = page.find("div", {
   "class": 'sk'})
        mes2 = mes.find("div", {
   "class": 'zs h'})
        sd = mes2.em.get_text()    #相对湿度

        mes3 = mes.find("div", {
   "class": 'zs w'})
        fx = mes3.span.get_text()  # 风向
        fl = mes3.em.get_text()    # 风力
        mes4 = mes.find("div",{
   "class":'zs pol'})
        kq = mes4.a.get_text()     #空气质量

        mes2 = page.find("p", {
   "class": 'wea'})
        tq = mes2.get_text()       #天气状况

        #连接数据库，达成天气记忆
        dbpath = "tianna.db"
        #init_db(dbpath)


        #防止同一城市的天气情况因重复搜索而被多次记录在数据库中
        bool1 = True
        bool2 = True
        """
        在这里写出data=[]是为了防止data像datalist一样在pycharm未关闭前一直保存着所有搜索到的
        东西，在这里写后，每当data成功保存了一批数据并传入数据库后，会在下次调用这段记忆代码时
        自动清零，防止下次传入数据库时重复把上次的内容再倒入进去。
        """
        base = []
        data = []
        """
        保险手段，若data清零失效，则判断其是否为空，若不空，则将data赋值为空。为此
        base列表必须设定为空，且不赋任何值。        
        """
        if data != False:
            data = base
        #使用date来表示出日期的不同，以此来区分日期相同的问题
        rq = datetime.date.today()
        rq = str(rq)
        """
        一、若打开pycharm后一次都未使用搜索，则datalist为空，搜索一次后bool1与bool2为true，
        直接保存到数据库中同时，datalist不再为空，记录下了这次搜索的结果
        二、若搜索过一次，则会判断现在搜索的东西与之前搜索过的所有东西是否相同，若日期相通，
        则bool1为false若城市相同，则bool2为false
        三、考虑到相同日期城市名称不同则也应被记录，
        所以采用bool1 or bool2
        """
        for data1 in datalist:
            for d in data1:
                if d == rq:
                    bool1 = False
                if d == city:
                    bool2 = False

        if bool1 or bool2:
            #data用于保存搜索到的东西，并传入数据库中
            data.append(rq)
            data.append(city)
            data.append(sd)
            data.append(fx)
            data.append(fl)
            data.append(tq)
            data.append(kq)
            #搜索到的东西存入datalist中
            datalist.append(data)

            # 存入数据库中
            conn = sqlite3.connect(dbpath)
            cur = conn.cursor()


            for index in range(len(data)):
                data[index] = '"'+data[index]+'"'
            sql = '''
                    insert into Tian (
                    rq,city,sd,fx,fl,tq,kq)
                    values(%s)'''%",".join(data)
            cur.execute(sql)
            conn.commit()

            cur.close()
            conn.close()

        return render_template("home.html", sd=sd, fx=fx, fl=fl, tq=tq, kq=kq)
       
#创建数据库(创建完一次就注释掉这块)
"""
def init_db(dbpath):
    sql = '''
    create table Tian
    (
    id integer primary key autoincrement,
    rq text,
    city text,
    sd text,
    fx text,
    fl text,
    tq text,
    kq text
    )
    '''
    conn = sqlite3.connect(dbpath)
    cursor = conn.cursor()
    cursor.execute(sql)
    conn.commit()
    conn.close()
"""       
#天气记忆(使用数据库保存每次的搜索结果)
@app.route('/jiyi')
def jiyi():
    datalist2 = []
    con = sqlite3.connect("tianna.db")
    cur = con.cursor()
    sql = "select * from Tian"
    data3 = cur.execute(sql)
    for item in data3:
        datalist2.append(item)
    cur.close()
    con.close()
    return render_template("jiyi.html",datalist2 = datalist2)




#quanguo为空气质量的体现，因为firefox的缓慢，并且经常会出现页面加载失败而爬取失败需从头开始爬的问题
@app.route('/quanguo')
def quanguo():
    shujuh = []
    shenghui1 = ['长春','北京','上海','天津','重庆','哈尔滨']
    for haha in shenghui1:
        driver_path = r"D:\Selenium webdriver\geckodriver.exe"
        HTML1 = 'http://www.weather.com.cn/weather1d/' + citys[haha] + '.shtml'
        driver = webdriver.Firefox(executable_path=driver_path)
        driver.get(HTML1)
        page = BeautifulSoup(driver.page_source, "html5lib")
        mes = page.find("div", {
   "class": 'sk'})
        mes4 = mes.find("div", {
   "class": 'zs pol'})
        kq = mes4.a.get_text()
        kq = re.sub("\D","",kq)
        shujuh.append(kq)

    """
    全国省会地区
    ['长春','北京','上海','天津','重庆','哈尔滨',
    '沈阳','呼和浩特','石家庄','乌鲁木齐','兰州',
    '西宁','西安','银川','郑州','济南','太原','合肥',
    '长沙','武汉','南京','成都','贵阳','昆明','南宁',
    '拉萨','杭州','南昌','广州','福州','台北','海口']
    """

    return render_template("quanguo.html",shujuh = shujuh)



@app.route('/quanguo2')
def quanguo2():
    shujuhh = []
    shenghui2 = ['沈阳','呼和浩特','石家庄','乌鲁木齐','兰州']


    for haha in shenghui2:
        driver_path = r"D:\Selenium webdriver\geckodriver.exe"
        HTML1 = 'http://www.weather.com.cn/weather1d/' + citys[haha] + '.shtml'
        driver = webdriver.Firefox(executable_path=driver_path)
        driver.get(HTML1)
        page = BeautifulSoup(driver.page_source, "html5lib")
        mes = page.find("div", {
   "class": 'sk'})
        mes4 = mes.find("div", {
   "class": 'zs pol'})
        kq = mes4.a.get_text()
        kq = re.sub("\D","",kq)
        shujuhh.append(kq)

    return render_template("quanguo2.html",shujuhh = shujuhh)



@app.route('/quanguo3')
def quanguo3():
    shujuhhh = []
    shenghui3 = ['西宁','西安','银川','郑州','济南','太原','合肥']

    for haha in shenghui3:
        driver_path = r"D:\Selenium webdriver\geckodriver.exe"
        HTML1 = 'http://www.weather.com.cn/weather1d/' + citys[haha] + '.shtml'
        driver = webdriver.Firefox(executable_path=driver_path)
        driver.get(HTML1)
        page = BeautifulSoup(driver.page_source, "html5lib")
        mes = page.find("div", {
   "class": 'sk'})
        mes4 = mes.find("div", {
   "class": 'zs pol'})
        kq = mes4.a.get_text()
        kq = re.sub("\D", "", kq)
        shujuhhh.append(kq)

    return render_template("quanguo3.html", shujuhhh=shujuhhh)



@app.route('/quanguo4')
def quanguo4():
    shujuhhhh = []
    shenghui4 = ['长沙','武汉','南京','成都','贵阳','昆明','南宁']

    for haha in shenghui4:
        driver_path = r"D:\Selenium webdriver\geckodriver.exe"
        HTML1 = 'http://www.weather.com.cn/weather1d/' + citys[haha] + '.shtml'
        driver = webdriver.Firefox(executable_path=driver_path)
        driver.get(HTML1)
        page = BeautifulSoup(driver.page_source, "html5lib")
        mes = page.find("div", {
   "class": 'sk'})
        mes4 = mes.find("div", {
   "class": 'zs pol'})
        kq = mes4.a.get_text()
        kq = re.sub("\D", "", kq)
        shujuhhhh.append(kq)

    return render_template("quanguo3.html", shujuhhhh=shujuhhhh)



@app.route('/quanguo5')
def quanguo5():
    shujuhhhhh = []
    shenghui5 = ['拉萨','杭州','南昌','广州','福州','台北','海口']

    for haha in shenghui5:
        driver_path = r"D:\Selenium webdriver\geckodriver.exe"
        HTML1 = 'http://www.weather.com.cn/weather1d/' + citys[haha] + '.shtml'
        driver = webdriver.Firefox(executable_path=driver_path)
        driver.get(HTML1)
        page = BeautifulSoup(driver.page_source, "html5lib")
        mes = page.find("div", {
   "class": 'sk'})
        mes4 = mes.find("div", {
   "class": 'zs pol'})
        kq = mes4.a.get_text()
        kq = re.sub("\D", "", kq)
        shujuhhhhh.append(kq)

    return render_template("quanguo5.html", shujuhhhhh=shujuhhhhh)


#开始
if __name__ == '__main__':
    app.run(debug=True)

前端html天气记忆：

<!DOCTYPE html>
<html lang="en">

<head>
  <meta charset="utf-8">
  <meta content="width=device-width, initial-scale=1.0" name="viewport">

  <title>天气记忆</title>
  <meta content="" name="descriptison">
  <meta content="" name="keywords">

  <!-- Favicons -->
  <link href="static/assets/img/favicon.png" rel="icon">
  <link href="static/assets/img/apple-touch-icon.png" rel="apple-touch-icon">

  <!-- Google Fonts -->
  <link href="https://fonts.googleapis.com/css?family=Open+Sans:300,300i,400,400i,600,600i,700,700i|Raleway:300,300i,400,400i,600,600i,700,700i,900" rel="stylesheet">

  <!-- Vendor CSS Files -->
  <link href="static/assets/vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">
  <link href="static/assets/vendor/icofont/icofont.min.css" rel="stylesheet">
  <link href="static/assets/vendor/boxicons/css/boxicons.min.css" rel="stylesheet">
  <link href="static/assets/vendor/animate.css/animate.min.css" rel="stylesheet">
  <link href="static/assets/vendor/venobox/venobox.css" rel="stylesheet">
  <link href="static/assets/vendor/aos/aos.css" rel="stylesheet">

  <!-- Template Main CSS File -->
  <link href="static/assets/css/style.css" rel="stylesheet">

</head>

<body>



  <!-- ======= Header ======= -->
  <header id="header">
    <div class="container">

      <div class="logo float-left">
        <h1 class="text-light"><a href="register.html"><span>WEATHER</

最低0.47元/天解锁文章

CodeYello

关注

26
点赞
踩
49

收藏

觉得还不错? 一键收藏
打赏
28
评论
天气爬虫网站（flask+sqlite3+selenium+echarts）

项目介绍：项目名称：动态爬虫天气网站一、项目核心：通过动态爬虫，根据用户关键词即时检索并爬取天气网站的相关天气数据，进行相应的数据处理，最后绘制相应统计图进行数据展示。二、任务要求1）主要城市的天气情况(七天),(天气状况(比如晴天、多云)、风级、相对湿度、空气质量)。2）统计分析全国各大省会当天天气情况，并作出相应的统计数据，在网页上渲染统计图。3）天气记忆。第三点分析的数据需要记录，实现能够查看以前的天气状况。技术介绍：1、使用flask连接python代码和前端2、因为天气网站的
复制链接

扫一扫