利用Scala+Spark对澳洲十年的天气数据进行部分指标统计

前言

什么是Sprak?

sprak介绍

为什么要对天气数据的指标进行统计呢?

        对天气数据的指标统计有很多意义,可以帮助人们更好地了解和预测天气变化。一些常见的指标包括温度、湿度、风速和降水量等。
        首先,通过对这些指标的统计分析,可以帮助我们了解某个地区的天气趋势和季节性变化,例如不同季节的平均温度、雨水的分布等等。这有助于人们制定更准确的出行计划和农业生产计划。
        其次,指标统计还能够帮助我们更好地了解极端天气的概率和强度,例如暴雨、龙卷风等等的风险。这有助于人们做好应对措施,预防灾害。
最后,对各种天气指标的统计分析也能够为气象学家和气象研究者提供数据支持,帮助他们研究天气变化的规律和原因,为未来的气象预测和应对气候变化工作提供更准确的数据依据。

测试Json数据集

{"date":"2008\/10\/1","location":"Albury","minTemp":"13.4","maxTemp":"22.9","rainfall":"0.6","windGustDir":"W","windGustSpeed":"44","windDir9am":"W","windDir3pm":"WNW","windSpeed9am":"20","windSpeed3pm":"24","humidity9am":"71","humidity3pm":"22","pressure9am":"1007.7","pressure3pm":"1007.1","cloud3pm":"NA","temp9am":"16.9","temp3pm":"21.8","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/10\/1","location":"Albury","minTemp":"-3.4","maxTemp":"NA","rainfall":"0.666","windGustDir":"W","windGustSpeed":"44","windDir9am":"W","windDir3pm":"WNW","windSpeed9am":"20","windSpeed3pm":"24","humidity9am":"71","humidity3pm":"22","pressure9am":"1007.7","pressure3pm":"1007.1","cloud3pm":"NA","temp9am":"16.9","temp3pm":"21.8","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/2","location":"Albury","minTemp":"NA","maxTemp":"25.1","rainfall":"0","windGustDir":"WNW","windGustSpeed":"44","windDir9am":"NNW","windDir3pm":"WSW","windSpeed9am":"4","windSpeed3pm":"22","humidity9am":"44","humidity3pm":"25","pressure9am":"1010.6","pressure3pm":"1007.8","cloud3pm":"NA","temp9am":"17.2","temp3pm":"24.3","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/3","location":"Albury","minTemp":"12.9","maxTemp":"25.7","rainfall":"NA","windGustDir":"WSW","windGustSpeed":"46","windDir9am":"W","windDir3pm":"WSW","windSpeed9am":"19","windSpeed3pm":"26","humidity9am":"38","humidity3pm":"30","pressure9am":"1007.6","pressure3pm":"1008.7","cloud3pm":"2","temp9am":"21","temp3pm":"23.2","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/3","location":"Albury","minTemp":"12.9","maxTemp":"25.7","rainfall":"0","windGustDir":"WSW","windGustSpeed":"46","windDir9am":"W","windDir3pm":"WSW","windSpeed9am":"19","windSpeed3pm":"26","humidity9am":"38","humidity3pm":"30","pressure9am":"1007.6","pressure3pm":"1008.7","cloud3pm":"2","temp9am":"21","temp3pm":"23.2","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/15","location":"aa","minTemp":"9.2","maxTemp":"28","rainfall":"0","windGustDir":"NE","windGustSpeed":"24","windDir9am":"SE","windDir3pm":"E","windSpeed9am":"11","windSpeed3pm":"9","humidity9am":"45","humidity3pm":"16","pressure9am":"1017.6","pressure3pm":"1012.8","cloud3pm":"NA","temp9am":"18.1","temp3pm":"26.5","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/11\/5","location":"aa","minTemp":"17.5","maxTemp":"32.3","rainfall":"1","windGustDir":"W","windGustSpeed":"41","windDir9am":"ENE","windDir3pm":"NW","windSpeed9am":"7","windSpeed3pm":"20","humidity9am":"82","humidity3pm":"33","pressure9am":"1010.8","pressure3pm":"1006","cloud3pm":"8","temp9am":"17.8","temp3pm":"29.7","rainToday":"No","rainTomorrow":"No"}

字段说明

date日期
location气象站名称
minTemp最低温度
maxTemp最高温度
rainfall当天记录的降雨量
windGustDir24小时至午夜期间最强阵风的方向

date //日期
location //气象站名称
minTemp //最低温度
maxTemp //最高温度
rainfall //当天记录的降雨量(毫米)
windGustDir //24小时至午夜期间最强阵风的方向
windGustSpeed//到午夜24小时内最强阵风的速度(km / h)
windDir9am //上午9点的风向
windDir3pm //下午3点的风向
windSpeed9am //下午9点前十分钟的平均风速
windSpeed3pm //下午3点前十分钟的平均风速
humidity9am //上午9点的湿度(百分比)
humidity3pm //下午3点的湿度(百分比)
pressure9am //上午9点时大气压(hpa)降低至平均海平面
pressure3pm //下午3点时大气压(hpa)降低至平均海平面
cloud3pm //下午3点,云层遮盖了天空的比例。 以“ oktas”为单位,该单位是高度的单位.
temp9am //上午9点的温度(摄氏度)
temp3pm //下午3点的温度(摄氏度)
rainToday //如果24小时至上午9点的降水量(mm)超过1mm,则为YES,否则为NO
rainTomorrow //第二天的雨量,以毫米为单位。一种衡量“风险”的方法。第二天降雨则YES,否则为NO

统计记录了最强风向次数最多的前五个气象站和各个最强风向发生的次数的前五个

package com.cjy.top5LocationWindGustDir

import com.cjy.weather.Weathers
import com.google.gson.Gson
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

/*
    统计 记录了最强风向次数最多的前五个气象站和各个最强风向发生的次数的前五个
 */
object TopLocationWindGustDir {
  def main(args: Array[String]): Unit = {
    val sparkConf = new SparkConf().setAppName(TopLocationWindGustDir.getClass.getName).setMaster("local[*]")

    val sparkContext = new SparkContext(sparkConf)

    var rdd: RDD[String] = sparkContext.textFile("文件的绝对路径")

    var pairRDD = rdd
      .map(new Gson().fromJson(_, classOf[Weathers]))
      .filter(!_.windGustDir.equals("NA"))
      .map(weather => ((weather.location, weather.windGustDir), 1))
      .reduceByKey((a,b)=>a+b)

    var locationWithWindDirRDD = pairRDD.map { it => (it._1._1, (it._1._2, it._2)) }

    var groupRDD = locationWithWindDirRDD.groupByKey()

    var top5RDD = groupRDD.mapValues { wdCounts =>
      wdCounts.toList
        .sortBy { case (_, a) => -a }
        .take(5)
    }

    var sortedTop5RDD = top5RDD.sortBy { case (_, wdCounts) =>
      -wdCounts.map(_._2).sum
    }.take(5)

    for ((location, windGustDirList) <- sortedTop5RDD) {
      println("统计风向发生次数最多的前五个气象站:"+location)
      for ((windGustDir, count) <- windGustDirList) {
        println("风向:"+windGustDir + " 发生次数:"+count)
      }
    }
    sparkContext.stop()
  }
}

在这里插入图片描述

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值