前言
什么是Sprak?
为什么要对天气数据的指标进行统计呢?
对天气数据的指标统计有很多意义,可以帮助人们更好地了解和预测天气变化。一些常见的指标包括温度、湿度、风速和降水量等。
首先,通过对这些指标的统计分析,可以帮助我们了解某个地区的天气趋势和季节性变化,例如不同季节的平均温度、雨水的分布等等。这有助于人们制定更准确的出行计划和农业生产计划。
其次,指标统计还能够帮助我们更好地了解极端天气的概率和强度,例如暴雨、龙卷风等等的风险。这有助于人们做好应对措施,预防灾害。
最后,对各种天气指标的统计分析也能够为气象学家和气象研究者提供数据支持,帮助他们研究天气变化的规律和原因,为未来的气象预测和应对气候变化工作提供更准确的数据依据。
测试Json数据集
{"date":"2008\/10\/1","location":"Albury","minTemp":"13.4","maxTemp":"22.9","rainfall":"0.6","windGustDir":"W","windGustSpeed":"44","windDir9am":"W","windDir3pm":"WNW","windSpeed9am":"20","windSpeed3pm":"24","humidity9am":"71","humidity3pm":"22","pressure9am":"1007.7","pressure3pm":"1007.1","cloud3pm":"NA","temp9am":"16.9","temp3pm":"21.8","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/10\/1","location":"Albury","minTemp":"-3.4","maxTemp":"NA","rainfall":"0.666","windGustDir":"W","windGustSpeed":"44","windDir9am":"W","windDir3pm":"WNW","windSpeed9am":"20","windSpeed3pm":"24","humidity9am":"71","humidity3pm":"22","pressure9am":"1007.7","pressure3pm":"1007.1","cloud3pm":"NA","temp9am":"16.9","temp3pm":"21.8","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/2","location":"Albury","minTemp":"NA","maxTemp":"25.1","rainfall":"0","windGustDir":"WNW","windGustSpeed":"44","windDir9am":"NNW","windDir3pm":"WSW","windSpeed9am":"4","windSpeed3pm":"22","humidity9am":"44","humidity3pm":"25","pressure9am":"1010.6","pressure3pm":"1007.8","cloud3pm":"NA","temp9am":"17.2","temp3pm":"24.3","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/3","location":"Albury","minTemp":"12.9","maxTemp":"25.7","rainfall":"NA","windGustDir":"WSW","windGustSpeed":"46","windDir9am":"W","windDir3pm":"WSW","windSpeed9am":"19","windSpeed3pm":"26","humidity9am":"38","humidity3pm":"30","pressure9am":"1007.6","pressure3pm":"1008.7","cloud3pm":"2","temp9am":"21","temp3pm":"23.2","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/3","location":"Albury","minTemp":"12.9","maxTemp":"25.7","rainfall":"0","windGustDir":"WSW","windGustSpeed":"46","windDir9am":"W","windDir3pm":"WSW","windSpeed9am":"19","windSpeed3pm":"26","humidity9am":"38","humidity3pm":"30","pressure9am":"1007.6","pressure3pm":"1008.7","cloud3pm":"2","temp9am":"21","temp3pm":"23.2","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/12\/15","location":"aa","minTemp":"9.2","maxTemp":"28","rainfall":"0","windGustDir":"NE","windGustSpeed":"24","windDir9am":"SE","windDir3pm":"E","windSpeed9am":"11","windSpeed3pm":"9","humidity9am":"45","humidity3pm":"16","pressure9am":"1017.6","pressure3pm":"1012.8","cloud3pm":"NA","temp9am":"18.1","temp3pm":"26.5","rainToday":"No","rainTomorrow":"No"}
{"date":"2008\/11\/5","location":"aa","minTemp":"17.5","maxTemp":"32.3","rainfall":"1","windGustDir":"W","windGustSpeed":"41","windDir9am":"ENE","windDir3pm":"NW","windSpeed9am":"7","windSpeed3pm":"20","humidity9am":"82","humidity3pm":"33","pressure9am":"1010.8","pressure3pm":"1006","cloud3pm":"8","temp9am":"17.8","temp3pm":"29.7","rainToday":"No","rainTomorrow":"No"}
字段说明
date | 日期 |
---|---|
location | 气象站名称 |
minTemp | 最低温度 |
maxTemp | 最高温度 |
rainfall | 当天记录的降雨量 |
windGustDir | 24小时至午夜期间最强阵风的方向 |
date //日期
location //气象站名称
minTemp //最低温度
maxTemp //最高温度
rainfall //当天记录的降雨量(毫米)
windGustDir //24小时至午夜期间最强阵风的方向
windGustSpeed//到午夜24小时内最强阵风的速度(km / h)
windDir9am //上午9点的风向
windDir3pm //下午3点的风向
windSpeed9am //下午9点前十分钟的平均风速
windSpeed3pm //下午3点前十分钟的平均风速
humidity9am //上午9点的湿度(百分比)
humidity3pm //下午3点的湿度(百分比)
pressure9am //上午9点时大气压(hpa)降低至平均海平面
pressure3pm //下午3点时大气压(hpa)降低至平均海平面
cloud3pm //下午3点,云层遮盖了天空的比例。 以“ oktas”为单位,该单位是高度的单位.
temp9am //上午9点的温度(摄氏度)
temp3pm //下午3点的温度(摄氏度)
rainToday //如果24小时至上午9点的降水量(mm)超过1mm,则为YES,否则为NO
rainTomorrow //第二天的雨量,以毫米为单位。一种衡量“风险”的方法。第二天降雨则YES,否则为NO
统计每个气象局每年的每个季节记录的最高温度与最低温度,并按照最高温度进行降序排序,查看前十五个元素
package com.cjy.maxMinTemp
import com.cjy.weather.Weathers
import com.google.gson.Gson
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
/*
统计 每个气象局每年的每个季节的记录的最高温度与最低温度,并按照最高温度进行降序排序,查看前十五行数据
*/
object MaxMinTemp {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName(MaxMinTemp.getClass.getName).setMaster("local[*]")
val sparkContext = new SparkContext(sparkConf)
var rdd: RDD[String] = sparkContext.textFile("文件路径")
var max = rdd.map(it=>new Gson().fromJson(it,classOf[Weathers]))
.filter(!_.maxTemp.equals("NA"))
.map(it=>{
var locationDate = it.location+"_"+conversion(it.date)
(locationDate,it.maxTemp.toDouble)
}).reduceByKey(Math.max(_,_))
var min = rdd.map(it=>new Gson().fromJson(it,classOf[Weathers]))
.filter(!_.minTemp.equals("NA"))
.map(it => {
var locationDate = it.location + "_" + conversion(it.date)
(locationDate, it.minTemp.toDouble)
}).reduceByKey(Math.min(_,_))
var maxMin = max.join(min)
maxMin.sortBy(it=>(-it._2._1,-it._2._2)).take(15).foreach(it=>println(it._1+" Max:"+it._2._1+" Min:"+it._2._2))
sparkContext.stop()
}
def conversion(date: String): String = {
val dateSplic = date.split("/")
var season: String = ""
val month = dateSplic(1).toInt
if (month >= 1 && month <= 3) {
season = dateSplic(0)+"_springtime"
} else if (month >= 4 && month <= 6) {
season = dateSplic(0)+"_summertime"
} else if (month >= 7 && month <= 9) {
season = dateSplic(0)+"_autumn"
} else {
season = dateSplic(0)+"_wintertime"
}
season
}
}