使用Hadoop分析气象数据完整版（附带完整代码）_hadoop气象数据分析(1)

2401_84181403

于 2024-05-15 07:13:29 发布

阅读量774

点赞数 30

分类专栏：程序员文章标签：大数据面试学习

本文链接：https://blog.csdn.net/2401_84181403/article/details/138887466

版权

程序员专栏收录该内容

170 篇文章 0 订阅

订阅专栏

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上大数据知识点，真正体系化！

由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新

需要这份系统化资料的朋友，可以戳这里获取

public class TemperatureReducer extends Reducer<Text, LongWritable, 
 Text, Temperature> {
    @Override
    protected void reduce(Text key, Iterable<LongWritable> values, 
    		Context context) throws IOException, InterruptedException {
        long maxTemperature = Long.MIN\_VALUE;
        long minTemperature = Long.MAX\_VALUE;
        double avgTemperature = 0.0;
        long temp;
        int count = 0;
        if (values!=null) {
            for (LongWritable value: values) {
                temp = value.get();
                maxTemperature = Math.max(temp, maxTemperature);
                minTemperature = Math.min(temp, minTemperature);
                avgTemperature += temp;
                count++;
            }
            Temperature temperature = new Temperature(maxTemperature, 
            		minTemperature, avgTemperature/count);

            context.write(key, temperature);
        }

    }
}

计算出每日温度的最大值、最小值和平均值，并放入Temperature对象中。

2.2.3 JobMain

public class JobMain extends Configured implements Tool {
    @Override
    public int run(String[] strings) throws Exception {
        // 创建一个任务对象
        Job job = Job.getInstance(super.getConf(), "mapreduce\_temperature");

        // 打包放在集群运行时，需要做一个配置
        job.setJarByClass(JobMain.class);

        // 第一步：设置读取文件的类：K1和V1
        job.setInputFormatClass(TextInputFormat.class);
        TextInputFormat.addInputPath(job, 
        	new Path("hdfs://node01:8020/usr/hadoop/in"));

        // 第二步：设置Mapper类
        job.setMapperClass(TemperatureMapper.class);
        // 设置Map阶段的输出类型：k2和v2的类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(LongWritable.class);

        // 第三、四、五、六步采用默认方式（分区，排序，规约，分组）

        // 第七步：设置Reducer类
        job.setReducerClass(TemperatureReducer.class);
        // 设置Reduce阶段的输出类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Temperature.class);


        // 第八步：设置输出类
        job.setOutputFormatClass(TextOutputFormat.class);
        // 设置输出路径
        TextOutputFormat.setOutputPath(job, 
        	new Path("hdfs://node01:8020/usr/hadoop/temperature"));

        boolean b = job.waitForCompletion(true);

        return b?0:1;
    }

    public static void main(String[] args) throws Exception {
        Configuration configuration = new Configuration();
        // 启动一个任务
        ToolRunner.run(configuration, new JobMain(), args);
    }

}

2.3 执行

2.3.1 打包、上传

老套路，不说了。

2.3.2 运行

hadoop jar temperature_test-1.0-SNAPSHOT.jar cn.sky.hadoop.JobMain
执行结果：
在这里插入图片描述
在这里看一眼数据：

嗯，还行。

3 导入数据到Hive

Hive详情过程，请参考：大数据学习系列：Hadoop3.0苦命学习（五）

有个问题，若直接从HDFS导入数据到Hive，HDFS上的数据会丢失。

所以我将数据下载下来，重命名为 temperature_data ，并上传到 node03 上
在这里插入图片描述
数据有了，开始创建Hive表：

 create external table temperature (t_date string, t_max double, 
 	t_min double, t_avg double) row format delimited fields terminated by '\t';

加载数据到hive：

load data local inpath '/export/services/temperature_data' overwrite 
	into table temperature;

查前面5条数据，看一眼：

select * from temperature limit 5;

在这里插入图片描述

4 Hive数据分析

弄得简单，就查几个静态数据吧。

查询2019全年平均温度
select avg(t_avg) from temperature;

哇，太慢了，查了25秒，最终结果是3.46（因为数据是被放大了10倍）左右
在这里插入图片描述

查询2019全年高于平均温度的天数
select count(1) from temperature where t_avg > 34.6;
答案是：196天，很显然低于平均气温的天数是169天。

在这里插入图片描述
好了，差不多就行了。

5 使用Sqoop导入数据到Mysql

Sqoop详情过程，请参考：大数据学习系列：Hadoop3.0苦命学习（七）

5.1 Mysql创建数据库

CREATE TABLE `temperature` (
  `Tem_Date` varchar(10) NOT NULL,
  `Tem_Max` double DEFAULT NULL,
  `Tem_Min` double DEFAULT NULL,
  `Tem_Avg` double DEFAULT NULL,
  PRIMARY KEY (`Tem_Date`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8

5.2 开始导入

bin/sqoop export --connect jdbc:mysql://192.168.0.102:3306/userdb 
	--username root --password 123456 --table temperature 
	--export-dir /usr/hadoop/temperature --input-fields-terminated-by "\t"

经过半分钟的等待，就可以在mysql中查到数据了，见下图：
在这里插入图片描述
不错，数据很好。

6 展示数据

这里前端使用echart，jsp，后端使用Spring、SpringMVC、Mybatis。

代码较多，展示主要的。

6.1 前端代码

在这里插入图片描述
主要是这一段，使用Ajax向后台请求数据，然后将数据丢进eChart中。

6.2 后端代码

controller层

@Autowired
    private TemperatureService tempService;
    @RequestMapping("/getTemperature.action")
    @ResponseBody
    public TemperatureReturnPojo getTemperature(){
        TemperatureReturnPojo temperaturePojo =  tempService.getAllTemperature();
        System.out.println(temperaturePojo);
        return temperaturePojo;
    }

Service层

public interface TemperatureService {
    TemperatureReturnPojo getAllTemperature();
}

Service实现类

@Service
public class TemperatureServiceImpl implements TemperatureService {
    @Autowired
    private TemperatureMapper temperatureMapper;
    @Override
    public TemperatureReturnPojo getAllTemperature() {
        TemperatureReturnPojo temperatureReturnPojo 
        	= new TemperatureReturnPojo();

        ArrayList<String> dates = new ArrayList<>();
        ArrayList<String> maxs = new ArrayList<>();
        ArrayList<String> mins = new ArrayList<>();
        ArrayList<String> avgs = new ArrayList<>();
        DecimalFormat df = new DecimalFormat("#.00");

        List<TemperaturePojo> allTemperature 
        	= temperatureMapper.getAllTemperature();
        for (TemperaturePojo pojo : allTemperature) {
            dates.add(pojo.getTem\_Date());
            maxs.add(df.format(pojo.getTem\_Max()/10.0));
            mins.add(df.format(pojo.getTem\_Min()/10.0));
            avgs.add(df.format(pojo.getTem\_Avg()/10.0));
        }
        temperatureReturnPojo.setTem\_Dates(dates);
        temperatureReturnPojo.setTem\_Maxs(maxs);
        temperatureReturnPojo.setTem\_Mins(mins);
        temperatureReturnPojo.setTem\_Avgs(avgs);

        return temperatureReturnPojo;
    }
}

实体类

public class TemperaturePojo {
    private String Tem\_Date;
    private Double Tem\_Max;
    private Double Tem\_Min;
    private Double Tem\_Avg;
	
	// 省略Get()、Set()、ToString()方法 
}

public class TemperatureReturnPojo {
    private List<String> Tem\_Dates;
    private List<String> Tem\_Maxs;
    private List<String> Tem\_Mins;
    private List<String> Tem\_Avgs;
 	// 省略Get()、Set()、ToString()方法 
}

Mapper

public interface TemperatureMapper {
    List<TemperaturePojo> getAllTemperature();
}

<mapper namespace="cn.itcast.weblog.mapper.TemperatureMapper" >
    <select id="getAllTemperature" 
 resultType="cn.itcast.weblog.pojo.TemperaturePojo">
        select * from temperature;
    </select>

</mapper>

运行结果如下：
在这里插入图片描述

流程完成，撒花~~~