MapReduce对环境大数据进行统计分析

最新推荐文章于 2024-05-10 15:09:59 发布

m0_58540923

最新推荐文章于 2024-05-10 15:09:59 发布

阅读量459

点赞数 3

文章标签：大数据 mapreduce hadoop 分布式

本文链接：https://blog.csdn.net/m0_58540923/article/details/132066782

版权

1. 需求说明

近年来，由于雾霾问题的持续发酵，越来越多的人开始关注城市相关的环境数据，包括空气质量数据、天气数据等等。

如果每小时记录一次城市的天气实况和空气质量实况信息，则每个城市每天都会产生24条环境数据，全国所有2500多个城市如果均如此进行记录，那每天产生的数据量将达到6万多条，每年则会产生2190万条记录，已经可以称得上环境大数据。

对于这些原始监测数据，我们可以根据时间的维度来进行统计，从而得出与该城市相关的日度及月度平均气温、空气质量优良及污染天数等等，从而为研究空气污染物扩散条件提供有力的数据支持。

本项目中选取了北京2016年1月到6月这半年间的每小时天气和空气质量数据（未取到数据的字段填充"N/A")，利用MapReduce来统计月度平均气温和半年内空气质量为优、良、轻度污染、中度污染、重度污染和严重污染的天数，其中部分北京空气质量数据如下文件所示。

2. 作业要求

要求每位学生编写MapReduce程序，分别从气温和空气质量这2个维度分析天气情况，从北京2016年1月到6月这半年间的历史天气和空气质量数据文件中分析出的环境统计结果，并在已搭建好的伪分布式集群或完全分布式集群的master服务器上运行，最后输出程序运行结果。具体实现功能如下：

（1）统计城市月度平均气温，输出年月和每月平均气温。

源代码：

public class TmpStat {

    public static class TmpStatMapper

        extends Mapper<Object, Text,Text, IntWritable> {

                private Text dateKey = new Text();

        private IntWritable tmpValue = new IntWritable();

        public void map(Object key, Text value, Context context)

                throws IOException, InterruptedException {

            //对行文本进行分词

            String[] items = value.toString().split(",");

            String date = items[0];

            String tmp = items[5];

            //过滤第一行数据和温度异常数据

            if (date.equals("DATE") || tmp.equals("N/A"))

                return;

            dateKey.set(date.substring(0, 6));

            tmpValue.set(Integer.parseInt(tmp));

            context.write(dateKey, tmpValue);

        }

    }

        public static class TmpSumReducer

                extends Reducer<Text,IntWritable,Text,IntWritable> {

            private IntWritable result = new IntWritable();

            public void reduce(Text key, Iterable<IntWritable> values,

                               Context context

            ) throws IOException, InterruptedException {

                int sum = 0;

                int count = 0;

                for (IntWritable val : values) {

                    sum += val.get();

                    count++;

                }

                int tmpAvg = sum / count;

                result.set(tmpAvg);

                context.write(key, result);

            }

        }

            public static void main(String[] args) throws Exception {

                Configuration conf = new Configuration();

                conf.set("fs.defaultFS", "hdfs://hadoop:9000");

                conf.set("hadoop.job.user", "root");

                conf.set("mapreduce.app-submission.cross-platform", "true");

                System.setProperty("HADOOP_USER_NAME","root");

                Job job = Job.getInstance(conf, "tmp stat");

                job.setJarByClass(TmpStat.class);

                job.setMapperClass(TmpStat.TmpStatMapper.class);

                job.setCombinerClass(TmpStat.TmpSumReducer.class);

                job.setReducerClass(TmpStat.TmpSumReducer.class);

                job.setOutputKeyClass(Text.class);

                job.setOutputValueClass(IntWritable.class);

                FileInputFormat.addInputPath(job, new Path(args[0]));

                FileOutputFormat.setOutputPath(job, new Path(args[1]));

                System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

运行结果：

（2）统计城市每日空气质量，输出年月日和每日空气。

源代码：

public class dateaql {

    public static class TmpStatMapper

            extends Mapper<Object, Text,Text, IntWritable> {

        private Text dateKey = new Text();

        private IntWritable tmpValue = new IntWritable();

        public void map(Object key, Text value, Context context)

                throws IOException, InterruptedException {

            //对行文本进行分词

            String[] items = value.toString().split(",");

            String date = items[0];

            String aqi = items[6];

            //过滤第一行数据和温度异常数据

            if (date.equals("DATE") || aqi.equals("N/A"))

                return;

            dateKey.set(date);

            tmpValue.set(Integer.parseInt(aqi));

            context.write(dateKey, tmpValue);

        }

    }

    public static class TmpSumReducer

            extends Reducer<Text,IntWritable,Text,IntWritable> {

        private IntWritable result = new IntWritable();



        public void reduce(Text key, Iterable<IntWritable> values,

                           Context context

        ) throws IOException, InterruptedException {

            int sum = 0;

            int count = 0;

            for (IntWritable val : values) {

                sum += val.get();

                count++;

            }

            int tmpAvg = sum / count;

            result.set(tmpAvg);

            context.write(key, result);



        }

    }

    public static void main(String[] args) throws Exception {

        Configuration conf = new Configuration();

        conf.set("fs.defaultFS", "hdfs://hadoop:9000");

        conf.set("hadoop.job.user", "root");

        conf.set("mapreduce.app-submission.cross-platform", "true");

        System.setProperty("HADOOP_USER_NAME","root");

        Job job = Job.getInstance(conf, "tmp stat");

        job.setJarByClass(dateaql.class);

        job.setMapperClass(dateaql.TmpStatMapper.class);

        job.setCombinerClass(dateaql.TmpSumReducer.class);

        job.setReducerClass(dateaql.TmpSumReducer.class);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));

        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

运行结果：

（3）统计城市各空气质量的天数，输出空气质量和总天数。

源代码：

public class AqiStat

{

    public static final String GOOD = "优";

    public static final String MODERATE = "良";

    public static final String LIGHTLY_POLLUTED = "轻度污染";

    public static final String MODERATELY_POLLUTED = "中度污染";

    public static final String HEAVILY_POLLUTED = "重度污染";

    public static final String SEVERELY_POLLUTED = "严重污染";

    public static class StatMapper extends Mapper<Object, Text, Text, IntWritable>

    {

        private final static IntWritable one = new IntWritable(1);

        private Text cond = new Text();

        // map方法，根据AQI值，将对应空气质量的天数加1

        public void map(Object key, Text value, Context context)

                throws IOException, InterruptedException

        {

            String[] items = value.toString().split("\t");

            int aqi = Integer.parseInt(items[1]);

            if(aqi <= 50)

            {

                // 优

                cond.set(GOOD);

            }

            else if(aqi <= 100)

            {

                // 良

                cond.set(MODERATE);

            }

            else if(aqi <= 150)

            {

                // 轻度污染

                cond.set(LIGHTLY_POLLUTED);

            }

            else if(aqi <= 200)

            {

                // 中度污染

                cond.set(MODERATELY_POLLUTED);

            }

            else if(aqi <= 300)

            {

                // 重度污染

                cond.set(HEAVILY_POLLUTED);

            }

            else

            {

                // 严重污染

                cond.set(SEVERELY_POLLUTED);

            }



            context.write(cond, one);

        }

    }

    public static class StatReducer extends Reducer<Text, IntWritable, Text, IntWritable>

    {

        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable<IntWritable> values,Context context)

                throws IOException, InterruptedException

        {

            int sum = 0;

            for (IntWritable val : values)

            {

                sum += val.get();

            }

            result.set(sum);

            context.write(key, result);

        }

    }

    public static void main(String args[])

            throws IOException, ClassNotFoundException, InterruptedException

    {

        Configuration conf = new Configuration();

        conf.set("fs.defaultFS", "hdfs://hadoop:9000");

        conf.set("hadoop.job.user", "root");

        conf.set("mapreduce.app-submission.cross-platform", "true");

        System.setProperty("HADOOP_USER_NAME","root");

        Job job = Job.getInstance(conf, "AqiStat");

        job.setInputFormatClass(TextInputFormat.class);

        TextInputFormat.setInputPaths(job, args[0]);

        job.setJarByClass(AqiStat.class);

        job.setMapperClass(StatMapper.class);

        job.setCombinerClass(StatReducer.class);

        job.setMapOutputKeyClass(Text.class);

        job.setMapOutputValueClass(IntWritable.class);

        job.setPartitionerClass(HashPartitioner.class);

        job.setReducerClass(StatReducer.class);

        job.setNumReduceTasks(2);

        job.setOutputKeyClass(Text.class);

        job.setOutputValueClass(IntWritable.class);

        job.setOutputFormatClass(TextOutputFormat.class);

        TextOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

运行结果：

m0_58540923

关注

3
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
MapReduce对环境大数据进行统计分析

要求每位学生编写MapReduce程序，分别从气温和空气质量这2个维度分析天气情况，从北京2016年1月到6月这半年间的历史天气和空气质量数据文件中分析出的环境统计结果，并在已搭建好的伪分布式集群或完全分布式集群的master服务器上运行，最后输出程序运行结果。如果每小时记录一次城市的天气实况和空气质量实况信息，则每个城市每天都会产生24条环境数据，全国所有2500多个城市如果均如此进行记录，那每天产生的数据量将达到6万多条，每年则会产生2190万条记录，已经可以称得上环境大数据。
复制链接

扫一扫