【李老师云计算】作业二：参照Eclipse Mapreduc访问Hadoop文档，求解n个数的最大值_假定有a互联网公司对3万家中小企业提供服务,所需的配置每个季度的最后一个月需要(1)

最新推荐文章于 2024-05-24 02:08:26 发布

Git小发明

最新推荐文章于 2024-05-24 02:08:26 发布

阅读量695

点赞数 7

分类专栏：程序员文章标签：运维学习面试

本文链接：https://blog.csdn.net/m0_56169789/article/details/138738202

版权

程序员专栏收录该内容

1072 篇文章 7 订阅

订阅专栏

网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。

需要这份系统化的资料的朋友，可以点击这里获取！

一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://localhost:9000");
FileSystem fs = FileSystem.get(conf);
Path inputPath = new Path("/input");
Path outputPath = new Path("/output");
if (fs.exists(outputPath)) {
    fs.delete(outputPath, true);
}

3.2 作业配置实现

在配置MapReduce作业时，我们需要指定作业的输入、输出、Mapper类、Reducer类等信息。

Job job = Job.getInstance(conf, "max value");
job.setJarByClass(MaxValue.class);
job.setMapperClass(MaxValueMapper.class);
job.setCombinerClass(MaxValueReducer.class);
job.setReducerClass(MaxValueReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, inputPath);
FileOutputFormat.setOutputPath(job, outputPath);

在配置作业时，我们需要通过Job类的getInstance方法来获取一个作业实例，并指定作业的名称和运行配置。在本问题中，我们可以指定作业名称为"max value"。通过setJarByClass方法来指定运行作业的类，即MaxValue类。通过setMapperClass、setCombinerClass和setReducerClass方法来指定Mapper、Combine和Reducer类。通过setOutputKeyClass和setOutputValueClass方法来指定输出键和值的类型。最后，通过FileInputFormat和FileOutputFormat类的addInputPath和setOutputPath方法来指定作业的输入和输出路径。

public class MaxValue {
    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        conf.set("fs.defaultFS", "hdfs://localhost:9000");
        FileSystem fs = FileSystem.get(conf);
        Path inputPath = new Path("/input");
        Path outputPath = new Path("/output");
        if (fs.exists(outputPath)) {
            fs.delete(outputPath, true);
        }

        Job job = Job.getInstance(conf, "max value");
        job.setJarByClass(MaxValue.class);
        job.setMapperClass(MaxValueMapper.class);
        job.setCombinerClass(MaxValueReducer.class);
        job.setReducerClass(MaxValueReducer.class);
        job.setOutputKeyClass(IntWritable.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, inputPath);
        FileOutputFormat.setOutputPath(job, outputPath);

        boolean success = job.waitForCompletion(true);
        if (success) {
            System.out.println("Job completed successfully.");
        }
    }
}

在main函数中，我们首先创建一个Configuration对象，并设置默认文件系统为本地HDFS。然后，我们获取一个FileSystem对象，并指定输入和输出路径。在作业配置之后，我们通过调用waitForCompletion方法来等待作业运行完毕。最后，我们输出作业运行结果。如果作业运行成功，输出"Job completed successfully."。

3.3 Map过程的实现

public class MaxValueMapper extends Mapper<LongWritable, Text, IntWritable, IntWritable> {
    private final IntWritable one = new IntWritable(1);
    private IntWritable number = new IntWritable();

    @Override
    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        String[] numbers = value.toString().split(",");
        for (String s : numbers) {
            number.set(Integer.parseInt(s));
            context.write(one, number);
        }
    }
}

在Mapper类中，我们首先声明两个局部变量：一个IntWritable类型的one变量，用于作为键；一个IntWritable类型的number变量，用于作为值。在map函数中，我们首先将输入的一行文本转换为一个字符串数组，然后遍历该数组。对于数组中的每个元素，我们将其转换为一个整数，并将其赋值给number变量。接下来，我们将one作为键，number作为值，通过调用Context对象的write方法写入上下文。这样，Map函数就将每个输入数值作为值输出，而将固定的键1与每个数值组合。

在Reducer类中，我们同样首先声明一个局部变量，用于保存输入值的最大值。在reduce函数中，对于每个键值对，我们将值转换为一个整数，并与当前最大值进行比较。如果值大于当前最大值，则将该值赋值给最大值变量。最后，我们通过调用Context对象的write方法将最大值写入上下文。

public static class MaxReducer extends Reducer<IntWritable, IntWritable, IntWritable, IntWritable> {

    private IntWritable result = new IntWritable();

    public void reduce(IntWritable key, Iterable<IntWritable> values, Context context)
            throws IOException, InterruptedException {
        int max = Integer.MIN_VALUE;
        for (IntWritable val : values) {
            max = Math.max(max, val.get());
        }
        result.set(max);
        context.write(key, result);
    }
}

完整的代码如下：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class MaxValue {

    public static class MaxValueMapper extends Mapper<LongWritable, Text, LongWritable, LongWritable> {

        private LongWritable lineNumber = new LongWritable();
        private LongWritable maxNumber = new LongWritable();

        @Override
        public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

            String line = value.toString();
            String[] numbers = line.split(",");

            long max = Long.MIN_VALUE;
            for (String number : numbers) {
                long currentNumber = Long.parseLong(number.trim());
                if (currentNumber > max) {
                    max = currentNumber;
                }
            }

            lineNumber.set(key.get());
            maxNumber.set(max);

            context.write(lineNumber, maxNumber);
        }
    }

    public static class MaxValueReducer extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable> {

        private LongWritable maxLineNumber = new LongWritable();
        private LongWritable maxValue = new LongWritable(Long.MIN_VALUE);

        @Override
        public void reduce(LongWritable key, Iterable<LongWritable> values, Context context)
                throws IOException, InterruptedException {

            long localMax = Long.MIN_VALUE;
            for (LongWritable value : values) {
                long currentValue = value.get();
                if (currentValue > localMax) {
                    localMax = currentValue;
                }
            }

            if (localMax > maxValue.get()) {
                maxValue.set(localMax);
                maxLineNumber.set(key.get());
            }
        }

        @Override
        protected void cleanup(Context context) throws IOException, InterruptedException {
            context.write(maxLineNumber, maxValue);
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Max Value");

        job.setJarByClass(MaxValue.class);
        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);
        job.setMapperClass(MaxValueMapper.class);
        job.setReducerClass(MaxValueReducer.class);
        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(LongWritable.class);

        TextInputFormat.addInputPath(job, new Path(args[0]));
        TextOutputFormat.setOutputPath(job, new Path(args[1]));

        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }
}

创建一个输入文件，其中每行包含一个或多个用逗号分隔的整数，例如：

1, 2, 3, 4, 5
6, 7, 8, 9, 10
11, 12, 13, 14, 15

该文件中包含三行整数，每行包含五个整数。我们的程序将读取该文件，并返回其中的最大值，即15。

接下来，我们可以在Eclipse中创建一个MapReduce项目，并将上述代码复制到项目中的相应文件中。在项目中，我们需要在项目的src/main/resources目录下创建一个名为input的文件夹，并将上述测试数据文件复制到该文件夹中。

现在我们可以运行程序。首先，我们需要将项目打包成JAR文件。在Eclipse中，我们可以右键单击项目，选择Export，然后选择JAR文件并按照向导的指示进行操作。将生成的JAR文件上传到Hadoop集群中，并使用以下命令在Hadoop集群上运行程序：

hadoop jar maxvalue.jar input output

其中，maxvalue.jar是我们生成的JAR文件，input是输入文件夹的路径，output是输出文件夹的路径。运行程序后，Hadoop将在输出文件夹中生成一个名为part-r-00000的文件，其中包含计算出的最大值。我们可以使用以下命令查看该文件中的结果：

hadoop fs -cat output/part-r-00000

该命令将输出键值对(key-value)，最终的答案是14 15

先自我介绍一下，小编浙江大学毕业，去过华为、字节跳动等大厂，目前在阿里

深知大多数程序员，想要提升技能，往往是自己摸索成长，但自己不成体系的自学效果低效又漫长，而且极易碰到天花板技术停滞不前！

因此收集整理了一份《2024年最新Linux运维全套学习资料》，初衷也很简单，就是希望能够帮助到想自学提升又不知道该从何学起的朋友。

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上运维知识点，真正体系化！

由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新

需要这份系统化的资料的朋友，可以点击这里获取！

外链图片转存中…(img-NaORuX7I-1715455242245)]

既有适合小白学习的零基础资料，也有适合3年以上经验的小伙伴深入学习提升的进阶课程，涵盖了95%以上运维知识点，真正体系化！

由于文件比较多，这里只是将部分目录截图出来，全套包含大厂面经、学习笔记、源码讲义、实战项目、大纲路线、讲解视频，并且后续会持续更新

需要这份系统化的资料的朋友，可以点击这里获取！

Git小发明

关注

7
点赞
踩
26

收藏

觉得还不错? 一键收藏
0
评论
【李老师云计算】作业二：参照Eclipse Mapreduc访问Hadoop文档，求解n个数的最大值_假定有a互联网公司对3万家中小企业提供服务,所需的配置每个季度的最后一个月需要(1)

在配置作业时，我们需要通过Job类的getInstance方法来获取一个作业实例，并指定作业的名称和运行配置。最后，我们输出作业运行结果。其中，maxvalue.jar是我们生成的JAR文件，input是输入文件夹的路径，output是输出文件夹的路径。运行程序后，Hadoop将在输出文件夹中生成一个名为part-r-00000的文件，其中包含计算出的最大值。在项目中，我们需要在项目的src/main/resources目录下创建一个名为input的文件夹，并将上述测试数据文件复制到该文件夹中。
复制链接

扫一扫