Hadoop-学习笔记三---API的更新

最新推荐文章于 2024-10-12 16:08:04 发布

johnson_it

最新推荐文章于 2024-10-12 16:08:04 发布

阅读量442

点赞数

分类专栏： hadoop 文章标签： mapreduce api deprecated class output hadoop

本文链接：https://blog.csdn.net/johnson_it/article/details/6760277

版权

hadoop 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

当我们在配置好的环境中运行mapreduce程序时，会发现很多类已经deprecated，这是因为在0.20版本中，Hadoop中的API做了许多改动，主要的几个改动如下。许多类在新的API中被重写，现在的较为业界熟悉和稳定的API版本为0.18.3.

1. 首先，原本在hadoop.mapred包中的类被移动到了hadoop.mapreduce包中，另外还有一些被移动了apache.hadoop.mapreduce.lib中，所以当我们在使用新的API中，将不会再从旧的包中引用类。

2. 一个较为重要的改动时，之前mapper和reducer上的outputCollector以及reporter类被Context类，新api中将使用context write()方法输出map的结果。这个改动的意义在于统一客户代码和mapreduce框架的交互，并且使mapper和reducer的API更加稳定。当有信工能加入的时候，mapper不用做改变，这些新功能将被加入到context类中。所以之前写的代码在有新的功能加入的情况下页不用改变。

3. mapper和reducer方法也做了相应的改变，之前的mapper和reducer接口将被移除，而将用到新的mapper和reducer抽象类。并且将能够抛出新的异常，interruptedExceptuon（），另外，reduce（）将接受iterable作为参数，这样运用JAVA的for each语法，可以更方便的遍历values

4. 新旧Mappe和reducer的对比：

public static class MapClass extends MapReduceBase
    implements Mapper<K1, V1, K2, V2> {
    public void map(K1 key, V1 value,
                    OutputCollector<K2, V2> output,
                    Reporter reporter) throws IOException { }
}
public static class Reduce extends MapReduceBase
    implements Reducer<K2, V2, K3, V3> {
    public void reduce(K2 key, Iterator<V2> values,
                       OutputCollector<K3, V3> output,
                       Reporter reporter) throws IOException { }
}
The new API has simplified them somewhat:
public static class MapClass extends Mapper<K1, V1, K2, V2> {
    public void map(K1 key, V1 value, Context context)
                    throws IOException, InterruptedException { }
public static class Reduce extends Reducer<K2, V2, K3, V3> {
    public void reduce(K2 key, Iterable<V2> values, Context context)
                       throws IOException, InterruptedException { }
}

另外，mapreduce的驱动程序也做了比较大的变动，以前的jobconf和Jobclient类被替换，这些功能被加入到两个类中，一个configuration和一个job类中，configration是之前jobconf的父类，现在将主要负责job的配置，job将负责设计和控制job的执行，之前的setOutputKeyClass()等方法都将放入job类中。

JobConf job = new JobConf(conf, MyJob.class);
job.setJobName("MyJob");
Now it’s done through Job:
Job job = new Job(conf, "MyJob");
job.setJarByClass(MyJob.class);
Previously JobClient submitted a job for execution:
JobClient.runJob(job);
Now it’s also done through Job:
System.exit(job.waitForCompletion(true)?0:1);