Driver驱动类中job的任务提交源码解读

最新推荐文章于 2022-07-17 21:17:40 发布

desYang

最新推荐文章于 2022-07-17 21:17:40 发布

阅读量612

点赞数

本文链接：https://blog.csdn.net/ygyblue2/article/details/81110824

版权

Driver的提交：

在此以简单的WordCount为例，通过DEBUG来了解Driver中submit()方法的执行流程（案例不是关键，重在通过源码学习submit的设计原理）

1.前期准备：WordCount部分，在本地或虚拟机hadoop目录下创建一个简单的txt文本文件即可（我在本地d盘创建），内容随意如

hadoop hadoop
spark
hadoop atguigu
spark
hello WordCount

接下来各位可将以下代码拷贝到idea等编译工具，

① pom.xml 中的依赖：

<dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>RELEASE</version>
        </dependency>
        <dependency>
            <groupId>org.apache.logging.log4j</groupId>
            <artifactId>log4j-core</artifactId>
            <version>2.8.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.7.2</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.7.2</version>
        </dependency>
    </dependencies>

② WordCountMapper：

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/**
 * Created by ygyblue2
 */
public class WordCountMapper extends Mapper {
    Text k = new Text();
    IntWritable v  =  new IntWritable(1);

    @Override
    protected void map(Object key, Object value, Context context) throws IOException, InterruptedException {

        //1.将value转化为string
        String line = value.toString();
        //2.切分
        String[] words = line.split(" ");
        //3.写出
        for (String word : words) {
        k.set(word);
        context.write(k,v);
        }
    }
}

③ WordCountReducer

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;


import java.io.IOException;

/**
 * Created by ygyblue2
 */
public class WordCountReducer extends Reducer<Text,IntWritable,Text,IntWritable>{

    IntWritable v = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        //1.累加
        int sum = 0;
        for ( IntWritable value : values){
            sum += value.get();
        }
        //2.写出
        context.write(key,v);
        }
}

④ WordDriver

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

/**
 * Created by ygyblue2
 */
public class WordDriver {
    public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
        //1.获取job实体
        Configuration configuration = new Configuration();
        Job job = Job.getInstance(configuration);
        //2.jar包所在路径
        job.setJarByClass(WordDriver.class);
        //3.设置自定义的mapper和reducer
        job.setMapperClass(WordCountMapper.class);
        job.setReducerClass(WordCountReducer.class);
        //4.设置mapper输出的kv类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);
        //5.设置最终的输出类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        //6.要出的文件所在路径
        FileInputFormat.setInputPaths(job,new Path(args[0]));
        //7.输出结果将要存放的路径
        FileOutputFormat.setOutputPath(job,new Path(args[1]));

        //8.提交(我们需要debug的部分),waitForCompletion内部同样调用submit()
        boolean result = job.waitForCompletion(true);
        //System.exit(result ? 0: 1);
    }
}

2.在WordCountDriver的最后一行job.waitForCompletion除断点,运行debug并进入

---step Into 到submit();向下执行

进入submit()后首先比较重要的是this.connect()，connect方法通过构造Cluster对象来建立与集群的连接

setUseNewAPI将指引使用新版hadoopAPI接口

---step Into connect()

---Ctrl进入Cluster类构造器查看并断点

在CLuster初始化时会调用initialize()，此方法内部通过ClientProtocolProvider对象的creat()加载集群的配置文件,并会判断MapReduce运行在何种框架上来返回一个LocalJobRunner或YarnJobRunner协议对象(多态)，继续向下执行会发现经过赋值最终client = clientProtocol , 所以local或是yarn的客户端本质即是JobRunner。initialize()的目的是通过加载配置类来创建客户端。