Hadoop Tool,ToolRunner原理分析

Hadoop Tool,ToolRunner原理分析

先看Configurable 接口:

1
2
3
4
public interface Configurable {
void setConf (Configuration conf ) ;
  Configuration getConf ( ) ;
}

Configurable接口只定义了两个方法:setConf与 getConf。
Configured类实现了Configurable接口:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
public class Configured implements Configurable {
  private Configuration conf ;
    public Configured ( ) {
    this ( null ) ;
  }

  public Configured (Configuration conf ) {
    setConf (conf ) ;
  }
 
  public void setConf (Configuration conf ) {
    this. conf = conf ;
  }
  public Configuration getConf ( ) {
    return conf ;
  }
}

Tool接口继承了Configurable接口,只有一个run()方法。(接口继承接口)

1
2
3
public interface Tool extends Configurable {
  int run ( String [ ] args ) throws Exception ;
}

继承关系如下:

再看ToolRunner类的一部分:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
public class ToolRunner {
  public static int run (Configuration conf, Tool tool, String [ ] args )
  throws Exception {
    if (conf == null ) {
     conf = new Configuration ( ) ;
    }

    GenericOptionsParser parser = new GenericOptionsParser (conf, args ) ;
    //set the configuration back, so that Tool can configure itself
    tool. setConf (conf ) ;
    //get the args w/o generic hadoop args
    String [ ] toolArgs = parser. getRemainingArgs ( ) ;
    return tool. run (toolArgs ) ;

  }
}

从ToolRunner的静态方法run()可以看到,其通过GenericOptionsParser 来读取传递给run的job的conf和命令行参数args,处理hadoop的通用命令行参数,然后将剩下的job自己定义的参数(toolArgs = parser.getRemainingArgs();)交给tool来处理,再由tool来运行自己的run方法。

通用命令行参数指的是对任意的一个job都可以添加的,如:

-conf < configuration file >     specify a configuration file
-D < property=value >            use value for given property
-fs < local|namenode:port >      specify a namenode
-jt < local|jobtracker:port >    specify a job tracker
-files < comma separated list of files >    specify comma separated files to be copied to the map reduce cluster
-libjars < comma separated list of jars >   specify comma separated jar files to include in the classpath.
-archives < comma separated list of archives >    specify comma separated archives to be unarchived on the compute machines.

一个典型的实现Tool的程序:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
/**

MyApp 需要从命令行读取参数,用户输入命令如,

$bin/hadoop jar MyApp.jar -archives test.tgz  arg1 arg2

-archives 为hadoop通用参数,arg1 ,arg2为job的参数

*/


public class MyApp extends Configured implements Tool {

//implemet Tool’s run

    public int run ( String [ ] args ) throws Exception {

        Configuration conf = getConf ( ) ;

// Create a JobConf using the processed conf

        JobConf job = new JobConf (conf, MyApp. class ) ;

// Process custom command-line options

        Path in = new Path (args [ 1 ] ) ;

        Path out = new Path (args [ 2 ] ) ;

// Specify various job-specific parameters

        job. setJobName ( "my-app" ) ;

        job. setInputPath (in ) ;

        job. setOutputPath (out ) ;

        job. setMapperClass (MyApp. MyMapper. class ) ;

        job. setReducerClass (MyApp. MyReducer. class ) ;

       

        JobClient. runJob (job ) ;

    }

    public static void main ( String [ ] args ) throws Exception {

// args由ToolRunner来处理

        int res = ToolRunner. run ( new Configuration ( ), new MyApp ( ), args ) ;

        System. exit (res ) ;

    }

}

THE END

  • 大大头
    大大头

    ToolRunner的run方法主要是调用Tool的run方法实现

  • happy_pingli

    写得很具体:
    补充:从ToolRunner的静态方法run()可以看到,其通过GenericOptionsParser 来读取传递给run的job的conf和命令行参数args,处理hadoop的通用命令行参数,然后将剩下的job自己定义的参数(toolArgs = parser.getRemainingArgs();)交给tool来处理,再由tool来运行自己的run方法。
    补充conf参数:
    Configured Parameters
    The following properties are localized in the job configuration for each task's execution:
    Name Type Description
    mapred.job.id String The job id
    mapred.jar String job.jar location in job directory
    job.local.dir String The job specific shared scratch space
    mapred.tip.id String The task id
    mapred.task.id String The task attempt id
    mapred.task.is.map boolean Is this a map task
    mapred.task.partition int The id of the task within the job
    map.input.file String The filename that the map is reading from
    map.input.start long The offset of the start of the map input split
    map.input.length long The number of bytes in the map input split
    mapred.work.output.dir String The task's temporary output directory

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值