Hadoop Tool,ToolRunner原理分析

最新推荐文章于 2021-03-21 15:59:54 发布

鹅

最新推荐文章于 2021-03-21 15:59:54 发布

阅读量568

点赞数

分类专栏： lucene实战管理

lucene实战管理专栏收录该内容

10 篇文章 0 订阅

订阅专栏

Hadoop Tool,ToolRunner原理分析

By Mr.King on 2013/05/23

先看Configurable 接口：

1
2
3
4

public interface Configurable {
void setConf (Configuration conf ) ;
Configuration getConf ( ) ;
}

Configurable接口只定义了两个方法：setConf与 getConf。
Configured类实现了Configurable接口：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

public class Configured implements Configurable {
private Configuration conf ;
public Configured ( ) {
this ( null ) ;
}

public Configured (Configuration conf ) {
setConf (conf ) ;
}

public void setConf (Configuration conf ) {
this. conf = conf ;
}
public Configuration getConf ( ) {
return conf ;
}
}

Tool接口继承了Configurable接口，只有一个run()方法。(接口继承接口)

1
2
3

public interface Tool extends Configurable {
int run ( String [ ] args ) throws Exception ;
}

继承关系如下：

再看ToolRunner类的一部分：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

public class ToolRunner {
public static int run (Configuration conf, Tool tool, String [ ] args )
throws Exception {
if (conf == null ) {
conf = new Configuration ( ) ;
}

GenericOptionsParser parser = new GenericOptionsParser (conf, args ) ;
//set the configuration back, so that Tool can configure itself
tool. setConf (conf ) ;
//get the args w/o generic hadoop args
String [ ] toolArgs = parser. getRemainingArgs ( ) ;
return tool. run (toolArgs ) ;

}
}

从ToolRunner的静态方法run()可以看到，其通过GenericOptionsParser 来读取传递给run的job的conf和命令行参数args，处理hadoop的通用命令行参数，然后将剩下的job自己定义的参数(toolArgs = parser.getRemainingArgs();)交给tool来处理,再由tool来运行自己的run方法。

通用命令行参数指的是对任意的一个job都可以添加的，如：

-conf < configuration file >     specify a configuration file
-D < property=value >            use value for given property
-fs < local|namenode:port >      specify a namenode
-jt < local|jobtracker:port >    specify a job tracker
-files < comma separated list of files >    specify comma separated files to be copied to the map reduce cluster
-libjars < comma separated list of jars >   specify comma separated jar files to include in the classpath.
-archives < comma separated list of archives >    specify comma separated archives to be unarchived on the compute machines.

一个典型的实现Tool的程序：

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

/**

MyApp 需要从命令行读取参数，用户输入命令如，

$bin/hadoop jar MyApp.jar -archives test.tgz arg1 arg2

-archives 为hadoop通用参数，arg1 ,arg2为job的参数

*/

public class MyApp extends Configured implements Tool {

//implemet Tool’s run

public int run ( String [ ] args ) throws Exception {

Configuration conf = getConf ( ) ;

// Create a JobConf using the processed conf

JobConf job = new JobConf (conf, MyApp. class ) ;

// Process custom command-line options

Path in = new Path (args [ 1 ] ) ;

Path out = new Path (args [ 2 ] ) ;

// Specify various job-specific parameters

job. setJobName ( "my-app" ) ;

job. setInputPath (in ) ;

job. setOutputPath (out ) ;

job. setMapperClass (MyApp. MyMapper. class ) ;

job. setReducerClass (MyApp. MyReducer. class ) ;

JobClient. runJob (job ) ;

}

public static void main ( String [ ] args ) throws Exception {

// args由ToolRunner来处理

int res = ToolRunner. run ( new Configuration ( ), new MyApp ( ), args ) ;

System. exit (res ) ;

}

}

THE END

最新最早最热

2条评论

大大头

ToolRunner的run方法主要是调用Tool的run方法实现

6月6日回复顶转发
happy_pingli

写得很具体：
补充：从ToolRunner的静态方法run()可以看到，其通过GenericOptionsParser 来读取传递给run的job的conf和命令行参数args，处理hadoop的通用命令行参数，然后将剩下的job自己定义的参数(toolArgs = parser.getRemainingArgs();)交给tool来处理,再由tool来运行自己的run方法。
补充conf参数：
Configured Parameters
The following properties are localized in the job configuration for each task's execution:
Name Type Description
mapred.job.id String The job id
mapred.jar String job.jar location in job directory
job.local.dir String The job specific shared scratch space
mapred.tip.id String The task id
mapred.task.id String The task attempt id
mapred.task.is.map boolean Is this a map task
mapred.task.partition int The id of the task within the job
map.input.file String The filename that the map is reading from
map.input.start long The offset of the start of the map input split
map.input.length long The number of bytes in the map input split
mapred.work.output.dir String The task's temporary output directory

鹅

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Hadoop Tool,ToolRunner原理分析

Hadoop Tool,ToolRunner原理分析By Mr.King on2013/05/23 先看Configurable 接口：1234public interface Configurable{void setConf(Configuration conf); Configuration getConf()
复制链接

扫一扫