定义框架接口
由具体实现类实现
ToolRunner
同一的入口调用
按配置解析参数,调用接口方法
Mahout 中具体调用示例
覆盖方法,提取参数,调用核心方法
核心方法,配置job,开始map reduce任务
[img]http://dl.iteye.com/upload/attachment/0062/3801/ead12e2b-f3c4-3ee8-892e-17a30ea2dfaa.jpg[/img]
由具体实现类实现
public interface Tool extends Configurable {
int run(String [] args) throws Exception;
}
ToolRunner
同一的入口调用
按配置解析参数,调用接口方法
public static int run(Configuration conf, Tool tool, String[] args)
throws Exception{
if(conf == null) {
conf = new Configuration();
}
GenericOptionsParser parser = new GenericOptionsParser(conf, args);
//set the configuration back, so that Tool can configure itself
tool.setConf(conf);
//get the args w/o generic hadoop args
String[] toolArgs = parser.getRemainingArgs();
return tool.run(toolArgs);
}
Mahout 中具体调用示例
public static void main(String[] args) throws Exception {
ToolRunner.run(new Configuration(), new MinHashDriver(), args);
}
覆盖方法,提取参数,调用核心方法
@Override
public int run(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
addInputOption();
addOutputOption();
//...........
runJob(input,
output,
minClusterSize,
minVectorSize,
hashType,
numHashFunctions,
keyGroups,
numReduceTasks,
debugOutput);
return 0;
}
核心方法,配置job,开始map reduce任务
private void runJob(Path input,
Path output,
int minClusterSize,
int minVectorSize,
String hashType,
int numHashFunctions,
int keyGroups,
int numReduceTasks,
boolean debugOutput) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = getConf();
//配置参数设置........................
Job job = new Job(conf, "MinHash Clustering");
job.setJarByClass(MinHashDriver.class);
//Job参数设置.........................
job.waitForCompletion(true);
}
[img]http://dl.iteye.com/upload/attachment/0062/3801/ead12e2b-f3c4-3ee8-892e-17a30ea2dfaa.jpg[/img]