把自写的程序在集群上运行,使用动态传参。
hadoop jar wc.jar com.study.mapreduce.wordcount.WordCountDriver -Dmapreduce.job.queuename=root.test /input /output
传入的参数的数组元素有3个,分别是-Dmapreduce.job.queuename=root.test、/inpu、 /output,而程序里的输入输出路径为传入数组的第一第二个元素,所以需要编写Yarn的Tool接口动态修改参数。
步骤:
(1)新建Maven项目YarnDemo,修改pom:
<?xmlversion="1.0" encoding="UTF-8"?>
<projectxmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.study.hadoop</groupId>
<artifactId>yarn_tool_test</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.3</version>
</dependency>
</dependencies>
</project>
(2)新建包名com.study.yarn
(3)创建类WordCount并实现Tool接口:
package com.study.yarn;
importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.fs.Path;
importorg.apache.hadoop.io.IntWritable;
importorg.apache.hadoop.io.LongWritable;
importorg.apache.hadoop.io.Text;
importorg.apache.hadoop.mapreduce.Job;
importorg.apache.hadoop.mapreduce.Mapper;
importorg.apache.hadoop.mapreduce.Reducer;
importorg.apache.hadoop.mapreduce.lib.input.FileInputFormat;
importorg.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
importorg.apache.hadoop.util.Tool;
importjava.io.IOException;
public class WordCount implements Tool {
//包含mapper和reducer,且传统的驱动也要包含在run()函数里
private Configuration conf;
@Override
public int run(String[] args) throwsException {
Job job = Job.getInstance(conf);
job.setJarByClass(WordCountDriver.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, newPath(args[0]));
FileOutputFormat.setOutputPath(job, newPath(args[1]));
return job.waitForCompletion(true) ? 0: 1;
}
@Override
public void setConf(Configuration conf) {
this.conf = conf;
}
@Override
public Configuration getConf() {
return conf;
}
//mapper
public static class WordCountMapper extendsMapper<LongWritable, Text, Text, IntWritable> {
private Text outK = new Text();
private IntWritable outV = newIntWritable(1);
@Override
protected void map(LongWritable key,Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split("");
for (String word : words) {
outK.set(word);
context.write(outK, outV);
}
}
}
//reducer
public static class WordCountReducer extendsReducer<Text, IntWritable, Text, IntWritable> {
private IntWritable outV = newIntWritable();
@Override
protected void reduce(Text key,Iterable<IntWritable> values, Context context) throws IOException, InterruptedException{
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
outV.set(sum);
context.write(key, outV);
}
}
}
(4)新建WordCountDriver
packagecom.study.yarn;
importorg.apache.hadoop.conf.Configuration;
importorg.apache.hadoop.util.Tool;
importorg.apache.hadoop.util.ToolRunner;
importjava.util.Arrays;
public classWordCountDriver {
//tool的驱动
private static Tool tool;
public static void main(String[] args)throws Exception {
// 1. 创建配置文件
Configuration conf = newConfiguration();
// 2. 判断是否有tool接口
switch (args[0]){
case "wordcount":
tool = new WordCount();
break;
default:
throw newRuntimeException(" No such tool: "+ args[0] );
}
// 3. 用Tool执行程序
// Arrays.copyOfRange 将老数组的元素放到新数组里面,最后两个数组元素即输入输出路径
int run = ToolRunner.run(conf, tool, Arrays.copyOfRange(args, args.length-2,args.length));
System.exit(run);
}
}
(5)给项目打jar包放到集群环境中。
(6)进到jar包的存放目录,向集群提交jar文件执行,此时为3个参数,第一个用于生成特定的Tool,第二个和第三个为输入输出目录,显示正常运行。
yarn jar YarnDemo.jar com.study.yarn.WordCountDriver wordcount /input /output
(7)在wordcount后面添加参数,也就是4个参数
yarn jar YarnDemo.jar com.study.yarn.WordCountDriver wordcount -Dmapreduce.job.queuename=root.test /input /output1