准备环节
安装如下应用
1.Hadoop2.7.7
2.Jdk1.8.0
3.IDEA2019.1
上篇写过前两个的安装
这里简单说一下idea安装运行
http://www.jetbrains.com/idea/download/#section=windows
下载安装包解压然后在bin目录下运行idea.sh即可
嫌麻烦可以建一个快捷方式
创建项目
SDK指定jdk安装路径
自己设置项目名和包名,别重名就行
新建2个类TokenMapper,TokenReducer
导入相应依赖包
File->Project Structure->Modules->Dependencies
jar包在hadoop安装路径下share/hadoop
如我的路径为:/home/kona/app/hadoop2.7.7/share/hadoop/
将common和mapreduce及yarn下的jar包导入
Java代码
TokenMapper如下
package com.kona;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.mapreduce.Mapper;
public class TokenMapper
extends Mapper<Object, Text, Text, IntWritable>{
IntWritable one = new IntWritable(1);
Text word = new Text();
public void map(Object key, Text value, Context context) throws IOException,InterruptedException{
StringTokenizer st = new StringTokenizer(value.toString());
while(st.hasMoreTokens()) {
word.set(st.nextToken());
context.write(word, one);
}
}
}
TokenReducer如下
package com.kona;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import org.omg.PortableInterceptor.INACTIVE;
public class TokenReducer extends
Reducer<Text, IntWritable, Text, IntWritable>{
IntWritable result = new IntWritable();
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException{
int sum = 0;
for(IntWritable val:values){
sum += val.get();
}
result.set(sum);
context.write(key, result);
}
}
这里给Main改个名字为WordCount
右键点击Main选择Refactor->Rename
最后WordCount如下
package com.kona;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs();
if (otherArgs.length != 2) {
System.err.println("Usage:wordcount<in> <out>");
System.exit(2);
}
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class);
job.setMapperClass(TokenMapper.class);
job.setReducerClass(TokenReducer.class);
job.setOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
System.exit(job.waitForCompletion(true)?0:1);
}
}
试直接运行
先建立一个保存输入文件的目录如/home/kona/input
然后新建一个文件随便写点单词句子如下:
Weather Predict A film was on location deep in the. One day an old Indian went up to the director and, "Tomorrow rain." The next day it rained.A later, the Indian went up to the director and said, "Tomorrow." The next day there was a hailstorm. "This Indian is incredible," said the director. He told his secretary to hire the Indian to predict the weather. However, after several successful predictions, the old Indian didn't show up for two weeks. Finally the director sent for him. "I have to shoot a big scene tomorrow," said the director, "and I'm depending on you. What will the weather be like?"
One morning a fox saw a cock.He thought,"This is my breakfast.'' He came up to the cock and said,"I know you can sing very well.Can you sing for me?''The cock was glad.He closes his eyes and began to sing.The fox saw that and caught him in his mouth and carried him away. The people in the field saw the fox.They cried,"Look,look!The fox is carrying the cock away.'' The cock said to the fox,"Mr Fox,do you understand?The people say you are carrying their cock away.Tell them it is yours.Not theirs.'' The fox opened his mouth and said,"The cock is mine,not yours.''Just then the cock ran away from the fox and fled into the tree.
然后设置命令行参数
两个参数对应WordCount的main方法的args[0]和args[1]
其中第二个不需要自己建立
然后直接运行,没有报错即说明可以运行成功
(常见错误NoClassDefFoundError是由于有依赖包未导入)
然后在刚才设置的第二个参数目录下可以发现出现两个文件
其中part-r-00000为输出结果
部分结果如下
打Jar包
File->Project Structure->Artifacts
改第一个框的名字为WordCount(随意),双击第二个框 ,然后点击第三个框,弹框选择路径
我直接选择项目根目录,ok之后发现根目录下多了一个.mf文件
编辑mf文件如下
Manifest-Version: 1.0
Main-Class: com.kona.WordCount
然后头顶工具栏Build->Build Artifacts
点击build等待完成
在out/artifacts/WordCount中得到打包的jar
Hadoop上运行jar
启动hadoop
确保hadoop相关五个进程都运行成功
在hdfs上创建一个用于保存文件的目录并将准备好的测试文档上传到此目录,如in
运行jar
命令:hadoop jar jar路径 输入 输出
如
hadoop jar /home/kona/IdeaProjects/Test/out/artifacts/WordCount/WordCount.jar /in /out
其中路径为hdfs上的路径
验证结果
部分结果如下
运行完成