- 新建maven项目
- +Create New Project… ->Maven -> Next
- 填写好GroupId和ArtifactId 点击Next -> Finish
- +Create New Project… ->Maven -> Next
- 编写wordcount项目
- 建立项目结构目录:右键java -> New -> package 输入package路径(本例是com.hadoop.wdcount)建立package。类似的方式在创建好的package下建立三个类WordcountMain、WordcountMapper、WordcountReducer
2、 编写pom.xml配置(引入要用到的hadoop的jar包)
- 建立项目结构目录:右键java -> New -> package 输入package路径(本例是com.hadoop.wdcount)建立package。类似的方式在创建好的package下建立三个类WordcountMain、WordcountMapper、WordcountReducer
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>sa.hadoop</groupId>
<artifactId>wordcount</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.7</version>
<!-- 我们用的是2.7.7版本的hadoop -->
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.7.7</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.7</version>
</dependency>
</dependencies>
</project>
3、 编写项目代码
完成刚刚建立的三个类中的逻辑实现。
(1) WordcountMapper.java
package com.hadoop.wdcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import java.io.IOException;
public class WordcountMapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,Text,IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line=value.toString();
String[] words=line.split(" ");
for (String word:words){
context.write(new Text(word),new IntWritable(1));
}
}
}
(2)WordcountReducer.java
package com.hadoop.wdcount;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
import java.util.Iterator;
public class WordcountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
Integer counts=0;
for (IntWritable value:values){
counts+=value.get();
}
context.write(key,new IntWritable(counts));
}
}
(3)WordcountMain.java
package com.hadoop.wdcount;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordcountMain {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
//获取job对象
Job job = Job.getInstance(conf, "wordcount");
//设置jar存储位置
job.setJarByClass(WordcountMain.class);
//关联map和reduce类
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
//设置mapper阶段输出数据的key和value类型
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
//设置最终数据输出的key和value类型
//job.setOutputKeyClass(Text.class);
//job.setOutputValueClass(IntWritable.class);
//设置输入路径和输出路径
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//提交job
boolean flag = job.waitForCompletion(true);
if (!flag) {
System.out.println("wordcount failed!");
}
}
}
-
将项目打包成jar
- 右键项目名称 -> Open Module Settings
- Artifacts -> + -> JAR -> From modules with dependencies…
- 填写Main Class(点击…选择WordcountMain),然后选择extract to the target JAR,点击OK。
- 勾选include in project build ,其中Output directory为最后的输出目录,下面output layout是输出的各jar包,点击ok
- 点击菜单Build——>Build Aritifacts…
- 选择Build,结果可到前面4的output目录查看或者项目结构中的out目录
- 右键项目名称 -> Open Module Settings
-
执行验证(这里采用win环境下的hadoop2.7.6作为例子,wsl暂时未验证)
-
先在创建jar包路径下(C:\Users\USTC\Documents\maxyi\Java\wordcount\out\artifacts\wordcount_jar)建立一个input1.txt文档,并添加内容“I believe that I will succeed!”并保存。等会儿要将该txt文件上传到hadoop。
-
运行hadoop打开所有节点
cd hadoop-2.7.6/sbin
start-all.cmd
-
运行成功后,来到之前创建的jar包路径,将编写好的txt文件上传到hadoop
cd /
cd C:\Users\USTC\Documents\maxyi\Java\wordcount\out\artifacts \wordcount_jar
hadoop fs -put ./input1.txt /input1
可以用以下代码查看是否上传成功。
hadoop fs -ls /
4、 删除wordcount.jar/META-INF/LICENSE,否则不能创建hadoop运行时不能创建license,会报错。
5、 运行wordcount
hadoop jar wordcount.jar com.hadoop.wdcount.WordcountMain /input1 /output2
jar 命令后有四个参数,
第一个wordcount.jar是打好的jar包
第二个com.hadoop.wdcount.WordcountMain是java工程总编写的主类,这里要包含package路径
第三个/input1是刚刚上传的输入
第四个/output2是wordcount的输出(一定要新建,不能重用之前建立的)
6、 下载输出文件查看结果是否正确
hadoop fs -get /output2
-