编写环境:windows ,IntelliJ IDEA 2018.1.4 x64 ,maven,jdk-1.8
运行环境:centos-7.3,hadoop-2.7.3,jdk-1.8
基本思路:在windows中的idea新建maven项目wordcount并编写,将项目打包成jar,上传至hadoop并执行作业
一、新建maven项目
1、菜单File——>New——>Project…——>Maven(编写环境的jdk和运行环境的jdk最好一致),结果如下:
2、点击Next,结果如下:
3、填好GroupId和ArtifactId,点击Next,结果如下:
4、Finish.
二、编写wordcount项目
1、建立项目结构目录
2、编写pom.xml(引入用到的jar包)
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>cn.lg</groupId>
<artifactId>wordcount</artifactId>
<version>1.0-SNAPSHOT</version>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-common</artifactId>
<version>2.7.3</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>2.7.3</version>
</dependency>
</dependencies>
</project>
3、编写项目代码
(1)WordcountMapper.java
package cn.lg.project;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import java.io.IOException;
public class WordcountMapper extends org.apache.hadoop.mapreduce.Mapper<LongWritable,Text,Text,IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line=value.toString();
String[] words=line.split(" ");
for (Strin