一、首先准备好需要的清洗的数据
二、将数据导入项目中,在项目下新建input(原数据),output(清洗过后的数据),如下图所示:
三、导入所需要的jar
hadoop-2.8.5\share\hadoop\common*jar
hadoop-2.8.5\share\hadoop\common\lib*jar
hadoop-2.8.5\share\hadoop\hdfs*jar
hadoop-2.8.5\share\hadoop\hdfs\lib*jar
hadoop-2.8.5\share\hadoop\mapreduce*jar
hadoop-2.8.5\share\hadoop\mapreduce\lib*jar
hadoop-2.8.5\share\hadoop\yarn*jar
hadoop-2.8.5\share\hadoop\yarn\lib*jar
四、代码如下:
清洗类:
package com.stu.mr06;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
/**
* @