前言
本文主要是学习MapReduce的学习笔记,对所学内容进行记录。
实验环境:
1.Linux Ubuntu 16.04
2.hadoop3.0.0
3.eclipse4.5.1
一、启动Hadoop
- 进入Hadoop启动目录
cd /apps/hadoop/sbin
- 启动Hadoop
./start-all.sh
- 输入‘jps’,启动后显示如下信息
二、环境搭配
-
打开eclipse->Window->Preferences;
-
选择Hadoop Map/Reduce,选择Hadoop包根目录,
/apps/hadoop
,点击Apply,点击OK; -
点击window–>show view–>other–>mapreduce tools–>map/reduce locations,之后页面会出现对应的标签页;
-
点击3中图标1,在Local name输入myhadoop,在DFS Master 框下Port输入8020,点击Finish,出现3中右侧页面;
-
点击3中
-
图标2,选择下图内容,出现第3步图中左侧内容
完成环境配置环境。
三、过滤与保存实验
- 新建test项目,新建srs包;
- 新建SRS类,即SRS.java,编写并保存如下代码:
package srs;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Counter;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
import java.util.Iterator;
import java.util.Random;
public class SRS {
public static class SRSMapper extends Mapper<Object,Text,NullWritable,Text>{
private Random rands=new Random();
private Double percentage=0.3;
public void map(Object key,Text value,Context context) throws IOException,InterruptedException{
if(rands.nextDouble()<percentage){
context.write(NullWritable.get(),value);
}
}
}
public static class SRSReduce extends Reducer<NullWritable,Text,Text,NullWritable>{
public void reducer(NullWritable key, Iterator<Text> values,Context context) throws IOException,InterruptedException{
while(values.hasNext()){
String value=values.toString();
context.write(new Text(value),NullWritable.get());
}
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException{
String dir_in = "hdfs://localhost:8020/srs/input";
String dir_out = "hdfs://localhost:8020/srs/output";
Path in = new Path(dir_in);
Path out = new Path(dir_out);
Configuration conf=new Configuration();
out.getFileSystem(conf).delete(out, true);
Job job=Job.getInstance(conf);
job.setJarByClass(SRS.class);
job.setMapperClass(SRSMapper.class);
job.setReducerClass(SRSReduce.class);
job.setOutputKeyClass(NullWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job,in);
FileOutputFormat.setOutputPath(job,out);
System.exit(job.waitForCompletion(true)?0:1);
}
}
- 运行指令
cp /apps/hadoop/etc/hadoop/{core-site.xml,hdfs-site.xml,log4j.properties} /home/dolphin/workspace/test/src
,将hadoop配置文件复制到src文件夹下; - 创建输入文件存放路径
hadoop fs -mkdir /srs
hadoop fs -mkdir /srs/input
- 将数据文件放入hadoop目录下,
hadoop fs -put /home/dolphin/Desktop/demo.txt /srs/input
,demo.txt内容如下:
aaa
sss
ddd
ccc
ggg
- 运行SRS.java文件,得到单词计数的结果在output文件夹中如下所示
总结
本实验利用Hadoop的MapReduce进行过滤和保存操作