前言
本文主要是学习MapReduce的学习笔记,对所学内容进行记录。
实验环境:
1.Linux Ubuntu 16.04
2.hadoop3.0.0
3.eclipse4.5.1
一、启动Hadoop
- 进入Hadoop启动目录
cd /apps/hadoop/sbin
- 启动Hadoop
./start-all.sh
- 输入‘jps’,启动后显示如下信息
二、环境搭配
-
打开eclipse->Window->Preferences;
-
选择Hadoop Map/Reduce,选择Hadoop包根目录,
/apps/hadoop
,点击Apply,点击OK; -
点击window–>show view–>other–>mapreduce tools–>map/reduce locations,之后页面会出现对应的标签页;
-
点击3中图标1,在Local name输入myhadoop,在DFS Master 框下Port输入8020,点击Finish,出现3中右侧页面;
-
点击3中
-
图标2,选择下图内容,出现第3步图中左侧内容
完成环境配置环境。
三、求平均值
- 新建test项目,新建word包;
- 新建Wordcount类,即Wordcount.java,编写并保存如下代码:
package word;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class Wordcount{
public static class WordcountMapper extends Mapper<LongWritable,Text,Text,IntWritable>{
protected void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException{
String line = value.toString();
String[] words = line.split(" ");
for(String word:words){
context.write(new Text(word),new IntWritable(1));
}
}
}
public static class WordcountReducer extends Reducer<Text,IntWritable,Text,IntWritable> {
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for(IntWritable value:values){
count+=value.get();
}
context.write(key,new IntWritable(count));
}
}
public static void main(String[] args) throws IOException,InterruptedException,ClassNotFoundException {
String dir_in = "hdfs://localhost:8020/word/input";
String dir_out = "hdfs://localhost:8020/word/output";
Path in = new Path(dir_in);
Path out = new Path(dir_out);
Configuration conf = new Configuration();
out.getFileSystem(conf).delete(out, true);
Job job = Job.getInstance(conf);
job.setJarByClass(Wordcount.class);
job.setMapperClass(WordcountMapper.class);
job.setReducerClass(WordcountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setNumReduceTasks(1);
FileInputFormat.addInputPath(job,in);
FileOutputFormat.setOutputPath(job,out);
System.exit(job.waitForCompletion(true)?0:1);
}
}
- 运行指令
cp /apps/hadoop/etc/hadoop/{core-site.xml,hdfs-site.xml,log4j.properties} /home/dolphin/workspace/test/src
,将hadoop配置文件复制到src文件夹下; - 创建输入文件存放路径
hadoop fs -mkdir /word
hadoop fs -mkdir /word/input
- 将数据文件放入hadoop目录下,
hadoop fs -put /home/dolphin/Desktop/text.txt /word/input
,text.txt内容如下:
Hello Hadoop
Welcome to Hadoop
I love Hadoop
- 运行Wordcount.java文件,得到单词计数的结果在output文件夹中如下所示
总结
本实验利用Hadoop进行单词计数操作