大数据MapReduce(Java实现)

一、MapReduce流程分析

Map的任务是将大任务转换成小任务,Reduce是汇总的意思

注意:

1、所有的输入和输出都是key-value类型,总共四对

2、key2 value2和key3 value3数据类型一致,value3是一个集合,集合中的每个元素是value2

k1 value分别是偏移量和输入的数据 k2 value2是进行分词后的单词和频率 k3 value3 ,value3是一个集合

是value2的集合,这一步已经开始Reducer阶段,k4 value4是将value3中的集合进行计算得到的最终结果

java源码实现

结构分别来WordCountMain、WordCountMap、WordCountReduce

WordCountMain代码

package demo;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import com.sun.jersey.core.impl.provider.entity.XMLJAXBElementProvider.Text;



public class WordCountMain {
	public static void main(String[] args) throws Exception{
		//创建一个job = map + reduce
		Configuration conf = new Configuration();
		//创建一个Job
		Job job = Job.getInstance(conf);
		//指定任务的入口
		job.setJarByClass(WordCountMain.class);
		//指定job的mapper
		job.setMapperClass(WordCountMapper.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(LongWritable.class);
		//指定reducer
		job.setReducerClass(WordCountReducer.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(LongWritable.class);
		
		//指定任务的输入和输出
		FileInputFormat.setInputPaths(job, new Path(args[0]));
		FileOutputFormat.setOutputPath(job, new Path(args[1]));
		//提交任务
		job.waitForCompletion(true);
	}
}

WordCountMap

package demo;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;



public class WordCountMapper extends Mapper<LongWritable, Text, Text , LongWritable>{

	@Override
	protected void map(LongWritable key, Text value,Context context)throws IOException, InterruptedException {
		//输入key	。value是 i love beiJing Map上下文(context)
		
		String data = value.toString();//得到数据
		//分词的操作
		String[] words = data.split(" ");
		//输出每个单词
		for(String w:words){
			context.write(new Text(w), new LongWritable(1));
		}
	
	}

}

WordCountReduce

package demo;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class WordCountReducer extends Reducer<Text, LongWritable, Text, LongWritable>{

	@Override
	protected void reduce(Text k3, Iterable<LongWritable> v3,Context context)throws IOException, InterruptedException {
		//v3是一个集合,集合中的每个元素是v2
		
		long total = 0;
		for(LongWritable l:v3){
			total = total + l.get();
		}
		
		//输出
		context.write(k3, new LongWritable(total));
	}
}

将这三个文件打包成jar文件提交到hdfs上面就可以实现一个简单的MapReduce

 

 

展开阅读全文

没有更多推荐了,返回首页