MapReduce数据分析（7）Join操作

最新推荐文章于 2022-11-26 11:39:15 发布

计算机程序猿

最新推荐文章于 2022-11-26 11:39:15 发布

阅读量1.4k

点赞数 1

分类专栏： # Hadoop 文章标签：数据分析 java MapReduce

本文链接：https://blog.csdn.net/weixin_44258756/article/details/103018174

版权

Hadoop 专栏收录该内容

10 篇文章 7 订阅

订阅专栏

七、MapReduce第七讲合表（Join操作）

通俗的讲就是把两个文件的内容合到一块。话不多说，我直接上案列
一、准备两个数据文件：
data.txt：

201001 1003 abc
201002 1005 def
201003 1006 ghi
201004 1003 jkl
201005 1004 mno
201006 1005 pqr

info.txt：

1003 kaka
1004 da
1005 jue
1006 zhao

得出的数据文件：
part-r-00000:
在这里插入图片描述
代码如下：

package Join;

import java.io.IOException;
import java.util.Vector;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class join {
	public static void main(String[] args) throws Exception {
		Configuration conf=new Configuration();
		Job job = Job.getInstance(conf);
		job.setJarByClass(join.class);
		job.setMapperClass(MMapper.class);
		job.setReducerClass(MReduce.class);
		job.setMapOutputKeyClass(Text.class);
		job.setMapOutputValueClass(Text.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		FileInputFormat.addInputPath(job, new Path(args[0]));
		FileInputFormat.addInputPath(job, new Path(args[1]));
		FileOutputFormat.setOutputPath(job, new Path(args[2]));
		job.waitForCompletion(true);
	}

	public static class MMapper extends Mapper<LongWritable, Text, Text, Text>{

		protected void map(LongWritable key, Text value, Context context)
				throws IOException, InterruptedException {
			//获取路劲信息
			FileSplit inputSplit = (FileSplit) context.getInputSplit();
			String path = inputSplit.getPath().toString();
			//通过if进行判断输入的路劲是否包含data.txt
			if (path.contains("data.txt")) {
				//转换数据类型并用制表符进行切割
				String[] line = value.toString().split("\t");
				//取出数据第一位 1003
				String joinkey = line[1];
				//然后把需要跟第二个文件合并的内容给提取出来，并用“data标记”
				String val = "data"+line[0]+"\t"+line[1]+"\t"+line[2];
				//开始写入
				context.write(new Text(joinkey), new Text(val));
			}
			if (path.contains("info.txt")) {
				String[] line = value.toString().split("\t");
				//取出数据第0位 1003
				String joinkey = line[0];
				
				String val = "info"+line[1];
				context.write(new Text(joinkey), new Text(val));
			}
		}
	}
	public static class MReduce extends Reducer<Text, Text, Text, Text>{
		@Override
		protected void reduce(Text key, Iterable<Text> value, Context context)
				throws IOException, InterruptedException {
			//创建两个向量
			Vector<String> vectora = new Vector<String>();
			Vector<String> vectorb= new Vector<String>();
			//判断两个输入数据的value,分别加入上面两个集合
			for (Text v : value) {
				String line = v.toString();
				//判断开头标记，放入集合
				if (line.startsWith("data")) {
				
				vectora.add(line.substring("data".length()));
				}
				if (line.startsWith("info")) {
					vectorb.add(line.substring("info".length()));
				}
			}
			//将两个合集进行拼接，笛卡尔积
			for (String a : vectora) {
				for (String b : vectorb) {
					context.write(new Text(a), new Text(b));
				}
			}
		}
	}
}

本次教程就到次结束，有什么不懂的多多在下方进行评论，博主有时间的话会在下方进行回答的。多多支持博主。

在这里插入图片描述

下期见！！！

计算机程序猿

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
打赏
1
评论
MapReduce数据分析（7）Join操作

七、MapReduce第七讲合表（Join操作）通俗的讲就是把两个文件的内容合到一块。话不多说，我直接上案列一、准备两个数据文件：data.txt：201001 1003 abc201002 1005 def201003 1006 ghi201004 1003 jkl201005 1004 mno201006 1005 pqrinfo.txt：1003 kaka100...
复制链接

扫一扫