大数据技术原理与应用第七章MapReduce编程训练（自然连接处理）

最新推荐文章于 2023-04-06 23:29:27 发布

Coonger

最新推荐文章于 2023-04-06 23:29:27 发布

阅读量2.6k

点赞数 3

分类专栏： MapReduce

本文链接：https://blog.csdn.net/yuntunlu/article/details/105848193

版权

MapReduce 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

对给定的表格进行信息挖掘

给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。

    child          parent
     Steven        Lucy
     Steven        Jack
     Jone         Lucy
     Jone         Jack
     Lucy         Mary
     Lucy         Frank
     Jack         Alice
     Jack         Jesse
     David       Alice
     David       Jesse
     Philip       David
     Philip       Alma
     Mark       David
     Mark       Alma

以下是对这题的坑点自白：
在SQL语言中，这无非就是表的自身连接。
我认为这题编写Map函数和Reduce函数大多没什么问题，但是最后的结果只能输出表头grandchild 和grandparent，在中间测试时，尝试过把一些中间结果输出，可并没有什么问题，想着预处理也到位了吧，跳过两头的空格和中间一个或多个空格，结果还是不尽人意。最后无奈只能手打一遍！！！（我竟然想到了手打，太绝望了，我竟然要把它当做披着大数据外衣的小数据）可喜的是，有输出了。。。

具体原因还不能说清楚，望走过的大牛指点。

package mapreduce_exmple;

import java.io.IOException;
import org.apache.hadoop.conf.Configuration; // hadoop 环境配置
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;  
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper; //自定义map class 继承Mapper类
import org.apache.hadoop.mapreduce.Reducer;//自定义reduce class 继承Reducer类
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import java.util.LinkedList;  // LinkedList 增删处理比较快

public class Relationship_Mining {
	public static void main(String[] args) throws Exception{
		// TODO Auto-generated method stub
		Configuration conf = new Configuration();
		conf.set("fs.default.name", "hdfs://localhost:9000");
		String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();
		if (otherArgs.length < 2) {
			System.err.println("Usage: wordcount <in> [<in>...] <out>");
			System.exit(2);
		}
		
		Job job = Job.getInstance(conf, "Relationship_Mining");//环境参数
		job.setJarByClass(Relationship_Mining.class);//类名
		job.setMapperClass(MiningMapper.class);//我的map方法
		job.setReducerClass(MiningReducer.class);//我的reduce方法
		job.setOutputKeyClass(Text.class);//指定key输出类型
		job.setOutputValueClass(Text.class);//指定value输出类型
		
		for (int i = 0; i< otherArgs.length - 1; i++) { //文件输入
			FileInputFormat.addInputPath(job, new Path(otherArgs[i]));
		}
		 FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); //文件输出
		 System.exit(job.waitForCompletion(true)? 0 : 1);  //判断任务完成？
	}
	
	public static class MiningMapper extends Mapper<Object, Text, Text, Text>{
		private Text tKey = new Text();
		private Text tValue = new Text();
		@Override
		public void map(Object key, Text value, Context context) 
				throws IOException, InterruptedException{
			String txt = value.toString().trim(); //去掉两端的空格
			String[] tmp = txt.split("[\\s|\\t]+");//去掉中间的空格包含1个或多个
			if (!tmp[0].equals("child")) {
				tKey.set(tmp[1]);
				tValue.set("1#" + tmp[0]);
				context.write(tKey, tValue);
				tKey.set(tmp[0]);
				tValue.set("2#" + tmp[1]);
				context.write(tKey, tValue);
			}
		}
	}

	public static class MiningReducer extends Reducer<Text, Text, Text, Text>{
		private static boolean flag = false;
		@Override
		public void reduce(Text key, Iterable<Text> values, Context context) 
				throws IOException, InterruptedException{
			if (!flag) {  //表头，第一行提示信息
				context.write(new Text("grandchild"), new Text("grandparent"));
				flag = true;
			}
			LinkedList<String> grandchild = new LinkedList<String> ();
			LinkedList<String> grandparent = new LinkedList<String> ();
			for (Text tval : values) {
				String val = tval.toString();
				if (val.startsWith("1#")) {
					grandchild.add(val.substring(2));
				}
				if (val.startsWith("2#")){
					grandparent.add(val.substring(2));
				}
			}
			// 笛卡尔积
			for (String c : grandchild) {
				for (String p : grandparent) {
					context.write(new Text(c), new Text(p));
				}
			}
		}
	}
}

Coonger

关注

3
点赞
踩
15

收藏

觉得还不错? 一键收藏
0
评论
大数据技术原理与应用第七章MapReduce编程训练（自然连接处理）

对给定的表格进行信息挖掘给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。 child parent Steven Lucy Steven Jack Jone Lucy Jone Jack Lucy Mary...
复制链接

扫一扫