MapReduce实战【单表关联】

最新推荐文章于 2019-10-13 22:01:30 发布

我就是喜欢我话多

最新推荐文章于 2019-10-13 22:01:30 发布

阅读量908

点赞数

分类专栏：学习日志 MapReduce 文章标签： mapreduce

本文链接：https://blog.csdn.net/qq_28945021/article/details/51327519

版权

学习日志同时被 2 个专栏收录

10 篇文章 0 订阅

订阅专栏

MapReduce

1 篇文章 0 订阅

订阅专栏

一、需求描述：

从给出的child-parent文本中通过计算输出grandchild-grandparend
例：
源文件：

child   parent
Tom Lucy
Tom Jack
Jone    Lucy
Jone    Jack
Lucy    Mary
Lucy    Ben
Jack    Alice
Jack    Jesse
Terry   Alice
Terry   Jesse
Philip  Terry
Philip  Alma
Mark    Terry
Mark    Alma

输出文件：

grandchild  grandparent
Tom Alice
Tom Jesse
Jone    Alice
Jone    Jesse
Tom Ben
Tom Mary
Jone    Ben
Jone    Mary
Philip  Alice
Philip  Jesse
Mark    Alice
Mark    Jesse

二、设计思路

不难发现，通过源文件可以获得每一个人的child和parent那么就可以获得这样一个关系：grandchild->child->parent->grandparent，其实中间的child->parent可以是一个人，如果去掉中间的两样，就得到了最终想要的结果。考虑到在MapReduce的shuffle过程会将相同key值的value放在一起。所以我们可以如此设计：当读取文档得到child及parent后，首先将parent设为key输出，那么经过shuffle后可以得到该parent的所有child的集合，我们将这个集合作为表1，再将同一对的child设为key输出，到reduce时便可以得到该child的所有parent的集合作为表2。最后只用求出表1和表2的笛卡尔积便可。

三、代码

Map阶段：

public static class PrentMapper extends Mapper<Text, Text, Text, Text>{
        //正续输出一次作为表1，反序输出一次作为表2，注意表1表2的标示
        @Override
        protected void map(Text key, Text value, Mapper<Text, Text, Text, Text>.Context context)
                throws IOException, InterruptedException {
            //将parent存为key，洗牌后就会把所有的child和在一起
            if(!key.toString().equals("child")&&!value.toString().equals("parent")){
                context.write(value, new Text("1table;"+key.toString()+";"+value.toString()));
            }
            //将child存为key，洗牌后就会把所有的parent和在一起
            if(!key.toString().equals("child")&&!value.toString().equals("parent")){
                context.write(key, new Text("2table;"+key.toString()+";"+value.toString()));
            }
        }
    }

Reduce阶段：

public static int time=0;
public static class PrentReduce extends Reducer<Text, Text, Text, Text>{

        @Override
        protected void reduce(Text key, Iterable<Text> value, Reducer<Text, Text, Text, Text>.Context context)
                throws IOException, InterruptedException {
            if(time==0){//输出表头
                context.write(new Text("grandchild"), new Text("grandparent"));
                time++;
            }
            int grandchildnum=0;
            String grandchild[]=new String[10];
            int grandparentnum=0;
            String grandparent[]=new String[10];
            for (Text s : value) {
                String record=s.toString();
                int len=record.length();
                String[] childparent=record.split(";");
                String relationtype=childparent[0];//表的标示
                String childname=childparent[1];
                String parentname=childparent[2];
                if(relationtype.equals("1table")){  //1表中是所有的child和在一起。
                    grandchild[grandchildnum]=childname;
                    grandchildnum++;
                }else if(relationtype.equals("2table")){
                    grandparent[grandparentnum]=parentname;
                    grandparentnum++;
                }
            }
            //求笛卡尔积
            if(grandparentnum!=0&&grandchildnum!=0){
                for(int m=0;m<grandchildnum;m++){
                    for(int n=0;n<grandparentnum;n++){
                        context.write(new Text(grandchild[m]),new Text(grandparent[n]) );
                    }
                }
            }
        }

    }

测试函数

public static void main(String[] args) {
            try {
                Configuration conf=new Configuration();
                Job job=Job.getInstance(conf, "single table join");
                job.setJarByClass(Text.class);
                //由于需求，使用hadoop自带的键值对输入Format
                job.setInputFormatClass(KeyValueTextInputFormat.class);  
                job.setMapperClass(PrentMapper.class);
                job.setReducerClass(PrentReduce.class);
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(Text.class);
                FileInputFormat.addInputPath(job, new Path("C:/Users/wrm/Desktop/大数据/测试文件/childprent.txt"));
                FileOutputFormat.setOutputPath(job, new Path("hdfs://172.16.153.98:9000/STJ/"));
                System.exit(job.waitForCompletion(true)?0:1);
            } catch (IllegalStateException e) {
                e.printStackTrace();
            } catch (IllegalArgumentException e) {
                e.printStackTrace();
            } catch (ClassNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } catch (InterruptedException e) {
                e.printStackTrace();
            }

    }

这是MapReduce的单表关联的一个小实战，接下来会尝试更多的程序学习。

我就是喜欢我话多

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
MapReduce实战【单表关联】

一、需求描述：从给出的child-parent文本中通过计算输出grandchild-grandparend 例：源文件：child parentTom LucyTom JackJone LucyJone JackLucy MaryLucy BenJack AliceJack JesseTerry AliceTerry Jes
复制链接

扫一扫

专栏目录