Hadoop-挖掘child-parent表中信息

最新推荐文章于 2024-06-26 15:25:37 发布

晒太阳的黑宝

最新推荐文章于 2024-06-26 15:25:37 发布

阅读量1.3k

点赞数 3

分类专栏： Hadoop 文章标签： mapreduce hadoop 大数据

本文链接：https://blog.csdn.net/qq_40301475/article/details/117351675

版权

Hadoop 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Hadoop平台MapReduce编程

目标：给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。
输入文件table.txt内容如下：

child parent
Steven Lucy
Steven Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Frank
Jack Alice
Jack Jesse
David Alice
David Jesse
Philip David
Philip Alma
Mark David
Mark Alma

要求输出：

grandchild  grandparent
Mark    Jesse
Mark    Alice
Philip  Jesse
Philip  Alice
Jone    Jesse
Jone    Alice
Steven  Jesse
Steven  Alice
Steven  Frank
Steven  Mary
Jone    Frank
Jone    Mary

一，实验环境准备

已搭建好的Hadoop环境
Windows下准备环境完成

二，实验步骤

创建 maven 工程，MapReduceDemo
导入依赖

    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.1.3</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.30</version>
        </dependency>
    </dependencies>


    <build>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.7.0</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
            <plugin>
                <artifactId>maven-assembly-plugin</artifactId>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>make-assembly</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

在项目的 src/main/resources 目录下，新建一个文件，命名为“log4j.properties”，在文件中填入。

log4j.rootLogger=INFO, stdout 
log4j.appender.stdout=org.apache.log4j.ConsoleAppender 
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout 
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n 
log4j.appender.logfile=org.apache.log4j.FileAppender 
log4j.appender.logfile.File=target/spring.log 
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout 
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

编写程序

package com.xumx.mapreduce.work;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class STJoin {
    public static int time = 0;

    public static class Map extends Mapper<Object, Text, Text, Text> {
        @Override
        public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
            String line = value.toString();
            String[] childAndParent = line.split(" ");
            List<String> list = new ArrayList<>(2);
            for (String childOrParent : childAndParent) {
                if (!"".equals(childOrParent)) {
                    list.add(childOrParent);
                }
            }
            if (!"child".equals(list.get(0))) {
                String childName = list.get(0);
                String parentName = list.get(1);
                String relationType = "1";
                context.write(new Text(parentName), new Text(relationType + "+"
                        + childName + "+" + parentName));
                relationType = "2";
                context.write(new Text(childName), new Text(relationType + "+"
                        + childName + "+" + parentName));
            }
        }
    }

    public static class Reduce extends Reducer<Text, Text, Text, Text> {
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
            if (time == 0) {
                context.write(new Text("grand_child"), new Text("grand_parent"));
                time++;
            }
            List<String> grandChild = new ArrayList<>();
            List<String> grandParent = new ArrayList<>();
            for (Text text : values) {
                String s = text.toString();
                String[] relation = s.split("\\+");
                String relationType = relation[0];
                String childName = relation[1];
                String parentName = relation[2];
                if ("1".equals(relationType)) {
                    grandChild.add(childName);
                } else {
                    grandParent.add(parentName);
                }
            }
            int grandParentNum = grandParent.size();
            int grandChildNum = grandChild.size();
            if (grandParentNum != 0 && grandChildNum != 0) {
                for (int m = 0; m < grandChildNum; m++) {
                    for (int n = 0; n < grandParentNum; n++) {
                        context.write(new Text(grandChild.get(m)), new Text(
                                grandParent.get(n)));
                    }
                }
            }
        }
    }

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "Single table Join ");
        job.setJarByClass(STJoin.class);
        job.setMapperClass(Map.class);
        job.setReducerClass(Reduce.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }

}

将程序打包，点击Maven中package
运行结束后，打开target，找到打包好的jar包
拷贝该 jar 包到 Hadoop 集群的/opt/module/hadoop-3.1.3 路径

三，程序结果验证

启动 Hadoop 集群
在/opt/module/hadoop-3.1.3目录下创建table.txt

[xumx@hadoop102 hadoop-3.1.3]$ vim table.txt

输入如下内容：

child parent
Steven Lucy
Steven Jack
Jone Lucy
Jone Jack
Lucy Mary
Lucy Frank
Jack Alice
Jack Jesse
David Alice
David Jesse
Philip David
Philip Alma
Mark David
Mark Alma

创建hdfs文件夹 /input5

[xumx@hadoop102 hadoop-3.1.3]$ hadoop fs -mkdir /input5

将table.txt文件上传至 /input5 目录下

[xumx@hadoop102 hadoop-3.1.3]$ hadoop fs -put table.txt /input5

运行程序

[xumx@hadoop102 hadoop-3.1.3]$ hadoop jar MapReduceDemo-1.0-SNAPSHOT.jar com.xumx.mapreduce.work.STJoin /input5 /output5

前端查看，发现多了output5文件夹，点进行
```
part-r-00000为运行结果
```

在这里插入图片描述
可以看到程序运行结果正确

晒太阳的黑宝

关注

3
点赞
踩
20

收藏

觉得还不错? 一键收藏
打赏
0
评论
Hadoop-挖掘child-parent表中信息

Hadoop平台MapReduce编程目标：给出一个child-parent的表格，要求挖掘其中的父子辈关系，给出祖孙辈关系的表格。输入文件table.txt内容如下：child parentSteven LucySteven JackJone LucyJone JackLucy MaryLucy FrankJack AliceJack JesseDavid AliceDavid JessePhilip DavidPhilip AlmaMark DavidMark Alma
复制链接

扫一扫