HNU2022夏季小学期大数据并行处理MapReduce任务

0 说明

默认已经安装好了Hadoop集群,切配置和下面链接一致

https://blog.csdn.net/S0609/article/details/125566918

这里就是利用hadoop完成几个简单的MapReduce任务。

1 连接hadoop

利用idea的插件Big data tools连接到hadoop集群的hdfs。hdfs可以理解为电脑的磁盘空间,而这个插件其实就是一个可视化插件,就是方便让你看见hadoop的磁盘文件。

1 安装

image-20220630163541224

直接到插件商店安装就行。

2 配置环境变量

连接到hadoop本地也需要一些环境变量,但是只需要一些基础的就行,不需要完全按照,下载文件不大,就一个bin目录。

下载地址:

https://pan.baidu.com/s/1FY6h9DANQ2u_syyXyyWHgQ

提取码:ybum

image-20220630164309299

添加环境变量:路径自己看着改

HADOOP_HOME

image-20220630164429071

%HADOOP_HOME%\bin

image-20220630164542541

再将hadoop.dll复制一份放到c盘window文件夹下

C:\Windows\System32

image-20220630164718351

最后双击:

image-20220630164750629

一闪而过是正常的。

3 连接HDFS

image-20220630164948500

image-20220630165040290

选完之后,下面就会出现两个框框

第一个里面写的是地址,写哪个地址了?看下面
浏览器输出http://192.168.206.200:9870(注意输入你的地址)
看下面红色框框里面的,就写那个

image-20220630165238428

下面的username是指你在linux里面的地址

image-20220630165324617

image-20220630165442450

4 连Hadoop

image-20220630165549971

改一个地方就行

image-20220630165633815

5 测试

本身就是一个可视化工具,出现了可视化就是成功了。

image-20220630165712408

2 输入文件准备

作业一共有四个,我是先把输入文件放到hdfs中,然后本地编写代码,设置成集群模式,输出也到hdfs中,通过bigdatatoos可以看见。

有几种方式把文件放到集群中。

1 先通过Xftp放入虚拟机,然后通过hadoop指令放进hdfs中,hadoop指令可以参考以下网站:

https://blog.csdn.net/weixin_39760689/article/details/111487683

其实就是:

hadoop fs -put ~/a.txt /input1

这个办法比较笨,但是当时只会这种。如果大家看教程也使用这种吧。

2 JAVA的接口,这是第二个作业,我觉得放到第一个作业比较好。我这里也不说多,第二个作业也要写,大家就用第一张吧。

3 建MAVEN工程

可以自按照maven,懒的话也可以用idea自带的maven都一样,只不过自带的会把包下载到C盘。如果想自己装可参考以下教程:

https://blog.csdn.net/fl6881688/article/details/121353872

1 建工程

image-20220630172329151

image-20220630172358638

2 导依赖

image-20220630172510004

在里面插入以下内容:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.map</groupId>
    <artifactId>MapReduceDemo</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>3.3.3</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>

        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-log4j12</artifactId>
            <version>1.7.30</version>
        </dependency>

        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>3.3.3</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-hdfs</artifactId>
            <version>2.3.0</version>
        </dependency>

    </dependencies>

</project>

image-20220630172628983

等待下载完成,下载的时间执行下面,在目录下建五个文件,分别输入以下内容,其实就是虚拟机hadoop的配置文件,都是一样的。

image-20220630172722903

从虚拟机中hadoop安装目录下

/etc/hadoop

中拿出这四个文件,然后复制粘贴到该目录下就行。

image-20220630173013971

log4j.properties中输入:

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{ABSOLUTE} | %-5.5p | %-16.16t | %-32.32c{1} | %-32.32C %4L | %m%n

这样环境基本就配好了。

4 任务一:数据去重

1 任务说明

对数据文件中的数据进行去重。数据文件中的每行都是一个数据。

2 输入

1)file1: 
2012-3-1 a
2012-3-2 b
2012-3-3 c 
2012-3-4 d 
2012-3-5 a 
2012-3-6 b
2012-3-7 c
2012-3-3 c 

2)file2: 
2012-3-1 b
2012-3-2 a
2012-3-3 b
2012-3-4 d 
2012-3-5 a 
2012-3-6 c
2012-3-7 d
2012-3-3 c 

3 输出:

2012-3-1 a
2012-3-1 b
2012-3-2 a
2012-3-2 b
2012-3-3 b
2012-3-3 c 
2012-3-4 d 
2012-3-5 a 
2012-3-6 b
2012-3-6 c
2012-3-7 c
2012-3-7 d

4 工程结构说明

image-20220630172026967

5 代码

代码不多,只要注意路径修改就行,一般都是三个类,map,reduce,和driver,driver其实就是主函数,把map和reduce阶段结合起来,形成一个job提交给集群。

只要修改Driver的路径基本就没问题。

DataDeMapper
package com.hadoop.task1;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/*
KEYIN:longwritable
VALUEIN:text
KEYOUT:text
VALUEOUT:intwritable
 */
public class DataDeMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
    private Text OutKey = new Text();
    private IntWritable OutV = new IntWritable(1);
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //获取一行
        String line = value.toString();
        //写出
            OutKey.set(line);
            context.write(OutKey,OutV);
    }
}

DataDeReducer
package com.hadoop.task1;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

//KEYIN:Text
//VALUEIN:intwritable
//KEYOUT:text
//VALUEOUT:intwritable
public class DataDeReducer extends Reducer<Text, IntWritable,Text,NullWritable>{
    private IntWritable OutV = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum =0;

        //累加,集合形式 aa,(1,1,1)
        for (IntWritable value : values) {
            sum += value.get();
        }
        OutV.set(0);
        context.write(key,NullWritable.get());
    }
}

DataDeDriver
package com.hadoop.task1;

import com.hadoop.task2.DataSortDriver;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class DataDeDriver {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        //1 获取job
        System.setProperty("HADOOP_USER_NAME", "hadoop");
        Configuration conf = new Configuration();
        conf.set("mapreduce.app-submission.cross-platform","true");
        conf.set("fs.default.name","hdfs://master:9000");
        Job job = Job.getInstance(conf);

        // 2 设置jar包路径
        job.setJar("D:\\Java\\hadoop\\project\\MapReduceDemo\\target\\MapReduceDemo-1.0-SNAPSHOT.jar");
        job.setJarByClass(DataSortDriver.class);

        // 3 关联mapper和reducer
        job.setMapperClass(DataDeMapper.class);
        job.setReducerClass(DataDeReducer.class);

        // 4  设置map输出的kv类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // 5 设置最终输出的kv类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 6 设置输入路径和输出路径
        FileInputFormat.setInputPaths(job,new Path("/input1"));
        FileOutputFormat.setOutputPath(job,new Path("/output1"));

        // 7 提交job
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);


    }
}

6 测试

写完后,生成jar包,所有类只需要修改Driver里面的路径就行,其他基本不用修改。

image-20220630230634595

运行Driver就行,可以看见

image-20220630173726953

在web端也能看见集群有任务完成:

(输入下面网站的前提是window的hosts文件中有master和虚拟机ip地址 的映射),也可把下面网站的master改成虚拟机的ip地址。

http://master:8088/

image-20220630173807991

5 任务二:数据排序

1 任务说明

对输入文件中数据进行排序。输入文件中的每行内容均为一个数字即一个数据。要求在输出中每行有两个间隔的数字,其中,第一个代表原始数据在原始数据集中的位次第二个代表原始数据

2 输入

1)file1: 
 
2
32
654
32
15
756
65223

 2)file2: 
5956
22
650
92

3)file3: 
26
54
6

image-20220630174138923

3 输出

1    2
2    6
3    15
4    22
5    26
6    32
7    32
8    54
9    92
10    650
11    654
12    756
13    5956
14    65223

4 结构

image-20220630212109329

5 代码

SortBean
package com.hadoop.task2;

import org.apache.hadoop.io.WritableComparable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class SortBean implements WritableComparable<SortBean> {
    private int num;
    private int value;

    public void setNum(int num) {
        this.num = num;
    }

    public int getNum() {
        return num;
    }

    public SortBean() {

    }

    public int getValue() {
        return value;
    }

    public void setValue(int value) {
        this.value = value;
    }

    @Override
    public int compareTo(SortBean o) {

        //数值的正序
        if (this.value < o.value) {
            return -1;
        } else if (this.value > o.value) {
            return 1;
        } else {
            return 0;
        }
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeInt(value);
        out.writeInt(num);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.value = in.readInt();
        this.num = in.readInt();
    }

    @Override
    public String toString() {
        return String.valueOf(num)+"    "+String.valueOf(value);
    }
}

DataSortMapper
package com.hadoop.task2;

import org.apache.commons.lang3.ObjectUtils;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/*
KEYIN:longwritable
VALUEIN:text
KEYOUT:text
VALUEOUT:intwritable
 */
public class DataSortMapper extends Mapper<LongWritable, Text,SortBean, Text> {
    private SortBean sb = new SortBean();
    Text text = new Text("1");
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //获取一行
        String line = value.toString();
        sb.setValue(Integer.parseInt(line));
        sb.setNum(0);
        context.write(sb, text);

    }
}
DataSortReducer
package com.hadoop.task2;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

//KEYIN:Text
//VALUEIN:intwritable
//KEYOUT:text
//VALUEOUT:intwritable
public class DataSortReducer extends Reducer<SortBean,Text,NullWritable, SortBean>{
    private Text text = new Text();
    private int count=1;
    @Override
    protected void reduce(SortBean key, Iterable<Text> values, Context context) throws IOException, InterruptedException {

        for (Text value : values) {
            key.setNum(count);
            text.set(Integer.toString(count));
            key.setNum(count);
            count=count+1;
            context.write(NullWritable.get(),key);
        }
    }
}

DataSortDriver
package com.hadoop.task2;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class DataSortDriver {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        //1 获取job
        System.setProperty("HADOOP_USER_NAME", "hadoop");
        Configuration conf = new Configuration();
        conf.set("mapreduce.app-submission.cross-platform","true");
        conf.set("fs.default.name","hdfs://master:9000");
        Job job = Job.getInstance(conf);

        // 2 设置jar包路径
        job.setJar("D:\\Java\\hadoop\\project\\MapReduceDemo\\target\\MapReduceDemo-1.0-SNAPSHOT.jar");
        job.setJarByClass(DataSortDriver.class);

        // 3 关联mapper和reducer
        job.setMapperClass(DataSortMapper.class);
        job.setReducerClass(DataSortReducer.class);

        // 4  设置map输出的kv类型
        job.setMapOutputKeyClass(SortBean.class);
        job.setMapOutputValueClass(Text.class);

        // 5 设置最终输出的kv类型
        job.setOutputKeyClass(NullWritable.class);
        job.setOutputValueClass(SortBean.class);
//        job.setNumReduceTasks(0);

        // 6 设置输入路径和输出路径
        FileInputFormat.setInputPaths(job,new Path("/input2"));
        FileOutputFormat.setOutputPath(job,new Path("/out2"));
//        FileInputFormat.setInputPaths(job,new Path("D:\\Java\\hadoop\\project\\MapReduceDemo\\src\\main\\java\\com\\hadoop\\task2\\input"));
//        FileOutputFormat.setOutputPath(job,new Path("D:\\Java\\hadoop\\project\\MapReduceDemo\\src\\main\\java\\com\\hadoop\\task2\\output"));

        // 7 提交job
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);


    }
}

6 测试

写完后,生成jar包,所有类只需要修改Driver里面的路径就行,其他基本不用修改。

image-20220630230634595

运行Driver就行,可以看见

image-20220630230853788

6 任务三:求平均值

1 任务说明

对输入文件中数据进行就算学生平均成绩。输入文件中的每行内容均为一个学生姓名和他相应的成绩,如果有多门学科,则每门学科为一个文件。要求在输出中每行有两个间隔的数据,其中,第一个代表学生的姓名第二个代表其平均成绩

2 输入

   1)math: 
 
张三    88
李四    99
王五    66
赵六    77

    2)china: 
 
张三    78
李四    89
王五    96
赵六    67

  3)english: 
 
张三    80
李四    82
王五    84
赵六    86

image-20220630231047081

3 输出

张三    82
李四    90
王五    82
赵六    76 

4 结构

image-20220630231131101

5 代码

DataAvgDriver
package com.hadoop.task3;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class DataAvgDriver {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        //1 获取job
        System.setProperty("HADOOP_USER_NAME", "hadoop");
        Configuration conf = new Configuration();
        //设置集群地址
        conf.set("fs.default.name","hdfs://master:9000");
        conf.set("mapreduce.app-submission.cross-platform","true");
        Job job = Job.getInstance(conf);

        // 2 设置jar包路径
        job.setJar("D:\\Java\\hadoop\\project\\MapReduceDemo\\target\\MapReduceDemo-1.0-SNAPSHOT.jar");
        job.setJarByClass(DataAvgDriver.class);


        // 3 关联mapper和reducer
        job.setMapperClass(DataAvgMapper.class);
        job.setReducerClass(DataAvgReducer.class);

        // 4  设置map输出的kv类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(IntWritable.class);

        // 5 设置最终输出的kv类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);

        // 6 设置输入路径和输出路径
        FileInputFormat.setInputPaths(job,new Path("/input3"));
        FileOutputFormat.setOutputPath(job,new Path("/output3"));

        // 7 提交job
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);


    }
}

DataAvgMapper
package com.hadoop.task3;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/*
KEYIN:longwritable
VALUEIN:text
KEYOUT:text
VALUEOUT:intwritable
 */
public class DataAvgMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
    private Text OutKey = new Text();
    private IntWritable OutV = new IntWritable();
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //获取一行
        String line = value.toString();
        //分割
        String[] words = line.split("    ");
        //写出
//        for (String word : words) {
//            System.out.println(word);
//        }
        OutKey.set(words[0]);
        OutV.set(Integer.parseInt(words[1]));
        context.write(OutKey, OutV);
    }

}

DataAvgReducer
package com.hadoop.task3;

import com.google.common.collect.Iterators;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

//KEYIN:Text
//VALUEIN:intwritable
//KEYOUT:text
//VALUEOUT:intwritable
public class DataAvgReducer extends Reducer<Text, IntWritable,Text,IntWritable>{
    private IntWritable OutV = new IntWritable();
    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
        int sum =0;
        int count=0;
        //累加,集合形式 小明,(80,90,100)
        for (IntWritable value : values) {
            sum += value.get();
            count++;
        }
        OutV.set(sum / count);
//        OutV.set(sum);
        context.write(key,OutV);
    }
}

6 测试

写完后,生成jar包,所有类只需要修改Driver里面的路径就行,其他基本不用修改。

image-20220630230634595

运行Driver就行,可以看见

image-20220630231407610

7 任务四:单表连接

1 任务说明

实例中给出child-parent(孩子——父母)表,要求输出grandchild-grandparent(孙子——爷奶)表。

2 输入

file: 
 
child        parent 
Tom        Lucy
Tom        Jack
Jone        Lucy
Jone        Jack
Lucy        Mary
Lucy        Ben
Jack        Alice
Jack        Jesse
Terry        Alice
Terry        Jesse
Philip        Terry
Philip        Alma
Mark        Terry
Mark        Alma
 

image-20220630231541029

3 输出

grandchild        grandparent 
Tom              Alice
Tom              Jesse
Jone              Alice
Jone              Jesse
Tom              Mary
Tom              Ben
Jone              Mary
Jone              Ben
Philip              Alice
Philip              Jesse
Mark              Alice
Mark              Jesse 

4 结构

image-20220630231633696

5 代码

RelationBean
package com.hadoop.task4;

import org.apache.hadoop.io.Writable;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;

public class RelationBean implements Writable {
    private String relationship;
    private String name;

    public RelationBean() {
    }

    public String getRelationship() {
        return relationship;
    }

    public void setRelationship(String relationship) {
        this.relationship = relationship;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(relationship);
        out.writeUTF(name);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.relationship= in.readUTF();
        this.name = in.readUTF();

    }

    @Override
    public String toString() {
        return relationship+" "+name ;
    }
}

RelationDriver
package com.hadoop.task4;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import java.io.IOException;

public class RelationDriver {
    public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
        //1 获取job
        System.setProperty("HADOOP_USER_NAME", "hadoop");
        Configuration conf = new Configuration();
        //设置集群地址
        conf.set("fs.default.name","hdfs://master:9000");
        conf.set("mapreduce.app-submission.cross-platform","true");
        Job job = Job.getInstance(conf);

        // 2 设置jar包路径
        job.setJar("D:\\Java\\hadoop\\project\\MapReduceDemo\\target\\MapReduceDemo-1.0-SNAPSHOT.jar");
        job.setJarByClass(RelationDriver.class);

        // 3 关联mapper和reducer
        job.setMapperClass(RelationMapper.class);
        job.setReducerClass(RelationReducer.class);

//        job.setNumReduceTasks(0);
        // 4  设置map输出的kv类型
        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(RelationBean.class);

        // 5 设置最终输出的kv类型
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        // 6 设置输入路径和输出路径
        FileInputFormat.setInputPaths(job,new Path("/input4"));
        FileOutputFormat.setOutputPath(job,new Path("/output4"));

        // 7 提交job
        boolean result = job.waitForCompletion(true);
        System.exit(result?0:1);


    }
}

RelationMapper
package com.hadoop.task4;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

/*
KEYIN:longwritable
VALUEIN:text
KEYOUT:text
VALUEOUT:intwritable
 */
public class RelationMapper extends Mapper<LongWritable, Text,Text, RelationBean> {
    private Text OutKey = new Text();
    private RelationBean OutV = new RelationBean();
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        //获取一行
        String line = value.toString();
        //分割
        String[] words = line.split("        ");

        //输出child-parent 关系
        OutKey.set(words[0]);
        OutV.setName(words[1]);
        OutV.setRelationship("1");
        context.write(OutKey, OutV);

        //输出parent-child关系
        OutKey.set(words[1]);
        OutV.setName(words[0]);
        OutV.setRelationship("2");
        context.write(OutKey, OutV);

    }

}

RelationReducer
package com.hadoop.task4;

import com.sun.org.apache.xerces.internal.xs.StringList;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

//KEYIN:Text
//VALUEIN:intwritable
//KEYOUT:text
//VALUEOUT:intwritable
public class RelationReducer extends Reducer<Text,RelationBean,Text,Text>{
    private Text OutKey = new Text();
    private Text OutV = new Text();
    @Override
    protected void reduce(Text key, Iterable<RelationBean> values, Context context) throws IOException, InterruptedException {

//        for (int i=0;i)
//        for (RelationBean value : values){
//                OutV.set(value.getName());
//                context.write(key,OutV);
//        }

        //连接输出

        List<String> grandchild = new ArrayList<String>();
        List<String> grandparent = new ArrayList<String>();
        for (RelationBean value : values) {
            if(value.getRelationship().equals("1")){
                grandparent.add(value.getName());
            }else{
                grandchild.add(value.getName());
            }

        }
        for (String child : grandchild) {
            OutKey.set(child);
            for (String parent : grandparent) {
                OutV.set(parent);
                context.write(OutKey,OutV);
            }
        }

    }
}

6 测试

写完后,生成jar包,所有类只需要修改Driver里面的路径就行,其他基本不用修改。

image-20220630230634595

运行Driver就行,可以看见

image-20220630231805760

  • 1
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值