mpareduce案例

最新推荐文章于 2023-08-17 18:42:10 发布

zrp木青

最新推荐文章于 2023-08-17 18:42:10 发布

阅读量102

点赞数

分类专栏： HCIA-BD

本文链接：https://blog.csdn.net/qq_40693443/article/details/118366383

版权

HCIA-BD 专栏收录该内容

17 篇文章 0 订阅

订阅专栏

1.2 好友推荐

1.2.1 需求
该给hadoop推荐哪个间接好友呢？
两个人之间不认识，但是共同好友数越多，推荐的可能性越大

1.2.2 数据集
QQ：
tom hello hadoop cat
hadoop tom hive world
world hadoop hello hive
cat tom hive
mr hive hello
hive cat hadoop world hello mr
hello tom world hive mr

1.2.2.1 特点
每一行是一条记录 map默认一次读取一行
每一条记录第一个名字是其本身
每一条记录第二个以后的名字是其好友
好友之间有可能认识，但是不确定他们是否为直接好友，但能确定的是他们是间接好友。

1.2.2.2 疑问
最应该推荐的好友TopN，如何排名？

1.2.3 案例分析
推荐者与被推荐者一定有一个或多个相同的好友，全局去寻找好友列表中两两关系，去除直接好友，统计两两关系出现次数
tom hello hadoop cat
tom:hello 0 0表示直接好友
tom:hadoop 0
tom:cat 0

hello:hadoop 1 表示间接好友，有共同好友
hello:cat 1
hadoop:cat 1

A:B 1
A:B 1

A:B 1
无论是直接好友关系，还是间接好友关系，编写key是注意顺序。保证顺序无论是A和B还是B和A，都拼接为 A:B。目的是将他们排到一组中。
所以将以上的key调整（将来可以编写一个方法专门处理）：
hello:tom 0
hadoop:tom 0
cat:tom 0
hadoop:hello 1
cat:hello 1
cat:hadoop 1
Mapper任务便可以完成该操作
将全部文件遍历处理完毕，然后将所有相同的关系全部找出来，统计共同好友数，注意：如果这两个用户存在直接好友关系，则丢弃。（已经是直接好友，也就没有推荐的必要了。）

hadoop:hive 1
hadoop:hive 1
hadoop:hive 1
hadoop:hive 0
hadoop:hive 1
类似如上情况出现时，该组数据在reduceTask执行时丢弃（不让context.write()执行）
cat:hadoop 2
cat:hello 2
第一次MR之后可以得到如下中间数据：
cat:hadoop 2
cat:hello 2
cat:mr 1
cat:world 1
hadoop:hello 3
hadoop:mr 1
hive:tom 3
mr:tom 1
mr:world 2
tom:world 2
然后接着分析：
cat:hadoop 2 可以给cat推荐hadoop，因为共同好友是2；同理也可以给hadoop推荐cat，共同好友为2
分别以两个用户为中心，可以将数据进行拆分后在输出：
Key value
cat hadoop,2
hadoop cat,2
将全部的中间数组处理完后然后通过分组，得到如下数据，取其中一个举例：
key value
hadoop cat,2
hadoop hello,3
hadoop mr,1
Map<String,Integer> map = …;
map.put(“cat”,2);
map.put(“hello”,3);
map.put(“mr”,1);
List<Map.Entry<String,Integer>> list = …

hadoop--------------
hello,3
cat,2
mr,1

hadoop hello,cat

然后进行排序，取top2。

所以我们需要编写两套MR程序，用MR1的数据结果，作为MR2的输入数据。
1.2.4 具体实现
1.2.4.1 FOFMain
package com.bjsxt.fof;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class FOFMain {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration(true);
//设置本地运行
conf.set(“mapreduce.framework.name”,“local”);
Job job = Job.getInstance(conf);
job.setJarByClass(FOFMain.class);

    job.setMapperClass(FMapper.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(IntWritable.class);

    job.setReducerClass(FReduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    Path input = new Path("/data/fof/input");
    FileInputFormat.addInputPath(job,input);

    Path output = new Path("/data/fof/output");
    if(output.getFileSystem(conf).exists(output)){
        output.getFileSystem(conf).delete(output,true);
    }
    FileOutputFormat.setOutputPath(job,output);

    job.waitForCompletion(true);
}

}

1.2.4.2 FMapper类
package com.bjsxt.fof;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class FMapper extends Mapper<LongWritable,Text,Text,IntWritable> {
private Text mkey = new Text();
private IntWritable mval = new IntWritable();
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
//tom hello hadoop cat
String[] names = value.toString().split(" ");
//先处理直接关系 tom–> hello hadoop cat
for (int i = 1;i<names.length;i++){
//names[0]+names[i] 保证顺序无论是A和B还是B和A，都拼接为 A:B
mkey.set(getFof(names[0],names[i]));
mval.set(0);//0表示直接关系，1表示间接关系
context.write(mkey,mval);
//处理间接关系 hello-> hadoop cat hadoop-> cat
for(int j = i+1;j<names.length;j++){
mkey.set(getFof(names[i],names[j]));
mval.set(1);//0表示直接关系，1表示间接关系
context.write(mkey,mval);
}
}

}
//保证顺序  无论是A和B还是B和A，都拼接为 A:B
private String getFof(String s1,String s2){
    if(s1.compareTo(s2)<0){
        return s1+":"+s2;
    }
    return s2+":"+s1;
}

}

1.2.4.3 FReduce类
package com.bjsxt.fof;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;

public class FReduce extends Reducer<Text,IntWritable,Text,IntWritable> {
private IntWritable rval = new IntWritable();
@Override
protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
//hello:hadoop 1
//hello:hadoop 1
//hello:hadoop 0
//hello:hadoop 1
boolean flag = true;
int sum = 0;
for (IntWritable val:values) {
if(val.get()==0){
flag = false;
break;
}
sum += val.get();
}
if(flag){
rval.set(sum);
context.write(key,rval);
}
}
}

准备工作：
[root@node1 ~]# vim qq.txt
[root@node1 ~]# hdfs dfs -mkdir -p /data/fof/input
[root@node1 ~]# hdfs dfs -put qq.txt /data/fof/input

运行后得出中间结果：
cat:hadoop 2
cat:hello 2
cat:mr 1
cat:world 1
hadoop:hello 3
hadoop:mr 1
hive:tom 3
mr:tom 1
mr:world 2
tom:world 2

1.2.4.4 FOFMain2
package com.bjsxt.fof2;

public class FOFMain2 {
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration(true);
//设置本地运行
conf.set(“mapreduce.framework.name”,“local”);
Job job = Job.getInstance(conf);
job.setJarByClass(FOFMain2.class);

    job.setMapperClass(F2Mapper.class);
    job.setMapOutputKeyClass(Text.class);
    job.setMapOutputValueClass(Text.class);

    job.setReducerClass(F2Reduce.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(Text.class);
    // 设置inputformat的具体实现，key是行中第一个\t之前的部分，如果没有\t，则整行是key，value是空
    job.setInputFormatClass(KeyValueTextInputFormat.class);

    Path input = new Path("/data/fof/output");
    KeyValueTextInputFormat.addInputPath(job,input);

    Path output = new Path("/data/fof2/output");
    if(output.getFileSystem(conf).exists(output)){
        output.getFileSystem(conf).delete(output,true);
    }
    FileOutputFormat.setOutputPath(job,output);

    job.waitForCompletion(true);
}

}
1.2.4.5 F2Mapper
package com.bjsxt.fof2;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class F2Mapper extends Mapper<Text,Text,Text,Text> {
@Override
protected void map(Text key, Text value, Context context) throws IOException, InterruptedException {
//入：key->cat:hadoop value->2
String names[] = key.toString().split("😊;
String numString = value.toString();

    //出：key->cat  value->hadoop,2
    context.write(new Text(names[0]),new Text(names[1]+","+numString));
    //出：key->hadoop   value->cat,2
    context.write(new Text(names[1]),new Text(names[0]+","+numString));
}

}
1.2.4.6 F2Reduce
package com.bjsxt.fof2;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class F2Reduce extends Reducer<Text,Text,Text,Text> {
Map<String,Integer> contents = null;
@Override
protected void reduce(Text key, Iterable values, Context context) throws IOException, InterruptedException {
/**hadoop cat,2
* hadoop hello,3
* hadoop mr,1
*/
contents = new HashMap<>();

    for (Text val:values) {
        String arrs[] = val.toString().split(",");
        contents.put(arrs[0],Integer.parseInt(arrs[1]));
    }
    //map:
        /**     cat    2
         *      hello  3
         *      mr     1
         */
    //list:
    /**     hello  3
     *      cat    2
     *      mr     1
     */
    List<Map.Entry<String,Integer>> list = new ArrayList<>();
    //遍历map集合
    for (Map.Entry<String,Integer> entry:contents.entrySet()) {
        int valNum = entry.getValue();
        boolean flag = false;
        //遍历list集合，添加到对应的位置
        for (int i =0;i<list.size();i++) {
            if(valNum>list.get(i).getValue()){
                list.add(i,entry);
                flag = true;
                break;
            }
        }
        //比已有的都小，添加到最后
        if(!flag){
            list.add(entry);
        }

    }
    //获取推荐好友：top2
    String result ="";
    for (int i = 0;i<(list.size()>2?2:list.size());i++){
        result += list.get(i).getKey()+",";
    }
    //去掉最后一个","
    result = result.substring(0,result.length()-1);
    context.write(key,new Text(result));

}

}

cat hello,hadoop
hadoop hello,cat
hello hadoop,cat
hive tom
mr world,tom
tom hive,world
world tom,mr

zrp木青

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
mpareduce案例

1.2 好友推荐1.2.1 需求该给hadoop推荐哪个间接好友呢？两个人之间不认识，但是共同好友数越多，推荐的可能性越大1.2.2 数据集QQ：tom hello hadoop cathadoop tom hive worldworld hadoop hello hivecat tom hivemr hive hellohive cat hadoop world hello mrhello tom world hive mr1.2.2.1 特点每一行是一条记录 map默认一次
复制链接

扫一扫