Hadoop案例之二度人脉与好友推荐

Hadoop案例之二度人脉与好友推荐

参考:

https://my.oschina.net/u/176897/blog/99761

1.实例描述

社交网站上的各个用户以及用户之间的相互关注可以抽象为一个图。以下图为例:

                                                                     图1

顶点A、B、C到I分别是社交网站的用户,两顶点之间的边表示两顶点代表的用户之间相互关注。那么如何根据用户之间相互关注所构成的图,来向每个用户推荐好友呢?

 

现在我们以上图为例,介绍下如何利用用户之间相互关注所构成的图,来向每个用户推荐好友。首先我们不得不假设的是如果两用户之间相互关注,那么我们认为他们认识或者说是现实中的好友,至少应该认识。假设我们现在需要向用户I推荐好友,我们发现用户I的好友有H、G、C。其中H的好友还有A,G的好友还有F,C的好友还有B、F。那么用户I、H、G、C、A、B、F极有可能是同一个圈子里的人。我们应该把用户A、B、F推荐给用户I认识。进一步的想,用户F跟两位I的好友C、G是好友,而用户A、B都分别只跟一位I的好友是好友,那么相对于A、B来说,F当然更应该推荐给用户I认识。

 

可能你会发现,在上面的分析中,我们使用了用户I的二度人脉作为他的推荐好友,而且我们对用户I的每个二度人脉进行了投票处理,选举出最优推荐。其实,我觉得,二度人脉的结果只能看看某个用户的在社交网站上的人际关系链,而基于投票选举产生的二度人脉才是好友推荐功能中所需要的好友

 

2.设计思路

我们的输入是deg2friend.txt,保存用户之间相互关注的信息。每行有两个用户ID,以逗号分割,表示这两个用户之间相互关注即认识。

A,B

B,C

C,D

D,E

E,F

F,D

F,C

F,G

G,I

G,H

H,I

I,C

H,A

二度好友的计算需要两轮的MapReduce。第一轮MapReduce的Map中,如果输入是“H,I”,我们的输出是key=H,value=“H,I”跟key=I,value=“H,I”两条结果。前者表示I可以通过H去发现他的二度好友,后者表示H可以通过I去发现他的二度好友。

 

根据第一轮MapReduce的Map,第一轮MapReduce的Reduce 的输入是例如key =I,value={“H,I”、“C,I”、“G,I”} 。其实Reduce 的输入是所有与Key代表的结点相互关注的人。如果H、C、G是与I相互关注的好友,那么H、C、G就可能是二度好友的关系,如果他们之间不是相互关注的。对应最上面的图,H与C是二度好友,G与C是二度好友,但G与H不是二度好友,因为他们是相互关注的。第一轮MapReduce的Reduce的处理就是把相互关注的好友对标记为一度好友(“deg1friend”)并输出,把有可能是二度好友的好友对标记为二度好友(“deg2friend”)并输出。

 

第二轮MapReduce则需要根据第一轮MapReduce的输出,即每个好友对之间是否是一度好友(“deg1friend”),是否有可能是二度好友(“deg2friend”)的关系,确认他们之间是不是真正的二度好友关系。如果他们有deg1friend的标签,那么不可能是二度好友的关系;如果有deg2friend的标签、没有deg1friend的标签,那么他们就是二度好友的关系。另外,特别可以利用的是,某好友对deg2friend标签的个数就是他们成为二度好友的支持数,即他们之间可以通过多少个都相互关注的好友认识。

 

 

3.程序代码

package Hadoop_Deg2friend;

 

import java.io.IOException;

import java.util.Vector;

 

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

 

public class Deg2friend {

 

   //map1

   public static class Map1 extends Mapper<Object, Text, Text, Text>

   {

     private Text map1_key = new Text();

     private Text map1_value = new Text();

    

     @Override

     protected void map(Object key, Text value, Context context)

          throws IOException, InterruptedException {

        String[] eachterm = value.toString().split(",");

        if (eachterm.length != 2) {

          return;

        }

       

        if (eachterm[0].compareTo(eachterm[1]) < 0) {

          map1_value.set(eachterm[0] + "\t" + eachterm[1]);

        }

        else if (eachterm[0].compareTo(eachterm[1]) > 0) {

          map1_value.set(eachterm[1] + "\t" + eachterm[0]);

        }

       

        map1_key.set(eachterm[0]);

        context.write(map1_key, map1_value);

       

        map1_key.set(eachterm[1]);

        context.write(map1_key, map1_value);

     }

   }

   //reduce1

   public static class Reduce1 extends Reducer<Text, Text, Text, Text>

   {

     @Override

     protected void reduce(Text key, Iterable<Text> values, Context context)

          throws IOException, InterruptedException {

        Vector<String> hisFriends = new Vector<String>();

       

        for(Text val : values)

        {

          String[] eachterm = val.toString().split("\t");

          if (eachterm[0].equals(key.toString())) {

             hisFriends.add(eachterm[1]);

             context.write(val, new Text("deg1friend"));

          }

          if (eachterm[1].equals(key.toString())) {

             hisFriends.add(eachterm[0]);

             context.write(val, new Text("deg1friend"));

          }

        }

       

        for(int i = 0; i < hisFriends.size(); i++)

        {

          for(int j = 0; j < hisFriends.size(); j++)

          {

             if (hisFriends.elementAt(i).compareTo(hisFriends.elementAt(j)) < 0) {

               Text reduce_key = new Text(hisFriends.elementAt(i)+"\t"+hisFriends.elementAt(j));

               context.write(reduce_key, new Text("deg2friend"));

             }

          }

        }

     }

   }

  

   //map2

   public static class Map2 extends Mapper<Object, Text, Text, Text>

   {

     @Override

     protected void map(Object key, Text value, Context context)

          throws IOException, InterruptedException {

       

        String[] line = value.toString().split("\t");

        if (line.length == 3) {

          Text map2_key = new Text(line[0]+"\t"+line[1]);

          Text map2_value = new Text(line[2]);

          context.write(map2_key, map2_value);

        }

            

     }

   }

  

   //reduce2

   public static class Reduce2 extends Reducer<Text, Text, Text, Text>

   {

     @Override

     protected void reduce(Text key, Iterable<Text> values, Context context)

          throws IOException, InterruptedException {

        boolean isdeg1 = false;

        boolean isdeg2 = false;

        int count = 0;

       

        for(Text val : values)

        {

          if (val.toString().compareTo("deg1friend") == 0) {

             isdeg1 = true;

          }

          if (val.toString().compareTo("deg2friend") == 0) {

             isdeg2 = true;

             count++;

          }

        }

       

        if ((!isdeg1) && isdeg2) {

          context.write(new Text(String.valueOf(count)),key);

        }

     }

   }

   //main

   public static void main(String[] args) throws Exception {

     Configuration conf = new Configuration();

       String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();

       if (otherArgs.length != 3) {

        System.err.println("Usage:Deg2friend <in> <temp> <out>");

        System.exit(2);

     }

       Job job1 = new Job(conf, "Deg2friend");

       job1.setJarByClass(Deg2friend.class);

       job1.setMapperClass(Map1.class);

       job1.setReducerClass(Reduce1.class);

       job1.setOutputKeyClass(Text.class);

       job1.setOutputValueClass(Text.class);

      

       FileInputFormat.addInputPath(job1, new Path(otherArgs[0]));

       FileOutputFormat.setOutputPath(job1, new Path(otherArgs[1]));

      

       if (job1.waitForCompletion(true)) {

        Job job2 = new Job(conf, "Deg2friend");

        job2.setJarByClass(Deg2friend.class);

        job2.setMapperClass(Map2.class);

        job2.setReducerClass(Reduce2.class);

        job2.setOutputKeyClass(Text.class);

        job2.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job2, new Path(otherArgs[1]));

        FileOutputFormat.setOutputPath(job2, new Path(otherArgs[2]));

       

        System.exit(job2.waitForCompletion(true)? 0 : 1);

       

     }

       System.exit(job1.waitForCompletion(true)? 0 : 1);

   }

}

 

 

4. 程序执行

root@node1:/usr/local/hadoop/hadoop-2.5.2/myJar#hadoop jar Deg2friend.jarHadoop_Deg2friend.Deg2friend /usr/local/hadooptempdata/input/deg2 /usr/local/hadooptempdata/temp/deg2/usr/local/hadooptempdata/output/deg2

16/12/30 23:35:36 INFO client.RMProxy: Connectingto ResourceManager at node1/192.168.233.129:8032

16/12/30 23:35:40 INFOinput.FileInputFormat: Total input paths to process : 1

16/12/30 23:35:41 INFOmapreduce.JobSubmitter: number of splits:1

16/12/30 23:35:43 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1483111826986_0001

16/12/30 23:35:45 INFO impl.YarnClientImpl:Submitted application application_1483111826986_0001

16/12/30 23:35:45 INFO mapreduce.Job: Theurl to track the job: http://node1:8088/proxy/application_1483111826986_0001/

16/12/30 23:35:45 INFO mapreduce.Job:Running job: job_1483111826986_0001

16/12/30 23:36:32 INFO mapreduce.Job: Jobjob_1483111826986_0001 running in uber mode : false

16/12/30 23:36:32 INFO mapreduce.Job:  map 0% reduce 0%

16/12/30 23:37:36 INFO mapreduce.Job:  map 100% reduce 0%

16/12/30 23:38:21 INFO mapreduce.Job:  map 100% reduce 100%

16/12/30 23:38:24 INFO mapreduce.Job: Jobjob_1483111826986_0001 completed successfully

16/12/30 23:38:28 INFO mapreduce.Job:Counters: 49

         FileSystem Counters

                   FILE:Number of bytes read=214

                   FILE:Number of bytes written=197899

                   FILE:Number of read operations=0

                   FILE:Number of large read operations=0

                   FILE:Number of write operations=0

                   HDFS:Number of bytes read=178

                   HDFS:Number of bytes written=795

                   HDFS:Number of read operations=6

                   HDFS:Number of large read operations=0

                   HDFS:Number of write operations=2

         JobCounters

                   Launchedmap tasks=1

                   Launchedreduce tasks=1

                   Data-localmap tasks=1

                   Totaltime spent by all maps in occupied slots (ms)=60503

                   Totaltime spent by all reduces in occupied slots (ms)=38314

                   Totaltime spent by all map tasks (ms)=60503

                   Totaltime spent by all reduce tasks (ms)=38314

                   Totalvcore-seconds taken by all map tasks=60503

                   Totalvcore-seconds taken by all reduce tasks=38314

                   Totalmegabyte-seconds taken by all map tasks=61955072

                   Totalmegabyte-seconds taken by all reduce tasks=39233536

         Map-ReduceFramework

                   Mapinput records=13

                   Mapoutput records=26

                   Mapoutput bytes=156

                   Mapoutput materialized bytes=214

                   Inputsplit bytes=126

                   Combineinput records=0

                   Combineoutput records=0

                   Reduceinput groups=9

                   Reduceshuffle bytes=214

                   Reduceinput records=26

                   Reduceoutput records=53

                   SpilledRecords=52

                   ShuffledMaps =1

                   FailedShuffles=0

                   MergedMap outputs=1

                   GCtime elapsed (ms)=406

                   CPUtime spent (ms)=2790

                   Physicalmemory (bytes) snapshot=290168832

                   Virtualmemory (bytes) snapshot=3772538880

                   Totalcommitted heap usage (bytes)=139837440

         ShuffleErrors

                   BAD_ID=0

                   CONNECTION=0

                   IO_ERROR=0

                   WRONG_LENGTH=0

                   WRONG_MAP=0

                   WRONG_REDUCE=0

         FileInput Format Counters

                   BytesRead=52

         FileOutput Format Counters

                   BytesWritten=795

16/12/30 23:38:29 INFO client.RMProxy:Connecting to ResourceManager at node1/192.168.233.129:8032

16/12/30 23:38:41 INFOinput.FileInputFormat: Total input paths to process : 1

16/12/30 23:38:42 INFOmapreduce.JobSubmitter: number of splits:1

16/12/30 23:38:42 INFOmapreduce.JobSubmitter: Submitting tokens for job: job_1483111826986_0002

16/12/30 23:38:43 INFO impl.YarnClientImpl:Submitted application application_1483111826986_0002

16/12/30 23:38:43 INFO mapreduce.Job: Theurl to track the job: http://node1:8088/proxy/application_1483111826986_0002/

16/12/30 23:38:43 INFO mapreduce.Job:Running job: job_1483111826986_0002

16/12/30 23:39:26 INFO mapreduce.Job: Jobjob_1483111826986_0002 running in uber mode : false

16/12/30 23:39:26 INFO mapreduce.Job:  map 0% reduce 0%

16/12/30 23:40:30 INFO mapreduce.Job:  map 100% reduce 0%

16/12/30 23:40:59 INFO mapreduce.Job:  map 100% reduce 100%

16/12/30 23:41:00 INFO mapreduce.Job: Jobjob_1483111826986_0002 completed successfully

16/12/30 23:41:01 INFO mapreduce.Job:Counters: 49

         FileSystem Counters

                   FILE:Number of bytes read=907

                   FILE:Number of bytes written=199287

                   FILE:Number of read operations=0

                   FILE:Number of large read operations=0

                   FILE:Number of write operations=0

                   HDFS:Number of bytes read=924

                   HDFS:Number of bytes written=90

                   HDFS:Number of read operations=6

                   HDFS:Number of large read operations=0

                   HDFS:Number of write operations=2

         JobCounters

                   Launchedmap tasks=1

                   Launchedreduce tasks=1

                   Data-localmap tasks=1

                   Totaltime spent by all maps in occupied slots (ms)=47074

                   Totaltime spent by all reduces in occupied slots (ms)=36364

                   Totaltime spent by all map tasks (ms)=47074

                   Totaltime spent by all reduce tasks (ms)=36364

                   Totalvcore-seconds taken by all map tasks=47074

                   Totalvcore-seconds taken by all reduce tasks=36364

                   Totalmegabyte-seconds taken by all map tasks=48203776

                   Totalmegabyte-seconds taken by all reduce tasks=37236736

         Map-ReduceFramework

                   Mapinput records=53

                   Mapoutput records=53

                   Mapoutput bytes=795

                   Mapoutput materialized bytes=907

                   Inputsplit bytes=129

                   Combineinput records=0

                   Combineoutput records=0

                   Reduceinput groups=28

                   Reduceshuffle bytes=907

                   Reduceinput records=53

                   Reduceoutput records=15

                   SpilledRecords=106

                   ShuffledMaps =1

                   FailedShuffles=0

                   MergedMap outputs=1

                   GCtime elapsed (ms)=268

                   CPUtime spent (ms)=2570

                   Physicalmemory (bytes) snapshot=296046592

                   Virtualmemory (bytes) snapshot=3772530688

                   Totalcommitted heap usage (bytes)=140697600

         ShuffleErrors

                   BAD_ID=0

                   CONNECTION=0

                   IO_ERROR=0

                   WRONG_LENGTH=0

                   WRONG_MAP=0

                   WRONG_REDUCE=0

         FileInput Format Counters

                   BytesRead=795

         FileOutput Format Counters

                   BytesWritten=90

5.输出结果

root@node1:/usr/local/hadoop/hadoop-2.5.2/myJar#hdfs dfs -cat /usr/local/hadooptempdata/output/deg2/*

1       A       C

1       A       G

1       A       I

1       B       D

1       B       F

1       B       H

1       B       I

2       C       E

2       C       G

1       C       H

1       D      G

1       D      I

1       E       G

1       F       H

2       F       I

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值