利用hadoop二次排序进行用户行为分析

最新推荐文章于 2022-07-20 12:01:12 发布

小飞_侠

最新推荐文章于 2022-07-20 12:01:12 发布

阅读量1.3k

点赞数 1

分类专栏： hadoop 文章标签： mapreduce hadoop 二次排序用户行为分析

本文链接：https://blog.csdn.net/a6210575/article/details/24104253

版权

hadoop 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1、应用场景说明

在对用户行为进行分析之前，需要对用户行为按session进行关联，或记录每个用户的访问某个页面的时间。

原始日志：

用户 \t 访问时间 \t 访问页面

1111 20140416 05:55 page1

1111 20140416 06:01 page2

1111 20140416 06:06 page3

1111 20140416 09:06 page4

2222 20140416 09:10 page1

2222 20140416 09:15 page3

你需要的结果是：

（1）用户行为关联

这里我先定义session会话的时间为20分钟，在20分钟内的为一次会话，因此，

1111用户在会话1中访问了page1、page2、page3，在会话2中访问了page4

2222用户在会话1中访问了page1、page3

（2）统计用户的访问时间

1111用户在05:55的入口是page1，出口是page4，访问了4个页面，停留时间为3小时11分钟。

2222用户在09:10的入口是page1，出口是page3，访问了2个页面，停留时间为5分钟。

2、计算方法

在map阶段，将日志解析，key为用户id和访问时间，value为用户访问的页面

public  static class Map extends Mapper<LongWritable,Text,Text,Text>
{
     public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException
     {
          String line = value.toString();
          
          String []lineArray = line.split("\t",-1);
          
          if(lineArray.length<3)
               return;
     
          String id = lineArray[0];
          String time = lineArray[1];
          String page = lineArray[2];

          context.write(new Text(id+"\t"+time),new Text(page));
          
     }
}

在partition阶段，将key进行分桶，由于我们将同一个用户的行为进行关联，因此这里只将first_key相同的分在同一个桶内。

public static class   SecondPatitionor  extends Partitioner<Text,Text>
{
     @Override
     public int getPatition(Text key,Text value,int redueceNumber)
     {
          // TODO Auto-generated method stub
          String skey = key.toString().split("\t")[0];
                             
          return (skey.hashCode()&Integer.MAX_VALUE)%reduceNumber;
          
     }
}

在shuffle和sort阶段，将分到同一reduce中的数据进行分组，同进行组内排序。

GroupingComparator有两种方法：

 /*
    * （1）继承writablecomparator(extend)
    * 这里必须有一个构造函数，并且重载public int compare（writablecomparable a，writablecomparable b）
    * （2）实现接口RawComparator(implements)
    *
    */
   public static class FirstGroupingComparator extends WritableComparator
   {
        public FirstGroupingComparator()
        {
             super(Text.class,true);
        }
       
        @SuppressWarnings("rawtypes")
        @Override
        public int compare( WritableComparable o1,  WritableComparable  o2)
        {
              Text ot1 = (Text)o1;
              Text ot2 = (Text)o2;
             
              /**
               * 另一个分组方法(将id相同的分在同一个组内)(统计用户的访问时间)
               */
//               String l = ot1.toString().split("\t")[0];
//               String r = ot2.toString().split("\t")[0];
//               return l.compareTo(r);         

               //用户行为关联
              return ot1.compareTo(ot2);
             
          }
       
   }

keyComparator

 /**
    * key比较函数类     first_key降序或升序，last_key降序或升序
    *
    * （1）继承writablecomparator(extend)
    * 这里必须有一个构造函数，并且重载public int compare（writablecomparable a，writablecomparable b）
    * （2）实现接口RawComparator(implements)
    *
    *
    */
   public static class KeyComparator extends WritableComparator
   {
             public KeyComparator() {
               // TODO Auto-generated constructor stub
                  super(Text.class,true);
          }

          @SuppressWarnings("rawtypes")
          @Override
          public int compare(WritableComparable a, WritableComparable b) {
               // TODO Auto-generated method stub
               Text at = (Text)a;
               Text bt = (Text)b;
              
               String sa1 = at.toString().split("\t")[0];
               String sa2 = at.toString().split("\t")[1];
              
               String sb1 = bt.toString().split("\t")[0];
               String sb2 = bt.toString().split("\t")[1];
              
               if(sa1.compareTo(sb1)!=0)
                    return sa1.compareTo(sb1);
               else if(sa2.compareTo(sb2)!=0)
                    return sa2.compareTo(sb2);
               else
                    return 0;
              
          }
       
   }

在reduce阶段，写自己的统计逻辑就可以啦！！！

public static class Reduce extends Reducer<Text, Text, Text, Text> 
  {      
     
     public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException 
     { 
           
         /**
           * 写自己的统计逻辑
           */
       
     }  
  }

到这里利用hadoop的二次排序进行用户行为的分析介绍结束。

转载请注明出处：

http://blog.csdn.net/a6210575/article/details/24104253