MapReduce之CWBTIAB (简单推荐系统）

最新推荐文章于 2023-02-23 20:22:47 发布

路人张的鱼生

最新推荐文章于 2023-02-23 20:22:47 发布

阅读量528

点赞数 1

分类专栏： MapReduce 文章标签： MapReduce

本文链接：https://blog.csdn.net/zhangdy12307/article/details/99417325

版权

MapReduce 专栏收录该内容

41 篇文章 8 订阅

订阅专栏

MapReduce之CWBTIAB

背景

在大多数电商网站中使用 “ 购买过该商品的顾客还购买过哪些商品 “ 即(CWBTIAB)特性来推荐图书或其他商品

原理

通过统计交易集中购买过该商品的所有顾客所购买过的所有商品，对商品购买次数进行统计，进而得出购买过该商品的顾客可能会购买哪些商品

利用这个原理，可以制作一个简单的推荐系统，如：猜你喜欢

设计方案

利用MapReduce的两次迭代实现CWBTIAB功能

阶段1：生成同一个用户购买的所有商品列表
阶段2：解决列表商品的共现问题，使用stripes（条纹）设计模式，统计处5个最常见的共现商品

Stripes设计模式

该模式的设计思想主要是将键-值对分组为一个关联数组，这里给出一个映射器发出的键-值对

键	值
(k, $k_1$ )	3
(k, $k_2$ )	2
(k, $k_3$ )	4
(k, $k_4$ )	6
(z, $z_1$ )	7
(z, $z_2$ )	8
(z, $z_3$ )	5

Stripes方法的基本思想则不是发出很多键-值对，只对每个条纹发出一个键-值对，如下

键	值
k	{( $k_1$ ,3),( $k_2$ ,2),( $k_3$ ,4),( $k_4$ ,6)}
z	{( $z_1$ ,7),( $z_2$ ,8),( $z_3$ ,5)}

Stripes方法为每个自然键创建一个关联数组，并归约各个映射器发出的键-值对数，各个映射器发出的值转换为一个复杂的对象时，采用Stripes方法可以减少键-值对的排序和洗牌
Stripes方法中归约器的工作情况如下：归约器会对关联数组完成一个元素级的求和，如下：
K -> { ( a , 1 ) , ( b , 2 ) , ( c , 4 ) , ( d , 3 ) }
K -> { ( a , 2 ) , ( c , 2 ) }
K -> { ( a , 3 ) , ( b , 5 ) , ( d , 5 ) }
将生成以下输出：
K -> { ( a , 1+2+3) , ( b , 2+5 ) , ( c , 4+2 ) , ( d , 3+5 ) }
或：
K -> { ( a , 6) , ( b , 7 ) , ( c , 6 ) , ( d , 8 ) }
Stripes方法的优点如下：

与传统方法相比，由于映射器生成的键-值对更少，所以需要的排序和洗牌也更少
采用Stripes方法，可以充分利用组合器（完成个结点本地优化）
Stripes方法的缺点如下：
较难实现（因为各映射器发出的值是一个关联数组，必须为这个关联数组写一个串行化器和逆串行化器
底层对象是更重量级的对象
Stripes在事件空间大小方面存在一个基本限制（需确保映射器有足够的RAM来保存关联数组）

样例输入

假设输入是一个大的交易集（包括顾客ID，交易时间，价格，商品ID）

样例输入程序设计如下

package com.deng.CWBTIAB;

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Random;

public class create {
    public static void main(String[] args) throws IOException {
        String path="input/CWBTIAB.txt";
        File file=new File(path);
        if(!file.exists()){
            file.getParentFile().mkdirs();
        }
        file.createNewFile();
        FileWriter fw=new FileWriter(file,true);
        BufferedWriter bw=new BufferedWriter(fw);
//生成顾客id，交易id，时间，价格，商品id等
        for(int i=0;i<10000;i++){
            int id=(int)(Math.random()*400+1000);
            int time=(int)(Math.random()*20+2000);
            bw.write("UserId="+id+" dealId="+id+time+" dealTime="+time+" price="+(int)(Math.random()*1000)+" goodId="+(int)(Math.random()*20)+"\n");
        }
        bw.flush();
        bw.close();
        fw.close();
    }
}

结果如下
运行结果

MapReduce阶段1

mapper阶段思路

mapper阶段根据用户ID统计用户购买的商品,发送<userId,goodID>的键-值对

mapper阶段编码

package com.deng.CWBTIAB;

import java.io.IOException;
import java.util.Map;

import com.deng.MRDPUtil;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.mapreduce.Mapper;

public class findUserGoodsMapper extends Mapper<LongWritable,Text,Text,Text>{
    public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
        String line=value.toString();
        //MRDPUtil为工具类，后面给出
        Map<String,String> parsed= MRDPUtil.transInformation(line);
        String userId=parsed.get("UserId");
        String good=parsed.get("goodId");
        context.write(new Text(userId),new Text(good));
    }
}

MRDPUtil类如下

通过使用等号切割字符，生成<属性，值>的键-值对，直接通过商品属性查找值

package com.deng;

import java.util.HashMap;
import java.util.Map;

public class MRDPUtil {
    public static Map<String,String> transInformation(String s){
        Map<String,String> mp=new HashMap<String, String>();
        String[] tokens=s.split(" ");
        for(int i=0;i<tokens.length;i++){
            String key=tokens[i].substring(0,getlocation(tokens[i]));
            String val=tokens[i].substring(getlocation(tokens[i])+1);
            mp.put(key,val);
        }
        return mp;
    }
     public static int getlocation(String s){
        int location;
        for( location=0;location<s.length();location++){
            if(s.charAt(location)=='='){
                break;
            }
        }
        return location;
    }
}

reducer阶段任务

对用户的所有商品进行分组

reducer阶段编码

package com.deng.CWBTIAB;

import java.io.IOException;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class findUserGoodsReducer extends Reducer<Text,Text,Text,Text>{
    public void reduce(Text key, Iterable<Text> values,Context context) throws IOException,InterruptedException{
        StringBuilder goods=new StringBuilder();
        for(Text value:values){
            goods.append(value);
            goods.append("-");
        }
        //将所有的货物ID存储到一起，使用 ’ - ‘作为分割
        context.write(new Text("UserId="+key),new Text(" goods="+goods.toString()));
    }
}

Mapreduce阶段2

MapReduce阶段使用Stripes方法解决商品共现问题

mapper阶段任务

利用MapWritable统计每个商品出现的次数，发出<goodId,map>的键-值对

mapper阶段编码

package com.deng.CWBTIAB;

import com.deng.MRDPUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;
import java.util.HashMap;
import java.util.Map;;

public class stripesMapper extends Mapper<LongWritable,Text,Text,MapWritable>{
    public IntWritable cnt=new IntWritable();
    public IntWritable ZERO=new IntWritable(0);
    public Text tag;

    public void map(LongWritable key,Text value,Context context) throws IOException,InterruptedException{
        String line=value.toString();
        Map<String,String> parsed= MRDPUtil.transInformation(line);
        String[] goods=parsed.get("goods").split("-");
        Map<String,Integer> count=new HashMap<String, Integer>();
        for(String good:goods){
            if(count.get(good)==null){
                count.put(good,1);
                MapWritable mp=new MapWritable();
                for(String item:goods){
                    tag=new Text(item);
                    if(mp.get(tag)==null){
                        cnt.set(0);
                    }else{
                        cnt=(IntWritable) mp.get(tag);
                    }
                    cnt.set(cnt.get()+1);
                    mp.put(tag,cnt);
                }
                context.write(new Text("good="+good),new MapWritable(mp));
            }else {
                continue;
            }
        }
    }
}

reducer阶段任务

对于交易集中的每一件商品，生成与这个商品最常购买的前五个商品

reducer阶段编码

package com.deng.CWBTIAB;

import java.io.IOException;
import java.util.Collections;
import java.util.TreeMap;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Reducer;

public class stripesReducer extends Reducer<Text, MapWritable,Text,Text>{
    public String element_key;
    public IntWritable cnt=new IntWritable();
    public void reduce(Text key,Iterable<MapWritable> values,Context context) throws IOException,InterruptedException{
    //使用TreeMap函数统计最受欢迎的前五个商品
        TreeMap<String,Integer> counts=new TreeMap<String, Integer>(Collections.<String>reverseOrder());
        for(MapWritable value:values){
            if(!value.isEmpty()){
                for(Writable elememt:value.keySet()){
                    cnt=((IntWritable) value.get(elememt));
                    element_key=elememt.toString();
                    //防止出现空值
                    if(counts.get(element_key)==null) counts.put(element_key,0);
                    counts.put(element_key,counts.get(element_key)+cnt.get());
                    if(counts.size()>5){
                        counts.remove(counts.firstKey());
                    }
                }
            }
        }
        StringBuffer goods=new StringBuffer();
        for(String s:counts.keySet()){
            goods.append(s);
            goods.append(' ');
        }
        context.write(key,new Text("Goods is : "+goods));
    }
}

驱动程序如下

使用作业链的方式完成

package com.deng.CWBTIAB;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.MapWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class CWBTIABDriver {
    public static void main(String[] args) throws Exception{
        FileUtil.deleteDirs("output");
        FileUtil.deleteDirs("output2");
        Configuration conf=new Configuration();
        String[] otherArgs=new String[]{"input/CWBTIAB.txt","output","output2"};
        Job userGoodsJob=new Job(conf,"findUserGoods");
        userGoodsJob.setJarByClass(CWBTIABDriver.class);
        userGoodsJob.setMapperClass(findUserGoodsMapper.class);
        userGoodsJob.setReducerClass(findUserGoodsReducer.class);
        userGoodsJob.setOutputKeyClass(Text.class);
        userGoodsJob.setOutputValueClass(Text.class);
        FileInputFormat.addInputPath(userGoodsJob,new Path(otherArgs[0]));
        FileOutputFormat.setOutputPath(userGoodsJob,new Path(otherArgs[1]));
        int code=userGoodsJob.waitForCompletion(true)?0:1;
        if(code==0){
            Job stripesJob=new Job(conf,"stripes");
            stripesJob.setJarByClass(CWBTIABDriver.class);
            stripesJob.setMapperClass(stripesMapper.class);
            stripesJob.setReducerClass(stripesReducer.class);
            stripesJob.setOutputKeyClass(Text.class);
            stripesJob.setOutputValueClass(MapWritable.class);
            FileInputFormat.addInputPath(stripesJob,new Path("output/part-r-00000"));
            FileOutputFormat.setOutputPath(stripesJob,new Path(otherArgs[2]));
            System.exit(stripesJob.waitForCompletion(true)?0:1);
        }
    }
}

运行结果如下：
在这里插入图片描述
应该是由于随机生成数的问题，各个数字出现的概论是均等的，所以才会产生如下结果

路人张的鱼生

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
MapReduce之CWBTIAB (简单推荐系统）

MapReduce之CWBTIAB背景在大多数电商网站中使用 “ 购买过该商品的顾客还购买过哪些商品 “ 即(CWBTIAB)特性来推荐图书或其他商品原理通过统计交易集中购买过该商品的所有顾客所购买过的所有商品，对商品购买次数进行统计，进而得出购买过该商品的顾客可能会购买哪些商品设计方案利用MapReduce的两次迭代实现CWBTIAB功能阶段1：生成同一个用户购买的所有商品列表...
复制链接

扫一扫