mapreduce实现推荐系统(UserCF-基于用户的协同过滤算法)

博客说明:

博客内容用于学习与分享,有问题欢迎大家讨论留言。

关于作者:
程序员:杨洪(ellende)
blog: http://blog.csdn.net/ellende
email: yangh.personal@qq.com

转载请注明出处,引用部分网上博客,若有侵权还请作者联系与我。



用户推荐协同过滤算法(UserCF)原理说明


基于用户的协同过滤,通过不同用户对物品的评分来评测用户之间的相似性,基于用户之间的相似性做出推荐。简单来讲就是:给用户推荐和他兴趣相似的其他用户喜欢的物品。


1.原始数据输入

1,101,5.0

1,102,3.0

1,103,2.5

2,101,2.0

2,102,2.5

2,103,5.0

2,104,2.0

3,101,2.5

3,104,4.0

3,105,4.5

3,107,5.0

4,101,5.0

4,103,3.0

4,104,4.5

4,106,4.0

5,101,4.0

5,102,3.0

5,103,2.0

5,104,4.0

5,105,3.5

5,106,4.0 


2.构成矩阵

     101 102 103 104 105 106 107

[1,] 5.0 3.0 2.5 0.0 0.0   0   0

[2,] 2.0 2.5 5.0 2.0 0.0   0   0

[3,] 2.5 0.0 0.0 4.0 4.5   0   5

[4,] 5.0 0.0 3.0 4.5 0.0   4   0

[5,] 4.0 3.0 2.0 4.0 3.5   4   0 


3.欧氏相似矩阵转换

          [,1]      [,2]      [,3]      [,4]      [,5]

[1,] 0.0000000 0.6076560 0.2857143 1.0000000 1.0000000

[2,] 0.6076560 0.0000000 0.6532633 0.5568464 0.7761999

[3,] 0.2857143 0.6532633 0.0000000 0.5634581 1.0000000

[4,] 1.0000000 0.5568464 0.5634581 0.0000000 1.0000000

[5,] 1.0000000 0.7761999 1.0000000 1.0000000 0.0000000


计算方式:

相似度=n/(1+sqrt(sum((Xi-Yi)^2)))

即需要对两个向量元素做差值并平方再求和再开方,开方后加1,最后n是有效向量差值个数。

如用户1和用户2的相似度计算:

(5.0-2.0)^2 + (3.0-2.5)^2 + (2.5-5.0)^2 = 15.5  //之所以只有3个向量元素做差值 是因为要两个向量元素都为非0值才做差值计算

3/(1+sqrt(15.5)) = 0.607656


如用户1和用户4的相似度计算:

(5.0-5.0)^2 + (2.5-3.0)^2 = 0.25

2/(1+sqrt(0.25)) = 1.333333 因大于1 取相似度为1.000000(程序里去掉了这个限制)


4.最近邻矩阵

根据欧氏相似矩阵找出用户相似度最高的前2个用户,如下所示:

     top1 top2

[1,]    4    5

[2,]    5    3

[3,]    5    2

[4,]    1    5

[5,]    1    3 


如用户1相似度排序:4[1.0],5[1.0],2[0.607],3[0.285],1[0.0]


5.以用户1为例的推荐矩阵

用户1前2个最高相似度是用户4和用户5,分别列出对应评分矩阵:

   101  102  103  104  105  106  107

1  5.0  3.0  2.5  0.0  0.0  0.0  0.0

4  5.0  0.0  3.0  4.5  0.0    4    0

5  4.0  3.0  2.0  4.0  3.5    4    0 

去掉用户1已经买过的物品,即101,102,103,剩下用户1未买过的物品进行推荐,推荐矩阵如下:

   101  102  103  104  105  106  107

4    0    0    0  4.5  0.0    4    0

5    0    0    0  4.0  3.5    4    0 


6.以用户1为例的推荐结果

用户1未购买的物品分别得分:

104[(4.5+4)/2=4.25],106[(4+4)/2=4],105[(0+3.5)/2=1.75],107[(0+0)/2=0]

最后推荐前2个物品,矩阵如下:

    推荐物品   物品得分

[1]  "104"    "4.25"

[2]  "106"    "4" 


7.代码实现

主要基于hadoop实现mapreduce并行算法,UserCF算法在网上并行实现的不多,这里作为练习实现下,主要分为5步实现:

步骤1: 将数据输入整理,为计算欧氏相似矩阵准备数据。

步骤2: 依赖步骤1输出数据,计算欧氏相似矩阵完成。

步骤3: 依赖步骤2输出数据,根据欧氏相似矩阵找出用户相似度最高的前2个用户。

步骤4: 依赖步骤3输出数据和原始数据,计算出每个用户与相似度最高的2个用户之间未买过的物品进行推荐,输出推荐矩阵,输出数据是每个用户对应的推荐物品的平均值。

步骤5: 依赖步骤4输出数据,根据每个用户对应的推荐物品的平均值,计算出前3个推荐物品。


主要源文件:

1)HdfsDAO.java 是一个HDFS操作的工具,用API实现Hadoop的各种HDFS命令,请参考文章:Hadoop编程调用HDFS

2)UserCFHadoop.java 是main入口文件,实现目录配置,步骤运行。

3)UserCF_Step1.java 是步骤1实现文件

4)UserCF_Step2.java 是步骤2实现文件

5)UserCF_Step3.java 是步骤3实现文件

6)UserCF_Step4.java 是步骤4实现文件

7)UserCF_Step5.java 是步骤5实现文件


运行环境:

1)Centos6.5

2)hadoop 2.7.2

3)java sdk 1.7.079


主要代码如下:

1)HdfsDAO.java

package recommend.code1.hdfs;


import java.io.IOException;
import java.net.URI;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.mapred.JobConf;

public class HdfsDAO {

    private static final String HDFS = "hdfs://localhost:9000/";

    public HdfsDAO(Configuration conf) {
        this(HDFS, conf);
    }

    public HdfsDAO(String hdfs, Configuration conf) {
        this.hdfsPath = hdfs;
        this.conf = conf;
    }

    private String hdfsPath;
    private Configuration conf;

    public static void main(String[] args) throws IOException {
        JobConf conf = config();
        HdfsDAO hdfs = new HdfsDAO(conf);
        hdfs.mkdirs("/tmp/new");
        hdfs.copyFile("/home/yj/HadoopFile/userFile/small.csv", "/tmp/new");
        hdfs.ls("/tmp/new");
    }        
    
    public static JobConf config(){
        JobConf conf = new JobConf(HdfsDAO.class);
        conf.setJobName("HdfsDAO");
        conf.addResource("classpath:/hadoop/core-site.xml");
        conf.addResource("classpath:/hadoop/hdfs-site.xml");
        conf.addResource("classpath:/hadoop/mapred-site.xml");
        return conf;
    }
    
    public void mkdirs(String folder) throws IOException {
        Path path = new Path(folder);
        FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
        if (!fs.exists(path)) {
            fs.mkdirs(path);
            System.out.println("Create: " + folder);
        }
        fs.close();
    }

    public void rmr(String folder) throws IOException {
        Path path = new Path(folder);
        FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
        fs.deleteOnExit(path);
        System.out.println("Delete: " + folder);
        fs.close();
    }

    public void ls(String folder) throws IOException {
        Path path = new Path(folder);
        FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
        FileStatus[] list = fs.listStatus(path);
        System.out.println("ls: " + folder);
        System.out.println("==========================================================");
        for (FileStatus f : list) {
            System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());
        }
        System.out.println("==========================================================");
        fs.close();
    }

    public void createFile(String file, String content) throws IOException {
        FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
        byte[] buff = content.getBytes();
        FSDataOutputStream os = null;
        try {
            os = fs.create(new Path(file));
            os.write(buff, 0, buff.length);
            System.out.println("Create: " + file);
        } finally {
            if (os != null)
                os.close();
        }
        fs.close();
    }

    public void copyFile(String local, String remote) throws IOException {
        FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
        fs.copyFromLocalFile(new Path(local), new Path(remote));
        System.out.println("copy from: " + local + " to " + remote);
        fs.close();
    }

    public void download(String remote, String local) throws IOException {
        Path path = new Path(remote);
        FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
        fs.copyToLocalFile(path, new Path(local));
        System.out.println("download: from" + remote + " to " + local);
        fs.close();
    }
    
    public void cat(String remoteFile) throws IOException {
        Path path = new Path(remoteFile);
        FileSystem fs = FileSystem.get(URI.create(hdfsPath), conf);
        FSDataInputStream fsdis = null;
        System.out.println("cat: " + remoteFile);
        try {  
            fsdis =fs.open(path);
            IOUtils.copyBytes(fsdis, System.out, 4096, false);  
          } finally {  
            IOUtils.closeStream(fsdis);
            fs.close();
          }
    }

    public void location() throws IOException {
        // String folder = hdfsPath + "create/";
        // String file = "t2.txt";
        // FileSystem fs = FileSystem.get(URI.create(hdfsPath), new
        // Configuration());
        // FileStatus f = fs.getFileStatus(new Path(folder + file));
        // BlockLocation[] list = fs.getFileBlockLocations(f, 0, f.getLen());
        //
        // System.out.println("File Location: " + folder + file);
        // for (BlockLocation bl : list) {
        // String[] hosts = bl.getHosts();
        // for (String host : hosts) {
        // System.out.println("host:" + host);
        // }
        // }
        // fs.close();
    }

}

2)UserCFHadoop.java

package recommend.code1.recommend;

import java.util.HashMap;
import java.util.Map;
import java.util.regex.Pattern;

import org.apache.hadoop.conf.Configuration;

public class UserCFHadoop {

    public static final String HDFS = "hdfs://localhost:9000";
    public static final Pattern DELIMITER = Pattern.compile("[\t,]");
    
	/**
	 * @param args
	 */
	public static void main(String[] args) {
		// TODO Auto-generated method stub
        Map<String, String> path = new HashMap<String, String>();
        path.put("data", "/home/yj/HadoopFile/userFile/item.csv");// 本地的数据文件
        
        path.put("input_file", HDFS + "/user/yj/input/userCF/");// HDFS的目录
        path.put("input_step1",  path.get("input_file") + "/data");
        path.put("output_step1", path.get("input_file") + "/step1");
        path.put("input_step2",  path.get("output_step1"));
        path.put("output_step2", path.get("input_file") + "/step2");
        path.put("input_step3",  path.get("output_step2"));
        path.put("output_step3", path.get("input_file") + "/step3");
        path.put("input1_step4", path.get("output_step3"));
        path.put("input2_step4", path.get("input_step1"));
        path.put("output_step4", path.get("input_file") + "/step4");
        path.put("input_step5",  path.get("output_step4"));
        path.put("output_step5", path.get("input_file") + "/step5");
        
        try 
        {
        	UserCF_Step1.run(path);
        	UserCF_Step2.run(path);
        	UserCF_Step3.run(path);
        	UserCF_Step4.run(path);
        	UserCF_Step5.run(path);
        } 
        catch (Exception e) 
        {
          e.printStackTrace();
        }
        
        System.exit(0);
	}

    public static Configuration config() {// Hadoop集群的远程配置信息
    	Configuration conf = new Configuration();
        return conf;
    }
}


3)UserCF_Step1.java

package recommend.code1.recommend;

//import hadoop.myMapreduce.martrix.MainRun;

import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import recommend.code1.hdfs.HdfsDAO;


public class UserCF_Step1 {
    public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {

        @Override
        public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
            String[] tokens = UserCFHadoop.DELIMITER.split(values.toString());
            
            if (tokens.length >= 3)
            {
	            Text k = new Text(tokens[1]);//itemid
	            Text v = new Text(tokens[0] + "," + tokens[2]);//userid + score
	            context.write(k, v);            	
            }
        }
    }

    public static class MyReducer extends Reducer<Text, Text, Text, Text> {

        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        	Map<String, String> map = new HashMap<String, String>();
        	
            for (Text line : values) {
                String val = line.toString();
                String[] vlist = UserCFHadoop.DELIMITER.split(val);
                
                if (vlist.length >= 2)
                {
                	map.put(vlist[0], vlist[1]);
                }
            }

            Iterator<String> iterA = map.keySet().iterator();
            while (iterA.hasNext())
            {
            	String k1 = iterA.next();
            	String v1 = map.get(k1);
            	Iterator<String> iterB = map.keySet().iterator();
            	while (iterB.hasNext())
            	{
            		String k2 = iterB.next();
            		String v2 = map.get(k2);
            		context.write(new Text(k1 + "," + k2), new Text(v1 + "," + v2));
            	}
            }
        }
    }

    public static void run(Map<String, String> path) throws IOException, InterruptedException, ClassNotFoundException {
    	Configuration conf = UserCFHadoop.config();

        String input  = path.get("input_step1");
        String output = path.get("output_step1");

        HdfsDAO hdfs = new HdfsDAO(UserCFHadoop.HDFS, conf);
        hdfs.rmr(path.get("input_file"));
        hdfs.rmr(input);
        hdfs.mkdirs(input);
        hdfs.copyFile(path.get("data"), input);

        Job job = Job.getInstance(conf, "UserCF_Step1 job");
        job.setJarByClass(UserCF_Step1.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setMapperClass(MyMapper.class);
        job.setReducerClass(MyReducer.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(input));// 加载2个输入数据集
        FileOutputFormat.setOutputPath(job, new Path(output));

        System.out.println("input : " + input);
        System.out.println("output: " + output);
        
        if (!job.waitForCompletion(true))
        {
        	System.out.println("main run stop!");
			return;	
        }
        
        System.out.println("main run successfully!");
    }
}


4)UserCF_Step2.java

package recommend.code1.recommend;

//import hadoop.myMapreduce.martrix.MainRun;

import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import java.lang.Math;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import recommend.code1.hdfs.HdfsDAO;


public class UserCF_Step2 {
    public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {

        @Override
        public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
            String[] tokens = UserCFHadoop.DELIMITER.split(values.toString());
            
            if (tokens.length >= 4)
            {
	            Text k = new Text(tokens[0] + "," + tokens[1]);
	            Text v = new Text(tokens[2] + "," + tokens[3]);
	            context.write(k, v);            	
            }
        }
    }

    public static class MyReducer extends Reducer<Text, Text, Text, Text> {

        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        	double sum = 0.0;
        	double similarity = 0.0;
        	int num = 0;
        	
            for (Text line : values) {
                String val = line.toString();
                String[] vlist = UserCFHadoop.DELIMITER.split(val);
                
                if (vlist.length >= 2)
                {
                	sum += Math.pow((Double.parseDouble(vlist[0]) - Double.parseDouble(vlist[1])), 2);
                	num += 1;
                }
            }
            
            if (sum > 0.00000001)
            {
            	similarity = (double)num / (1 + Math.sqrt(sum));
            }
            
//            if (similarity > 1.0)
//            {
//            	similarity = 1.0;
//            }
            
            context.write(key, new Text(String.format("%.7f", similarity)));
        }
    }

    public static void run(Map<String, String> path) throws IOException, InterruptedException, ClassNotFoundException {
    	Configuration conf = UserCFHadoop.config();

        String input  = path.get("input_step2");
        String output = path.get("output_step2");

        Job job = Job.getInstance(conf, "UserCF_Step2 job");
        job.setJarByClass(UserCF_Step2.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setMapperClass(MyMapper.class);
        job.setReducerClass(MyReducer.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(input));// 加载2个输入数据集
        FileOutputFormat.setOutputPath(job, new Path(output));

        System.out.println("input : " + input);
        System.out.println("output: " + output);
        
        if (!job.waitForCompletion(true))
        {
        	System.out.println("main run stop!");
			return;	
        }
        
        System.out.println("main run successfully!");
    }
}


5)UserCF_Step3.java

package recommend.code1.recommend;

//import hadoop.myMapreduce.martrix.MainRun;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.lang.Math;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import recommend.code1.hdfs.HdfsDAO;


public class UserCF_Step3 {
    public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {

        @Override
        public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
            String[] tokens = UserCFHadoop.DELIMITER.split(values.toString());
            
            if (tokens.length >= 3)
            {
	            Text k = new Text(tokens[0]);
	            Text v = new Text(tokens[1] + "," + tokens[2]);
	            context.write(k, v);            	
            }
        }
    }

    public static class MyReducer extends Reducer<Text, Text, Text, Text> {
    	private final int NEIGHBORHOOD_NUM = 2;
    	
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        	Map<Double, String> map = new HashMap<Double, String>();
        	
            for (Text line : values) {
                String val = line.toString();
                String[] vlist = UserCFHadoop.DELIMITER.split(val);
                
                if (vlist.length >= 2)
                {
                	map.put(Double.parseDouble(vlist[1]), vlist[0]);
                }
            }
            
            List<Double> list = new ArrayList<Double>();
            Iterator<Double> iter = map.keySet().iterator();
            while (iter.hasNext()) {
            	Double similarity = iter.next();
                list.add(similarity);
            }
            
            //然后通过比较器来实现排序
            Collections.sort(list,new Comparator<Double>() {
                //降序排序
                public int compare(Double o1, Double o2) {
                    return o2.compareTo(o1);
                }
            });
            
//            for (int i = 0; i < NEIGHBORHOOD_NUM && i < list.size(); i++)
//            {
//        		context.write(key, new Text(map.get(list.get(i)) + "," + String.format("%.7f", list.get(i))));
//            }
            
            String v = "";
            for (int i = 0; i < NEIGHBORHOOD_NUM && i < list.size(); i++)
            {
            	v += "," + map.get(list.get(i)) + "," + String.format("%.7f", list.get(i));
            }
            context.write(key, new Text(v.substring(1)));
        }
    }

    public static void run(Map<String, String> path) throws IOException, InterruptedException, ClassNotFoundException {
    	Configuration conf = UserCFHadoop.config();

        String input  = path.get("input_step3");
        String output = path.get("output_step3");

        Job job = Job.getInstance(conf, "UserCF_Step3 job");
        job.setJarByClass(UserCF_Step3.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setMapperClass(MyMapper.class);
        job.setReducerClass(MyReducer.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(input));// 加载2个输入数据集
        FileOutputFormat.setOutputPath(job, new Path(output));

        System.out.println("input : " + input);
        System.out.println("output: " + output);
        
        if (!job.waitForCompletion(true))
        {
        	System.out.println("main run stop!");
			return;	
        }
        
        System.out.println("main run successfully!");
    }
}


6)UserCF_Step4.java

package recommend.code1.recommend;

//import hadoop.myMapreduce.martrix.MainRun;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.lang.Math;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import recommend.code1.hdfs.HdfsDAO;


public class UserCF_Step4 {
    public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {
    	private String flag;// A:step3 or B:data
    	private int itemNum = 7;
    	
        @Override
        protected void setup(Context context) throws IOException, InterruptedException {
            FileSplit split = (FileSplit) context.getInputSplit();
            flag = split.getPath().getParent().getName();// 判断读的数据集
            
            System.out.println(flag);
        }
        
        @Override
        public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
            String[] tokens = UserCFHadoop.DELIMITER.split(values.toString());
            int itemIndex = 100;
            
            if (flag.equals("step3")) {
            	for (int i = 1; i <= itemNum; i++)
            	{
            		Text k = new Text(Integer.toString(itemIndex + i));//itemid
            		Text v = new Text("A:" + tokens[0] + "," + tokens[1] + "," + tokens[3]);
            		context.write(k, v);
//            		System.out.println(k.toString() + "  " + v.toString());
            	}
            } else if (flag.equals("data")) {
                Text k = new Text(tokens[1]);//itemid
	            Text v = new Text("B:" + tokens[0] + "," + tokens[2]);//userid + score
	            context.write(k, v);  
//	            System.out.println(k.toString() + "  " + v.toString());
            }
        }
    }

    public static class MyReducer extends Reducer<Text, Text, Text, Text> {
    	
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        	Map<String, String> mapA = new HashMap<String, String>();
            Map<String, String> mapB = new HashMap<String, String>();

            for (Text line : values) {
                String val = line.toString();

                if (val.startsWith("A:")) {
                    String[] kv = UserCFHadoop.DELIMITER.split(val.substring(2));
                    mapA.put(kv[0], kv[1] + "," + kv[2]);
                } else if (val.startsWith("B:")) {
                    String[] kv = UserCFHadoop.DELIMITER.split(val.substring(2));
                    mapB.put(kv[0], kv[1]);
                }
            }
            
            Iterator<String> iterA = mapA.keySet().iterator();
            while (iterA.hasNext())
            {
            	String userId = iterA.next();
            	if (!mapB.containsKey(userId))//不存在可以推荐 有买过这个物品的不推荐
            	{
            		String simiStr = mapA.get(userId);
            		String[] simi = UserCFHadoop.DELIMITER.split(simiStr);
            		if (simi.length >= 2)
            		{
            			double simiVal1 = mapB.containsKey(simi[0]) ? Double.parseDouble(mapB.get(simi[0])) : 0;
            			double simiVal2 = mapB.containsKey(simi[1]) ? Double.parseDouble(mapB.get(simi[1])) : 0;
            			double score = (simiVal1 + simiVal2) / 2;
            			
            			context.write(new Text(userId), new Text(key.toString() + "," + String.format("%.2f", score)));
            		}
            	}
            }
        }
    }

    public static void run(Map<String, String> path) throws IOException, InterruptedException, ClassNotFoundException {
    	Configuration conf = UserCFHadoop.config();

        String input1 = path.get("input1_step4");
        String input2 = path.get("input2_step4");
        String output = path.get("output_step4");

        Job job = Job.getInstance(conf, "UserCF_Step4 job");
        job.setJarByClass(UserCF_Step4.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setMapperClass(MyMapper.class);
        job.setReducerClass(MyReducer.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(input1), new Path(input2));// 加载2个输入数据集
        FileOutputFormat.setOutputPath(job, new Path(output));

        System.out.println("input1: " + input1);
        System.out.println("input2: " + input2);
        System.out.println("output: " + output);
        
        if (!job.waitForCompletion(true))
        {
        	System.out.println("main run stop!");
			return;	
        }
        
        System.out.println("main run successfully!");
    }
}


7)UserCF_Step5.java

package recommend.code1.recommend;

//import hadoop.myMapreduce.martrix.MainRun;

import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.lang.Math;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.Reducer.Context;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

import recommend.code1.hdfs.HdfsDAO;


public class UserCF_Step5 {
    public static class MyMapper extends Mapper<LongWritable, Text, Text, Text> {

        @Override
        public void map(LongWritable key, Text values, Context context) throws IOException, InterruptedException {
            String[] tokens = UserCFHadoop.DELIMITER.split(values.toString());
            
            if (tokens.length >= 3)
            {
	            Text k = new Text(tokens[0]);
	            Text v = new Text(tokens[1] + "," + tokens[2]);
	            context.write(k, v);            	
            }
        }
    }

    public static class MyReducer extends Reducer<Text, Text, Text, Text> {
    	private final int RECOMMENDER_NUM = 3;
    	
        @Override
        public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        	Map<Double, String> map = new HashMap<Double, String>();
        	
            for (Text line : values) {
                String val = line.toString();
                String[] vlist = UserCFHadoop.DELIMITER.split(val);
                
                if (vlist.length >= 2)
                {
                	map.put(Double.parseDouble(vlist[1]), vlist[0]);
                }
            }
            
            List<Double> list = new ArrayList<Double>();
            Iterator<Double> iter = map.keySet().iterator();
            while (iter.hasNext()) {
            	Double similarity = iter.next();
                list.add(similarity);
            }
            
            //然后通过比较器来实现排序
            Collections.sort(list,new Comparator<Double>() {
                //降序排序
                public int compare(Double o1, Double o2) {
                    return o2.compareTo(o1);
                }
            });
            
            String v = "";
            for (int i = 0; i < RECOMMENDER_NUM && i < list.size(); i++)
            {
            	if (list.get(i).compareTo(new Double(0.001)) > 0)
            	{
            		v += "," + map.get(list.get(i)) + "[" + String.format("%.2f", list.get(i)) + "]";
            	}
            }
            
            if (!v.isEmpty())
            {
            	context.write(key, new Text(v.substring(1)));
            }
            else
            {
            	context.write(key, new Text("none"));
            }
        }
    }

    public static void run(Map<String, String> path) throws IOException, InterruptedException, ClassNotFoundException {
    	Configuration conf = UserCFHadoop.config();

        String input  = path.get("input_step5");
        String output = path.get("output_step5");

        Job job = Job.getInstance(conf, "UserCF_Step5 job");
        job.setJarByClass(UserCF_Step5.class);

        job.setMapOutputKeyClass(Text.class);
        job.setMapOutputValueClass(Text.class);
        
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        job.setMapperClass(MyMapper.class);
        job.setReducerClass(MyReducer.class);

        job.setInputFormatClass(TextInputFormat.class);
        job.setOutputFormatClass(TextOutputFormat.class);

        FileInputFormat.setInputPaths(job, new Path(input));// 加载2个输入数据集
        FileOutputFormat.setOutputPath(job, new Path(output));

        System.out.println("input : " + input);
        System.out.println("output: " + output);
        
        if (!job.waitForCompletion(true))
        {
        	System.out.println("main run stop!");
			return;	
        }
        
        System.out.println("main run successfully!");
    }
}

8.结果运行:

1)原数据item.csv

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0


2)step1输出

3,3	2.5,2.5
3,2	2.5,2.0
3,1	2.5,5.0
3,5	2.5,4.0
3,4	2.5,5.0
2,3	2.0,2.5
2,2	2.0,2.0
2,1	2.0,5.0
2,5	2.0,4.0
2,4	2.0,5.0
1,3	5.0,2.5
1,2	5.0,2.0
1,1	5.0,5.0
1,5	5.0,4.0
1,4	5.0,5.0
5,3	4.0,2.5
5,2	4.0,2.0
5,1	4.0,5.0
5,5	4.0,4.0
5,4	4.0,5.0
4,3	5.0,2.5
4,2	5.0,2.0
4,1	5.0,5.0
4,5	5.0,4.0
4,4	5.0,5.0
2,2	2.5,2.5
2,1	2.5,3.0
2,5	2.5,3.0
1,2	3.0,2.5
1,1	3.0,3.0
1,5	3.0,3.0
5,2	3.0,2.5
5,1	3.0,3.0
5,5	3.0,3.0
2,2	5.0,5.0
2,1	5.0,2.5
2,5	5.0,2.0
2,4	5.0,3.0
1,2	2.5,5.0
1,1	2.5,2.5
1,5	2.5,2.0
1,4	2.5,3.0
5,2	2.0,5.0
5,1	2.0,2.5
5,5	2.0,2.0
5,4	2.0,3.0
4,2	3.0,5.0
4,1	3.0,2.5
4,5	3.0,2.0
4,4	3.0,3.0
3,3	4.0,4.0
3,2	4.0,2.0
3,5	4.0,4.0
3,4	4.0,4.5
2,3	2.0,4.0
2,2	2.0,2.0
2,5	2.0,4.0
2,4	2.0,4.5
5,3	4.0,4.0
5,2	4.0,2.0
5,5	4.0,4.0
5,4	4.0,4.5
4,3	4.5,4.0
4,2	4.5,2.0
4,5	4.5,4.0
4,4	4.5,4.5
3,3	4.5,4.5
3,5	4.5,3.5
5,3	3.5,4.5
5,5	3.5,3.5
5,5	4.0,4.0
5,4	4.0,4.0
4,5	4.0,4.0
4,4	4.0,4.0
3,3	5.0,5.0


3)step2输出

1,1	0.0000000
1,2	0.6076560
1,3	0.2857143
1,4	1.3333333
1,5	1.4164079
2,1	0.6076560
2,2	0.0000000
2,3	0.6532633
2,4	0.5568464
2,5	0.7761999
3,1	0.2857143
3,2	0.6532633
3,3	0.0000000
3,4	0.5634581
3,5	1.0703675
4,1	1.3333333
4,2	0.5568464
4,3	0.5634581
4,4	0.0000000
4,5	1.6000000
5,1	1.4164079
5,2	0.7761999
5,3	1.0703675
5,4	1.6000000
5,5	0.0000000


4)step3输出

1	5,1.4164079,4,1.3333333
2	5,0.7761999,3,0.6532633
3	5,1.0703675,2,0.6532633
4	5,1.6000000,1,1.3333333
5	4,1.6000000,1,1.4164079


5)step4输出

3	102,2.75
4	102,3.00
3	103,3.50
1	104,4.25
2	105,4.00
1	105,1.75
4	105,1.75
3	106,2.00
2	106,2.00
1	106,4.00
2	107,2.50
1	107,0.00
5	107,0.00
4	107,0.00


6)step5输出

1	104[4.25],106[4.00],105[1.75]
2	105[4.00],107[2.50],106[2.00]
3	103[3.50],102[2.75],106[2.00]
4	102[3.00],105[1.75]
5	none






  • 11
    点赞
  • 33
    收藏
    觉得还不错? 一键收藏
  • 4
    评论
### 回答1: 基于MapReduce的电影推荐系统是一种通过分析和处理大规模电影数据集,帮助用户找到个性推荐电影的系统。其主要基于MapReduce并行计算框架来处理和分析庞大的数据。 该系统的核心思想是将用户行为数据和电影元数据进行结合,通过MapReduce并行计算能力,对大规模的数据进行处理和分析。 首先,MapReduce用户对电影的评分、观看历史等行为数据进行处理,将其分解为<用户ID,电影ID,评分>的形式。通过对这些数据进行分析,可以将用户的兴趣偏好进行建模,比如根据评分的高低判断用户喜欢的电影类型、导演等。 其次,MapReduce还将电影的元数据进行处理,比如电影的类型、导演、演员等信息。通过将这些元数据与用户行为数据进行关联,系统可以建立用户与电影之间的关联推荐模型。比如,如果用户喜欢某一类型的电影,系统可以根据用户对该类型电影的评分情况,向用户推荐其他同样类型的电影。 最后,通过将MapReduce的结果进行合并和整理,系统能够生成个性的电影推荐列表。这些推荐结果可根据用户的个人偏好和行为历史进行排序和过滤,提供给用户更加精准和相关的推荐。 基于MapReduce的电影推荐系统具有良好的扩展性和效率,可以处理大规模的电影数据集。同时,通过结合用户行为数据和电影元数据的分析,该系统能更加精准地进行推荐,提高用户的满意度和使用体验。 ### 回答2: 基于MapReduce的电影推荐系统是一种通过使用MapReduce进行电影推荐的方案。 传统的基于协同过滤的电影推荐系统需要计算用户间的相似度矩阵或者计算用户和物品间的相似度矩阵,这需要对大规模的用户和物品进行复杂的计算,计算复杂度较高。 而基于MapReduce的电影推荐系统将计算任务拆分成多个子任务,然后由多个Map和Reduce任务并行执行,提高了计算效率。 具体实现过程为: 1. 首先,将电影数据集划分成多个分片,每个分片包含若干电影数据。 2. 使用Map任务将每个用户对电影的评分数据转换成键值对(key-value pairs),其中键是用户ID,值是评分数据。 3. 使用Reduce任务对每个用户的评分数据进行合并,并计算与其他用户的相似度,得到用户间的相似度矩阵。 4. 使用Map任务将用户与电影的评分数据转换成键值对,其中键是电影ID,值是评分数据。 5. 使用Reduce任务对每个电影的评分数据进行合并,并根据用户的喜好计算出推荐指数。 6. 根据推荐指数对电影进行排序,并将推荐结果返回给用户。 基于MapReduce的电影推荐系统利用了MapReduce的分布式计算能力,能够有效地处理大规模的电影数据,提高了计算效率和推荐的准确性。 当然,除了MapReduce,还有其他算法和技术可以用于电影推荐系统,如基于内容的推荐、深度学习等。具体选择哪种方案,取决于实际场景和需求。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值