问题描述:lintcode506
Give a user a list of movies he has seen and recommend other movies he may like for each user. Here we specify a simple recommendation algorithm that you need to implement in Map Reduce.
For a user, join the movies A, B, C that he has seen, then we can see which movie collections these people have seen among the people who have seen the movies A, B or C. For these movies, the first five movies with the highest frequency are recommended to the user. The recommended movies need to be sorted according to the degree of correlation
思路:
Map Reduce思想运用得不够好,解题逻辑是分三步:第一步,分别以每个电影为key,计算用户看此电影的次数(此题中看过记为1次,没看过的不标记,没有多次看同一个电影)。第二步,计算共现矩阵,即用户数为矩阵行数、列数,如果两个用户同事看过相同的电影,则共现,标为1,如果没有同时看过相同的电影,则标为0。例共现矩阵第一行所示用户,标为1的列元素下标,即为与第一行用户共现的用户。第三步,剔除当前用户,所有与当前用户共现用户看过的电影次数叠加,并按照次数排序,根据排序,推荐次数最多的5部电