Programming Exercise 8:Anomaly Detection and Recommender Systems 第二部分

        大家好,我是Mac Jiang,今天和大家分享Coursera-Stanford University-Machine Learning-Programming Exercise 8:Anomaly Detection and Recommender Systems的第二部分Recommender Systems的实现过程,第一部分Anomaly Detection的实现过程的网址为: http://blog.csdn.net/a1015553840/article/details/50913824。虽然我的代码通过了系统的测试,但这不一定是最好的,如果博友发现错误或者有更好的实现思路请留言联系,谢谢!希望我的博客能给您的学习带来一些帮助!

       这部分,吴恩达老师主要讲了协同过滤算法(collabrative filtering),先介绍一些符号的基本意思:
j:第j个用户,用来用户计数
i:第i部电影,用于电影计数
r(i,j):第j个用户对第i部电影的评价,r(i,j)=1表示第j个用户对第i部电影有评价,为0反之。
y(i,j):第j个用户对第i部电影的评分
n下标m:用户总数
n下标u:电影总数
theta(j):用户j的参数
x(i):电影i的特征
用户j对电影i的评价预测:theta(j)' * x(i)
        协同过滤的算法步骤是利用已有的用户对电影的评价,通过梯度下降算法(B-LFGS/共轭梯度等高级优化算法)求解最优电影特征矩阵X和用户信息矩阵Theta,对r(i,j)=0的对应点进行预测,同时利用这些信息可以得到用户的喜好,为用户推荐电影。注意,在进行学习算法之前需要进行均值归一化!
        这次实验就是基于以上理论,对以上理论的具体实现。

1.实验数据和文件说明
数据集:ex8_movies.mat---用户对电影的评价信息,有两个矩阵,一个为评价矩阵R,另一个为R用来指示哪些位置有评价信息。
              ex8_movieParams.mat---X矩阵信息,Theta矩阵信息,用户人数num_users,电影数num_movies,电影特征数num_features
              movie_ids.txt---用于存储电影名
    文件:ex8_cofi.m---用于控制程序的进行过程
              loadMovieList.m---将movie_ids.txt的电影名导入到电影名数组中
              computeNumericalGradient.m---用求导的方法计算梯度值
              checkCostFuntion.m---确定我们自己用cifiCostFunc.m计算出来的梯度的正确性
              fmincg.m---高级优化函数,通过迭代的方法寻找代价值J的局部最小值或全局最小值,并记录此时的参数Theta和参数X
              normalizeRating.m---在训练之前对举证Y进行均值归一化
              cofiCostFunc.m---计算代价函数J的值和此时对X的梯度值X_grad以及对Theta的梯度值Theta_grad,需要完善代码!

2.ex8_cofi.m的控制过程
%% =============== Part 1: Loading movie ratings dataset ================
%  You will start by loading the movie ratings dataset to understand the
%  structure of the data.
%  
fprintf('Loading movie ratings dataset.\n\n');
%  Load data
load ('ex8_movies.mat');
%  Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies on 
%  943 users
%
%  R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a
%  rating to movie i
%  From the matrix, we can compute statistics like average rating.
fprintf('Average rating for movie 1 (Toy Story): %f / 5\n\n', ...
        mean(Y(1, R(1, :))));

%  We can "visualize" the ratings matrix by plotting it with imagesc
imagesc(Y);
ylabel('Movies');
xlabel('Users');
fprintf('\nProgram paused. Press enter to continue.\n');
pause;
%% ============ Part 2: Collaborative Filtering Cost Function ===========
%  You will now implement the cost function for collaborative filtering.
%  To help you debug your cost function, we have included set of weights
%  that we trained on that. Specifically, you should complete the code in 
%  cofiCostFunc.m to return J.

%  Load pre-trained weights (X, Theta, num_users, num_movies, num_features)
load ('ex8_movieParams.mat');

%  Reduce the data set size so that this runs faster
num_users = 4; num_movies = 5; num_features = 3;
X = X(1:num_movies, 1:num_features);
Theta = Theta(1:num_users, 1:num_features);
Y = Y(1:num_movies, 1:num_users);
R = R(1:num_movies, 1:num_users);
%  Evaluate cost function
J = cofiCostFunc([X(:) ; Theta(:)], Y, R, num_users, num_movies, ...
               num_features, 0);           
fprintf(['Cost at loaded parameters: %f '...
        '\n(this value should be about 22.22)\n'], J);
fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ============== Part 3: Collaborative Filtering Gradient ==============
%  Once your cost function matches up with ours, you should now implement 
%  the collaborative filtering gradient function. Specifically, you should 
%  complete the code in cofiCostFunc.m to return the grad argument.
%  
fprintf('\nChecking Gradients (without regularization) ... \n');
%  Check gradients by running checkNNGradients
checkCostFunction;
fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ========= Part 4: Collaborative Filtering Cost Regularization ========
%  Now, you should implement regularization for the cost function for 
%  collaborative filtering. You can implement it by adding the cost of
%  regularization to the original cost computation.
%  
%  Evaluate cost function
J = cofiCostFunc([X(:) ; Theta(:)], Y, R, num_users, num_movies, ...
               num_features, 1.5);          
fprintf(['Cost at loaded parameters (lambda = 1.5): %f '...
         '\n(this value should be about 31.34)\n'], J);
fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ======= Part 5: Collaborative Filtering Gradient Regularization ======
%  Once your cost matches up with ours, you should proceed to implement 
%  regularization for the gradient. 
%
%  
fprintf('\nChecking Gradients (with regularization) ... \n');
%  Check gradients by running checkNNGradients
checkCostFunction(1.5);
fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ============== Part 6: Entering ratings for a new user ===============
%  Before we will train the collaborative filtering model, we will first
%  add ratings that correspond to a new user that we just observed. This
%  part of the code will also allow you to put in your own ratings for the
%  movies in our dataset!
%
movieList = loadMovieList();
%  Initialize my ratings
my_ratings = zeros(1682, 1);
% Check the file movie_idx.txt for id of each movie in our dataset
% For example, Toy Story (1995) has ID 1, so to rate it "4", you can set
my_ratings(1) = 4;
% Or suppose did not enjoy Silence of the Lambs (1991), you can set
my_ratings(98) = 2;
% We have selected a few movies we liked / did not like and the ratings we
% gave are as follows:
my_ratings(7) = 3;
my_ratings(12)= 5;
my_ratings(54) = 4;
my_ratings(64)= 5;
my_ratings(66)= 3;
my_ratings(69) = 5;
my_ratings(183) = 4;
my_ratings(226) = 5;
my_ratings(355)= 5;
fprintf('\n\nNew user ratings:\n');
for i = 1:length(my_ratings)
    if my_ratings(i) > 0 
        fprintf('Rated %d for %s\n', my_ratings(i), ...
                 movieList{i});
    end
end

fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ================== Part 7: Learning Movie Ratings ====================
%  Now, you will train the collaborative filtering model on a movie rating 
%  dataset of 1682 movies and 943 users
%

fprintf('\nTraining collaborative filtering...\n');
%  Load data
load('ex8_movies.mat');
%  Y is a 1682x943 matrix, containing ratings (1-5) of 1682 movies by 
%  943 users
%
%  R is a 1682x943 matrix, where R(i,j) = 1 if and only if user j gave a
%  rating to movie i

%  Add our own ratings to the data matrix
Y = [my_ratings Y];
R = [(my_ratings ~= 0) R];
%  Normalize Ratings
[Ynorm, Ymean] = normalizeRatings(Y, R);
%  Useful Values
num_users = size(Y, 2);
num_movies = size(Y, 1);
num_features = 10;
% Set Initial Parameters (Theta, X)
X = randn(num_movies, num_features);
Theta = randn(num_users, num_features);
initial_parameters = [X(:); Theta(:)];
% Set options for fmincg
options = optimset('GradObj', 'on', 'MaxIter', 100);
% Set Regularization
lambda = 10;
theta = fmincg (@(t)(cofiCostFunc(t, Y, R, num_users, num_movies, ...
                                num_features, lambda)), ...
                initial_parameters, options);

% Unfold the returned theta back into U and W
X = reshape(theta(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(theta(num_movies*num_features+1:end), ...
               num_users, num_features);
fprintf('Recommender system learning completed.\n');
fprintf('\nProgram paused. Press enter to continue.\n');
pause;

%% ================== Part 8: Recommendation for you ====================
%  After training the model, you can now make recommendations by computing
%  the predictions matrix.
%
p = X * Theta';
my_predictions = p(:,1) + Ymean;

movieList = loadMovieList();

[r, ix] = sort(my_predictions, 'descend');
fprintf('\nTop recommendations for you:\n');
for i=1:10
    j = ix(i);
    fprintf('Predicting rating %.1f for movie %s\n', my_predictions(j), ...
            movieList{j});
end

fprintf('\n\nOriginal ratings provided:\n');
for i = 1:length(my_ratings)
    if my_ratings(i) > 0 
        fprintf('Rated %d for %s\n', my_ratings(i), ...
                 movieList{i});
    end
end
        Part1:Loading movie ratings dataset---导入用户对电影的评价数据并可视化
        Part2:Colleborative Filtering Cost Function---这里取电影和评价的一个子集,用cofiCostFunc计算代价值J,以及梯度值X_grad,Theta_grad
        Part3:Collaboretive Filtering Gradient---对上面子集用[f(x-epsilon)-f(x+epsilon)]/(2*epsilon)的方法计算梯度,确定我们用举证方法求得梯度的正确性
        Part4:Collaboretive Filtering Regularization---对算法进行正则化,改进cofiCostFunc.m的J加上正则化项
        Part5:Collaboretive Flitering Gradient Rgulazation---对cofiCostFunc.m的求梯度过程进行正则化,对X_grad,Theta_grad加上正则化项
        Part6:Entering ratings for a new user---新建一个用户,增加一些他的评价信息
        Part7:Learning Movie Ratings---利用上面完善的协同过滤算法对数据进行学习
        Part8:Recommendation for you---利用上步得到的学习参数,预测这位新用户可能喜欢的电影

2.cofiCostFunc.m的实现过程
未正则化的代价函数值:J = sum(sum(((X * Theta' - Y) .* R).^2)) / 2;
未正则化的X梯度值:X_grad = (X * Theta' - Y) .* R * Theta;
未正则化的Theta梯度值:Theta_grad = (Theta * X' - Y') .* R' * X;
正则化后的代价函数值:J = J + lambda / 2 * (sum(sum(Theta .^2)) + sum(sum(X .^ 2)))
正则化后的X梯度值:X_grad = X_grad + lambda * X;正则化后的Theta梯度值:Theta_grad = Theta_grad + lambda * Theta;
由于推到过程比较复杂,而且推到过程在这里也不能打出来,所以就不说明了。不过大家可以从矩阵维度的角度寻找一些启发!具体实现代码如下:
function [J, grad] = cofiCostFunc(params, Y, R, num_users, num_movies, ...
                                  num_features, lambda)
%COFICOSTFUNC Collaborative filtering cost function
%   [J, grad] = COFICOSTFUNC(params, Y, R, num_users, num_movies, ...
%   num_features, lambda) returns the cost and gradient for the
%   collaborative filtering problem.
%

% Unfold the U and W matrices from params
X = reshape(params(1:num_movies*num_features), num_movies, num_features);
Theta = reshape(params(num_movies*num_features+1:end), ...
                num_users, num_features);            
% You need to return the following values correctly
J = 0;
X_grad = zeros(size(X));
Theta_grad = zeros(size(Theta));

% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost function and gradient for collaborative
%               filtering. Concretely, you should first implement the cost
%               function (without regularization) and make sure it is
%               matches our costs. After that, you should implement the 
%               gradient and use the checkCostFunction routine to check
%               that the gradient is correct. Finally, you should implement
%               regularization.
%
% Notes: X - num_movies  x num_features matrix of movie features
%        Theta - num_users  x num_features matrix of user features
%        Y - num_movies x num_users matrix of user ratings of movies
%        R - num_movies x num_users matrix, where R(i, j) = 1 if the 
%            i-th movie was rated by the j-th user
%
% You should set the following variables correctly:
%
%        X_grad - num_movies x num_features matrix, containing the 
%                 partial derivatives w.r.t. to each element of X
%        Theta_grad - num_users x num_features matrix, containing the 
%                     partial derivatives w.r.t. to each element of Theta
%
J = sum(sum(((X * Theta' - Y) .* R).^2)) / 2;   %因为到多次调用高级优化函数,所以这里最好采用向量的方法,而不是利用for循环
X_grad = (X * Theta' - Y) .* R * Theta;%利用向量的方法计算对X的梯度
%利用向量的方法计算对Theta的梯度
J = J + lambda / 2 * (sum(sum(Theta .^2)) + sum(sum(X .^ 2)));%对代价值进行正则化
X_grad = X_grad + lambda * X;
Theta_grad = Theta_grad + lambda * Theta;
% =============================================================
grad = [X_grad(:); Theta_grad(:)];
end




  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
Programming Exercise 1: Linear Regression Machine Learning Introduction In this exercise, you will implement linear regression and get to see it work on data. Before starting on this programming exercise, we strongly recom- mend watching the video lectures and completing the review questions for the associated topics. To get started with the exercise, you will need to download the starter code and unzip its contents to the directory where you wish to complete the exercise. If needed, use the cd command in Octave/MATLAB to change to this directory before starting this exercise. You can also find instructions for installing Octave/MATLAB in the “En- vironment Setup Instructions” of the course website. Files included in this exercise ex1.m - Octave/MATLAB script that steps you through the exercise ex1 multi.m - Octave/MATLAB script for the later parts of the exercise ex1data1.txt - Dataset for linear regression with one variable ex1data2.txt - Dataset for linear regression with multiple variables submit.m - Submission script that sends your solutions to our servers [?] warmUpExercise.m - Simple example function in Octave/MATLAB [?] plotData.m - Function to display the dataset [?] computeCost.m - Function to compute the cost of linear regression [?] gradientDescent.m - Function to run gradient descent [†] computeCostMulti.m - Cost function for multiple variables [†] gradientDescentMulti.m - Gradient descent for multiple variables [†] featureNormalize.m - Function to normalize features [†] normalEqn.m - Function to compute the normal equations ? indicates files you will need to complete † indicates optional exercises

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值