1、电影评分数据集
load ('ex8_movies.mat');
该数据集包含两个矩阵,分别是矩阵Y和矩阵R,其维度均为1682943(ij)。
Y(i,j)表示为第j个用户对第i个电影的评分,R中数据为0或1,R(i,j) = 1表示为第j个用户对第i个电影有评分,0表示为未评分。
若需要对第1部电影的已有评分计算其平均分,则代码应为:
mean(Y(1, R(1, :)));
数据可视化图形为:
2、协同过滤算法
2.1、协同过滤算法损失函数
得损失函数的计算公式为:
J = sum(sum(((X*Theta' - Y).*R).^2))/2;
运行得:
Cost at loaded parameters: 22.224604
(this value should be about 22.22)
2.2、协同过滤梯度
X_grad = R.*(X*Theta' - Y)*Theta ;
Theta_grad = (R.*(X*Theta' - Y))'*X
运行程序得:
Checking Gradients (without regularization) ...
5.5335 5.5335
3.6186 3.6186
5.4422 5.4422
-1.7312 -1.7312
4.1196 4.1196
-1.4833 -1.4833
-6.0734 -6.0734
2.3490 2.3490
7.6341 7.6341
1.8651 1.8651
4.1192 4.1192
-1.5834 -1.5834
1.2828 1.2828
-6.1573 -6.1573
1.6628 1.6628
1.1686 1.1686
5.5630 5.5630
0.3050 0.3050
4.6442 4.6442
-1.6691 -1.6691
-2.1505 -2.1505
-3.6832 -3.6832
3.4067 3.4067
-4.0743 -4.0743
0.5567 0.5567
-2.1056 -2.1056
0.9168 0.9168
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)
If your cost function implementation is correct, then
the relative difference will be small (less than 1e-9).
Relative Difference: 1.84952e-12
2.3、正则化损失函数
转换成代码为:
J = sum(sum((R.*(X*Theta' - Y)).^2))/2 + sum(sum(Theta.^2))*lambda/2+...
sum(sum(X.^2))*lambda/2;
2.4、正则化梯度
转换成代码为:
J = sum(sum((R.*(X*Theta' - Y)).^2))/2 + sum(sum(Theta.^2))*lambda/2+...
sum(sum(X.^2))*lambda/2;
X_grad = R.*(X*Theta' - Y)*Theta + X*lambda;
Theta_grad = (R.*(X*Theta' - Y))'*X + Theta*lambda;
运行程序有:
Cost at loaded parameters (lambda = 1.5): 31.344056
(this value should be about 31.34)
Checking Gradients (with regularization) ...
2.2223 2.2223
0.7968 0.7968
-3.2924 -3.2924
-0.7029 -0.7029
-4.2016 -4.2016
3.5969 3.5969
0.8859 0.8859
1.0523 1.0523
-7.8499 -7.8499
0.3904 0.3904
-0.1347 -0.1347
-2.3656 -2.3656
2.1066 2.1066
1.6703 1.6703
0.8519 0.8519
-1.0380 -1.0380
2.6537 2.6537
0.8114 0.8114
-0.8604 -0.8604
-0.5884 -0.5884
-0.7108 -0.7108
-4.0652 -4.0652
0.2494 0.2494
-4.3484 -4.3484
-3.6167 -3.6167
-4.1277 -4.1277
-3.2439 -3.2439
The above two columns you get should be very similar.
(Left-Your Numerical Gradient, Right-Analytical Gradient)
If your cost function implementation is correct, then
the relative difference will be small (less than 1e-9).
Relative Difference: 1.78901e-12
3.电影推荐系统
New user ratings:
Rated 4 for Toy Story (1995)
Rated 3 for Twelve Monkeys (1995)
Rated 5 for Usual Suspects, The (1995)
Rated 4 for Outbreak (1995)
Rated 5 for Shawshank Redemption, The (1994)
Rated 3 for While You Were Sleeping (1995)
Rated 5 for Forrest Gump (1994)
Rated 2 for Silence of the Lambs, The (1991)
Rated 4 for Alien (1979)
Rated 5 for Die Hard 2 (1990)
Rated 5 for Sphere (1998)
Program paused. Press enter to continue.
Training collaborative filtering...
Iteration 1 | Cost: 3.108511e+05
Iteration 2 | Cost: 1.475959e+05
Iteration 3 | Cost: 1.000321e+05
Iteration 4 | Cost: 7.707565e+04
Iteration 5 | Cost: 6.153638e+04
Iteration 6 | Cost: 5.719300e+04
Iteration 7 | Cost: 5.239113e+04
Iteration 8 | Cost: 4.771435e+04
Iteration 9 | Cost: 4.559863e+04
Iteration 10 | Cost: 4.385394e+04
Iteration 11 | Cost: 4.263562e+04
Iteration 12 | Cost: 4.184598e+04
Iteration 13 | Cost: 4.116751e+04
Iteration 14 | Cost: 4.073297e+04
Iteration 15 | Cost: 4.032577e+04
Iteration 16 | Cost: 4.009203e+04
Iteration 17 | Cost: 3.986428e+04
Iteration 18 | Cost: 3.971337e+04
Iteration 19 | Cost: 3.958890e+04
Iteration 20 | Cost: 3.949630e+04
Iteration 21 | Cost: 3.940187e+04
Iteration 22 | Cost: 3.934142e+04
Iteration 23 | Cost: 3.930822e+04
Iteration 24 | Cost: 3.926063e+04
Iteration 25 | Cost: 3.922334e+04
Iteration 26 | Cost: 3.920956e+04
Iteration 27 | Cost: 3.917145e+04
Iteration 28 | Cost: 3.914804e+04
Iteration 29 | Cost: 3.913479e+04
Iteration 30 | Cost: 3.910882e+04
Iteration 31 | Cost: 3.908992e+04
Iteration 32 | Cost: 3.908209e+04
Iteration 33 | Cost: 3.907380e+04
Iteration 34 | Cost: 3.906903e+04
Iteration 35 | Cost: 3.906437e+04
Iteration 36 | Cost: 3.905754e+04
Iteration 37 | Cost: 3.905112e+04
Iteration 38 | Cost: 3.904531e+04
Iteration 39 | Cost: 3.904023e+04
Iteration 40 | Cost: 3.903390e+04
Iteration 41 | Cost: 3.902800e+04
Iteration 42 | Cost: 3.902367e+04
Iteration 43 | Cost: 3.902195e+04
Iteration 44 | Cost: 3.902007e+04
Iteration 45 | Cost: 3.901780e+04
Iteration 46 | Cost: 3.901699e+04
Iteration 47 | Cost: 3.901489e+04
Iteration 48 | Cost: 3.901190e+04
Iteration 49 | Cost: 3.900929e+04
Iteration 50 | Cost: 3.900742e+04
Iteration 51 | Cost: 3.900630e+04
Iteration 52 | Cost: 3.900485e+04
Iteration 53 | Cost: 3.900348e+04
Iteration 54 | Cost: 3.900283e+04
Iteration 55 | Cost: 3.900208e+04
Iteration 56 | Cost: 3.900118e+04
Iteration 57 | Cost: 3.899982e+04
Iteration 58 | Cost: 3.899860e+04
Iteration 59 | Cost: 3.899710e+04
Iteration 60 | Cost: 3.899381e+04
Iteration 61 | Cost: 3.899242e+04
Iteration 62 | Cost: 3.899094e+04
Iteration 63 | Cost: 3.898986e+04
Iteration 64 | Cost: 3.898908e+04
Iteration 65 | Cost: 3.898811e+04
Iteration 66 | Cost: 3.898754e+04
Iteration 67 | Cost: 3.898736e+04
Iteration 68 | Cost: 3.898712e+04
Iteration 69 | Cost: 3.898687e+04
Iteration 70 | Cost: 3.898673e+04
Iteration 71 | Cost: 3.898634e+04
Iteration 72 | Cost: 3.898524e+04
Iteration 73 | Cost: 3.898369e+04
Iteration 74 | Cost: 3.898322e+04
Iteration 75 | Cost: 3.898257e+04
Iteration 76 | Cost: 3.898194e+04
Iteration 77 | Cost: 3.898141e+04
Iteration 78 | Cost: 3.898077e+04
Iteration 79 | Cost: 3.898025e+04
Iteration 80 | Cost: 3.897962e+04
Iteration 81 | Cost: 3.897909e+04
Iteration 82 | Cost: 3.897861e+04
Iteration 83 | Cost: 3.897735e+04
Iteration 84 | Cost: 3.897609e+04
Iteration 85 | Cost: 3.897534e+04
Iteration 86 | Cost: 3.897488e+04
Iteration 87 | Cost: 3.897468e+04
Iteration 88 | Cost: 3.897414e+04
Iteration 89 | Cost: 3.897389e+04
Iteration 90 | Cost: 3.897371e+04
Iteration 91 | Cost: 3.897355e+04
Iteration 92 | Cost: 3.897320e+04
Iteration 93 | Cost: 3.897304e+04
Iteration 94 | Cost: 3.897290e+04
Iteration 95 | Cost: 3.897276e+04
Iteration 96 | Cost: 3.897254e+04
Iteration 97 | Cost: 3.897240e+04
Iteration 98 | Cost: 3.897232e+04
Iteration 99 | Cost: 3.897222e+04
Iteration 100 | Cost: 3.897217e+04
Recommender system learning completed.
Program paused. Press enter to continue.
Top recommendations for you:
Predicting rating 5.0 for movie Saint of Fort Washington, The (1993)
Predicting rating 5.0 for movie Great Day in Harlem, A (1994)
Predicting rating 5.0 for movie Someone Else's America (1995)
Predicting rating 5.0 for movie Entertaining Angels: The Dorothy Day Story (1996)
Predicting rating 5.0 for movie Santa with Muscles (1996)
Predicting rating 5.0 for movie Aiqing wansui (1994)
Predicting rating 5.0 for movie Prefontaine (1997)
Predicting rating 5.0 for movie They Made Me a Criminal (1939)
Predicting rating 5.0 for movie Marlene Dietrich: Shadow and Light (1996)
Predicting rating 5.0 for movie Star Kid (1997)
Original ratings provided:
Rated 4 for Toy Story (1995)
Rated 3 for Twelve Monkeys (1995)
Rated 5 for Usual Suspects, The (1995)
Rated 4 for Outbreak (1995)
Rated 5 for Shawshank Redemption, The (1994)
Rated 3 for While You Were Sleeping (1995)
Rated 5 for Forrest Gump (1994)
Rated 2 for Silence of the Lambs, The (1991)
Rated 4 for Alien (1979)
Rated 5 for Die Hard 2 (1990)
Rated 5 for Sphere (1998)