Week 6主要讲解了机器学习的系统设计与评估。
评估一个学习算法的好坏,我们需要评估假设函数。评估假设函数是先在training set中学习θ并最小化 J t r a i n ( θ ) J_{train}(θ) Jtrain(θ),然后计算test set的error J t e s t ( θ ) J_{test}(θ) Jtest(θ)。为了使test set error成为generalization error,我们增加一个validation (cross validation) set,在validation set中进行评估函数的测试。没有validation set时,一般training set 70%,test set 30%。有validation set时,一般training set 60%,cross validation set 20%,test set 20%。注意数据在划分前要是乱序的,不然要先shuffle。
有了测试结果后,我们需要进行诊断(Diagnostic)看这个学习算法是否有用,并以此指导提升性能的方向。Diagnostics可能会花费很多时间,但这样做是充分地利用时间。一般我们需要诊断算法的偏差(Bias)和方差(Variance)。High Bias意味着Underfitting,High Variance意味着Overfitting。High Bias和High Variance都意味着学习算法表现没有预期的好, J c v ( θ ) J_{cv}(θ) Jcv(θ)或 J t e s t ( θ ) J_{test}(θ) Jtest(θ)会很大,但是High Bias中 J t r a i n ( θ ) J_{train}(θ) Jtrain(θ)会很接近 J c v ( θ ) J_{cv}(θ) Jcv(θ)或 J t e s t ( θ ) J_{test}(θ) Jtest(θ),且都高于预期的误差,然而High Variance中 J t r a i n ( θ ) J_{train}(θ) Jtrain(θ)会很大幅度低于 J c v ( θ ) J_{cv}(θ) Jcv(θ)或 J t e s t ( θ ) J_{test}(θ) Jtest(θ),预期的误差位于两者之间。这时候学习曲线(Learning Curves)可以帮忙理解Bias和Variance的情形,并且告诉我们增加训练集的数量能帮忙解决High Variance问题,但是无法帮忙解决High Bias问题。
解决High Variance问题,一般可以尝试:1)获取更多训练集;2)减小特征集;3)增大λ。解决High Bias问题,一般可以尝试:1)获取更多的特征;2)增加多项式特征;3)减小λ。
设计一个机器学习系统,建议:1)先完成一个可以快速实现的简单算法,在validation set中进行测试;2)画learning curves,看更多的数据,或更多的特征等条件能否改善模型;3)Error analysis,手动检测validation set中判断失误的例子,看能否从中判断出失误的类型。
Error analysis需要numerical value。在处理skewed data时,precision不足以评估模型是否准确,我们是否有在改进这个模型,为此引进recall。F score(F 1 _1 1 score)是precision和recall的综合体现, 2 P R P + R 2\frac{PR}{P+R} 2P+RPR。
High performance learning algorithm would need both enough parameters to predict y accurately, and a very large training set.
1. Evaluating a Learning Algorithm
1.1 Evaluating a Hypothesis
1.2 Model Selection and Train/Validation/Test Sets
Given many models with different polynomial degrees, we can use a systematic approach to identify the ‘best’ function. In order to choose the model of your hypothesis, you can test each degree of polynomial and look at the error result.
2. Bias vs. Variance
2.1 Diagnosing Bias vs. Variance
2.2 Regularization and Bias/Variance
2.3 Learning Curves
2.4 Deciding What to Do Next
3. Building a Spam Classifier
4. Handling Skewed Data
5. Using Large Data Sets
6. Exercise 5: 探索正则化线性回归中的Bias和Variance - Matlab
6.1 Regularized Linear Regression
对于正则化线性回归,根据损失函数和梯度下降的公式,我们在linearRegCostFunction.m文件中,补充对J和gradient的计算。X在传进函数的时候,已经扩充了
x
0
x_0
x0的那一列。
θ
0
θ_0
θ0不参与正则化项。
function [J, grad] = linearRegCostFunction(X, y, theta, lambda)
%LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear
%regression with multiple variables
% [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the
% cost of using theta as the parameter for linear regression to fit the
% data points in X and y. Returns the cost in J and the gradient in grad
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost and gradient of regularized linear
% regression for a particular choice of theta.
%
% You should set J to the cost and grad to the gradient.
%
reg = (lambda / 2 / m) * (theta(2 : end))' * theta(2 : end);
J = sum((X * theta - y) .^ 2) / (2 * m) + reg;
grad_temp = X' * (X * theta - y) / m;
grad = [grad_temp(1 : 1) ; grad_temp(2 : end) + (lambda / m) * theta(2 : end)];
% =========================================================================
grad = grad(:);
end
得到了损失函数与下降梯度后,在trainLearReg.m中用fmincg()函数来学习参数。
function [theta] = trainLinearReg(X, y, lambda)
%TRAINLINEARREG Trains linear regression given a dataset (X, y) and a
%regularization parameter lambda
% [theta] = TRAINLINEARREG (X, y, lambda) trains linear regression using
% the dataset (X, y) and regularization parameter lambda. Returns the
% trained parameters theta.
%
% Initialize Theta
initial_theta = zeros(size(X, 2), 1);
% Create "short hand" for the cost function to be minimized
costFunction = @(t) linearRegCostFunction(X, y, t, lambda);
% Now, costFunction is a function that takes in only one argument
options = optimset('MaxIter', 200, 'GradObj', 'on');
% Minimize using fmincg
theta = fmincg(costFunction, initial_theta, options);
end
6.2 Bias-Variance
6.2.1 Learning Curves
通过绘制learning curves来debug学习算法。learning curves的横坐标是training set size,纵坐标是error,所以对于一个i大小的training set,我们取training set中的前i个例子,用trainLinearReg()函数找到使损失函数最小的参数θ,然后用这个θ分别计算training set与validation set的error。注意error是没有正则化项的,因此在调用linearRegCostFunction()时传入的lambda为0。在learningCurve.m文件中,完成对两种error的计算。
function [error_train, error_val] = ...
learningCurve(X, y, Xval, yval, lambda)
%LEARNINGCURVE Generates the train and cross validation set errors needed
%to plot a learning curve
% [error_train, error_val] = ...
% LEARNINGCURVE(X, y, Xval, yval, lambda) returns the train and
% cross validation set errors for a learning curve. In particular,
% it returns two vectors of the same length - error_train and
% error_val. Then, error_train(i) contains the training error for
% i examples (and similarly for error_val(i)).
%
% In this function, you will compute the train and test errors for
% dataset sizes from 1 up to m. In practice, when working with larger
% datasets, you might want to do this in larger intervals.
%
% Number of training examples
m = size(X, 1);
% You need to return these values correctly
error_train = zeros(m, 1);
error_val = zeros(m, 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Fill in this function to return training errors in
% error_train and the cross validation errors in error_val.
% i.e., error_train(i) and
% error_val(i) should give you the errors
% obtained after training on i examples.
%
% Note: You should evaluate the training error on the first i training
% examples (i.e., X(1:i, :) and y(1:i)).
%
% For the cross-validation error, you should instead evaluate on
% the _entire_ cross validation set (Xval and yval).
%
% Note: If you are using your cost function (linearRegCostFunction)
% to compute the training and cross validation error, you should
% call the function with the lambda argument set to 0.
% Do note that you will still need to use lambda when running
% the training to obtain the theta parameters.
%
% Hint: You can loop over the examples with the following:
%
% for i = 1:m
% % Compute train/cross validation errors using training examples
% % X(1:i, :) and y(1:i), storing the result in
% % error_train(i) and error_val(i)
% ....
%
% end
%
% ---------------------- Sample Solution ----------------------
for i = 1 : m
theta = trainLinearReg(X(1 : i, :), y(1 : i), lambda);
error_train(i) = linearRegCostFunction(X(1 : i, :), y(1 : i), theta, 0);
error_val(i) = linearRegCostFunction(Xval, yval, theta, 0);
end
% -------------------------------------------------------------
% =========================================================================
end
绘制出来的learning curves如图。由图可看出,随着训练集大小增加,train error和validation error都很高,这意味着这个模型有high bias问题。
6.3 Polynomial Regression
6.3.1 Learning Polynomial Regression
通过增加多项式来增加特征个数。在polyFeatures.m文件中,完成对多项式特征的扩充。因为多项式的特征取值范围差距会很大,所以这里需要用到特征归一化(Normalization)。
function [X_poly] = polyFeatures(X, p)
%POLYFEATURES Maps X (1D vector) into the p-th power
% [X_poly] = POLYFEATURES(X, p) takes a data matrix X (size m x 1) and
% maps each example into its polynomial features where
% X_poly(i, :) = [X(i) X(i).^2 X(i).^3 ... X(i).^p];
%
% You need to return the following variables correctly.
X_poly = zeros(numel(X), p);
% ====================== YOUR CODE HERE ======================
% Instructions: Given a vector X, return a matrix X_poly where the p-th
% column of X contains the values of X to the p-th power.
%
%
for i = 1 : p
X_poly(:, i) = X .^ i;
end
% =========================================================================
end
绘制出的learning curve如图。由图可看出,training error很小,非常贴近x轴,validation error很小,但和training error之间还是有一段距离,这意味着这个模型有high variance问题。
6.3.2 Selecting Lambda Using a Cross Validation Set
λ的取值会显著影响正则化多项式回归的结果。为了观察不同的λ的影响,我们用一个向量lambda_vec保存了一系列的λ值,针对不同的λ分别训练一个模型,对每个模型检验它的validation error与training error。在validationCurve.m文件中,完成相应代码。
function [lambda_vec, error_train, error_val] = ...
validationCurve(X, y, Xval, yval)
%VALIDATIONCURVE Generate the train and validation errors needed to
%plot a validation curve that we can use to select lambda
% [lambda_vec, error_train, error_val] = ...
% VALIDATIONCURVE(X, y, Xval, yval) returns the train
% and validation errors (in error_train, error_val)
% for different values of lambda. You are given the training set (X,
% y) and validation set (Xval, yval).
%
% Selected values of lambda (you should not change this)
lambda_vec = [0 0.001 0.003 0.01 0.03 0.1 0.3 1 3 10]';
% You need to return these variables correctly.
error_train = zeros(length(lambda_vec), 1);
error_val = zeros(length(lambda_vec), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Fill in this function to return training errors in
% error_train and the validation errors in error_val. The
% vector lambda_vec contains the different lambda parameters
% to use for each calculation of the errors, i.e,
% error_train(i), and error_val(i) should give
% you the errors obtained after training with
% lambda = lambda_vec(i)
%
% Note: You can loop over lambda_vec with the following:
%
% for i = 1:length(lambda_vec)
% lambda = lambda_vec(i);
% % Compute train / val errors when training linear
% % regression with regularization parameter lambda
% % You should store the result in error_train(i)
% % and error_val(i)
% ....
%
% end
%
%
for i = 1 : length(lambda_vec)
theta = trainLinearReg(X, y, lambda_vec(i));
error_train(i) = linearRegCostFunction(X, y, theta, 0);
error_val(i) = linearRegCostFunction(Xval, yval, theta, 0);
end
% =========================================================================
end
绘制出的图像如图。由图可看出,λ最好的取值在3附近。因为数据集是随机排序的,validation error有时候可能会比training error低。
作业代码参考:https://www.cnblogs.com/hapjin/p/6114466.html
Ex4全部代码已上传Github。