本周不复杂,但还是很关键,且要注意一些细节点。
1、如何评估算法好坏:高偏差与高方差的问题;
2、当某个集合中,一个类别远小于另一个类别的时候(如患癌),如何评估在面对偏斜类(Skewed classes)的算法好坏:查准率和召回率;
3、如何选择一个好的阀值:F1 = 2 * (P*R / (P+R));
4、如何选择一个好的算法:头脑风暴,快速试错(好熟悉的赶脚),足够的属性值(消除偏差)加足够的数据量(消除方差)。
有意思吧,上代码了~~
function [J, grad] = linearRegCostFunction(X, y, theta, lambda)
%LINEARREGCOSTFUNCTION Compute cost and gradient for regularized linear
%regression with multiple variables
% [J, grad] = LINEARREGCOSTFUNCTION(X, y, theta, lambda) computes the
% cost of using theta as the parameter for linear regression to fit the
% data points in X and y. Returns the cost in J and the gradient in grad
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
grad = zeros(size(theta));
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost and gradient of regularized linear
% regression for a particular choice of theta.
%
% You should set J to the cost and grad to the gradient.
%
theta1 = [0;theta(2:end)];
J = 1/(2 * m) * sum(( X * theta - y ).^2) + lambda / (2 * m) * sum(theta1.^2);
regularized = lambda / m * theta1 ;
grad = 1 / m * (X * theta - y)' *X ;
grad = grad + regularized';
% =========================================================================
grad = grad(:);
end
function [X_poly] = polyFeatures(X, p)
%POLYFEATURES Maps X (1D vector) into the p-th power
% [X_poly] = POLYFEATURES(X, p) takes a data matrix X (size m x 1) and
% maps each example into its polynomial features where
% X_poly(i, :) = [X(i) X(i).^2 X(i).^3 ... X(i).^p];
%
% You need to return the following variables correctly.
X_poly = zeros(numel(X), p);
% ====================== YOUR CODE HERE ======================
% Instructions: Given a vector X, return a matrix X_poly where the p-th
% column of X contains the values of X to the p-th power.
%
%
m = numel(X);
X1 = X(:);
disp(X1);
for i = 1:p
for j = 1:m
X_poly(j,i) = X1(j)^i;
% =========================================================================
end
end
function [lambda_vec, error_train, error_val] = ...
validationCurve(X, y, Xval, yval)
%VALIDATIONCURVE Generate the train and validation errors needed to
%plot a validation curve that we can use to select lambda
% [lambda_vec, error_train, error_val] = ...
% VALIDATIONCURVE(X, y, Xval, yval) returns the train
% and validation errors (in error_train, error_val)
% for different values of lambda. You are given the training set (X,
% y) and validation set (Xval, yval).
%
% Selected values of lambda (you should not change this)
lambda_vec = [0 0.001 0.003 0.01 0.03 0.1 0.3 1 3 10]';
% You need to return these variables correctly.
error_train = zeros(length(lambda_vec), 1);
error_val = zeros(length(lambda_vec), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Fill in this function to return training errors in
% error_train and the validation errors in error_val. The
% vector lambda_vec contains the different lambda parameters
% to use for each calculation of the errors, i.e,
% error_train(i), and error_val(i) should give
% you the errors obtained after training with
% lambda = lambda_vec(i)
%
% Note: You can loop over lambda_vec with the following:
%
% for i = 1:length(lambda_vec)
% lambda = lambda_vec(i);
% % Compute train / val errors when training linear
% % regression with regularization parameter lambda
% % You should store the result in error_train(i)
% % and error_val(i)
% ....
%
% end
%
%
for i = 1:length(lambda_vec)
lambda = lambda_vec(i);
theta = trainLinearReg(X, y, lambda);
J1 = linearRegCostFunction(X, y, theta, 0); % pay attention , this lambda is zero.%
J2 = linearRegCostFunction(Xval, yval, theta, 0);
error_train(i) = J1;
error_val(i) = J2;
end
% =========================================================================
end