主要内容:
多变量下的线性回归,特征缩放/均值归一化,正规方程,常用的matlab/octave函数语句
多变量的线性回归:
和单变量的其实差不多,在矩阵的运算下是完全一样的。主要是对变量的一些定义,如下标表示第几个特征量,上标表示训练集中的第几个样本。吴恩达机器学习第一周
特征缩放/均值归一化:
由于不同的特征量会有不同的取值范围,如果不加以调整,会导致梯度下降的过程中反复振荡,效率过低。可以采用均值归一化来调整特征量的取值范围。
即有: ,其中分子是特征值与该特征量的均值的差,分母可以是x的取值范围(最大值-最小值),也可以是x的标准差。
非一次项转化为一次项:
如: 可以转化为
,而这时候,因为幂次的关系,三者的取值范围会有极大差别,特征缩放显得十分重要。
正规方程:
优点:一次性解出最佳theta值,不需要选择学习速率;
缺点:需要注意不可逆的情况:训练集样本数m小于特征量数n , 特征量出现线性相关
具体计算:
推导过程也不难,本质是求解方程来使得代价函数最小,主要用到矩阵的求导运算。
将代价函数采用矩阵的方式写出来:
然后每一项对θ求导,最终可以得到
matlab/Octave 的矩阵运算功能很强大,十分适用用来学习机器学习算法,而且python中也有许多相关类似的库。
编程作业(matlab 和 python版本):
主要是熟悉怎么用matlab语言来编写这些,巩固下算法,全都采用矩阵实现。
computeCost.m —— 代价函数的计算
function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
% J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
% parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
% You should set J to the cost.
J = (1/(2*m))*sum((X*theta-y).^2);
% =========================================================================
end
compustCostMulti.m —— 其实现与compustCost.m一样。
featuresNormalize.m —— 特征缩放
function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X
% FEATURENORMALIZE(X) returns a normalized version of X where
% the mean value of each feature is 0 and the standard deviation
% is 1. This is often a good preprocessing step to do when
% working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
% of the feature and subtract it from the dataset,
% storing the mean value in mu. Next, compute the
% standard deviation of each feature and divide
% each feature by it's standard deviation, storing
% the standard deviation in sigma.
%
% Note that X is a matrix where each column is a
% feature and each row is an example. You need
% to perform the normalization separately for
% each feature.
%
% Hint: You might find the 'mean' and 'std' functions useful.
%
mu = mean(X,1)
sigma = std(X,0,1)
for i=1:1:size(X,1)
X_norm(i,:) = (X(i,:)-mu)./sigma;
end
% ============================================================
end
gradientDescent.m —— 梯度下降
function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
% theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by
% taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
% ====================== YOUR CODE HERE ======================
% Instructions: Perform a single gradient step on the parameter vector
% theta.
%
% Hint: While debugging, it can be useful to print out the values
% of the cost function (computeCost) and gradient here.
%
theta = theta-(alpha/m)*(X'*(X*theta-y));
% ============================================================
% Save the cost J in every iteration
J_history(iter) = computeCost(X, y, theta);
end
end
gradientDescentMulti.m —— 同gradientDescent.m一样
normalEqn.m —— 正规方程
function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression
% NORMALEQN(X,y) computes the closed-form solution to linear
% regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the code to compute the closed form solution
% to linear regression and put the result in theta.
%
% ---------------------- Sample Solution ----------------------
theta = pinv(X'*X)*X'*y;
% -------------------------------------------------------------
% ============================================================
end
python版本(包括画图):
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def loadDataSet():
dataMat = []
labelMat = []
fr = open('C:/Users/apple/Desktop/ex1data1.txt')
for line in fr.readlines():
lineArr = line.strip().split()
#print(lineArr)
dataArr = lineArr[0].strip().split(',')[0]
labelArr = lineArr[0].strip().split(',')[1]
dataMat.append([float(dataArr)])
labelMat.append([float(labelArr)])
return dataMat,labelMat
def computeCost(dataMat,theta,labelMat):
hxMinusY = np.dot(dataMat, theta) - labelMat
costJ = 1 / (2 * m) * np.dot(hxMinusY.T, hxMinusY)
return costJ
if __name__ == "__main__":
(dataMat , labelMat) = loadDataSet()
dataMat = np.asarray(dataMat)
labelMat = np.asarray(labelMat)
#print(dataMat.shape)
plt.scatter(dataMat , labelMat , marker='x' , color='red' , s=10 , label='myPlot')
plt.show()
m = dataMat.size
addOneArr = np.ones(m)
dataMat = np.column_stack((addOneArr,dataMat))
theta = np.zeros((2,1))
iters = 1500
alpha = 0.01
for i in range(iters):
#print(theta)
hxMinusY = np.dot(dataMat , theta) - labelMat
computeCost(dataMat,theta,labelMat)
#print(costJ)
theta = theta - alpha * (1/m) * np.dot(dataMat.T , hxMinusY)
print(theta)
costJ = computeCost(dataMat,theta,labelMat)
print(costJ)
plt.plot(dataMat[:,1],np.dot(dataMat , theta),'-')
# plt.show()
predict1 = np.dot([[1.,3.5]] , theta)
predict2 = np.dot([[1.,7]],theta)
print("for population = 35,000 , we predict a profit of %f" % (predict1*10000))
print("for population = 70,000 , we predict a profit of %f" % (predict2 * 10000))
fig = plt.figure()
ax = Axes3D(fig)
theta0_vals= np.arange(-10,10,0.2)
theta0_vals = theta0_vals.reshape((100, 1))
theta1_vals = np.arange(-1,4,0.05)
theta1_vals = theta1_vals.reshape((100, 1))
J_vals = np.zeros((theta0_vals.size,theta1_vals.size))
for i in range(theta0_vals.size):
for j in range(theta1_vals.size):
t = np.row_stack((theta0_vals[i],theta1_vals[j]))
J_vals[i,j] = computeCost(dataMat,t,labelMat)
J_vals = J_vals.T
theta0_vals, theta1_vals = np.meshgrid(theta0_vals, theta1_vals)
ax.plot_surface(theta0_vals,theta1_vals,J_vals,cmap=plt.cm.inferno)
ax.set_xlabel('theta0')
ax.set_ylabel('theta1')
ax.set_zlabel('costJ')
plt.show()
plt.figure()
plt.contour(theta0_vals, theta1_vals, J_vals, np.logspace(-2, 3, 20))
plt.scatter(theta[0,0], theta[1,0], marker='x', color='red', s=39, label='myPlot')
plt.show()