吴恩达机器学习第二周(编程及python实现)

主要内容:

多变量下的线性回归,特征缩放/均值归一化,正规方程,常用的matlab/octave函数语句

多变量的线性回归:

和单变量的其实差不多,在矩阵的运算下是完全一样的。主要是对变量的一些定义,如下标表示第几个特征量,上标表示训练集中的第几个样本。吴恩达机器学习第一周

特征缩放/均值归一化:

由于不同的特征量会有不同的取值范围,如果不加以调整,会导致梯度下降的过程中反复振荡,效率过低。可以采用均值归一化来调整特征量的取值范围。

即有: x_{j} = \frac{x_{j} - \mu }{\sigma },其中分子是特征值与该特征量的均值的差,分母可以是x的取值范围(最大值-最小值),也可以是x的标准差。

非一次项转化为一次项:

如:x_{1}^{ } + x_{1}^{ 2} + x_{1}^{ 3}   可以转化为  x_{1} + x_{2} + x_{3}   ,而这时候,因为幂次的关系,三者的取值范围会有极大差别,特征缩放显得十分重要。

正规方程:

优点:一次性解出最佳theta值,不需要选择学习速率;

缺点:需要注意不可逆的情况:训练集样本数m小于特征量数n , 特征量出现线性相关

具体计算: \Theta = \left ( X^{T}X \right )^{-1}X^{T}y

推导过程也不难,本质是求解方程\frac{\partial }{\partial \Theta _{j}}J\left ( \Theta_{j} \right ) = 0来使得代价函数最小,主要用到矩阵的求导运算。

将代价函数采用矩阵的方式写出来:

J = \frac{1}{2m}\left ( X*\Theta - y \right )^{T}\left ( X*\Theta - y \right ) = \frac{1}{2m}\left ( \Theta ^{T}X^{T}X\Theta - \Theta ^{T}X^{T}y - y^{T}X\Theta + y^{T}y \right )

然后每一项对θ求导,最终可以得到\Theta = \left ( X^{T}X \right )^{-1}X^{T}y

matlab/Octave 的矩阵运算功能很强大,十分适用用来学习机器学习算法,而且python中也有许多相关类似的库。

 

编程作业(matlab 和 python版本):

主要是熟悉怎么用matlab语言来编写这些,巩固下算法,全都采用矩阵实现。

computeCost.m —— 代价函数的计算


function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly 
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
J = (1/(2*m))*sum((X*theta-y).^2);
% =========================================================================
end

compustCostMulti.m —— 其实现与compustCost.m一样。

 

featuresNormalize.m —— 特征缩放


function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X 
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the 
%               standard deviation of each feature and divide
%               each feature by it's standard deviation, storing
%               the standard deviation in sigma. 
%
%               Note that X is a matrix where each column is a 
%               feature and each row is an example. You need 
%               to perform the normalization separately for 
%               each feature. 
%
% Hint: You might find the 'mean' and 'std' functions useful.
%    
mu = mean(X,1)
sigma = std(X,0,1)
for i=1:1:size(X,1)
    X_norm(i,:) = (X(i,:)-mu)./sigma;
end
% ============================================================
end

gradientDescent.m —— 梯度下降


function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %
    theta = theta-(alpha/m)*(X'*(X*theta-y));
    % ============================================================
    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);
end
end

 

gradientDescentMulti.m —— 同gradientDescent.m一样

 

normalEqn.m —— 正规方程


function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression 
%   NORMALEQN(X,y) computes the closed-form solution to linear 
%   regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the code to compute the closed form solution
%               to linear regression and put the result in theta.
%
% ---------------------- Sample Solution ----------------------
theta = pinv(X'*X)*X'*y;
% -------------------------------------------------------------
% ============================================================
end

 

python版本(包括画图):

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D



def loadDataSet():
    dataMat = []
    labelMat = []
    fr = open('C:/Users/apple/Desktop/ex1data1.txt')
    for line in fr.readlines():
        lineArr = line.strip().split()
        #print(lineArr)
        dataArr = lineArr[0].strip().split(',')[0]
        labelArr = lineArr[0].strip().split(',')[1]
        dataMat.append([float(dataArr)])
        labelMat.append([float(labelArr)])
    return dataMat,labelMat

def computeCost(dataMat,theta,labelMat):
    hxMinusY = np.dot(dataMat, theta) - labelMat
    costJ = 1 / (2 * m) * np.dot(hxMinusY.T, hxMinusY)
    return costJ



if __name__ == "__main__":
    (dataMat , labelMat) = loadDataSet()
    dataMat = np.asarray(dataMat)
    labelMat = np.asarray(labelMat)
    #print(dataMat.shape)
    plt.scatter(dataMat , labelMat , marker='x' , color='red' , s=10 , label='myPlot')
    plt.show()

    m = dataMat.size
    addOneArr = np.ones(m)
    dataMat = np.column_stack((addOneArr,dataMat))

    theta = np.zeros((2,1))
    iters = 1500
    alpha = 0.01

    for i in range(iters):
        #print(theta)
        hxMinusY = np.dot(dataMat , theta) - labelMat
        computeCost(dataMat,theta,labelMat)
        #print(costJ)
        theta = theta - alpha * (1/m) * np.dot(dataMat.T , hxMinusY)

    print(theta)
    costJ = computeCost(dataMat,theta,labelMat)
    print(costJ)

    plt.plot(dataMat[:,1],np.dot(dataMat , theta),'-')
  #  plt.show()
    predict1 = np.dot([[1.,3.5]] , theta)
    predict2 = np.dot([[1.,7]],theta)
    print("for population = 35,000  ,  we predict a profit of %f" % (predict1*10000))
    print("for population = 70,000  ,  we predict a profit of %f" % (predict2 * 10000))

    fig = plt.figure()
    ax = Axes3D(fig)

    theta0_vals= np.arange(-10,10,0.2)
    theta0_vals = theta0_vals.reshape((100, 1))
    theta1_vals = np.arange(-1,4,0.05)
    theta1_vals = theta1_vals.reshape((100, 1))

    J_vals = np.zeros((theta0_vals.size,theta1_vals.size))

    for i in range(theta0_vals.size):
        for j in range(theta1_vals.size):
            t = np.row_stack((theta0_vals[i],theta1_vals[j]))
            J_vals[i,j] = computeCost(dataMat,t,labelMat)

    J_vals = J_vals.T
    theta0_vals, theta1_vals = np.meshgrid(theta0_vals, theta1_vals)

    ax.plot_surface(theta0_vals,theta1_vals,J_vals,cmap=plt.cm.inferno)
    ax.set_xlabel('theta0')
    ax.set_ylabel('theta1')
    ax.set_zlabel('costJ')

    plt.show()

    plt.figure()
    plt.contour(theta0_vals, theta1_vals, J_vals, np.logspace(-2, 3, 20))
    plt.scatter(theta[0,0], theta[1,0], marker='x', color='red', s=39, label='myPlot')
    plt.show()





 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值