吴恩达机器学习第二周（编程及python实现）

最新推荐文章于 2024-07-30 11:28:17 发布

booooooty

最新推荐文章于 2024-07-30 11:28:17 发布

阅读量418

点赞数

分类专栏：机器学习文章标签：机器学习吴恩达第二周 matlab python

本文链接：https://blog.csdn.net/qq_41508508/article/details/86681772

版权

机器学习专栏收录该内容

3 篇文章 2 订阅

订阅专栏

主要内容：

多变量下的线性回归，特征缩放/均值归一化，正规方程，常用的matlab/octave函数语句

多变量的线性回归：

和单变量的其实差不多，在矩阵的运算下是完全一样的。主要是对变量的一些定义，如下标表示第几个特征量，上标表示训练集中的第几个样本。吴恩达机器学习第一周

特征缩放/均值归一化：

由于不同的特征量会有不同的取值范围，如果不加以调整，会导致梯度下降的过程中反复振荡，效率过低。可以采用均值归一化来调整特征量的取值范围。

即有： $x_{j} = \frac{x_{j} - \mu }{\sigma }$ ，其中分子是特征值与该特征量的均值的差，分母可以是x的取值范围（最大值-最小值），也可以是x的标准差。

非一次项转化为一次项：

如： $x_{1}^{ } + x_{1}^{ 2} + x_{1}^{ 3}$ 可以转化为 $x_{1} + x_{2} + x_{3}$ ，而这时候，因为幂次的关系，三者的取值范围会有极大差别，特征缩放显得十分重要。

正规方程：

优点：一次性解出最佳theta值，不需要选择学习速率；

缺点：需要注意不可逆的情况：训练集样本数m小于特征量数n ，特征量出现线性相关

具体计算： $\Theta = \left ( X^{T}X \right )^{-1}X^{T}y$

推导过程也不难，本质是求解方程 $\frac{\partial }{\partial \Theta _{j}}J\left ( \Theta_{j} \right ) = 0$ 来使得代价函数最小，主要用到矩阵的求导运算。

将代价函数采用矩阵的方式写出来：

$J = \frac{1}{2m}\left ( X*\Theta - y \right )^{T}\left ( X*\Theta - y \right ) = \frac{1}{2m}\left ( \Theta ^{T}X^{T}X\Theta - \Theta ^{T}X^{T}y - y^{T}X\Theta + y^{T}y \right )$

然后每一项对θ求导，最终可以得到 $\Theta = \left ( X^{T}X \right )^{-1}X^{T}y$

matlab/Octave 的矩阵运算功能很强大，十分适用用来学习机器学习算法，而且python中也有许多相关类似的库。

编程作业（matlab 和 python版本）：

主要是熟悉怎么用matlab语言来编写这些，巩固下算法，全都采用矩阵实现。

computeCost.m —— 代价函数的计算


function J = computeCost(X, y, theta)
%COMPUTECOST Compute cost for linear regression
%   J = COMPUTECOST(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y
% Initialize some useful values
m = length(y); % number of training examples
% You need to return the following variables correctly 
J = 0;
% ====================== YOUR CODE HERE ======================
% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
J = (1/(2*m))*sum((X*theta-y).^2);
% =========================================================================
end

compustCostMulti.m —— 其实现与compustCost.m一样。

featuresNormalize.m —— 特征缩放


function [X_norm, mu, sigma] = featureNormalize(X)
%FEATURENORMALIZE Normalizes the features in X 
%   FEATURENORMALIZE(X) returns a normalized version of X where
%   the mean value of each feature is 0 and the standard deviation
%   is 1. This is often a good preprocessing step to do when
%   working with learning algorithms.
% You need to set these values correctly
X_norm = X;
mu = zeros(1, size(X, 2));
sigma = zeros(1, size(X, 2));
% ====================== YOUR CODE HERE ======================
% Instructions: First, for each feature dimension, compute the mean
%               of the feature and subtract it from the dataset,
%               storing the mean value in mu. Next, compute the 
%               standard deviation of each feature and divide
%               each feature by it's standard deviation, storing
%               the standard deviation in sigma. 
%
%               Note that X is a matrix where each column is a 
%               feature and each row is an example. You need 
%               to perform the normalization separately for 
%               each feature. 
%
% Hint: You might find the 'mean' and 'std' functions useful.
%    
mu = mean(X,1)
sigma = std(X,0,1)
for i=1:1:size(X,1)
    X_norm(i,:) = (X(i,:)-mu)./sigma;
end
% ============================================================
end

gradientDescent.m —— 梯度下降


function [theta, J_history] = gradientDescent(X, y, theta, alpha, num_iters)
%GRADIENTDESCENT Performs gradient descent to learn theta
%   theta = GRADIENTDESCENT(X, y, theta, alpha, num_iters) updates theta by 
%   taking num_iters gradient steps with learning rate alpha
% Initialize some useful values
m = length(y); % number of training examples
J_history = zeros(num_iters, 1);
for iter = 1:num_iters
    % ====================== YOUR CODE HERE ======================
    % Instructions: Perform a single gradient step on the parameter vector
    %               theta. 
    %
    % Hint: While debugging, it can be useful to print out the values
    %       of the cost function (computeCost) and gradient here.
    %
    theta = theta-(alpha/m)*(X'*(X*theta-y));
    % ============================================================
    % Save the cost J in every iteration    
    J_history(iter) = computeCost(X, y, theta);
end
end

gradientDescentMulti.m —— 同gradientDescent.m一样

normalEqn.m —— 正规方程


function [theta] = normalEqn(X, y)
%NORMALEQN Computes the closed-form solution to linear regression 
%   NORMALEQN(X,y) computes the closed-form solution to linear 
%   regression using the normal equations.
theta = zeros(size(X, 2), 1);
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the code to compute the closed form solution
%               to linear regression and put the result in theta.
%
% ---------------------- Sample Solution ----------------------
theta = pinv(X'*X)*X'*y;
% -------------------------------------------------------------
% ============================================================
end

python版本（包括画图）：

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D



def loadDataSet():
    dataMat = []
    labelMat = []
    fr = open('C:/Users/apple/Desktop/ex1data1.txt')
    for line in fr.readlines():
        lineArr = line.strip().split()
        #print(lineArr)
        dataArr = lineArr[0].strip().split(',')[0]
        labelArr = lineArr[0].strip().split(',')[1]
        dataMat.append([float(dataArr)])
        labelMat.append([float(labelArr)])
    return dataMat,labelMat

def computeCost(dataMat,theta,labelMat):
    hxMinusY = np.dot(dataMat, theta) - labelMat
    costJ = 1 / (2 * m) * np.dot(hxMinusY.T, hxMinusY)
    return costJ



if __name__ == "__main__":
    (dataMat , labelMat) = loadDataSet()
    dataMat = np.asarray(dataMat)
    labelMat = np.asarray(labelMat)
    #print(dataMat.shape)
    plt.scatter(dataMat , labelMat , marker='x' , color='red' , s=10 , label='myPlot')
    plt.show()

    m = dataMat.size
    addOneArr = np.ones(m)
    dataMat = np.column_stack((addOneArr,dataMat))

    theta = np.zeros((2,1))
    iters = 1500
    alpha = 0.01

    for i in range(iters):
        #print(theta)
        hxMinusY = np.dot(dataMat , theta) - labelMat
        computeCost(dataMat,theta,labelMat)
        #print(costJ)
        theta = theta - alpha * (1/m) * np.dot(dataMat.T , hxMinusY)

    print(theta)
    costJ = computeCost(dataMat,theta,labelMat)
    print(costJ)

    plt.plot(dataMat[:,1],np.dot(dataMat , theta),'-')
  #  plt.show()
    predict1 = np.dot([[1.,3.5]] , theta)
    predict2 = np.dot([[1.,7]],theta)
    print("for population = 35,000  ,  we predict a profit of %f" % (predict1*10000))
    print("for population = 70,000  ,  we predict a profit of %f" % (predict2 * 10000))

    fig = plt.figure()
    ax = Axes3D(fig)

    theta0_vals= np.arange(-10,10,0.2)
    theta0_vals = theta0_vals.reshape((100, 1))
    theta1_vals = np.arange(-1,4,0.05)
    theta1_vals = theta1_vals.reshape((100, 1))

    J_vals = np.zeros((theta0_vals.size,theta1_vals.size))

    for i in range(theta0_vals.size):
        for j in range(theta1_vals.size):
            t = np.row_stack((theta0_vals[i],theta1_vals[j]))
            J_vals[i,j] = computeCost(dataMat,t,labelMat)

    J_vals = J_vals.T
    theta0_vals, theta1_vals = np.meshgrid(theta0_vals, theta1_vals)

    ax.plot_surface(theta0_vals,theta1_vals,J_vals,cmap=plt.cm.inferno)
    ax.set_xlabel('theta0')
    ax.set_ylabel('theta1')
    ax.set_zlabel('costJ')

    plt.show()

    plt.figure()
    plt.contour(theta0_vals, theta1_vals, J_vals, np.logspace(-2, 3, 20))
    plt.scatter(theta[0,0], theta[1,0], marker='x', color='red', s=39, label='myPlot')
    plt.show()

booooooty

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
吴恩达机器学习第二周（编程及python实现）

主要内容：多变量下的线性回归，特征缩放/均值归一化，正规方程，常用的matlab/octave函数语句多变量的线性回归：和单变量的其实差不多，在矩阵的运算下是完全一样的。主要是对变量的一些定义，如下标表示第几个特征量，上标表示训练集中的第几个样本。吴恩达机器学习第一周特征缩放/均值归一化：由于不同的特征量会有不同的取值范围，如果不加以调整，会导致梯度下降的过程中反复振荡，效率过...
复制链接

扫一扫

专栏目录