机器学习2022吴恩达老师课程——学习笔记（四）

zzz_zzzz_

已于 2023-08-01 19:07:58 修改

阅读量133

点赞数

分类专栏：机器学习吴恩达老师文章标签：机器学习学习笔记

于 2023-07-28 07:05:38 首次发布

本文链接：https://blog.csdn.net/zzz_zzzz_/article/details/131961502

版权

机器学习同时被 2 个专栏收录

7 篇文章 2 订阅

订阅专栏

吴恩达老师

6 篇文章 0 订阅

订阅专栏

文章讨论了线性回归在处理多个特征时的模型构建，强调了向量化计算的优势，例如使用numpy的np.dot函数进行向量点乘以提高效率。通过比较循环计算与向量化计算的时间差异，展示了向量化在速度上的显著优势。此外，文章还涵盖了梯度下降法的实现，包括成本函数的计算和梯度的求解，以及批量梯度下降的迭代过程，展示了一个简单的多特征线性回归模型的训练过程。

摘要由CSDN通过智能技术生成

线性回归是监督学习最简单的一种模型了，在了解单特征的回归模型的基础上，再看较之复杂一些的多特征模型。也就是原来的例子是房子的大小对房价的影响，现在变为房子的大小、房间数目、层数、使用年限对房价的影响。由一个特征变成了多个特征。

Linear Regression with Multiple Features

这里的model用到了向量点乘

在算法中使用向量可以提高效率，下面介绍了向量化

关于模型构建的算法，上面列举了三种：

第一种：写起来复杂，运行慢

第二种：利用循环使得写起来简单了，但运行同样费时间

第三种：利用numpy中的np.dot点乘来完成，一行就完成，是并行计算的，速度快

此外，在梯度下降更新参数时，设学习率为0.1，用向量的计算（w=w-0.1*d)同样也更快

怎么来量化这个向量比循坏快呢？看算法：

def my_dot(a, b): 
    """
   Compute the dot product of two vectors
 
    Args:
      a (ndarray (n,)):  input vector 
      b (ndarray (n,)):  input vector with same dimension as a
    
    Returns:
      x (scalar): 
    """
    x=0
    for i in range(a.shape[0]):
        x = x + a[i] * b[i]
    return x

import time
np.random.seed(1)
a = np.random.rand(10000000)  # very large arrays
b = np.random.rand(10000000)

tic = time.time()  # capture start time
c = np.dot(a, b)
toc = time.time()  # capture end time

print(f"np.dot(a, b) =  {c:.4f}")
print(f"Vectorized version duration: {1000*(toc-tic):.4f} ms ")

tic = time.time()  # capture start time
c = my_dot(a,b)
toc = time.time()  # capture end time

print(f"my_dot(a, b) =  {c:.4f}")
print(f"loop version duration: {1000*(toc-tic):.4f} ms ")

del(a);del(b)  #remove these big arrays from memory

np.dot耗时8.6598毫秒，而循环耗时2756.6850毫秒

梯度下降

下面是这部分的算法：

1.用到的工具

import copy, math
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
np.set_printoptions(precision=2)  # reduced display precision on numpy arrays

2.特征和训练样本集

这里的 $x_{1}$ 是房子的大小， $x_{2}$ 是房间的数量， $x_{3}$ 是房屋的层数， $x_{4}$ 是房屋的已使用年限。4个特征，n为4。这里给出的训练样本有三个。

X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])

# data is stored in numpy array/matrix
print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
print(X_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)

2.初始化模型参数 W（向量）和 b（数值）

这里，n=4

b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")

3. 计算f(X)=w·x+b——点乘用循环计算

def predict_single_loop(x, w, b): 
    """
    single predict using linear regression
    
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters    
      b (scalar):  model parameter     
      
    Returns:
      p (scalar):  prediction
    """
    n = x.shape[0]
    p = 0
    for i in range(n):
        p_i = x[i] * w[i]  
        p = p + p_i         
    p = p + b                
    return p

# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction
f_wb = predict_single_loop(x_vec, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

将第一组样本[2104 5 1 45]代入模型，计算预测值459.9999976194083

4.更快的方法计算f(X)=w·x+b——np.dot

def predict(x, w, b): 
    """
    single predict using linear regression
    Args:
      x (ndarray): Shape (n,) example with multiple features
      w (ndarray): Shape (n,) model parameters   
      b (scalar):             model parameter 
      
    Returns:
      p (scalar):  prediction
    """
    p = np.dot(x, w) + b     #就这一行就OK  
    return p

# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")

# make a prediction
f_wb = predict(x_vec,w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")

结果和上面一样，只是运算更快了

5.成本函数，计算成本

def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]   #m是训练样本集中样本个数
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar    
    return cost

# Compute and display cost using our pre-chosen optimal parameters. 
cost = compute_cost(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')

6.多特征的梯度下降法

（1）计算偏导

def compute_gradient(X, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)  m是样本数量，n是特征数量
    dj_dw = np.zeros((n,))   #一维向量，全是0，一共有n个
    dj_db = 0.

    for i in range(m):        #m个样本                     
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):     #n个特征                
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err   
        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

#Compute and display gradient 
tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')

通过以上函数，求w1,w2,w3,w4和b关于成本函数的偏导值

此外，我自己写了另一种循环方法求各个偏导值：

def compute_gradient(X, y, w, b): 
    m,n = X.shape           #(number of examples, number of features)  m是样本数量，n是特征数量
    dj_dw = np.zeros((n,))   #一维向量，全是0，一共有n个
    dj_db = 0.
    
    for i in range(n):  #n个特征
        for j in range(m):  #m个样本
            err=(np.dot(X[j],w)+b)-y[j]
            dj_dw[i]=dj_dw[i]+err*X[j,i]
            dj_db=dj_db+err
    dj_dw=dj_dw/m
    dj_db = dj_db/n/m  
    return dj_db, dj_dw

结果和之前那个一模一样，但显然这个循环不如上面的好

（2）实现梯度下降

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history #return final w,b and J history for graphing

上面函数中的 w = w - alpha * dj_dw 是向量计算（并行的，运算快）

# initialize parameters
initial_w = np.zeros_like(w_init)   #构造一个矩阵initial_w，其维度数和w_init一样，值全为0
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7

# run gradient descent 
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
                                                    compute_cost, compute_gradient, 
                                                    alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
    print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")

可以看出，成本在降低，最终获得的W=[0.2 0 -0.01 -0.07],b=0

最后有计算了三个样本的模型预测值和真实值

7.可视化J（绘制成本J和训练次数之间的图像）

# plot cost versus iteration  
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))
ax1.plot(J_hist)  #J_hist因变量，自变量默认从0开始取整数
ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])  #自变量从100开始取到1000，因变量J_hist[100:]
ax1.set_title("Cost vs. iteration");  ax2.set_title("Cost vs. iteration (tail)")
ax1.set_ylabel('Cost')             ;  ax2.set_ylabel('Cost') 
ax1.set_xlabel('iteration step')   ;  ax2.set_xlabel('iteration step') 
plt.show()

These results are not inspiring! Cost is still declining and our predictions are not very accurate. The next lab will explore how to improve on this.

也就是说，梯度下降时，设置了迭代1000次，但这1000次之后成本还在下降，还未到达局部最小值，还未收敛。