监督学习Regression Model训练方法

最新推荐文章于 2024-05-28 20:00:00 发布

为什么是这样？

最新推荐文章于 2024-05-28 20:00:00 发布

阅读量619

点赞数 1

文章标签：学习算法 python

本文链接：https://blog.csdn.net/m0_70414688/article/details/125880791

版权

这篇博客详细介绍了如何在Coursera的吴恩达机器学习课程中实现无监督回归模型。从模型表示、代价函数计算到梯度下降算法的运用，博主通过实例展示了线性回归模型的训练过程，包括数据预处理、绘制图表、计算成本以及梯度下降的迭代更新。最后，博主展示了如何使用梯度下降找到最优的模型参数w和b。

摘要由CSDN通过智能技术生成

Coursera吴恩达机器学习专项课，C1W1笔记

无监督回归模型整体步骤：

步骤一：将模型以公式表达 Model representation

步骤二：计算代价函数 Cost function

步骤三：使用梯度下降算法 Gradient descent for linear regression

步骤一：将模型以公式表达 Model representation

1. 导入所需工具

import numpy as np
import matplotlib.pyplot as plt

2. 描述问题

此处以房价为例，1000sqft售价300千美元，2000sqft售价500千美元，则训练数据集为：

x_train = np.array([1.0, 2.0])
y_train = np.array([300.0, 500.0])
print(f"x_train = {x_train}")
print(f"y_train = {y_train}")

此处输出为：x_train = [1. 2.] y_train = [300. 500.]

3. 描述数据形状

用m表述训练数据的数量。Numpy有.shape的参数，因此：

print(f"x_train.shape: {x_train.shape}")
m = x_train.shape[0]
print(f"Number of training examples is: {m}")

此处输出：x_train.shape: (2,) Number of training examples is: 2

4. 描述单个的训练数据

在列表的基础上，使用切片：

i = 0 # Change this to 1 to see (x^1, y^1)

x_i = x_train[i]
y_i = y_train[i]
print(f"(x^({i}), y^({i})) = ({x_i}, {y_i})")

此处输出：(x^(0), y^(0)) = (1.0, 300.0)

5. 绘制图表

可以使用matplotlib中的scatter()，可以设置maker形状，c颜色等参数：

# 绘制散点图
plt.scatter(x_train, y_train, marker='x', c='r')
# 设置标题
plt.title("Housing Prices")
# 设置y轴标签
plt.ylabel('Price (in 1000s of dollars)')
# 设置x轴标签
plt.xlabel('Size (1000 sqft)')
plt.show()

6. 描述模型

此处以最简单的一元线性回归为例，𝑓𝑤,𝑏(𝑥(𝑖))=𝑤𝑥(𝑖)+𝑏

我们可以设置w和b的值，比如：

w = 100
b = 100
print(f"w: {w}")
print(f"b: {b}")

我们的training set中有两组数，f可以获得两个结果。在数据更多时，我们能够使用for循环遍历所有结果，函数为：

def compute_model_output(x, w, b):
    """
    Computes the prediction of a linear model
    Args:
      x (ndarray (m,)): Data, m examples 
      w,b (scalar)    : model parameters  
    Returns
      y (ndarray (m,)): target values
    """
    m = x.shape[0]
    f_wb = np.zeros(m)
    for i in range(m):
        f_wb[i] = w * x[i] + b
        
    return f_wb

7. 做预测

当w, b, x确定，计算w * x_i + b即可。

步骤二：计算代价函数 Cost function

1. 导入所需工具 - 同上

2. 描述问题 - 同上

3. 计算代价 Computing Cost

$$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \tag{1}$$

于是，可以通过for循环，计算在w,b既定的情况下，每一个训练数据用模型产出的结果，与y的差距有多大。本质上是看y-hat和y相差多少，再加上些其他便于代数计算的变形（求平方、除2m)。使用函数：

def compute_cost(x, y, w, b): 
    """
    Computes the cost function for linear regression.
    
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    
    Returns
        total_cost (float): The cost of using w,b as the parameters for linear regression
               to fit the data points in x and y
    """
    # number of training examples
    m = x.shape[0] 
    
    cost_sum = 0 
    for i in range(m): 
        f_wb = w * x[i] + b   
        cost = (f_wb - y[i]) ** 2  
        cost_sum = cost_sum + cost  
    total_cost = (1 / (2 * m)) * cost_sum  

    return total_cost

步骤三：使用梯度下降算法 Gradient descent for linear regression

1. 导入所需工具

import math, copy
import numpy as np
import matplotlib.pyplot as plt

2. 描述问题 - 同上

3. 计算代价函数 - 同上

4. 计算梯度

$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\; w &= w - \alpha \frac{\partial J(w,b)}{\partial w} \tag{3} \; \newline
b &= b - \alpha \frac{\partial J(w,b)}{\partial b} \newline \rbrace
\end{align*}$$
在这里 $w$, $b$ 两个参数是同时更新的，梯度被定义为：
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
\frac{\partial J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
\end{align}
$$

5. 执行梯度下降

首先，计算梯度：

def compute_gradient(x, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    Returns
      dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b     
     """
    
    # Number of training examples
    m = x.shape[0]    
    dj_dw = 0
    dj_db = 0
    
    for i in range(m):  
        f_wb = w * x[i] + b 
        dj_dw_i = (f_wb - y[i]) * x[i] 
        dj_db_i = f_wb - y[i] 
        dj_db += dj_db_i
        dj_dw += dj_dw_i 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 
        
    return dj_dw, dj_db

其次，进行梯度下降

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (list): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    w = copy.deepcopy(w_in) # avoid modifying global w_in
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        dj_dw, dj_db = gradient_function(x, y, w , b)     

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            

        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")
 
    return w, b, J_history, p_history #return w and J,w history for graphing

使用时需要输入相应的参数：

# initialize parameters
w_init = 0
b_init = 0
# some gradient descent settings
iterations = 10000
tmp_alpha = 1.0e-2
# run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init, b_init, tmp_alpha, 
                                                    iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")

这样就能够看见每一对w,b值的代价函数，并且能看见最终的w,b参数值。

最终，将它们带回最开始的f(x) = w * x_i + b，即可被用于预测。