Model1: Linear Regression

Blissmaker

已于 2023-11-30 10:56:27 修改

阅读量123

点赞数

分类专栏： machine learning model 文章标签：线性回归人工智能机器学习

于 2023-11-17 23:28:15 首次发布

本文链接：https://blog.csdn.net/linglingdie/article/details/134472174

版权

machine learning model 专栏收录该内容

5 篇文章 0 订阅

订阅专栏

1.Univariate linear regression

Linear regression is one of the simplest machine learning algorithms, so let's start with an example.

This is the relationship between the house price and the size of the house：

We're going to find a straight line that fits the trend of the discrete points as much as possible：

These data could fit straight lines

Regression prediction and classification are possible

Input-> model-> output (prediction)-> actual

Based on mathematics, we know the equations for straight lines，so how to represent f(price)?

(1)model

Univariate linear regression model

单变量线性回归模型

$f(x)=wx+b$

(2)cost function

We get the model, but what should be the value of the parameters w,b?

We know the true and predicted values of some of the data.

The target is：

(the y with a hat is predicted value,and the other one is the true value)

This expression is the predicted mean square error.

This function means :mean_square_error_function:

We add a 2 to its denominator to simplify subsequent calculations。

now the goal is minimize the cost function J(w,b)：

(3)gradient descent

Mathematical methods are rigorous, but they are difficult to generalize to more complex models.

So here we use the gradient descent method.

Now we start at a special condition:

When b=0, the model can be simplified:

When b is not 0, the cost function is a binary function:

automate the process of optimizing 𝑤 and 𝑏 using gradient descent.

The procession of gradient descent method:

It could produce local minimal values.

The algorithm for updating the parameter values w and b:

Among them, α is the learning rate(学习率).

The left side is different from the right side, and the right side is wrong because the w value is called after it is updated

A positive slope moves to the left, a negative slope moves to the right (in this example), and in short, to a smaller value of the cost function

The problem with the α of learning rates

If the learning rate is too small, the value of the cost function will be small, but the speed will be slow.

If the learning rate is too large, the value of the cost function will be large and the speed will be fast, and the minimum value may not be reached.

Trouble with local minimals

This can lead to being stuck in a local minimum and not being able to achieve a true minimum

The learning rate decreases as the slope changes

Convergence can be accelerated while improving accuracy

The gradient descent method requires two partial derivatives.

Their derivation process is as follows：

The linear cost function is quadratic and there is no local minimum

Gradient descent process

(4)summary

In total:

Implement Gradient Descent

You will implement gradient descent algorithm for one feature. You will need three functions.

compute_gradient implementing equation (4) and (5) above

compute_cost implementing equation (2) above (code from previous lab)

gradient_descent, utilizing compute_gradient and compute_cost

Use Python code implements these functions：

compute_gradient

def compute_gradient(x, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      x (ndarray (m,)): Data, m examples 
      y (ndarray (m,)): target values
      w,b (scalar)    : model parameters  
    Returns
      dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
      dj_db (scalar): The gradient of the cost w.r.t. the parameter b     
     """
    
    # Number of training examples
    m = x.shape[0]    
    dj_dw = 0
    dj_db = 0
    
    for i in range(m):  
        f_wb = w * x[i] + b 
        dj_dw_i = (f_wb - y[i]) * x[i] 
        dj_db_i = f_wb - y[i] 
        dj_db += dj_db_i
        dj_dw += dj_dw_i 
    dj_dw = dj_dw / m 
    dj_db = dj_db / m 
        
    return dj_dw, dj_db

compute_cost

#Function to calculate the cost
def compute_cost(x, y, w, b):
   
    m = x.shape[0] 
    cost = 0
    
    for i in range(m):
        f_wb = w * x[i] + b
        cost = cost + (f_wb - y[i])**2
    total_cost = 1 / (2 * m) * cost

    return total_cost

gradient_descent

def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function): 
    """
    Performs gradient descent to fit w,b. Updates w,b by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      x (ndarray (m,))  : Data, m examples 
      y (ndarray (m,))  : target values
      w_in,b_in (scalar): initial values of model parameters  
      alpha (float):     Learning rate
      num_iters (int):   number of iterations to run gradient descent
      cost_function:     function to call to produce cost
      gradient_function: function to call to produce gradient
      
    Returns:
      w (scalar): Updated value of parameter after running gradient descent
      b (scalar): Updated value of parameter after running gradient descent
      J_history (List): History of cost values
      p_history (list): History of parameters [w,b] 
      """
    
    w = copy.deepcopy(w_in) # avoid modifying global w_in
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    p_history = []
    b = b_in
    w = w_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters using gradient_function
        dj_dw, dj_db = gradient_function(x, y, w , b)     

        # Update Parameters using equation (3) above
        b = b - alpha * dj_db                            
        w = w - alpha * dj_dw                            

        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(x, y, w , b))
            p_history.append([w,b])
        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters/10) == 0:
            print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
                  f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e}  ",
                  f"w: {w: 0.3e}, b:{b: 0.5e}")
 
    return w, b, J_history, p_history #return w and J,w history for graphing

(5)example

Let's experiment with these algorithms：

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = np.loadtxt(open("D:\软件\pycharm\机器学习\线性回归\Linear Regression - Sheet1.csv","rb"),delimiter=",",skiprows=1)


x = data[:, 0]
y = data[:, 1]


# --------------2. 定义损失函数--------------
def compute_cost(w, b, data):
    total_cost = 0
    M = len(data)
    # 逐点计算平方损失误差，然后求平均数
    for i in range(M):
        x = data[i, 0]
        y = data[i, 1]
        total_cost += (y - w * x - b) ** 2
    return total_cost / M


# --------------3. 定义模型的超参数------------
alpha = 0.00003
initial_w = 0.6
initial_b = 0
num_iter = 100


# --------------4. 定义核心梯度下降算法函数-----
def grad_desc(data, initial_w, initial_b, alpha, num_iter):
    w = initial_w
    b = initial_b
    # 定义一个list保存所有的损失函数值，用来显示下降的过程
    cost_list = []
    for i in range(num_iter):
        cost_list.append(compute_cost(w, b, data))
        w, b = step_grad_desc(w, b, alpha, data)
    return [w, b, cost_list]


def step_grad_desc(current_w, current_b, alpha, data):
    sum_grad_w = 0
    sum_grad_b = 0
    M = len(data)
    # 对每个点，代入公式求和
    for i in range(M):
        x = data[i, 0]
        y = data[i, 1]
        sum_grad_w += (current_w * x + current_b - y) * x
        sum_grad_b += current_w * x + current_b - y
    # 用公式求当前梯度
    grad_w = 2 / M * sum_grad_w
    grad_b = 2 / M * sum_grad_b
    # 梯度下降，更新当前的w和b
    updated_w = current_w - alpha * grad_w
    updated_b = current_b - alpha * grad_b
    return updated_w, updated_b


# ------------5. 测试：运行梯度下降算法计算最优的w和b-------
w, b, cost_list = grad_desc(data, initial_w, initial_b, alpha, num_iter)
print("w is: ", w)
print("b is: ", b)
cost = compute_cost(w, b, data)
print("cost is: ", cost)
# plt.plot(cost_list)
# plt.show()

# ------------6. 画出拟合曲线-------------------------
plt.scatter(x, y)
# 针对每一个x，计算出预测的y值
pred_y = w * x + b
plt.plot(x, pred_y, c='r')
plt.show()

(6) Iterative process observation

These are the loss functions and parameter value change processes when the WuEnda deep learning ai course code is executed.

The function is called to observe the update history of w,b:

The update history of the values of comepute_cost.

Let's draw a picture to show them.

With the iterative progression of the parameter w, the value of b gradually tends to stabilize

2.Multiple linear regression

We use the same example to introduce.

Example：the price of houses

Multi-dimensional features

Description of the symbol

The vector is a list of multidimensional features，for example:

(1)model

Multiple linear regression

多元线性回归模型

We use the numpy library to write these out：

(2)cost function

Similar to the unary function, the gradient descent method is used.

(3)gradient descent

dot product

(点积)

At this point, we mimic a univariate linear regression model to build an algorithmic function.

Compute cost

def compute_cost(X, y, w, b): 
    """
    compute cost
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      cost (scalar): cost
    """
    m = X.shape[0]
    cost = 0.0
    for i in range(m):                                
        f_wb_i = np.dot(X[i], w) + b           #(n,)(n,) = scalar (see np.dot)
        cost = cost + (f_wb_i - y[i])**2       #scalar
    cost = cost / (2 * m)                      #scalar    
    return cost

Gradient descent

Compute the gradient

def compute_gradient(X, y, w, b): 
    """
    Computes the gradient for linear regression 
    Args:
      X (ndarray (m,n)): Data, m examples with n features
      y (ndarray (m,)) : target values
      w (ndarray (n,)) : model parameters  
      b (scalar)       : model parameter
      
    Returns:
      dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w. 
      dj_db (scalar):       The gradient of the cost w.r.t. the parameter b. 
    """
    m,n = X.shape           #(number of examples, number of features)
    dj_dw = np.zeros((n,))
    dj_db = 0.

    for i in range(m):                             
        err = (np.dot(X[i], w) + b) - y[i]   
        for j in range(n):                         
            dj_dw[j] = dj_dw[j] + err * X[i, j]    
        dj_db = dj_db + err                        
    dj_dw = dj_dw / m                                
    dj_db = dj_db / m                                
        
    return dj_db, dj_dw

Gradient Descent With Multiple Variables

def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters): 
    """
    Performs batch gradient descent to learn theta. Updates theta by taking 
    num_iters gradient steps with learning rate alpha
    
    Args:
      X (ndarray (m,n))   : Data, m examples with n features
      y (ndarray (m,))    : target values
      w_in (ndarray (n,)) : initial model parameters  
      b_in (scalar)       : initial model parameter
      cost_function       : function to compute cost
      gradient_function   : function to compute the gradient
      alpha (float)       : Learning rate
      num_iters (int)     : number of iterations to run gradient descent
      
    Returns:
      w (ndarray (n,)) : Updated values of parameters 
      b (scalar)       : Updated value of parameter 
      """
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):

        # Calculate the gradient and update the parameters
        dj_db,dj_dw = gradient_function(X, y, w, b)   ##None

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               ##None
        b = b - alpha * dj_db               ##None
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( cost_function(X, y, w, b))

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f}   ")
        
    return w, b, J_history

procession：