1.Univariate linear regression
Linear regression is one of the simplest machine learning algorithms, so let's start with an example.
This is the relationship between the house price and the size of the house:
We're going to find a straight line that fits the trend of the discrete points as much as possible:
These data could fit straight lines
Regression prediction and classification are possible
Input-> model-> output (prediction)-> actual
Based on mathematics, we know the equations for straight lines,so how to represent f(price)?
(1)model
Univariate linear regression model
单变量线性回归模型
(2)cost function
We get the model, but what should be the value of the parameters w,b?
We know the true and predicted values of some of the data.
The target is:
(the y with a hat is predicted value,and the other one is the true value)
This expression is the predicted mean square error.
This function means :mean_square_error_function:
We add a 2 to its denominator to simplify subsequent calculations。
now the goal is minimize the cost function J(w,b):
(3)gradient descent
Mathematical methods are rigorous, but they are difficult to generalize to more complex models.
So here we use the gradient descent method.
Now we start at a special condition:
When b=0, the model can be simplified:
When b is not 0, the cost function is a binary function:
- automate the process of optimizing 𝑤 and 𝑏 using gradient descent.
The procession of gradient descent method:
It could produce local minimal values.
The algorithm for updating the parameter values w and b:
Among them, α is the learning rate(学习率).
The left side is different from the right side, and the right side is wrong because the w value is called after it is updated
A positive slope moves to the left, a negative slope moves to the right (in this example), and in short, to a smaller value of the cost function
The problem with the α of learning rates
If the learning rate is too small, the value of the cost function will be small, but the speed will be slow.
If the learning rate is too large, the value of the cost function will be large and the speed will be fast, and the minimum value may not be reached.
Trouble with local minimals
This can lead to being stuck in a local minimum and not being able to achieve a true minimum
The learning rate decreases as the slope changes
Convergence can be accelerated while improving accuracy
The gradient descent method requires two partial derivatives.
Their derivation process is as follows:
The linear cost function is quadratic and there is no local minimum
Gradient descent process
(4)summary
In total:
Implement Gradient Descent
You will implement gradient descent algorithm for one feature. You will need three functions.
- compute_gradient implementing equation (4) and (5) above
- compute_cost implementing equation (2) above (code from previous lab)
- gradient_descent, utilizing compute_gradient and compute_cost
Use Python code implements these functions:
compute_gradient
def compute_gradient(x, y, w, b):
"""
Computes the gradient for linear regression
Args:
x (ndarray (m,)): Data, m examples
y (ndarray (m,)): target values
w,b (scalar) : model parameters
Returns
dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
dj_db (scalar): The gradient of the cost w.r.t. the parameter b
"""
# Number of training examples
m = x.shape[0]
dj_dw = 0
dj_db = 0
for i in range(m):
f_wb = w * x[i] + b
dj_dw_i = (f_wb - y[i]) * x[i]
dj_db_i = f_wb - y[i]
dj_db += dj_db_i
dj_dw += dj_dw_i
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_dw, dj_db
compute_cost
#Function to calculate the cost
def compute_cost(x, y, w, b):
m = x.shape[0]
cost = 0
for i in range(m):
f_wb = w * x[i] + b
cost = cost + (f_wb - y[i])**2
total_cost = 1 / (2 * m) * cost
return total_cost
gradient_descent
def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function):
"""
Performs gradient descent to fit w,b. Updates w,b by taking
num_iters gradient steps with learning rate alpha
Args:
x (ndarray (m,)) : Data, m examples
y (ndarray (m,)) : target values
w_in,b_in (scalar): initial values of model parameters
alpha (float): Learning rate
num_iters (int): number of iterations to run gradient descent
cost_function: function to call to produce cost
gradient_function: function to call to produce gradient
Returns:
w (scalar): Updated value of parameter after running gradient descent
b (scalar): Updated value of parameter after running gradient descent
J_history (List): History of cost values
p_history (list): History of parameters [w,b]
"""
w = copy.deepcopy(w_in) # avoid modifying global w_in
# An array to store cost J and w's at each iteration primarily for graphing later
J_history = []
p_history = []
b = b_in
w = w_in
for i in range(num_iters):
# Calculate the gradient and update the parameters using gradient_function
dj_dw, dj_db = gradient_function(x, y, w , b)
# Update Parameters using equation (3) above
b = b - alpha * dj_db
w = w - alpha * dj_dw
# Save cost J at each iteration
if i<100000: # prevent resource exhaustion
J_history.append( cost_function(x, y, w , b))
p_history.append([w,b])
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters/10) == 0:
print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e} ",
f"w: {w: 0.3e}, b:{b: 0.5e}")
return w, b, J_history, p_history #return w and J,w history for graphing
(5)example
Let's experiment with these algorithms:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data = np.loadtxt(open("D:\软件\pycharm\机器学习\线性回归\Linear Regression - Sheet1.csv","rb"),delimiter=",",skiprows=1)
x = data[:, 0]
y = data[:, 1]
# --------------2. 定义损失函数--------------
def compute_cost(w, b, data):
total_cost = 0
M = len(data)
# 逐点计算平方损失误差,然后求平均数
for i in range(M):
x = data[i, 0]
y = data[i, 1]
total_cost += (y - w * x - b) ** 2
return total_cost / M
# --------------3. 定义模型的超参数------------
alpha = 0.00003
initial_w = 0.6
initial_b = 0
num_iter = 100
# --------------4. 定义核心梯度下降算法函数-----
def grad_desc(data, initial_w, initial_b, alpha, num_iter):
w = initial_w
b = initial_b
# 定义一个list保存所有的损失函数值,用来显示下降的过程
cost_list = []
for i in range(num_iter):
cost_list.append(compute_cost(w, b, data))
w, b = step_grad_desc(w, b, alpha, data)
return [w, b, cost_list]
def step_grad_desc(current_w, current_b, alpha, data):
sum_grad_w = 0
sum_grad_b = 0
M = len(data)
# 对每个点,代入公式求和
for i in range(M):
x = data[i, 0]
y = data[i, 1]
sum_grad_w += (current_w * x + current_b - y) * x
sum_grad_b += current_w * x + current_b - y
# 用公式求当前梯度
grad_w = 2 / M * sum_grad_w
grad_b = 2 / M * sum_grad_b
# 梯度下降,更新当前的w和b
updated_w = current_w - alpha * grad_w
updated_b = current_b - alpha * grad_b
return updated_w, updated_b
# ------------5. 测试:运行梯度下降算法计算最优的w和b-------
w, b, cost_list = grad_desc(data, initial_w, initial_b, alpha, num_iter)
print("w is: ", w)
print("b is: ", b)
cost = compute_cost(w, b, data)
print("cost is: ", cost)
# plt.plot(cost_list)
# plt.show()
# ------------6. 画出拟合曲线-------------------------
plt.scatter(x, y)
# 针对每一个x,计算出预测的y值
pred_y = w * x + b
plt.plot(x, pred_y, c='r')
plt.show()
(6) Iterative process observation
These are the loss functions and parameter value change processes when the WuEnda deep learning ai course code is executed.
The function is called to observe the update history of w,b:
The update history of the values of comepute_cost.
Let's draw a picture to show them.
With the iterative progression of the parameter w, the value of b gradually tends to stabilize
2.Multiple linear regression
We use the same example to introduce.
Example:the price of houses
Multi-dimensional features
Description of the symbol
The vector is a list of multidimensional features,for example:
(1)model
Multiple linear regression
多元线性回归模型
We use the numpy library to write these out:
(2)cost function
Similar to the unary function, the gradient descent method is used.
(3)gradient descent
dot product
(点积)
At this point, we mimic a univariate linear regression model to build an algorithmic function.
Compute cost
def compute_cost(X, y, w, b):
"""
compute cost
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
cost (scalar): cost
"""
m = X.shape[0]
cost = 0.0
for i in range(m):
f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)
cost = cost + (f_wb_i - y[i])**2 #scalar
cost = cost / (2 * m) #scalar
return cost
Gradient descent
Compute the gradient
def compute_gradient(X, y, w, b):
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
"""
m,n = X.shape #(number of examples, number of features)
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
err = (np.dot(X[i], w) + b) - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err * X[i, j]
dj_db = dj_db + err
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_db, dj_dw
Gradient Descent With Multiple Variables
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
"""
Performs batch gradient descent to learn theta. Updates theta by taking
num_iters gradient steps with learning rate alpha
Args:
X (ndarray (m,n)) : Data, m examples with n features
y (ndarray (m,)) : target values
w_in (ndarray (n,)) : initial model parameters
b_in (scalar) : initial model parameter
cost_function : function to compute cost
gradient_function : function to compute the gradient
alpha (float) : Learning rate
num_iters (int) : number of iterations to run gradient descent
Returns:
w (ndarray (n,)) : Updated values of parameters
b (scalar) : Updated value of parameter
"""
# An array to store cost J and w's at each iteration primarily for graphing later
J_history = []
w = copy.deepcopy(w_in) #avoid modifying global w within function
b = b_in
for i in range(num_iters):
# Calculate the gradient and update the parameters
dj_db,dj_dw = gradient_function(X, y, w, b) ##None
# Update Parameters using w, b, alpha and gradient
w = w - alpha * dj_dw ##None
b = b - alpha * dj_db ##None
# Save cost J at each iteration
if i<100000: # prevent resource exhaustion
J_history.append( cost_function(X, y, w, b))
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters / 10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f} ")
return w, b, J_history
procession:
(4)regularization
Now, we use the regularization method for the linear model.
This is the updated gradient descent iteration algorithm.