Coursera吴恩达机器学习专项课,C1W1笔记
无监督回归模型整体步骤:
步骤一:将模型以公式表达 Model representation
步骤二:计算代价函数 Cost function
步骤三:使用梯度下降算法 Gradient descent for linear regression
步骤一:将模型以公式表达 Model representation
1. 导入所需工具
import numpy as np
import matplotlib.pyplot as plt
2. 描述问题
此处以房价为例,1000sqft售价300千美元,2000sqft售价500千美元,则训练数据集为:
x_train = np.array([1.0, 2.0])
y_train = np.array([300.0, 500.0])
print(f"x_train = {x_train}")
print(f"y_train = {y_train}")
此处输出为:x_train = [1. 2.] y_train = [300. 500.]
3. 描述数据形状
用m表述训练数据的数量。Numpy有.shape的参数,因此:
print(f"x_train.shape: {x_train.shape}")
m = x_train.shape[0]
print(f"Number of training examples is: {m}")
此处输出:x_train.shape: (2,) Number of training examples is: 2
4. 描述单个的训练数据
在列表的基础上,使用切片:
i = 0 # Change this to 1 to see (x^1, y^1)
x_i = x_train[i]
y_i = y_train[i]
print(f"(x^({i}), y^({i})) = ({x_i}, {y_i})")
此处输出:(x^(0), y^(0)) = (1.0, 300.0)
5. 绘制图表
可以使用matplotlib中的scatter(),可以设置maker形状,c颜色等参数:
# 绘制散点图
plt.scatter(x_train, y_train, marker='x', c='r')
# 设置标题
plt.title("Housing Prices")
# 设置y轴标签
plt.ylabel('Price (in 1000s of dollars)')
# 设置x轴标签
plt.xlabel('Size (1000 sqft)')
plt.show()
6. 描述模型
此处以最简单的一元线性回归为例,𝑓𝑤,𝑏(𝑥(𝑖))=𝑤𝑥(𝑖)+𝑏
我们可以设置w和b的值,比如:
w = 100
b = 100
print(f"w: {w}")
print(f"b: {b}")
我们的training set中有两组数,f可以获得两个结果。在数据更多时,我们能够使用for循环遍历所有结果,函数为:
def compute_model_output(x, w, b):
"""
Computes the prediction of a linear model
Args:
x (ndarray (m,)): Data, m examples
w,b (scalar) : model parameters
Returns
y (ndarray (m,)): target values
"""
m = x.shape[0]
f_wb = np.zeros(m)
for i in range(m):
f_wb[i] = w * x[i] + b
return f_wb
7. 做预测
当w, b, x确定,计算w * x_i + b即可。
步骤二:计算代价函数 Cost function
1. 导入所需工具 - 同上
2. 描述问题 - 同上
3. 计算代价 Computing Cost
$$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 \tag{1}$$
于是,可以通过for循环,计算在w,b既定的情况下,每一个训练数据用模型产出的结果,与y的差距有多大。本质上是看y-hat和y相差多少,再加上些其他便于代数计算的变形(求平方、除2m)。使用函数:
def compute_cost(x, y, w, b):
"""
Computes the cost function for linear regression.
Args:
x (ndarray (m,)): Data, m examples
y (ndarray (m,)): target values
w,b (scalar) : model parameters
Returns
total_cost (float): The cost of using w,b as the parameters for linear regression
to fit the data points in x and y
"""
# number of training examples
m = x.shape[0]
cost_sum = 0
for i in range(m):
f_wb = w * x[i] + b
cost = (f_wb - y[i]) ** 2
cost_sum = cost_sum + cost
total_cost = (1 / (2 * m)) * cost_sum
return total_cost
步骤三:使用梯度下降算法 Gradient descent for linear regression
1. 导入所需工具
import math, copy
import numpy as np
import matplotlib.pyplot as plt
2. 描述问题 - 同上
3. 计算代价函数 - 同上
4. 计算梯度
$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline
\; w &= w - \alpha \frac{\partial J(w,b)}{\partial w} \tag{3} \; \newline
b &= b - \alpha \frac{\partial J(w,b)}{\partial b} \newline \rbrace
\end{align*}$$
在这里 $w$, $b$ 两个参数是同时更新的,梯度被定义为:
$$
\begin{align}
\frac{\partial J(w,b)}{\partial w} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})x^{(i)} \tag{4}\\
\frac{\partial J(w,b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)}) \tag{5}\\
\end{align}
$$
5. 执行梯度下降
首先,计算梯度:
def compute_gradient(x, y, w, b):
"""
Computes the gradient for linear regression
Args:
x (ndarray (m,)): Data, m examples
y (ndarray (m,)): target values
w,b (scalar) : model parameters
Returns
dj_dw (scalar): The gradient of the cost w.r.t. the parameters w
dj_db (scalar): The gradient of the cost w.r.t. the parameter b
"""
# Number of training examples
m = x.shape[0]
dj_dw = 0
dj_db = 0
for i in range(m):
f_wb = w * x[i] + b
dj_dw_i = (f_wb - y[i]) * x[i]
dj_db_i = f_wb - y[i]
dj_db += dj_db_i
dj_dw += dj_dw_i
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_dw, dj_db
其次,进行梯度下降
def gradient_descent(x, y, w_in, b_in, alpha, num_iters, cost_function, gradient_function):
"""
Performs gradient descent to fit w,b. Updates w,b by taking
num_iters gradient steps with learning rate alpha
Args:
x (ndarray (m,)) : Data, m examples
y (ndarray (m,)) : target values
w_in,b_in (scalar): initial values of model parameters
alpha (float): Learning rate
num_iters (int): number of iterations to run gradient descent
cost_function: function to call to produce cost
gradient_function: function to call to produce gradient
Returns:
w (scalar): Updated value of parameter after running gradient descent
b (scalar): Updated value of parameter after running gradient descent
J_history (list): History of cost values
p_history (list): History of parameters [w,b]
"""
w = copy.deepcopy(w_in) # avoid modifying global w_in
# An array to store cost J and w's at each iteration primarily for graphing later
J_history = []
p_history = []
b = b_in
w = w_in
for i in range(num_iters):
# Calculate the gradient and update the parameters using gradient_function
dj_dw, dj_db = gradient_function(x, y, w , b)
# Update Parameters using equation (3) above
b = b - alpha * dj_db
w = w - alpha * dj_dw
# Save cost J at each iteration
if i<100000: # prevent resource exhaustion
J_history.append( cost_function(x, y, w , b))
p_history.append([w,b])
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters/10) == 0:
print(f"Iteration {i:4}: Cost {J_history[-1]:0.2e} ",
f"dj_dw: {dj_dw: 0.3e}, dj_db: {dj_db: 0.3e} ",
f"w: {w: 0.3e}, b:{b: 0.5e}")
return w, b, J_history, p_history #return w and J,w history for graphing
使用时需要输入相应的参数:
# initialize parameters
w_init = 0
b_init = 0
# some gradient descent settings
iterations = 10000
tmp_alpha = 1.0e-2
# run gradient descent
w_final, b_final, J_hist, p_hist = gradient_descent(x_train ,y_train, w_init, b_init, tmp_alpha,
iterations, compute_cost, compute_gradient)
print(f"(w,b) found by gradient descent: ({w_final:8.4f},{b_final:8.4f})")
这样就能够看见每一对w,b值的代价函数,并且能看见最终的w,b参数值。
最终,将它们带回最开始的f(x) = w * x_i + b,即可被用于预测。