Goals
扩展回归模型例程以支持多维特征
- 扩展数据结构以支持多维特征
- 重写prediction、cost和gradient例程以支持多维特征
- 利用NumPy的np.dot对其实现进行矢量化以提高速度和简洁性
Tools
In this lab, we will make use of:
- NumPy, a popular library for scientific computing
- Matplotlib, a popular library for plotting data
import copy, math
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('./deeplearning.mplstyle')
np.set_printoptions(precision=2) # reduced display precision on numpy arrays
Problem Statement
我们将使用房价预测的激励性例子,训练集包含三个示例,具有四个特征(大小、卧室、楼层和年龄),如下表所示
请注意,与之前的lab不同,这里规模以平方英尺为单位而不是1000平方英尺
我们将使用这些值建立一个线性回归模型,这样就可以预测其他房子的价格
请运行以下代码来创建X_train
和y_train
变量
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])
Matrix X Contain Our Examples
与上表类似,示例存储在NumPy的矩阵X_train中,矩阵的每一行代表一个示例
当有
m
m
m 个示例(在lab中m是3)和
n
n
n 个特征(在lab中n是4)时,
X
\mathbf{X}
X 是一个具有维度(
m
m
m,
n
n
n)的矩阵(m rows, n columns)
X
=
(
x
0
(
0
)
x
1
(
0
)
⋯
x
n
−
1
(
0
)
x
0
(
1
)
x
1
(
1
)
⋯
x
n
−
1
(
1
)
⋯
x
0
(
m
−
1
)
x
1
(
m
−
1
)
⋯
x
n
−
1
(
m
−
1
)
)
\mathbf{X} = \begin{pmatrix} x^{(0)}_0 & x^{(0)}_1 & \cdots & x^{(0)}_{n-1} \\ x^{(1)}_0 & x^{(1)}_1 & \cdots & x^{(1)}_{n-1} \\ \cdots \\ x^{(m-1)}_0 & x^{(m-1)}_1 & \cdots & x^{(m-1)}_{n-1} \end{pmatrix}
X=
x0(0)x0(1)⋯x0(m−1)x1(0)x1(1)x1(m−1)⋯⋯⋯xn−1(0)xn−1(1)xn−1(m−1)
注意:
- x ( i ) \mathbf{x}^{(i)} x(i) is vector containing example i. x ( i ) \mathbf{x}^{(i)} x(i) = ( x 0 ( i ) , x 1 ( i ) , ⋯ , x n − 1 ( i ) ) = (x^{(i)}_0, x^{(i)}_1, \cdots,x^{(i)}_{n-1}) =(x0(i),x1(i),⋯,xn−1(i))
-
x
j
(
i
)
x^{(i)}_j
xj(i) 是示例 i 中的元素 j ,括号中的上标表示示例编号,下标表示元素
Display the input data.
# data is stored in numpy array/matrix
print(f"X Shape: {X_train.shape}, X Type:{type(X_train)})")
print(X_train)
print(f"y Shape: {y_train.shape}, y Type:{type(y_train)})")
print(y_train)
输出如下
X Shape: (3, 4), X Type:<class 'numpy.ndarray'>)
[[2104 5 1 45]
[1416 3 2 40]
[ 852 2 1 35]]
y Shape: (3,), y Type:<class 'numpy.ndarray'>)
[460 232 178]
Parameter Vector w, b
w
\mathbf{w}
w 是一个具有
n
n
n 个元素的向量
每个元素包含一个与一个特征相关的参数,在我们的数据集中,n是4
从理论上讲,我们将其写成一个列向量
w
=
(
w
0
w
1
⋯
w
n
−
1
)
\mathbf{w} = \begin{pmatrix} w_0 \\ w_1 \\ \cdots\\ w_{n-1} \end{pmatrix}
w=
w0w1⋯wn−1
b
b
b 是标量参数
为了示范, w \mathbf{w} w 和 b b b 将加载一些接近最优的初始选择值, w \mathbf{w} w is a 1-D NumPy vector.
b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")
输出如下
w_init shape: (4,), b_init type: <class 'float'>
Model Prediction with Multiple Variables
linear model给出了具有多个特征的模型预测
f
w
,
b
(
x
)
=
w
0
x
0
+
w
1
x
1
+
.
.
.
+
w
n
−
1
x
n
−
1
+
b
(1)
f_{\mathbf{w},b}(\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}
fw,b(x)=w0x0+w1x1+...+wn−1xn−1+b(1)
或者用矢量表示法
f
w
,
b
(
x
)
=
w
⋅
x
+
b
(2)
f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b \tag{2}
fw,b(x)=w⋅x+b(2)
where
⋅
\cdot
⋅ is a vector dot product
为了演示点积,我们将使用(1)和(2)实现预测
Single Prediction Element by Element
我们之前的预测是将一个特征值乘以一个参数,并加上一个偏差参数,对我们之前的预测实现的多维特征的直接扩展是用循环对每一个元素实现 (1) ,对其参数进行加法并在最后添加偏差参数
def predict_single_loop(x, w, b):
"""
single predict using linear regression
Args:
x (ndarray): Shape (n,) example with multiple features
w (ndarray): Shape (n,) model parameters
b (scalar): model parameter
Returns:
p (scalar): prediction
"""
n = x.shape[0]
p = 0
for i in range(n):
p_i = x[i] * w[i]
p = p + p_i
p = p + b
return p
# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")
# make a prediction
f_wb = predict_single_loop(x_vec, w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")
输出如下
x_vec shape (4,), x_vec value: [2104 5 1 45]
f_wb shape (), prediction: 459.9999976194083
Note the shape of x_vec
. It is a 1-D NumPy vector with 4 elements, (4,). The result, f_wb
is a scalar.
Single Prediction, Vector
注意,可以使用上述 (2) 中的点积来实现上述等式 (1) ,我们可以使用矢量运算来加快预测速度
NumPy中的np.dot()
可以用于执行矢量点积
def predict(x, w, b):
"""
single predict using linear regression
Args:
x (ndarray): Shape (n,) example with multiple features
w (ndarray): Shape (n,) model parameters
b (scalar): model parameter
Returns:
p (scalar): prediction
"""
p = np.dot(x, w) + b
return p
# get a row from our training data
x_vec = X_train[0,:]
print(f"x_vec shape {x_vec.shape}, x_vec value: {x_vec}")
# make a prediction
f_wb = predict(x_vec,w_init, b_init)
print(f"f_wb shape {f_wb.shape}, prediction: {f_wb}")
输出如下
x_vec shape (4,), x_vec value: [2104 5 1 45]
f_wb shape (), prediction: 459.9999976194082
结果和使用循环是一样的,今后np.dot
将用于这些操作
现在,预测只是一句话,大多数例程将直接实现它而不是调用单独的预测例程
Compute Cost with Multiple Variables
多维特征的cost function
J
(
w
,
b
)
J(\mathbf{w},b)
J(w,b) 如下
J
(
w
,
b
)
=
1
2
m
∑
i
=
0
m
−
1
(
f
w
,
b
(
x
(
i
)
)
−
y
(
i
)
)
2
(3)
J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}
J(w,b)=2m1i=0∑m−1(fw,b(x(i))−y(i))2(3)
where:
f
w
,
b
(
x
(
i
)
)
=
w
⋅
x
(
i
)
+
b
(4)
f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b \tag{4}
fw,b(x(i))=w⋅x(i)+b(4)
与之前的lab相比,
w
\mathbf{w}
w 和
x
(
i
)
\mathbf{x}^{(i)}
x(i) 是支持多个特征的矢量而不是标量
下面是等式 (3) 和 (4) 的实现,请注意,this uses a standard pattern for this course where a for loop over all m
examples is used.
def compute_cost(X, y, w, b):
"""
compute cost
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
cost (scalar): cost
"""
m = X.shape[0]
cost = 0.0
for i in range(m):
f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)
cost = cost + (f_wb_i - y[i])**2 #scalar
cost = cost / (2 * m) #scalar
return cost
# Compute and display cost using our pre-chosen optimal parameters.
cost = compute_cost(X_train, y_train, w_init, b_init)
print(f'Cost at optimal w : {cost}')
输出如下
Cost at optimal w : 1.5578904330213735e-12
预期结果:最佳成本 w : 1.5578904045996674e-12
Gradient Descent with Multiple Variables
多维特征的gradient descent
repeat
until convergence:
{
w
j
=
w
j
−
α
∂
J
(
w
,
b
)
∂
w
j
for j = 0..n-1
b
=
b
−
α
∂
J
(
w
,
b
)
∂
b
}
\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\; & w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5} \; & \text{for j = 0..n-1}\newline &b\ \ = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \newline \rbrace \end{align*}
repeat} until convergence:{wj=wj−α∂wj∂J(w,b)b =b−α∂b∂J(w,b)for j = 0..n-1(5)
where, n is the number of features, parameters w j w_j wj, b b b, are updated simultaneously and where
∂ J ( w , b ) ∂ w j = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) x j ( i ) ∂ J ( w , b ) ∂ b = 1 m ∑ i = 0 m − 1 ( f w , b ( x ( i ) ) − y ( i ) ) \begin{align} \frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6} \\ \frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7} \end{align} ∂wj∂J(w,b)∂b∂J(w,b)=m1i=0∑m−1(fw,b(x(i))−y(i))xj(i)=m1i=0∑m−1(fw,b(x(i))−y(i))(6)(7)
-
m 四训练集中训练示例的数量
-
f w , b ( x ( i ) ) f_{\mathbf{w},b}(\mathbf{x}^{(i)}) fw,b(x(i)) is the model’s prediction, while y ( i ) y^{(i)} y(i) 是目标值
Compute Gradient with Multiple Variables
下面是用于计算等式 (6) 和 (7) 的实现方式,有很多方式可以实现这一点
在这个版本中,有一个所有m个例子的外循环
- ∂ J ( w , b ) ∂ b \frac{\partial J(\mathbf{w},b)}{\partial b} ∂b∂J(w,b) for the example can be computed directly and accumulated
- in a second loop over all n features:
- ∂ J ( w , b ) ∂ w j \frac{\partial J(\mathbf{w},b)}{\partial w_j} ∂wj∂J(w,b) is computed for each w j w_j wj.
def compute_gradient(X, y, w, b):
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
"""
m,n = X.shape #(number of examples, number of features)
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
err = (np.dot(X[i], w) + b) - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err * X[i, j]
dj_db = dj_db + err
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_db, dj_dw
#Compute and display gradient
tmp_dj_db, tmp_dj_dw = compute_gradient(X_train, y_train, w_init, b_init)
print(f'dj_db at initial w,b: {tmp_dj_db}')
print(f'dj_dw at initial w,b: \n {tmp_dj_dw}')
输出如下
dj_db at initial w,b: -1.6739251122999121e-06
dj_dw at initial w,b:
[-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
预期结果:
dj_db at initial w,b: -1.6739251122999121e-06
dj_dw at initial w,b:
[-2.73e-03 -6.27e-06 -2.22e-06 -6.92e-05]
Gradient Descent with Multiple Variables
下面的例程实现上面的等式 (5)
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
"""
Performs batch gradient descent to learn theta. Updates theta by taking
num_iters gradient steps with learning rate alpha
Args:
X (ndarray (m,n)) : Data, m examples with n features
y (ndarray (m,)) : target values
w_in (ndarray (n,)) : initial model parameters
b_in (scalar) : initial model parameter
cost_function : function to compute cost
gradient_function : function to compute the gradient
alpha (float) : Learning rate
num_iters (int) : number of iterations to run gradient descent
Returns:
w (ndarray (n,)) : Updated values of parameters
b (scalar) : Updated value of parameter
"""
# An array to store cost J and w's at each iteration primarily for graphing later
J_history = []
w = copy.deepcopy(w_in) #avoid modifying global w within function
b = b_in
for i in range(num_iters):
# Calculate the gradient and update the parameters
dj_db,dj_dw = gradient_function(X, y, w, b) ##None
# Update Parameters using w, b, alpha and gradient
w = w - alpha * dj_dw ##None
b = b - alpha * dj_db ##None
# Save cost J at each iteration
if i<100000: # prevent resource exhaustion
J_history.append( cost_function(X, y, w, b))
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters / 10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f} ")
return w, b, J_history #return final w,b and J history for graphing
In the next cell you will test the implementation.
# initialize parameters
initial_w = np.zeros_like(w_init)
initial_b = 0.
# some gradient descent settings
iterations = 1000
alpha = 5.0e-7
# run gradient descent
w_final, b_final, J_hist = gradient_descent(X_train, y_train, initial_w, initial_b,
compute_cost, compute_gradient,
alpha, iterations)
print(f"b,w found by gradient descent: {b_final:0.2f},{w_final} ")
m,_ = X_train.shape
for i in range(m):
print(f"prediction: {np.dot(X_train[i], w_final) + b_final:0.2f}, target value: {y_train[i]}")
输出如下
Iteration 0: Cost 2529.46
Iteration 100: Cost 695.99
Iteration 200: Cost 694.92
Iteration 300: Cost 693.86
Iteration 400: Cost 692.81
Iteration 500: Cost 691.77
Iteration 600: Cost 690.73
Iteration 700: Cost 689.71
Iteration 800: Cost 688.70
Iteration 900: Cost 687.69
b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07]
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178
预期输出:
b,w found by gradient descent: -0.00,[ 0.2 0. -0.01 -0.07]
prediction: 426.19, target value: 460
prediction: 286.17, target value: 232
prediction: 171.47, target value: 178
# plot cost versus iteration
fig, (ax1, ax2) = plt.subplots(1, 2, constrained_layout=True, figsize=(12, 4))
ax1.plot(J_hist)
ax2.plot(100 + np.arange(len(J_hist[100:])), J_hist[100:])
ax1.set_title("Cost vs. iteration"); ax2.set_title("Cost vs. iteration (tail)")
ax1.set_ylabel('Cost') ; ax2.set_ylabel('Cost')
ax1.set_xlabel('iteration step') ; ax2.set_xlabel('iteration step')
plt.show()
结果并不准确!cost仍在下降,预测也不太准确,下一个lab将探索如何改进这一点
Congratulations!
In this lab you:
- Redeveloped the routines for linear regression, now with multiple variables.
- Utilized NumPy
np.dot
to vectorize the implementations