Coursera吴恩达机器学习专项课,C1W2笔记
使用多元线性回归时,需要先引入矩阵、向量的概念,再用和之前一样的梯度下降算法,既能得到结果。为了使梯度下降更有效率,我们可能需要先对变量做一些处理,比如特征缩放(Feature scaling)或特征工程(Feature engineering),并选择合适的学习速率。此外,scikit-learn工具能够帮我们更快的实现线性回归。
多元回归整体步骤:
步骤一:学会使用向量和矩阵
步骤二:使用梯度下降算法
步骤三:学会使用特征缩放并选择学习速率
步骤四:学会使用特征工程
步骤五:使用scikit-learn工具
步骤一:学会使用向量和矩阵
1. 导入所需工具
import numpy as np
import time
2. 创建向量
a = np.zeros(4); print(f"np.zeros(4) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.zeros((4,)); print(f"np.zeros(4,) : a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.random_sample(4); print(f"np.random.random_sample(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.arange(4.); print(f"np.arange(4.): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.random.rand(4); print(f"np.random.rand(4): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.array([5,4,3,2]); print(f"np.array([5,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
a = np.array([5.,4,3,2]); print(f"np.array([5.,4,3,2]): a = {a}, a shape = {a.shape}, a data type = {a.dtype}")
np.zeros(4) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64 np.zeros(4,) : a = [0. 0. 0. 0.], a shape = (4,), a data type = float64 np.random.random_sample(4): a = [0.28243325 0.88875704 0.60911388 0.73204096], a shape = (4,), a data type = float64 np.arange(4.): a = [0. 1. 2. 3.], a shape = (4,), a data type = float64 np.random.rand(4): a = [0.43634179 0.41345104 0.41762148 0.08765631], a shape = (4,), a data type = float64 np.array([5,4,3,2]): a = [5 4 3 2], a shape = (4,), a data type = int64 np.array([5.,4,3,2]): a = [5. 4. 3. 2.], a shape = (4,), a data type = float64
3. 使用向量
可以进行引索、切片、运算。其中需要注意的是向量和向量之间的乘法,也就是dot product:
$$ x = \sum_{i=0}^{n-1} a_i b_i $$
我们可以用for循环实现计算,也就是for i in range(a.shape[0]),然后计算x = x + a[i] * b[i]。然而,直接使用NumPy是更有效率的方式。
a = np.array([1, 2, 3, 4])
b = np.array([-1, 4, 3, 2])
c = np.dot(a, b)
print(f"NumPy 1-D np.dot(a, b) = {c}, np.dot(a, b).shape = {c.shape} ")
c = np.dot(b, a)
print(f"NumPy 1-D np.dot(b, a) = {c}, np.dot(a, b).shape = {c.shape} ")
NumPy 1-D np.dot(a, b) = 24, np.dot(a, b).shape = () NumPy 1-D np.dot(b, a) = 24, np.dot(a, b).shape = ()
当我们计算量很大时,for循环的耗时会更长,我们可以使用time.time()来捕获时间:
tic = time.time() # 开始计时
# 需要运行的代码
toc = time.time() # 结束计时
4. 创建矩阵
a = np.zeros((1, 5))
print(f"a shape = {a.shape}, a = {a}")
a = np.zeros((2, 1))
print(f"a shape = {a.shape}, a = {a}")
a = np.random.random_sample((1, 1))
print(f"a shape = {a.shape}, a = {a}")
# 手动赋值
a = np.array([[5], [4], [3]]); print(f" a shape = {a.shape}, np.array: a = {a}")
a = np.array([[5],
[4],
[3]]);
print(f" a shape = {a.shape}, np.array: a = {a}")
a shape = (1, 5), a = [[0. 0. 0. 0. 0.]] a shape = (2, 1), a = [[0.] [0.]] a shape = (1, 1), a = [[0.82318089]] a shape = (3, 1), np.array: a = [[5] [4] [3]] a shape = (3, 1), np.array: a = [[5] [4] [3]]
5. 使用矩阵
矩阵可以引索、变形 ( 比如a = np.arange(6).reshape(-1,2) ),切片(比如a[:,:])。
步骤二:使用梯度下降算法
1. 导入所需工具
import copy, math
import numpy as np
import matplotlib.pyplot as plt
2. 描述问题
在X_train里,把所有feature的数值都写上,每一行数据在一个列表里。X_train是列表的列表。每一行数据只有一个结果,因此y是列表,数量和X的列表数量相等。此处,X_train是矩阵,y_train是向量,都是Type:<class 'numpy.ndarray'>。
X_train = np.array([[2104, 5, 1, 45], [1416, 3, 2, 40], [852, 2, 1, 35]])
y_train = np.array([460, 232, 178])
与此同时,我们也需要初始的w和b。在这里,w有四个值(和X_train里的数据一致,都是feature的数量),b只有一个。举例:
b_init = 785.1811367994083
w_init = np.array([ 0.39133535, 18.75376741, -53.36032453, -26.42131618])
print(f"w_init shape: {w_init.shape}, b_init type: {type(b_init)}")
此处 w_init shape: (4,), b_init type: <class 'float'>
3. 模型化
当我们把多元线性回归模型写下来,形状是:
$$ f_{\mathbf{w},b}(\mathbf{x}) = w_0x_0 + w_1x_1 +... + w_{n-1}x_{n-1} + b \tag{1}$$
当我们用点乘,则:
$$ f_{\mathbf{w},b}(\mathbf{x}) = \mathbf{w} \cdot \mathbf{x} + b \tag{2} $$
于是我们可以不用for循环,用点乘进行编程:
def predict(x, w, b):
"""
single predict using linear regression
Args:
x (ndarray): Shape (n,) example with multiple features
w (ndarray): Shape (n,) model parameters
b (scalar): model parameter
Returns:
p (scalar): prediction
"""
p = np.dot(x, w) + b
return p
4. 计算代价函数
代价函数公式为:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 \tag{3}$$
其中:
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = \mathbf{w} \cdot \mathbf{x}^{(i)} + b \tag{4} $$
使用for循环计算cost:
def compute_cost(X, y, w, b):
"""
compute cost
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
cost (scalar): cost
"""
m = X.shape[0]
cost = 0.0
for i in range(m):
f_wb_i = np.dot(X[i], w) + b #(n,)(n,) = scalar (see np.dot)
cost = cost + (f_wb_i - y[i])**2 #scalar
cost = cost / (2 * m) #scalar
return cost
5. 梯度下降
$$\begin{align*} \text{repeat}&\text{ until convergence:} \; \lbrace \newline\;
& w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{5} \; & \text{for j = 0..n-1}\newline
&b\ \ = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \newline \rbrace
\end{align*}$$
\begin{align}
\frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{6} \\
\frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{7}
\end{align}
$$
代码与之前一样,首先计算梯度:
def compute_gradient(X, y, w, b):
"""
Computes the gradient for linear regression
Args:
X (ndarray (m,n)): Data, m examples with n features
y (ndarray (m,)) : target values
w (ndarray (n,)) : model parameters
b (scalar) : model parameter
Returns:
dj_dw (ndarray (n,)): The gradient of the cost w.r.t. the parameters w.
dj_db (scalar): The gradient of the cost w.r.t. the parameter b.
"""
m,n = X.shape #(number of examples, number of features)
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
err = (np.dot(X[i], w) + b) - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err * X[i, j]
dj_db = dj_db + err
dj_dw = dj_dw / m
dj_db = dj_db / m
return dj_db, dj_dw
然后进行梯度下降的计算:
def gradient_descent(X, y, w_in, b_in, cost_function, gradient_function, alpha, num_iters):
"""
Performs batch gradient descent to learn theta. Updates theta by taking
num_iters gradient steps with learning rate alpha
Args:
X (ndarray (m,n)) : Data, m examples with n features
y (ndarray (m,)) : target values
w_in (ndarray (n,)) : initial model parameters
b_in (scalar) : initial model parameter
cost_function : function to compute cost
gradient_function : function to compute the gradient
alpha (float) : Learning rate
num_iters (int) : number of iterations to run gradient descent
Returns:
w (ndarray (n,)) : Updated values of parameters
b (scalar) : Updated value of parameter
"""
# An array to store cost J and w's at each iteration primarily for graphing later
J_history = []
w = copy.deepcopy(w_in) #avoid modifying global w within function
b = b_in
for i in range(num_iters):
# Calculate the gradient and update the parameters
dj_db,dj_dw = gradient_function(X, y, w, b) ##None
# Update Parameters using w, b, alpha and gradient
w = w - alpha * dj_dw ##None
b = b - alpha * dj_db ##None
# Save cost J at each iteration
if i<100000: # prevent resource exhaustion
J_history.append( cost_function(X, y, w, b))
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters / 10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]:8.2f} ")
return w, b, J_history #return final w,b and J history for graphing
步骤三:学会使用特征缩放并选择学习速率
特征缩放的三种方法:
- Feature scaling, essentially dividing each feature by a user selected value to result in a range between -1 and 1.
- Mean normalization: $x_i := \dfrac{x_i - \mu_i}{max - min} $
- Z-score normalization which we will explore below.
α自己看着办。
步骤四:学会使用特征工程
Optional,以后再补。总之就是把feature处理一下。
步骤五:使用scikit-learn工具
1. 导入所需工具
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import SGDRegressor
from sklearn.preprocessing import StandardScaler
# 导入自定义内容
from lab_utils_multi import load_house_data
from lab_utils_common import dlc
np.set_printoptions(precision=2)
2. 导入数据
X_train, y_train = load_house_data()
X_features = ['size(sqft)','bedrooms','floors','age']
接下来就可以进行梯度下降。
Scikit-learn有梯度下降回归模型sklearn.linear_model.SGDRegressor
并且能用sklearn.preprocessing.StandardScaler进行标准Z分数的转化
3. 特征缩放
StandardScaler()计算标准Z,于是:
scaler = StandardScaler()
X_norm = scaler.fit_transform(X_train)
print(f"Peak to Peak range by column in Raw X:{np.ptp(X_train,axis=0)}")
print(f"Peak to Peak range by column in Normalized X:{np.ptp(X_norm,axis=0)}")
4. 创建并拟合模型
SGDRegressor()为梯度下降,max_iter为最大迭代次数
sgdr = SGDRegressor(max_iter=1000)
sgdr.fit(X_norm, y_train)
print(sgdr)
print(f"number of iterations completed: {sgdr.n_iter_}, number of weight updates: {sgdr.t_}")
5. 审阅参数
b_norm = sgdr.intercept_
w_norm = sgdr.coef_
print(f"model parameters: w: {w_norm}, b:{b_norm}")
print( "model parameters from previous lab: w: [110.56 -21.27 -32.71 -37.97], b: 363.16")
6. 做预测
# make a prediction using sgdr.predict()
y_pred_sgd = sgdr.predict(X_norm)
# make a prediction using w,b.
y_pred = np.dot(X_norm, w_norm) + b_norm
print(f"prediction using np.dot() and sgdr.predict match: {(y_pred == y_pred_sgd).all()}")
print(f"Prediction on training set:\n{y_pred[:4]}" )
print(f"Target values \n{y_train[:4]}")