Coursera吴恩达机器学习专项课,C1W3笔记
监督学习通常有两种方式,一种是使用回归,另一种则是分类。相比能够输出不同预测值的回归,分类输出的是类型,因此仅输出0,1这种代表类别的数值。也正因如此,我们会使用logistic回归,而非此前使用的回归方程去判断类型。但与此前相同的部分,则是使用分类的方法时,依旧要计算代价函数,并且能够用梯度下降的方式求得最优解。
分类整体步骤:
步骤一:理解什么是分类
步骤二:使用Sigmoid function和Logistic regression
步骤三:确定Decision boundary
步骤四:计算Logistic loss
步骤五:计算代价函数
步骤六:使用梯度下降算法
步骤七:解决过拟合问题
步骤一:理解什么是分类
例子:
x_train1 = np.array([0., 1, 2, 3, 4, 5])
y_train1 = np.array([0, 0, 0, 1, 1, 1])
X_train2 = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train2 = np.array([0, 0, 0, 1, 1, 1])
此处train1是只有一个feature的分类;train2则有两个feature,所以X_train2中单个的x值是两个数值组成的列表。
两个训练集都将数据进行了二分类,输出分别是pos = 1, neg = 0。
步骤二:使用Sigmoid function和Logistic regression
① 认识sigmoid函数
$$g(z) = \frac{1}{1+e^{-z}}$$
这里的z是linear regression model的输出结果,而 $e^{z}$可以使用NumPy的函数 [`exp()`]进行计算。
import numpy as np
# 输入数组
input_array = np.array([1,2,3])
exp_array = np.exp(input_array)
print("Input to exp:", input_array)
print("Output of exp:", exp_array)
# 输入单个数值
input_val = 1
exp_val = np.exp(input_val)
print("Input to exp:", input_val)
print("Output of exp:", exp_val)
因此,sigmoid用函数表示为:
def sigmoid(z):
g = 1/(1+np.exp(-z))
return g
② 使用Logistic Regression
Logistic回归,就是将sigmoid函数以线性回归一样的方式表达
$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = g(\mathbf{w} \cdot \mathbf{x}^{(i)} + b ) $$
其中:
$$g(z) = \frac{1}{1+e^{-z}}$$
步骤三:确定Decision boundary
在Logistic回归的基础上,如何设置决策标准呢?
根据sigmoid函数的性质,所有实数都被映射到了(0,1)区间,数值的改变集中于对称轴附近,且与y轴相交于y=0.5的位置。假设以0.5作为区隔,y轴左边得数都是0,右边都是1。
f=g(z),当g(z)≥0.5则y=1;当g(z)<0.5则y=0
g(z)是sigmoid函数,当z≥0时,g(z)≥0.5;当z<0时,g(z)<0.5
而z=w·x+b,因此w·x+b≥0时,y=1;w·x+b<0时,y=0
步骤四:计算Logistic loss
① 理解为什么不用误差平方和
在线性回归中,Loss用误差平方和算,公式:
$$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 $$
在线性回归中:
$$f_{w,b}(x^{(i)}) = wx^{(i)} + b $$
那么,我们直接改变f的含义,变成sigmoid就可以了吗?
$$f_{w,b}(x^{(i)}) = sigmoid(wx^{(i)} + b )$$
不可以,结果不理想,如果画图会发现这无法产生平滑的凸函数。
② 使用计算公式
\begin{equation}
loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = \begin{cases}
- \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=1$}\\
- \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=0$}
\end{cases}
\end{equation}
在实际使用时,则用该公式的另一个写法:
$$loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)$$
步骤五:计算代价函数
loss指代每一个数据的误差,而所有的误差作为整体,则是cost
def compute_cost_logistic(X, y, w, b):
m = X.shape[0]
cost = 0.0
for i in range(m):
z_i = np.dot(X[i],w) + b
f_wb_i = sigmoid(z_i)
cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
cost = cost / m
return cost
简而言之,就是把刚才的loss公式放进for循环,然后求和。
步骤六:使用梯度下降算法
① 使用公式计算
复习一下梯度下降:
Recall the gradient descent algorithm utilizes the gradient calculation:
$$\begin{align*}
&\text{repeat until convergence:} \; \lbrace \\
& \; \; \;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1} \; & \text{for j := 0..n-1} \\
& \; \; \; \; \;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\
&\rbrace
\end{align*}$$
求一下偏导数可得:
$$\begin{align*}
\frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2} \\
\frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3}
\end{align*}$$
此处使用data set举例:
X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train = np.array([0, 0, 0, 1, 1, 1])
def compute_gradient_logistic(X, y, w, b):
m,n = X.shape
dj_dw = np.zeros((n,))
dj_db = 0.
for i in range(m):
f_wb_i = sigmoid(np.dot(X[i],w) + b)
err_i = f_wb_i - y[i]
for j in range(n):
dj_dw[j] = dj_dw[j] + err_i * X[i,j]
dj_db = dj_db + err_i
dj_dw = dj_dw/m
dj_db = dj_db/m
return dj_db, dj_dw
和线性回归相比,f_wb_i这里变了
def gradient_descent(X, y, w_in, b_in, alpha, num_iters):
# An array to store cost J and w's at each iteration primarily for graphing later
J_history = []
w = copy.deepcopy(w_in) #avoid modifying global w within function
b = b_in
for i in range(num_iters):
# Calculate the gradient and update the parameters
dj_db, dj_dw = compute_gradient_logistic(X, y, w, b)
# Update Parameters using w, b, alpha and gradient
w = w - alpha * dj_dw
b = b - alpha * dj_db
# Save cost J at each iteration
if i<100000: # prevent resource exhaustion
J_history.append( compute_cost_logistic(X, y, w, b) )
# Print cost every at intervals 10 times or as many iterations if < 10
if i% math.ceil(num_iters / 10) == 0:
print(f"Iteration {i:4d}: Cost {J_history[-1]} ")
return w, b, J_history #return final w,b and J history for graphing
② 使用scikit-learn工具
1.创建dataset
import numpy as np
X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y = np.array([0, 0, 0, 1, 1, 1])
2. 模型拟合
from sklearn.linear_model import LogisticRegression
lr_model = LogisticRegression()
lr_model.fit(X, y)
3. 做预测
y_pred = lr_model.predict(X)
print("Prediction on training set:", y_pred)
4. 计算精确度
print("Accuracy on training set:", lr_model.score(X, y))
步骤七:解决过拟合问题
我们需要对模型进行Regulation,此时的代价函数为:
$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 $$
和原本的代价函数相比,此时多了$$\frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2$$,由此参数w的值就变小了。
在应用线性回归时:
def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
m = X.shape[0]
n = len(w)
cost = 0.
for i in range(m):
f_wb_i = np.dot(X[i], w) + b
cost = cost + (f_wb_i - y[i])**2
cost = cost / (2 * m)
reg_cost = 0
for j in range(n):
reg_cost += (w[j]**2)
reg_cost = (lambda_/(2*m)) * reg_cost
total_cost = cost + reg_cost
return total_cost
在应用logistic回归时:
def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):
m,n = X.shape
cost = 0.
for i in range(m):
z_i = np.dot(X[i], w) + b
f_wb_i = sigmoid(z_i)
cost += -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
cost = cost/m
reg_cost = 0
for j in range(n):
reg_cost += (w[j]**2)
reg_cost = (lambda_/(2*m)) * reg_cost
total_cost = cost + reg_cost
return total_cost
同理求梯度下降即可。