监督学习Classification

最新推荐文章于 2024-09-04 21:35:53 发布

为什么是这样？

最新推荐文章于 2024-09-04 21:35:53 发布

阅读量280

点赞数

文章标签：学习机器学习算法

本文链接：https://blog.csdn.net/m0_70414688/article/details/126083382

版权

Coursera吴恩达机器学习专项课，C1W3笔记

监督学习通常有两种方式，一种是使用回归，另一种则是分类。相比能够输出不同预测值的回归，分类输出的是类型，因此仅输出0，1这种代表类别的数值。也正因如此，我们会使用logistic回归，而非此前使用的回归方程去判断类型。但与此前相同的部分，则是使用分类的方法时，依旧要计算代价函数，并且能够用梯度下降的方式求得最优解。

分类整体步骤：

步骤一：理解什么是分类

步骤二：使用Sigmoid function和Logistic regression

步骤三：确定Decision boundary

步骤四：计算Logistic loss

步骤五：计算代价函数

步骤六：使用梯度下降算法

步骤七：解决过拟合问题

步骤一：理解什么是分类

例子：

x_train1 = np.array([0., 1, 2, 3, 4, 5])
y_train1 = np.array([0,  0, 0, 1, 1, 1])
X_train2 = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train2 = np.array([0, 0, 0, 1, 1, 1])

此处train1是只有一个feature的分类；train2则有两个feature，所以X_train2中单个的x值是两个数值组成的列表。

两个训练集都将数据进行了二分类，输出分别是pos = 1, neg = 0。

步骤二：使用Sigmoid function和Logistic regression

① 认识sigmoid函数

$$g(z) = \frac{1}{1+e^{-z}}$$

这里的z是linear regression model的输出结果，而 $e^{z}$可以使用NumPy的函数 [`exp()`]进行计算。

import numpy as np

# 输入数组
input_array = np.array([1,2,3])
exp_array = np.exp(input_array)

print("Input to exp:", input_array)
print("Output of exp:", exp_array)

# 输入单个数值
input_val = 1  
exp_val = np.exp(input_val)

print("Input to exp:", input_val)
print("Output of exp:", exp_val)

因此，sigmoid用函数表示为：

def sigmoid(z):

    g = 1/(1+np.exp(-z))
   
    return g

② 使用Logistic Regression

Logistic回归，就是将sigmoid函数以线性回归一样的方式表达

$$ f_{\mathbf{w},b}(\mathbf{x}^{(i)}) = g(\mathbf{w} \cdot \mathbf{x}^{(i)} + b ) $$

其中：

$$g(z) = \frac{1}{1+e^{-z}}$$

步骤三：确定Decision boundary

在Logistic回归的基础上，如何设置决策标准呢？

根据sigmoid函数的性质，所有实数都被映射到了（0,1）区间，数值的改变集中于对称轴附近，且与y轴相交于y=0.5的位置。假设以0.5作为区隔，y轴左边得数都是0，右边都是1。

f=g(z)，当g(z)≥0.5则y=1；当g(z)＜0.5则y=0

g(z)是sigmoid函数，当z≥0时，g(z)≥0.5；当z<0时，g(z)<0.5

而z=w·x+b，因此w·x+b≥0时，y=1；w·x+b<0时，y=0

步骤四：计算Logistic loss

① 理解为什么不用误差平方和

在线性回归中，Loss用误差平方和算，公式：

$$J(w,b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{w,b}(x^{(i)}) - y^{(i)})^2 $$

在线性回归中：

$$f_{w,b}(x^{(i)}) = wx^{(i)} + b $$

那么，我们直接改变f的含义，变成sigmoid就可以了吗？

$$f_{w,b}(x^{(i)}) = sigmoid(wx^{(i)} + b )$$

不可以，结果不理想，如果画图会发现这无法产生平滑的凸函数。

② 使用计算公式

\begin{equation}
loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = \begin{cases}
- \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=1$}\\
- \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) & \text{if $y^{(i)}=0$}
\end{cases}
\end{equation}

在实际使用时，则用该公式的另一个写法：

$$loss(f_{\mathbf{w},b}(\mathbf{x}^{(i)}), y^{(i)}) = (-y^{(i)} \log\left(f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right) - \left( 1 - y^{(i)}\right) \log \left( 1 - f_{\mathbf{w},b}\left( \mathbf{x}^{(i)} \right) \right)$$

步骤五：计算代价函数

loss指代每一个数据的误差，而所有的误差作为整体，则是cost

def compute_cost_logistic(X, y, w, b):
    m = X.shape[0]
    cost = 0.0
    for i in range(m):
        z_i = np.dot(X[i],w) + b
        f_wb_i = sigmoid(z_i)
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)
             
    cost = cost / m
    return cost

简而言之，就是把刚才的loss公式放进for循环，然后求和。

步骤六：使用梯度下降算法

① 使用公式计算

复习一下梯度下降：

Recall the gradient descent algorithm utilizes the gradient calculation:
$$\begin{align*}
&\text{repeat until convergence:} \; \lbrace \\
& \; \; \;w_j = w_j - \alpha \frac{\partial J(\mathbf{w},b)}{\partial w_j} \tag{1} \; & \text{for j := 0..n-1} \\
& \; \; \; \; \;b = b - \alpha \frac{\partial J(\mathbf{w},b)}{\partial b} \\
&\rbrace
\end{align*}$$

求一下偏导数可得：

$$\begin{align*}
\frac{\partial J(\mathbf{w},b)}{\partial w_j} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})x_{j}^{(i)} \tag{2} \\
\frac{\partial J(\mathbf{w},b)}{\partial b} &= \frac{1}{m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)}) \tag{3}
\end{align*}$$

此处使用data set举例：

X_train = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y_train = np.array([0, 0, 0, 1, 1, 1])

def compute_gradient_logistic(X, y, w, b): 
    
    m,n = X.shape
    dj_dw = np.zeros((n,)) 
    dj_db = 0.

    for i in range(m):
        f_wb_i = sigmoid(np.dot(X[i],w) + b) 
        err_i  = f_wb_i  - y[i] 
        for j in range(n):
            dj_dw[j] = dj_dw[j] + err_i * X[i,j]
        dj_db = dj_db + err_i
    dj_dw = dj_dw/m     
    dj_db = dj_db/m  
        
    return dj_db, dj_dw

和线性回归相比，f_wb_i这里变了

def gradient_descent(X, y, w_in, b_in, alpha, num_iters): 
    
    # An array to store cost J and w's at each iteration primarily for graphing later
    J_history = []
    w = copy.deepcopy(w_in)  #avoid modifying global w within function
    b = b_in
    
    for i in range(num_iters):
        # Calculate the gradient and update the parameters
        dj_db, dj_dw = compute_gradient_logistic(X, y, w, b)   

        # Update Parameters using w, b, alpha and gradient
        w = w - alpha * dj_dw               
        b = b - alpha * dj_db               
      
        # Save cost J at each iteration
        if i<100000:      # prevent resource exhaustion 
            J_history.append( compute_cost_logistic(X, y, w, b) )

        # Print cost every at intervals 10 times or as many iterations if < 10
        if i% math.ceil(num_iters / 10) == 0:
            print(f"Iteration {i:4d}: Cost {J_history[-1]}   ")
        
    return w, b, J_history         #return final w,b and J history for graphing

② 使用scikit-learn工具

1.创建dataset

import numpy as np

X = np.array([[0.5, 1.5], [1,1], [1.5, 0.5], [3, 0.5], [2, 2], [1, 2.5]])
y = np.array([0, 0, 0, 1, 1, 1])

2. 模型拟合

from sklearn.linear_model import LogisticRegression

lr_model = LogisticRegression()
lr_model.fit(X, y)

3. 做预测

y_pred = lr_model.predict(X)

print("Prediction on training set:", y_pred)

4. 计算精确度

print("Accuracy on training set:", lr_model.score(X, y))

步骤七：解决过拟合问题

我们需要对模型进行Regulation，此时的代价函数为：

$$J(\mathbf{w},b) = \frac{1}{2m} \sum\limits_{i = 0}^{m-1} (f_{\mathbf{w},b}(\mathbf{x}^{(i)}) - y^{(i)})^2 + \frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2 $$

和原本的代价函数相比，此时多了$$\frac{\lambda}{2m} \sum_{j=0}^{n-1} w_j^2$$，由此参数w的值就变小了。

在应用线性回归时：

def compute_cost_linear_reg(X, y, w, b, lambda_ = 1):
 
    m  = X.shape[0]
    n  = len(w)
    cost = 0.
    for i in range(m):
        f_wb_i = np.dot(X[i], w) + b                                   
        cost = cost + (f_wb_i - y[i])**2                                            
    cost = cost / (2 * m)                                                
 
    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          
    reg_cost = (lambda_/(2*m)) * reg_cost                              
    
    total_cost = cost + reg_cost                                       
    return total_cost

在应用logistic回归时：

def compute_cost_logistic_reg(X, y, w, b, lambda_ = 1):

    m,n  = X.shape
    cost = 0.
    for i in range(m):
        z_i = np.dot(X[i], w) + b                                      
        f_wb_i = sigmoid(z_i)                                          
        cost +=  -y[i]*np.log(f_wb_i) - (1-y[i])*np.log(1-f_wb_i)      
             
    cost = cost/m                                                      

    reg_cost = 0
    for j in range(n):
        reg_cost += (w[j]**2)                                          
    reg_cost = (lambda_/(2*m)) * reg_cost                              
    
    total_cost = cost + reg_cost                                       
    return total_cost

同理求梯度下降即可。