Machine Learning（吴恩达）学习笔记（二）

最新推荐文章于 2024-10-04 21:21:54 发布

GLinttsd

最新推荐文章于 2024-10-04 21:21:54 发布

阅读量97

点赞数

文章标签：机器学习逻辑回归 python

本文链接：https://blog.csdn.net/GLinttsd/article/details/107192003

版权

Machine Learning（吴恩达）学习笔记（二）

1.逻辑回归
2.代价函数
3.梯度下降
4.代码回顾

1.逻辑回归

我们之前讨论了用于回归任务的线性模型，那么对于分类任务应该如何入手呢？其中一种方法是把通过线性模型得到实值映射为0或1（二分类），这有点像单位阶跃函数：
$y=\left\{ \begin{aligned} 0 & , & z<0 \\ 0.5 & , &z=0 \\ 1 & , & z>0, \end{aligned} \right.\tag{1}$

单位阶跃函数

图1.1单位阶跃函数

但是这样的函数太过理想，我们希望能够找到一个单调可微的代替函数，这样我们在做理论分析的时候会比较方便。而对数几率函数（Logistic function）正是这样的函数：
$y=\frac{1}{1+e^{-z}} \tag{2}$
在这里插入图片描述

图1.2对数几率函数

带入线性模型 $\bm \theta^T\bm x+b$ 后可以得到

$h_\theta(\bm x)=\frac{1}{1+e^{-(\bm \theta^T\bm x+b)}} \tag{3}$

2.代价函数

在上一节中我们定义线性回归模型的代价函数为所有模型的误差的平方和，但是在逻辑回归中，这样定义会导致相应的代价函数为非凸函数，这是一个不那么好的性质，所以我们重新定义代价函数：
$J(\bm \theta)=\frac{1}{m}\sum_{i=1}^MCost(h_\theta (x^{(i)}),y^{(i)})\tag{4}$ 其中 $Cost(h_\theta x^{(i)},y^{(i)})=\left\{ \begin{aligned} -log(h_\theta(x)) &&if\space y=1 \\ -log(1-h_\theta(x)) &&if\space y=0 \\ \end{aligned} \right.\tag{5}$
通过代价函数的定义可以得到：当真实标记 $\space y \space$ 与对数几率函数输出值 $\space h_\theta(x) \space$ 相同时，代价函数为0，否则两者相差越大，代价函数值越大。

3.梯度下降

有了代价函数之后我们就能用上一节提到的梯度下降法来找到最优的解了，于是有：
$\{ \theta_j := \theta_j - \alpha \frac{1}{m}\sum_{i=1}^M(h_\theta(x^{(i)}-y^{(i)})x_j^{(i)})\\ (simultaneously\ update\ all\ \theta_j) \}\tag{5}$

4.代码回顾

假设要通过一个学生的两次测验的评分来预测ta是否能被某一大学录取。
按照前面提到过的六大步骤，我们选择logistic regression作为模型，定义sigmoid函数。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

事先准备多组数据以便训练模型，

path = 'ex2data1.txt'
data = pd.read_csv(path, header=None, names=['Exam 1', 'Exam 2', 'Admitted'])
positive = data[data['Admitted'].isin([1])]
negative = data[data['Admitted'].isin([0])]
fig, ax = plt.subplots(figsize=(12,8))
ax.scatter(positive['Exam 1'], positive['Exam 2'], s=50, c='b', marker='o', label='Admitted')
ax.scatter(negative['Exam 1'], negative['Exam 2'], s=50, c='r', marker='x', label='Not Admitted')
ax.legend()
ax.set_xlabel('Exam 1 Score')
ax.set_ylabel('Exam 2 Score')
plt.show()

可视化后的结果如下：
在这里插入图片描述

图1.3数据散点图

然后选用式（4）作为代价函数并定义代价函数。

def cost(theta, X, y):
    theta = np.mat(theta)
    X = np.mat(X)
    y = np.mat(y)
    first = np.multiply(-y, np.log(sigmoid(X * theta.T)))
    second = np.multiply((1 - y), np.log(1 - sigmoid(X * theta.T)))
    return np.sum(first - second) / (len(X))

接着用梯度下降法去极小化代价函数。

def gradient(theta, X, y):
    theta = np.mat(theta)
    X = np.mat(X)
    y = np.mat(y)

    parameters = int(theta.ravel().shape[1])
    grad = np.zeros(parameters)

    error = sigmoid(X * theta.T) - y

    for i in range(parameters):
        term = np.multiply(error, X[:, i])
        grad[i] = np.sum(term) / len(X)

    return grad

找到最优的 $\theta$ 后将其带回式（3）就能得到预测模型。最后，将原始数据代入预测模型就可以得到模型的拟合程度了。

在这个例子中同样没有预留测试集，为此，我们可以利用sklearn中的train_test_split函数对原始数据集进行分割（训练集与测试集默认为4:1）

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
    data, data['Admitted'], random_state=0, stratify=data['Admitted'])