机器学习(2)-分类问题_Classification and Representation

最新推荐文章于 2022-01-27 21:23:35 发布

ShiKiScarlet

最新推荐文章于 2022-01-27 21:23:35 发布

阅读量669

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/zp753951zpzp/article/details/71908561

版权

机器学习专栏收录该内容

5 篇文章 0 订阅

订阅专栏

分类问题

1、分类问题简介

（1）例子

诸如
-垃圾邮件分类
-在线交易是否是诈骗
-肿瘤良性恶性判断

y∈{0,1}
其中
1 叫做正类用+表述
0 叫做负类用-表述

（2）不能使用线性回归算法

若运用线性回归算法，需要：
1、拟合数据
2、找到 h(x) = 0.5 的分界点
缺陷：
1、对于不规整的数据会出现极大的误差
2、线性回归h(x)会出现0~1之外的情况，但是逻辑回归h(x)的取值只会在0~1之间

2、逻辑回归模型_Logistic Regression Model

（1）假设函数

h θ (x) = g (θ T x) g (z) = 1 1 + e - z

$h_\theta(x) = g(\theta^Tx) \\ g(z)= \frac{1}{1+e^{-z}}$

其中，
g(z)称之为逻辑函数或者sigmoid（S形）函数
化简得：

h θ (x) = 1 1 + e - θ T x

$h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}$

假设函数含义：
$h_\theta(x)$ 表示在x和Θ的情况下得出1的概率

（2）决策边界

假设函数

h θ (x) = g (z) g (z) = 1 1 + e - z z 为 我 们 设 定 的 决 策 边 界 方 程

$h_\theta(x) = g(z) \\ g(z)= \frac{1}{1+e^{-z}} \\ z 为我们设定的决策边界方程$
为了得到离散0或1分类，可以将假设函数的输出转换如下：

h θ (x) \geq 0.5 \to y = 1 h θ (x) < 0.5 \to y = 0

$\begin{align*}& h_\theta(x) \geq 0.5 \rightarrow y = 1 \newline& h_\theta(x) < 0.5 \rightarrow y = 0 \newline\end{align*}$
逻辑函数g的行为方式是当其输入大于或等于零时，其输出大于或等于0.5

g (z) \geq 0.5 w h e n z \geq 0

$\begin{align*}& g(z) \geq 0.5 \newline& when \; z \geq 0\end{align*}$
如果我们设定的z方程为

θTX $\theta^T X$ 这就意味着

θ T x \geq 0 \Rightarrow y = 1 θ T x < 0 \Rightarrow y = 0

$\begin{align*}& \theta^T x \geq 0 \Rightarrow y = 1 \newline& \theta^T x < 0 \Rightarrow y = 0 \newline\end{align*}$
绘制出图形z方程就是决策边界

（3）逻辑回归的代价函数

我们不能使用与线性回归相同的成本函数，因为逻辑函数会导致输出为波浪形，导致许多局部最优。换句话说，它不会是一个凸函数。

相反，我们用于逻辑回归的成本函数如下所示：

J (θ) = 1 m \sum i = 1 m C o s t (h θ (x (i)), y (i)) C o s t (h θ (x), y) = - log (h θ (x)) C o s t (h θ (x), y) = - log (1 - h θ (x)) if y = 1 if y = 0

$\begin{align*}& J(\theta) = \dfrac{1}{m} \sum_{i=1}^m \mathrm{Cost}(h_\theta(x^{(i)}),y^{(i)}) \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(h_\theta(x)) \; & \text{if y = 1} \newline & \mathrm{Cost}(h_\theta(x),y) = -\log(1-h_\theta(x)) \; & \text{if y = 0}\end{align*}$

（4）逻辑回归的梯度下降函数

1、化简代价函数

C o s t (h θ (x), y) = - y log (h θ (x)) - (1 - y) log (1 - h θ (x))

$\mathrm{Cost}(h_\theta(x),y) = - y \; \log(h_\theta(x)) - (1 - y) \log(1 - h_\theta(x))$
带入得：

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))]

$J(\theta) = - \frac{1}{m} \displaystyle \sum_{i=1}^m [y^{(i)}\log (h_\theta (x^{(i)})) + (1 - y^{(i)})\log (1 - h_\theta(x^{(i)}))]$
向量化后为

h = g (X θ) J (θ) = 1 m \cdot (- y T log (h) - (1 - y) T log (1 - h))

$\begin{align*} & h = g(X\theta)\newline & J(\theta) = \frac{1}{m} \cdot \left(-y^{T}\log(h)-(1-y)^{T}\log(1-h)\right) \end{align*}$

2、实现梯度下降

梯度下降的一般形式

R e p e a t {θ j : = θ j - α \partial \partial θ j J (θ)}

$\begin{align*}& Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \alpha \dfrac{\partial}{\partial \theta_j}J(\theta) \newline & \rbrace\end{align*}$

使用微积分带入化简

R e p e a t {θ j : = θ j - α m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j}

$\begin{align*} & Repeat \; \lbrace \newline & \; \theta_j := \theta_j - \frac{\alpha}{m} \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)}) x_j^{(i)} \newline & \rbrace \end{align*}$
请注意，该算法与我们在线性回归中使用的算法相同。我们还必须同时更新theta中的所有值。

向量化后

θ : = θ - α m X T (g (X θ) - y ⃗)

$\theta := \theta - \frac{\alpha}{m} X^{T} (g(X \theta ) - \vec{y})$

（5）高级算法（代替梯度下降）

1、其他可以求回归问题的算法

给出以下的实现
-求出J(θ)
-求出J(θ)的偏导数

J (θ) \partial \partial θ j J (θ)

$\begin{align*} & J(\theta) \newline & \dfrac{\partial}{\partial \theta_j}J(\theta)\end{align*}$
优化算法
-Gradient descent（梯度下降法）
-Conjugate gredient（共轭梯度法）
-BFGS（变尺度法）
-L-BFGS（限制变尺度法）
高级算法有着优点和缺点，例如，
-优点是1，不需要选择学习度α；以及2，比梯度下降更快
-而确实就是相对更复杂

在octive中使用

例子
有代价函数 $J(\theta)=(\theta_1-5)^2 + (\theta_2-5)^2$
明显要得到两个5，
实现
可以编写一个返回这两个返回值的函数：

function [jVal, gradient] = costFuntion(theta)

    jVal = (theta(1)-5)^2 + (theta(2)-5)^2;
    gradient = zeros(2,1);
    gradient(1) = 2*(theta(1)-5);
    gradient(2) = 2*(theta(2)-5);

调用

%  配置选项
%  GradObj：设置梯度目标参数打开，你写的函数要提供一个梯度的值 
%  MaxIter：设置最大的迭代次数
options = optimset('GradObj','on', 'MaxIter', '100');
initTheta = zersos(2,1); %初始化theta的值
%  参数说明
%  @costFunction函数指针
%  initTheta初始化值
%  options配置选项

%  返回值说明
%  optTheta：theta最优解
%  functionVal：此时代价函数的解
%  exitFlag：1代表结果已经收敛
[optTheta, functionVal, exitFlag] = fminunc(@costFunction,initTheta,options);

其中initTheta必须是2维以上的向量

（6）多分类问题

当有两个以上的类别时，将接近数据分类。而不是y = {0,1}，我们将扩展定义，使得y = {0,1 … n}。

由于y = {0,1 … n}，我们将问题划分为n + 1（+1，因为索引从0开始）二分类问题;在每个类中，我们预测“y”是我们其中一个类的成员的概率。

y \in {0, 1 . . . n} h (0) θ (x) = P (y = 0 | x; θ) h (1) θ (x) = P (y = 1 | x; θ) \dots h (n) θ (x) = P (y = n | x; θ) p r e d i c t i o n = max i (h (i) θ (x))

$\begin{align*}& y \in \lbrace0, 1 ... n\rbrace \newline& h_\theta^{(0)}(x) = P(y = 0 | x ; \theta) \newline& h_\theta^{(1)}(x) = P(y = 1 | x ; \theta) \newline& \cdots \newline& h_\theta^{(n)}(x) = P(y = n | x ; \theta) \newline& \mathrm{prediction} = \max_i( h_\theta ^{(i)}(x) )\newline\end{align*}$
即
将多分类转化为2分类问题，
对于h(0)θ，将y=0映射为正类，其他映射为负类。运行二分类问题的算法得到的假设函数即为h(0)θ
对于新输入，max(maxi(h(i)θ(x)))时i的值就是该输入的分类结果

3、过度拟合问题

1、拟合的一些概念

-欠拟合、高偏差：模型不能很好的拟合数据
-过度拟合、高方差：训练的假设函数参数过多，如果没有足够的训练数据，来约束参数，输出参数总能很好的拟合数据，代价函数非常接近于0，输出的模型不是泛化的，不能正确的对新数据产生正确的输出
-恰好拟合：可以很好拟合数据，输出模型是泛化的可以很好的进行预测

2、过度拟合解决方法

选择尽量少的特征变量数（舍弃了一些信息）
- 人工选择特征
- 模型选择算法
正规化
- 保留所有特征，减少参数θj大小
- 当我们有很多有用的特征时，正则化效果很好。

3、正规化与代价函数

如果我们从假设函数中过度拟合，我们可以通过增加代价函数的项，来减少我们函数中的一些参数的权重

改变我们的代价函数为：

m i n θ 1 2 m \sum i = 1 m (h θ (x (i)) - y (i)) 2 + λ \sum j = 1 n θ 2 j

$min_\theta\ \dfrac{1}{2m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})^2 + \lambda\ \sum_{j=1}^n \theta_j^2$
其中λ是正则化参数，它决定了我们的θ参数的成本是多少。

使用上述成本函数与额外的求和，我们可以平滑我们的假设函数的输出，以减少过拟合。如果选择λ太大，可能会使过于平滑，导致欠拟合。如果λ= 0，或者太小，将可能发生过度拟合

4、正规化线性回归

（1）梯度下降法
我们将修改梯度下降函数，将θ0与其余参数分开，因为我们不想惩罚θ0。

Repeat {θ 0 : = θ 0 - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) 0 θ j : = θ j - α [(1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j) + λ m θ j]} j \in {1, 2... n}

$\begin{align*} & \text{Repeat}\ \lbrace \newline & \theta_0 := \theta_0 - \alpha\ \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_0^{(i)} \newline & \theta_j := \theta_j - \alpha\ \left[ \left( \frac{1}{m}\ \sum_{i=1}^m (h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)} \right) + \frac{\lambda}{m}\theta_j \right] & \,\,\,\,\, j \in \lbrace 1,2...n\rbrace\newline & \rbrace \end{align*}$
变形后：

θ j : = θ j (1 - α λ m) - α 1 m \sum i = 1 m (h θ (x (i)) - y (i)) x (i) j

$\theta_j := \theta_j(1 - \alpha\frac{\lambda}{m}) - \alpha\frac{1}{m}\sum_{i=1}^m(h_\theta(x^{(i)}) - y^{(i)})x_j^{(i)}$
上述方程式的第一项，

1−αλm $1 - \alpha\frac{\lambda}{m}$ 将始终小于1。直观地，您可以看到它在每次更新时将θj的值减少一些。
另外，第二项现在与以前完全相同。

（2）正规方程法

θ = (X T X + λ \cdot L) - 1 X T y where L = ⎡ ⎣ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ ⎢ 011 ⋱ 1 ⎤ ⎦ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥ ⎥

$\begin{align*}& \theta = \left( X^TX + \lambda \cdot L \right)^{-1} X^Ty \newline& \text{where}\ \ L = \begin{bmatrix} 0 & & & & \newline & 1 & & & \newline & & 1 & & \newline & & & \ddots & \newline & & & & 1 \newline\end{bmatrix}\end{align*}$
正规化不仅可以避免过渡拟合，还可以避免出现不可逆的情况。
回想一下，如果m小于n，那么

XTX $X^TX$ 将不可逆。但是现在加上了λ⋅L，

XTX+λL $X^TX + \lambda L$ 变得可逆。

5、正规化逻辑回归

我们可以使用与正规化线性回归相似的方法对逻辑回归正规化，来避免过渡拟合。

（1）代价函数
回想一下，我们的逻辑回归的成本函数是：

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))]

$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)})) \large]$
现在可以通过在末尾添加一项来正规化这个方程：

J (θ) = - 1 m \sum i = 1 m [y (i) log (h θ (x (i))) + (1 - y (i)) log (1 - h θ (x (i)))] + λ 2 m \sum j = 1 n θ 2 j

$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$
第二个求和项，