Coursera-Machine Learning 之 Logistic Regression (逻辑回归)-0x02

最新推荐文章于 2024-07-20 23:37:53 发布

c1rew

最新推荐文章于 2024-07-20 23:37:53 发布

阅读量422

点赞数

分类专栏： Machine Learning 文章标签： logistic 逻辑回归

本文链接：https://blog.csdn.net/kalenzh/article/details/43817321

版权

Machine Learning 专栏收录该内容

4 篇文章 0 订阅

订阅专栏

Cost Function

Training set(训练集):
$\{(x^{(1)}, (y^{(1)}), (x^{(2)}, (y^{(2)}), ... ,(x^{(m)}, (y^{(m)})\}$

m 个训练样本

x \in ⎡ ⎣ ⎢ ⎢ ⎢ x 0 x 1 . . . x n ⎤ ⎦ ⎥ ⎥ ⎥ x 0 = 1, y \in {0, 1}

$x \in \begin{bmatrix} x_0\\ x_1\\ ...\\ x_n\\ \end{bmatrix} \space x_0 = 1, y \in \{0,1\}$

hθ(x)=11+e−θTx $h_\theta(x) = \frac{1}{1+e^{-\theta^Tx}}$

如何选择拟合参数 $\theta$ ?

代价函数

线性回归：
$J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}\frac{1}{2}(h_\theta(x^{(i)}) - y^{(i)})^2$

$Cost(h_\theta(x^{(i)}) , y^{(i)}) = \frac{1}{2}(h_\theta(x^{(i)}) - y^{(i)})^2$

Logistic 回归：

C o s t (h θ (x (i)), y (i)) = {- l o g (h θ (x)) - l o g (1 - h θ (x)) if y = 1 if y = 0

$Cost(h_\theta(x^{(i)}) , y^{(i)}) = \begin{cases} -log(h_\theta(x)) & \text{if}\space y = 1\\ -log(1 - h_\theta(x)) & \text{if}\space y = 0 \end{cases}$
Note:

y=0 or 1 always $y = 0 \ \text{or}\ 1\ \text{always}$

结合函数图像比较好理解。

Simplified cost function and gradient descent

$Cost(h_\theta(x) , y) = -y log(h_\theta(x)) - (1 - y)log(1 - h_\theta(x))$

$J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}Cost(h_\theta(x^{(i)}), y^{(i)}) = -\frac{1}{m}[\sum\limits_{i = 1}^{m} y^{(i)}log\space h_\theta(x^{(i)}) + (1 - y^{(i)})log(1 - h_\theta(x^{(i)})]$

拟合参数 $\theta$ :

$\min\limits_{\theta}J(\theta)$

针对一个新的 $x$ 预测输出值：

Output $h_\theta(x) = \frac{1}{1+e^{-\theta^T x}}$

want $\min\limits_{\theta}J(\theta)$ :

Repeat {
$\theta_j := \theta_j -\alpha\frac{\partial}{\partial\theta_j}J(\theta)$
}

$\frac{\partial}{\partial\theta_j}J(\theta) = \frac{1}{m}\sum\limits_{i = 1}^{m}(h_\theta(x^{(i)}), y^{(i)})x_j^{(i)}$

Advanced Optimization(高级优化)

Optimization algorithm

Gradient descent（梯度下降）
Conjugate gradient（共轭梯度法）
BFGS（变尺度法）
L-BFGS（限制变尺度法）

后三种算法优点：
不需要手动选择学习率
一般情况下比梯度下来收敛得更快
缺点：更加复杂

Example:
$\theta = \begin{bmatrix} \theta_0\\ \theta_1\\ \end{bmatrix}$

$J(\theta) = (\theta_1 - 5)^2 + (\theta_2 - 5)^2$

$\frac{\partial}{\partial\theta_1}J(\theta) = 2(\theta_1 - 5)$

$\frac{\partial}{\partial\theta_2}J(\theta) = 2(\theta_2 - 5)$

function [jVal, gradient] = costFunction(theta)
    jVal = (theta(1) - 5)^2 + (theta(2) - 5)^2;
    gradient = zeros(2, 1);
    gradient(1) = 2*(theta(1) - 5);
    gradient(2) = 2*(theta(2) - 5);

options = optimset('GradObj', 'on', 'MaxIter', '100');
initialTeta = zeros(2,1);
[optTheta, functionVal, exitFlag] 
    = fminunc(@costFunction, initialTheta, options);