Cost function of Logistic Regression and Neural Network

最新推荐文章于 2020-03-12 20:24:52 发布

jiongjiongai

最新推荐文章于 2020-03-12 20:24:52 发布

阅读量277

点赞数

分类专栏：机器学习

本文链接：https://blog.csdn.net/phoenix198425/article/details/79534073

版权

机器学习专栏收录该内容

28 篇文章 0 订阅

订阅专栏

Logistic / Sigmoid function

$g(x) = \dfrac {1} {1 + e ^{-x}} = \dfrac {e ^{x}} {1 + e ^{x}}$

Cost function

Logistic Regression

$h _{\theta}\left (X\right ) = f\left (X ^{\intercal} \theta\right ) = P\left (y = 1 | X; \theta\right )$
令 $z = X ^{\intercal} \theta,$ 则
$\ln P\left (y = \mathrm {y} | X; \theta\right )$
$= \mathrm {y} \ln P\left (y = 1 | X; \theta\right ) + \left (1 - \mathrm {y} \right ) \ln P\left (y = 0 | X; \theta\right )$
$= \mathrm {y} \ln h _{\theta}\left (X\right ) + \left (1 - \mathrm {y} \right ) \ln \left [1 - h _{\theta}\left (X\right ) \right ]$
$= \mathrm {y} \ln g(z) + \left (1 - \mathrm {y} \right ) \ln \left [1 - g(z) \right ]$
因此 $\operatorname{d} \ln P\left (y = \mathrm {y} | X; \theta\right ) = \mathrm {y} \operatorname{d} \ln g(z) + \left (1 - \mathrm {y} \right ) \operatorname{d} \ln \left [1 - g(z) \right ]$
$= \mathrm {y} \cdot \dfrac {1} {g(z) } g(z) \left [ 1 - g(z) \right ] \operatorname{d} z + \left (1 - \mathrm {y} \right ) \dfrac {1} {1 - g(z) } (- 1) g(z) \left [ 1 - g(z) \right ] \operatorname{d} z$
$= \left \{ \mathrm {y} \cdot \left [ 1 - g(z) \right ] - \left (1 - \mathrm {y} \right ) g(z) \right \} \operatorname{d} z$
$= \left [ \mathrm {y} - g(z) \right ] \operatorname{d} z$
$= \left [ \mathrm {y} - g( X ^{\intercal} \theta) \right ] X ^{\intercal} \operatorname{d} \theta$
最大似然函数 $\operatorname{L}\left ({\theta}\right ) = \ln \left [ \prod \limits_{i = 1} ^{m} P\left (y = y_i | X_i; \theta\right ) \right ] = \sum \limits_{i = 1} ^{m} \ln P\left (y = y_i | X_i; \theta\right )$
令 $\operatorname {cost} (\theta) = - \dfrac {1} {m} \operatorname{L}\left ({\theta}\right ) = - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \ln P\left (y = y_i | X_i; \theta\right )$
$= - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \left \{ y_i \ln h _{\theta}\left (X_i \right ) + \left (1 - y_i \right ) \ln \left [1 - h _{\theta}\left (X_i \right ) \right ] \right \}$
$= - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \left \{ y_i \ln g(z_i) + \left (1 - y_i \right ) \ln \left [1 - g(z_i) \right ] \right \}$ ，其中 $z_i = X_i ^{\intercal} \theta$
则 $\max \operatorname{L}\left ({\theta}\right ) = - m \min \operatorname {cost} \left ({\theta}\right )$
$\operatorname {cost} (\theta)$ 即为代价函数。
令 $g \left ({\theta}\right ) = - \operatorname{L}\left ({\theta}\right )$
则 $\operatorname{d} \left [ g \left ({\theta}\right ) \right ]= - \sum \limits_{i = 1} ^{m} \left [ y_i - g( X _i ^{\intercal} \theta) \right ] X _i ^{\intercal} \operatorname{d} \theta$
$= \sum \limits_{i = 1} ^{m} \left [ g \left (X_i ^{\intercal} \theta \right ) - y_i \right ] X_i ^{\intercal} \operatorname{d} \theta$
因此 $\nabla\left [ g \left ({\theta}\right ) \right ] = \sum \limits_{i = 1} ^{m} \left [ g \left (X_i ^{\intercal} \theta \right ) - y_i \right ] X_i$
$= \mathbf {X} ^{\intercal} \left [ g \left (\mathbf {X} ^{\intercal} \theta \right ) - \mathrm {y} \right ]$
其中 $\mathbf {X} = \begin{pmatrix} X_1^{\intercal} \\ \vdots \\ X_m ^{\intercal} \end{pmatrix} , \mathrm {y} = \begin{pmatrix} \mathrm {y}_1^{\intercal} \\ \vdots \\ \mathrm {y}_m ^{\intercal} \end{pmatrix} , g \left (\mathbf {X} ^{\intercal} \theta \right ) = \begin{pmatrix} g \left (X_1 ^{\intercal} \theta \right ) \\ \vdots \\ g \left (X_m ^{\intercal} \theta \right ) \end{pmatrix},$
则 $\operatorname{d} \left \{ \nabla\left [ g \left ({\theta}\right ) \right ] \right \} = \sum \limits_{i = 1} ^{m} \operatorname{d} \left [ g \left (X_i ^{\intercal} \theta \right ) \right ] X_i$
$= \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) \left (X_i ^{\intercal} \operatorname{d}{\theta} \right ) X_i$
$= \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) X_i X_i ^{\intercal} \operatorname{d}{\theta}$
因此 $\operatorname{H}_{g(\theta)} = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) X_i X_i ^{\intercal}$

注

$\dfrac {\partial } {\partial {\theta}_{j}} g \left ({\theta}\right )= \sum \limits_{i = 1} ^{m} \left [ g \left (X_i ^{\intercal} \theta \right ) - y_i \right ] x_{ij}, j \in \mathbb N, 1 \le j \le n$

Regularized Logistic Regression

$\operatorname {cost} (\theta) = - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \left \{ y_i \ln h _{\theta}\left (X_i \right ) + \left (1 - y_i \right ) \ln \left [1 - h _{\theta}\left (X_i \right ) \right ] \right \} + \dfrac {\lambda} {2 n} \sum \limits_{j = 1} ^{n} \theta _j ^2$
则
$\operatorname{H}_{\operatorname {cost} (\theta)} = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) X_i X_i ^{\intercal} + \dfrac {\lambda} {2 n} \begin{pmatrix} 0 & & & \\ & 1 & & \\ & & \ddots & \\ & & & 1 \end{pmatrix}$

性质

$\operatorname{H}_{\operatorname {cost} (\theta)}$ 为正定矩阵。

证明

$\forall Z = \begin{pmatrix} z_0 \\ \vdots \\ z_n \end{pmatrix} \in \mathbb R^{n + 1},$
$Z ^{\intercal} \operatorname{H}_{\operatorname {cost} (\theta)} Z = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) Z ^{\intercal} X_i X_i ^{\intercal} Z + \dfrac {\lambda} {2 n} \sum \limits_{j = 1} ^{n} z _j ^2$
$= \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) { \left ( X_i ^{\intercal} Z \right ) } ^2 + \dfrac {\lambda} {2 n} \sum \limits_{j = 1} ^{n} z _j ^2 \ge 0$
若 $Z ^{\intercal} \operatorname{H}_{\operatorname {cost} (\theta)} Z = 0,$ 则 $\forall j \in \mathbb N, 1 \le j \le n, z_j = 0,$
于是 $Z ^{\intercal} \operatorname{H}_{\operatorname {cost} (\theta)} Z = \sum \limits_{i = 1} ^{m} g '\left (X_i ^{\intercal} \theta \right ) { z_0 } ^2 = 0 \Rightarrow z_0 = 0$
于是 $Z = 0$
因此 $\operatorname{H}_{\operatorname {cost} (\theta)}$ 为正定矩阵。

Neural Network for Classification

$\operatorname {cost} (\mathbf {\theta}) = - \dfrac {1} {m} \sum \limits_{i = 1} ^{m} \sum \limits_{k = 1} ^{K} \left \{ y_{ik} \left (\ln h _{\mathbf {\theta}} \left (X_i \right ) \right )_k + \left (1 - y_{ik} \right ) \left (\ln \left [1 - h _{\mathbf {\theta}}\left (X_i \right ) \right ] \right )_k \right \}$
$+ \dfrac {\lambda} {2m} \sum \limits_{l = 1} ^{L - 1} \sum \limits_{i = 1} ^{s_{l + 1}} \sum \limits_{j = 1} ^{s_l} {\theta} _{lij} ^2$

jiongjiongai

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Cost function of Logistic Regression and Neural Network

Logistic / Sigmoid functiong(x)=11+e−x=ex1+exg(x)=11+e−x=ex1+exg(x) = \dfrac {1} {1 + e ^{-x}} = \dfrac {e ^{x}} {1 + e ^{x}}Cost functionLogistic Regressionhθ(X)=f(X⊺θ)=P(y=1|X;θ)hθ(X)=f(X⊺...
复制链接

扫一扫