Logistic Regression 逻辑回归 + Newton's Method

最新推荐文章于 2021-01-27 15:53:57 发布

HaronW

最新推荐文章于 2021-01-27 15:53:57 发布

阅读量230

点赞数

分类专栏： Machine Learning

本文链接：https://blog.csdn.net/weixin_45286813/article/details/105519433

版权

Machine Learning 专栏收录该内容

15 篇文章 0 订阅

订阅专栏

Logistic Regression 逻辑回归

内容来自于CS229听课笔记以及百度百科，blog补充
Logistic regression is a classification model, but you can use it to solve regression problems if you want to.
WARNING: do not use linear regression to solve claasification problems.

Logistic regression

sigmoid function: $\frac{1}{1+e^{-x}}$
define $h_{\theta}(x) = g(\theta^TX) = \frac{1}{1+e^{-\theta^TX}},\ P(y=1|x;\theta) = h_{\theta}(x) ,\ P(y=0|x;\theta) = 1-h_{\theta}(x)$
combine these two equations : $P(y|x;\theta) = (h_{\theta}(x))^y(1-h_{\theta}(x))^{1-y}$
use maximum likelihood estimation(MLE):
likelihood $L(\theta)=P(\vec{y}|x;\theta)=\prod\limits_{i=1}^m(h_{\theta}(x^{(i)}))^{y^{(i)}}(1-h_{\theta}(x^{(i)}))^{1-y^{(i)}}$
we like sum rather than product $l(\theta)=\sum\limits_{i=1}^m[y^{(i)}log(h_{\theta}(x^{(i)}))+(1-y^{(i)})log(1-h_{\theta}(x^{(i)}))]$
This function is also called cross entropy function for binary classification. We will talk about it later.
use gradient descent to find the optima:
$\theta_j := \theta_j + \alpha\frac{\partial}{\partial\theta_j}l(\theta) \\ \theta_j := \theta_j + \alpha\sum\limits_{i=1}^m(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}$
$l(\theta)$ is a convex function and it doesn’t have local optima.

Softmax Regression

You can regard softmax regression as multiclass logistic regression.
define $K$ – #class
$y$ – label, $y$ is a one hot vector
$\theta$ – parameter, $\theta = \begin{bmatrix} -\theta_1^T- \\ -\theta_2^T- \\...\\ -\theta_K^T- \end{bmatrix}$ , $\theta_k$ is the parameter of $k^{th}$ class
$h_\theta(x)$ – hypothesis, $h_\theta(x)= \begin{bmatrix} P(y^{(1)}=1|x^{(i)};\theta) \\ ... \\ P(y^{(K)}=1|x^{(i)};\theta) \end{bmatrix} = \frac{1}{\sum\limits_{j=1}^Kexp(\theta_j^TX^{(i)})}\begin{bmatrix} exp(\theta_1^TX^{(i)}) \\ ... \\ exp(\theta_K^TX^{(i)})\end{bmatrix}$
Each value of dimension in hypothesis represents the probability of corresponding dimension.
Cross entropy error function:
$\sum\limits_i1\{a\}lnp_i$
$1\{a\}$ is indicate function, if a is true $1\{a\} = 1$ , else $1\{a\} = 0$
log or ln are both correct
$J(\theta) = -\frac1m[\sum\limits_{i=1}^m\sum\limits_{j=1}^K1\{y^{(i)}_j=1\}ln\frac{exp(\theta^T_jX^{(i)})}{\sum\limits_{j=1}^Kexp(\theta_j^TX^{(i)})}]$
usually we would add a weight decay term in case of over parameterization(any parameter is too large):
$J(\theta) = -\frac1m[\sum\limits_{i=1}^m\sum\limits_{j=1}^K1\{y^{(i)}_j=1\}ln\frac{exp(\theta^T_jX^{(i)})}{\sum\limits_{j=1}^Kexp(\theta_j^TX^{(i)})}]+\frac\lambda2\sum\limits_{i=1}^K\sum\limits_{j=1}^n\theta_{ij}^2$
use gradient descent to solve:
$\theta_j := \theta_j -\alpha\nabla_{\theta_j}J(\theta)$

$\theta_j := \theta_j +\alpha \frac1m\sum\limits_{i=1}^m[x^{(i)}(1\{y^{(i)}_j=1\}- P(y^{(j)}=1|x^{(i)};\theta))]+\lambda\theta_j$

Softmax VS Logistic(one vs all)

If the examples are mutually exclusive, use Softmax Regression(it’s faster).
If the examples are not mutually exclusive, use Logistic Regression with one versus all strategy.

Newton’s Method

$l^{'} = f$
$\theta^{(t+1)} := \theta^{(t)} - \frac{f(\theta^{(i)})}{f'(\theta^{(t)})}$
$\theta^{(t+1)} := \theta^{(t)} - \frac{l'(\theta^{(t)})}{l''(\theta^{(t)})}$
$\theta^{(t+1)} := \theta^{(t)} - H^{-1}\nabla_\theta l$
$H$ is Hessian matrix.
Use Newton’s method when the number of parameters is small.

Common loss functions

Classification Error $J(\theta) = \frac{error \ items}{all \ items}$

Mean Squared Error(MSE) $J(\theta)=\frac1n\sum\limits_i^n(\hat{y}_i-y_i)^2$

Cross Entropy Error Function $\frac1N\sum\limits_i1\{a\}lnp_i$

HaronW

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Logistic Regression 逻辑回归 + Newton's Method

Logistic Regression 逻辑回归内容来自于CS229听课笔记以及百度百科，blog补充Logistic regression is a classification model, but you can use it to solve regression problems if you want to.WARNING: do not use linear regressio...
复制链接

扫一扫