Logistic Regression

最新推荐文章于 2022-04-14 22:20:10 发布

勤奋的Garfield

最新推荐文章于 2022-04-14 22:20:10 发布

阅读量588

点赞数

文章标签： LR SMLR Softmax

本文链接：https://blog.csdn.net/zhang_jinhe/article/details/46964799

版权

why named logistic regression? (cause losigtic function)
what is the model?
how to solve the minimal/maximal problem?

2-class problem

P (y = 1 | x, θ) = f (x) = 1 1 + e - θ T x

$P(y=1|x,\theta)=f(x)=\frac{1}{1+e^{-\theta^{T}x}}\\$ this logistic (sigmoid) makes probability lies in 0~1

P (y = 0 | x, θ) = 1 - f (x)

$P(y=0|x,\theta)=1-f(x)$
// also lies in 0~1

in all,

P (y | x, θ) = f (x) y (1 - f (x)) 1 - y

$P(y|x,\theta)=f(x)^y(1-f(x))^{1-y}\\$

假定我们已经学到了最优的 $\theta$ ，那么分类的实现是计算 $P(1|x,\theta),$ if >0.5, then $\frac{p}{1-p}>1$ ; if <0.5, then $\frac{p}{1-p}<1$

学习的目标是最大化整个样本集合成立的概率：

L (θ) = \prod i = 1 n P (y i | x i, θ)

$L(\theta)=\prod_{i=1}^{n}P(y_i|x_i,\theta)$

ℓ (θ) = l o g L (θ)

$\ell(\theta)=logL(\theta)\\$
then gradient descent could be applied to solve the

MLE $MLE$ problem.

Multi-class problem

$\color{red}{of vital importance}$
$\mathbf{0}$
give the constrain the probability of different class:

\sum i = 1 m P (y (i) = 1 | x, w) = 1

$\sum_{i=1}^{m}P(y^{(i)}=1|\mathbf{x,w})=1 \\$

m $m$ equals to the class number and

w∈Rd×m $\mathbf{w}\in \Re^{d\times m}$ , where

d $d$ is the dimension of feature vector

x $\mathbf{x}\\$ .

P (y (i) = 1 | x, w) = e x p ( w ( i ) T x ) \sum m j = 1 e x p ( w ( j ) T x ) for i \in 1, . . ., m

$P(y^{(i)}=1|\mathbf{x,w})=\frac{exp(\mathbf{w}^{(i)^T}\mathbf{x})}{\sum_{j=1}^m exp(\mathbf{w}^{(j)^T}\mathbf{x})} \quad \text{for }i\in {1,...,m} \\$
The

cost $cost$ function is

SMLR - Sparse multinomial Logistic Regression

In total $m$ classes, input vector/feature is $d$ -dimensional,
the weight vector for one of the classes need not be estimated. Without loss of generality, we thus set $\mathbf{w}^{(m)}=\mathbf{0}$ and the only parameters to be learned are the weight vectors $\mathbf{w}^{(i)}$ for $i\in {1,…,m-1}$ . For the remainder of the paper, we use $\mathbf{w}$ to denote the (d(m-1))-dimensional vector of parameters to be learned.

for ordinary softmax regression (also named as multinomial logistic regression-MLR), the probability that $x$ belongs to class $i$ is written as:

P (y (i) = 1 | x, w) = e x p ( w ( i ) T x ) \sum m j = 1 e x p ( w ( j ) T x ) for i \in 1, . . ., m

$P(y^{(i)}=1|\mathbf{x,w})=\frac{exp(\mathbf{w}^{(i)^T}\mathbf{x})}{\sum_{j=1}^m exp(\mathbf{w}^{(j)^T}\mathbf{x})} \quad \text{for }i\in {1,...,m}$

ℓ (w) = l o g \prod j = 1 n P (y j | x j, w)

$\ell(\mathbf{w})=log\prod_{j=1}^{n}P(\mathbf{y}_j|\mathbf{x}_j,\mathbf{w})$ , where n is the total number of samples.

ℓ (w) = \sum j = 1 n l o g P (y j | x j, w)

$\ell(\mathbf{w})=\sum_{j=1}^{n}logP(\mathbf{y}_j|\mathbf{x}_j,\mathbf{w})$

ℓ (w) = \sum j = 1 n \sum i = 1 m 1 {y (j) = i} l o g e x p ( w ( i ) T x j ) \sum m j = 1 e x p ( w ( j ) T x j ),

$\ell(\mathbf{w})=\sum_{j=1}^{n}\sum_{i=1}^{m}1\{y^{(j)}=i\}log\frac{exp(\mathbf{w}^{(i)^T}\mathbf{x}_j)}{\sum_{j=1}^m exp(\mathbf{w}^{(j)^T}\mathbf{x}_j)},$ where n is the number of samples, m is the number of classes.

ℓ (w) = \sum j = 1 n {\sum i = 1 m y (i) j w (i) T x j - l o g \sum i = 1 m e x p (w (i) T x j)}

$\ell(\mathbf{w})=\sum_{j=1}^{n}\left\{ \sum_{i=1}^{m} \mathbf{y}_j^{(i)} \mathbf{w}^{(i)^T}\mathbf{x}_j -log\sum_{i=1}^{m}exp\left(\mathbf{w}^{(i)^T}\mathbf{x}_j\right)\right\}$

Besides on, add sparsity constraints to the $cost$ function,