假设有如下数据集代表 y = ( 0 , 1 ) y=(0,1) y=(0,1)两类不同数据
以 P ( y ^ = 1 ∣ x , w ) P(\widehat{y}=1|x,w) P(y =1∣x,w)来表示 y ^ = 1 \widehat{y}=1 y =1的概率,而由于y的取值为0或1,有 P ( y ^ = 1 ∣ x , w ) + P ( y ^ = 0 ∣ x , w ) = 1 P(\widehat{y}=1|x,w) + P(\widehat{y}=0|x,w) = 1 P(y =1∣x,w)+P(y =0∣x,w)=1,Logistic regression方法用线性方程加Sigmoid函数限制( 0 ≤ P ≤ 1 0 \leq P \leq 1 0≤P≤1)的方式建立模型:
h w ( x ) = g ( z ) \mathit{h_w(x) = g(z)} hw(x)=g(z)
g ( z ) = 1 1 + e − z \mathit{g(z) = \frac{1}{1 + e^{-z}}} g(z)=1+e−z1
z = w T x \mathit{z = w^Tx} z=wTx
然后使用阈值(threshold)来判断具体分类结果:
y = { 0 h w ( x ) < t h r e s h o l d 1 h w ( x ) ≥ t h r e s h o l d \mathit{y} = \begin{cases} 0 \qquad h_w(x) < threshold \\ 1 \qquad h_w(x) \geq threshold \end{cases}\\ y={0hw(x)<threshold1hw(x)≥threshold
计算损失函数如下:
l o s s = 1 m ∑ i = 1 m 1 2 ( h w ( x ( i ) − y ( i ) ) ) 2 \mathit{ loss = \frac{1}{m}\sum_{i = 1}^m \frac{1}{2}(h_w(x^{(i)} - y^{(i)}))^2 } loss=m1i=1∑m21(hw(x(i)−y(i)))2
C o s t ( h w ( x ) , y ) = 1 2 ( h w ( x ) − y ) 2 \mathit{Cost(h_w(x),y) = \frac{1}{2}(h_w(x) - y)^2} Cost(hw(x),y)=21(hw(x)−y)2
但是显然由于Sigmoid函数的影响损失函数是“非凸”的,所以需要通过变形来得到一个合适的形式:
C o s t ( h w ( x ) , y ) = { − l o g ( h w ( x ) ) y = 1 − l o g ( 1 − h w ( x ) ) y = 0 \mathit{Cost(h_w(x),y)} = \begin{cases} -log(h_w(x)) \qquad y = 1 \\ -log(1 - h_w(x)) \qquad y = 0 \end{cases}\\ Cost(hw(x),y)={−log(hw(x))y=1−log(1−hw(x))y=0
分析Cost函数:
- 当 y = 1 , h w ( x ) = 1 y=1,h_w(x)=1 y=1,hw(x)=1时 C o s t = 0 Cost=0 Cost=0,即 P ( y = 1 ∣ w , x ) = 1 P(y=1|w,x)=1 P(y=1∣w,x)=1时预测 y = 1 y=1 y=1的准确度极高。
- 当 y = 1 , h w ( x ) = 0 y=1,h_w(x)=0 y=1,hw(x)=0时 C o s t → ∞ Cost \to \infty Cost→∞,即 P ( y = 1 ∣ w , x ) = 0 P(y=1|w,x)=0 P(y=1∣w,x)=0时预测 y = 1 y=1 y=1的准确度极低。
- 当 y = 0 y=0 y=0时的情况相同。
所以使用如下Cost形式:
C o s t ( h w ( x ) , y ) = − y l o g ( h w ( x ) ) − ( 1 − y ) l o g ( 1 − h w ( x ) ) Cost(h_w(x),y) = -ylog(h_w(x)) - (1-y)log(1 - h_w(x)) Cost(hw(x),y)=−ylog(hw(x))−(1−y)log(1−hw(x))
需要求解的优化问题为:
m i n l o s s ( w ) = 1 m ∑ ( i = 1 ) m C o s t ( h w ( x ) , y ) min loss(w) = \frac{1}{m} \sum_{(i=1)}{m} Cost(h_w(x), y) minloss(w)=m1(i=1)∑mCost(hw(x),y)
d d w j ( C o s t ( w ) ) = − y 1 h w ( x ) h w ( x ) ( 1 − h w ( x ) ) x j − ( 1 − y ) − h w ( x ) 1 − h w ( x ) ( 1 − h w ( x ) ) x j \frac{d}{dw_j}(Cost(w)) = -y\frac{1}{h_w(x)}h_w(x)(1-h_w(x))x_j - (1-y)\frac{-h_w(x)}{1-h_w(x)} (1-h_w(x))x_j dwjd(Cost(w))=−yhw(x)1hw(x)(1−hw(x))xj−(1−y)1−hw(x)−hw(x)(1−hw(x))xj
= ( h w ( x ) − y ) x j = (h_w(x) - y)x_j =(hw(x)−y)xj
d d w j ( l o s s ( w ) ) = 1 m ∑ i = 1 m ( h w ( x ( i ) ) − y ( i ) ) x j ( i ) \frac{d}{dw_j}(loss(w)) = \frac{1}{m} \sum_{i=1}^{m} (h_w(x^{(i)}) - y^{(i)})x^{(i)}_j dwjd(loss(w))=m1i=1∑m(hw(x(i))−y(i))xj(i)
以上为逻辑回归算法的基本思路,更详尽描述待补充
基于numpy的具体实现代码见:https://github.com/Alnlll/ML/tree/master/lgr