机器学习笔记 ---- Logistic Regression

本文详细介绍了Logistic Regression在分类问题中的应用,包括线性回归的问题、Logistic Regression模型、Sigmoid函数、决策边界、代价函数、梯度下降优化算法、多类别分类、过拟合及正则化策略,特别是L2正则化的线性回归,为理解和实践提供了全面的理论基础。
摘要由CSDN通过智能技术生成

Logistic Regression

1. Problems of Linear Regression When Applied to Classification Problem

1) h(x) may out of range
2) some unusual feature values lead to failure of classification

2. Logistic Regression Model

1 ) h θ ( x ) = g ( θ T x ) = P ( y = 1 ∣ x ; θ ) 1)h_{\theta}(x)=g(\theta^{T}x) = P(y=1| x ; \theta) 1)hθ(x)=g(θTx)=P(y=1x;θ)

where g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+ez1 is called Sigmoid Function / Logistic Function

3. Decision Boundary

y=1 → h θ ( x ) > 0.5 h_{\theta}(x)>0.5 hθ(x)>0.5 θ T x > 0 \theta^{T}x>0 θTx>0
y=0 → h θ ( x ) < 0.5 h_{\theta}(x)<0.5 hθ(x)<0.5 θ T x < 0 \theta^{T}x<0 θTx<0
decision boundary: h θ ( x ) = 0.5 h_{\theta}(x)=0.5 hθ(x)=0.5 θ T x = 0 \theta^{T}x=0 θTx=0 (may be nonlinear)

4. Cost Function

C o s t ( h ( x ) , y ) = { − l o g ( h ( x ) ) , y = 1 − l o g ( 1 − h ( x ) ) , y = 0 = − y l o g ( h ( x ) ) − ( 1 − y ) l o g ( 1 − h ( x ) ) Cost(h(x),y)=\begin{cases} -log(h(x)), & y=1\\ -log(1-h(x)), & y=0 \end{cases} =-ylog(h(x))-(1-y)log(1-h(x)) Cost(h(x),y)={log(h(x)),log(1h(x)),y=1y=0=ylog(h(x))(1y)log(1h(x))
J ( θ ) = 1 m ∑ i = 1 m C o s t ( h ( x ( i ) ) , y ( i ) ) = 1 m ∑ i = 1 m ( − y T l o g ( h ) − ( 1 − y ) T l o g ( 1 − h ) ) J(\theta)=\frac{1}{m}\sum_{i=1}^{m}{Cost(h(x^{(i)}),y^{(i)})} =\frac{1}{m}\sum_{i=1}^{m}({-y^{T}log(h)-(1-y)^{T}log(1-h)}) J(θ)=m1i=1mCost(h(x(i)),y(i))=m1i=1m(yTlog(h)(1y)Tlog(1h))
where h = g ( X θ ) h=g(X\theta) h=g(Xθ)

5.Iteration Formula

θ j : = θ j − α ∗ 1 m ∑ i = 1 m ( h ( x ( i ) ) − y ( i ) ) ∗ x j ( i ) \theta_{j}:=\theta_{j}-\alpha *\frac{1}{m}\sum_{i=1}^{m}(h(x^{(i)}) - y^{(i)})*x_j^{(i)} θj:=θjαm1i=1m(h(x(i))y(i))xj(i)
vectorized formula:
θ : = α ∗ 1 m X T ( g ( X θ ) − y ) \theta:=\alpha *\frac{1}{m}X^{T}(g(X\theta)-y) θ:=αm1XT(g(Xθ)y)
(identical to linear regression)

6. Some Optimization Algorithms

Conjugate Gradient / BFGS / L-BFGS
No need to pick α and faster, but more complex

7. Multiclass Classification: one-vs-all

Train h θ ( i ) h_{\theta}^{(i)} hθ(i) for every individual i.
When predicting, using m a x i ( h θ ( i ) ( x ) ) max_{i}(h_{\theta}^{(i)}(x)) maxi(hθ(i)(x))

8.Overfitting Problems

underfit — high bias — too few features
overfit — high variance — too many features ---- fail to predict


2 solutions:

  1. Reduce features
  2. Regularization: Keep all features while reduce the values of some features

9. Regularization

adding   λ 2 m ∑ j = 1 n θ j 2 \frac{\lambda}{2m}\sum_{j=1}^{n}\theta_{j}^{2} 2mλj=1nθj2    to    J ( θ ) J(\theta) J(θ)
Note that it does not contain θ 0 \theta_{0} θ0 !
λ \lambda λ : regularization parameter, making θ \theta θ small

10. Regularized Linear Regression

(1) Linear Regression

J ( θ ) = 1 2 m ( ∑ i = 1 m ( h ( x ( i ) ) − y ( i ) ) 2 + λ ∑ j = 1 n θ j 2 ) J(θ) = \frac{1}{2m}(\sum_{i=1}^{m}(h(x^{(i)})-y^{(i)})^2+\lambda\sum_{j=1}^{n}\theta_{j}^{2}) J(θ)=2m1(i=1m(h(x(i))y(i))2+λj=1nθj2)
Note that it does not contain θ 0 \theta_{0} θ0 !


θ j : = θ j − α ( 1 m ∑ i = 1 m ( h ( x ( i ) ) − y ( i ) ) ∗ x j ( i ) + λ m θ j ) \theta_{j}:=\theta_{j}-\alpha (\frac{1}{m}\sum_{i=1}^{m}(h(x^{(i)}) - y^{(i)})*x_j^{(i)}+\frac{\lambda}{m}\theta_{j}) θj:=θjα(m1i=1m(h(x(i))y(i))xj(i)+mλθj) for j ≠ 0 j≠0 j=0
which is also
θ j : = θ j ( 1 − α λ m ) − α ∗ 1 m ∑ i = 1 m ( h ( x ( i ) ) − y ( i ) ) ∗ x j ( i ) + \theta_{j}:=\theta_{j}(1-\alpha\frac{\lambda}{m})-\alpha *\frac{1}{m}\sum_{i=1}^{m}(h(x^{(i)}) - y^{(i)})*x_j^{(i)}+ θj:=θj(1αmλ)αm1i=1m(h(x(i))y(i))xj(i)+ for j ≠ 0 j≠0 j=0

(2) Normal Equation

θ = ( X T X + λ d i a g ( 0 , 1 , 1 , . . . 1 , 1 ) ) − 1 X T y \theta=(X^{T}X+\lambda diag(0,1,1,...1,1))^{-1}X^{T}y θ=(XTX+λdiag(0,1,1,...1,1))1XTy where size of diag() is (n+1)*(n+1)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值