DataWhale

LOGISTIC REGRESSION

Linear regression & Logistic regression

  • The purpose of linear regression is to predict a continuous variable y y y with input x x x.
  • Logistic regression mainly means to classify different categories with input x x x.

The principle of logistic regression

LIke linear regression, there is also a predictive function h θ ( x ) h_\theta(x) hθ(x)(called classification function, linear or non-linear function) in logistic regression.

  • first, compute the predictive function

  • second, call the sigmoid function

  • finally, get the loss function and optimize parameters

    loss function

    J ( θ ) = 1 N ∑ i = 1 N C o s t ( h θ ( x ) , y ) J(\theta)=\frac{1}{N}\sum_{i=1}^NCost(h_\theta(x),y) J(θ)=N1i=1NCost(hθ(x),y)

    optimization

    1. batch gradient descent
      θ = θ + α ∗ 1 N ∑ i = 1 N ( y i − g ( X θ ) ) \theta=\theta+\alpha*\frac{1}{N}\sum_{i=1}^N(y_i-g(X_\theta)) θ=θ+αN1i=1N(yig(Xθ))
      (repeat until convergence)

    2. stochastic gradient descent
      for i to N:
      θ = θ + α ∗ ( y i − h θ ( x ( i ) ) ) x j ( i ) \theta=\theta+\alpha*(y_i-h_\theta(x^{(i)}))x_j^{(i)} θ=θ+α(yihθ(x(i)))xj(i)
      (repeat until convergence)

Regularization

J ( θ ) = 1 N ∑ i = 1 N y ( i ) l o g h θ ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) + λ 2 N ∑ j = 1 N θ j 2 J(\theta)=\frac{1}{N}\sum_{i=1}^Ny^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))+\frac{\lambda}{2N}\sum_{j=1}^N\theta_j^2 J(θ)=N1i=1Ny(i)loghθ(x(i))+(1y(i))log(1hθ(x(i)))+2Nλj=1Nθj2

Model evaluation index

  • receiver operating characteristic curve(roc)
    the horizontal axis: false positive rate(fpr)
    the vertical axis: true positive rate(tpr)
    every point corresponds to a thresthold

  • area under curve(auc)
    auc is related to roc

在这里插入图片描述

Advantages

  1. easy to compute, understand and implement
  2. efficient in time and memory
  3. robustness is good for small noise in data

Disadvantages

  1. underfitting happens easily
  2. classification accuracy is not high
  3. not work well for a big feature space

Sample imbalance issue

The logistic regression model ignores the feature of small sample category.

  • solutions
  1. oversampling, undersampling and combined sampling
  2. weight adjustment
  3. kernel function correction
  4. model correction

sklearn parameters

sklearn.linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1)
  1. regularization parameter: penalty(‘l1’ or ‘l2’)
  2. optimization: solver(‘liblinear’ ‘lbfgs’ ‘newton-cg’ ‘sag’)
  3. classification: multi_class(‘ovr’ or ‘multinomial’)
  4. class weight: class_weight
  5. sample weight: sample_weight

REFERENCE

[1] https://blog.csdn.net/touch_dream/article/details/79371462

[2] https://yoyoyohamapi.gitbooks.io/mit-ml/content/逻辑回归/articles/利用正规化解决过拟合问题.html

[3] https://www.cnblogs.com/dlml/p/4403482.html

[4] https://blog.csdn.net/u011088579/article/details/80654165

[5] https://blog.csdn.net/sun_shengyun/article/details/53811483

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值