Day two
LOGISTIC REGRESSION
Linear regression & Logistic regression
- The purpose of linear regression is to predict a continuous variable y y y with input x x x.
- Logistic regression mainly means to classify different categories with input x x x.
The principle of logistic regression
LIke linear regression, there is also a predictive function h θ ( x ) h_\theta(x) hθ(x)(called classification function, linear or non-linear function) in logistic regression.
-
first, compute the predictive function
-
second, call the sigmoid function
-
finally, get the loss function and optimize parameters
loss function
J ( θ ) = 1 N ∑ i = 1 N C o s t ( h θ ( x ) , y ) J(\theta)=\frac{1}{N}\sum_{i=1}^NCost(h_\theta(x),y) J(θ)=N1i=1∑NCost(hθ(x),y)
optimization
-
batch gradient descent
θ = θ + α ∗ 1 N ∑ i = 1 N ( y i − g ( X θ ) ) \theta=\theta+\alpha*\frac{1}{N}\sum_{i=1}^N(y_i-g(X_\theta)) θ=θ+α∗N1∑i=1N(yi−g(Xθ))
(repeat until convergence) -
stochastic gradient descent
for i to N:
θ = θ + α ∗ ( y i − h θ ( x ( i ) ) ) x j ( i ) \theta=\theta+\alpha*(y_i-h_\theta(x^{(i)}))x_j^{(i)} θ=θ+α∗(yi−hθ(x(i)))xj(i)
(repeat until convergence)
-
Regularization
J ( θ ) = 1 N ∑ i = 1 N y ( i ) l o g h θ ( x ( i ) ) + ( 1 − y ( i ) ) l o g ( 1 − h θ ( x ( i ) ) ) + λ 2 N ∑ j = 1 N θ j 2 J(\theta)=\frac{1}{N}\sum_{i=1}^Ny^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))+\frac{\lambda}{2N}\sum_{j=1}^N\theta_j^2 J(θ)=N1i=1∑Ny(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))+2Nλj=1∑Nθj2
Model evaluation index
-
receiver operating characteristic curve(roc)
the horizontal axis: false positive rate(fpr)
the vertical axis: true positive rate(tpr)
every point corresponds to a thresthold -
area under curve(auc)
auc is related to roc
Advantages
- easy to compute, understand and implement
- efficient in time and memory
- robustness is good for small noise in data
Disadvantages
- underfitting happens easily
- classification accuracy is not high
- not work well for a big feature space
Sample imbalance issue
The logistic regression model ignores the feature of small sample category.
- solutions
- oversampling, undersampling and combined sampling
- weight adjustment
- kernel function correction
- model correction
sklearn parameters
sklearn.linear_model.LogisticRegression(penalty='l2', dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='liblinear', max_iter=100, multi_class='ovr', verbose=0, warm_start=False, n_jobs=1)
- regularization parameter: penalty(‘l1’ or ‘l2’)
- optimization: solver(‘liblinear’ ‘lbfgs’ ‘newton-cg’ ‘sag’)
- classification: multi_class(‘ovr’ or ‘multinomial’)
- class weight: class_weight
- sample weight: sample_weight
REFERENCE
[1] https://blog.csdn.net/touch_dream/article/details/79371462
[2] https://yoyoyohamapi.gitbooks.io/mit-ml/content/逻辑回归/articles/利用正规化解决过拟合问题.html
[3] https://www.cnblogs.com/dlml/p/4403482.html
[4] https://blog.csdn.net/u011088579/article/details/80654165
[5] https://blog.csdn.net/sun_shengyun/article/details/53811483