逻辑回归
- 逻辑回归本质是线性回归,只是在求解结果的时候加了一层函数映射,将连续值映射到0和1上,达到分类的目的。
Sigmoid函数
g
(
z
)
=
1
1
+
e
−
z
{g(z)=\frac{1}{1+e^{-z}}}
g(z)=1+e−z1
g
′
(
z
)
=
(
1
1
+
e
−
z
)
′
=
e
−
z
(
1
+
e
−
z
)
2
=
g
(
z
)
⋅
(
1
−
g
(
z
)
)
{g'(z)=(\frac{1}{1+e^{-z}}})'=\frac{e^{-z}}{(1+e^{-z})^2}=g(z) \cdot (1-g(z))
g′(z)=(1+e−z1)′=(1+e−z)2e−z=g(z)⋅(1−g(z))
- sigmoid函数的导数在 x = 0.5 x=0.5 x=0.5处达到最大值,这就意味着预测值在0.5附近时,可以更快的进行下降,将预测值更加贴近于0或1的数值。
对数回归的假设函数如下,线性回归假设函数只是
θ
T
x
\theta^Tx
θTx,
h
θ
(
x
)
=
g
(
θ
T
x
)
=
1
1
+
e
−
θ
T
x
h_\theta(x)=g(\theta^Tx)=\frac{1}{1+e^{-\theta^{T}x}}
hθ(x)=g(θTx)=1+e−θTx1
为什么使用Sigmoid函数
首先,如果一个概率分布可以表示成:
p
(
y
;
η
)
=
b
(
y
)
exp
(
η
T
T
(
y
)
−
a
(
η
)
)
p(y ; \eta)=b(y) \exp \left(\eta^{T} T(y)-a(\eta)\right)
p(y;η)=b(y)exp(ηTT(y)−a(η))
,那么这个概率分布可以称为指数分布。
- 说明:伯努利分布,高斯分布,泊松分布,贝塔分布,狄特里特分布都属于指数分布。
而在对数回归时,即加一层Sigmoid函数映射,采用的就是伯努利分布,其概率可以表示成: p ( y ; ϕ ) = ϕ y ( 1 − ϕ ) 1 − y = exp ( y log ϕ + ( 1 − y ) log ( 1 − ϕ ) ) = exp ( ( log ( ϕ 1 − ϕ ) ) y + log ( 1 − ϕ ) ) \begin{aligned} p(y ; \phi) &=\phi^{y}(1-\phi)^{1-y} \\ &=\exp (y \log \phi+(1-y) \log (1-\phi)) \\ &=\exp \left(\left(\log \left(\frac{\phi}{1-\phi}\right)\right) y+\log (1-\phi)\right) \end{aligned} p(y;ϕ)=ϕy(1−ϕ)1−y=exp(ylogϕ+(1−y)log(1−ϕ))=exp((log(1−ϕϕ))y+log(1−ϕ))
其中, η = log ( ϕ 1 − ϕ ) \eta=\log(\frac{\phi}{1-\phi}) η=log(1−ϕϕ),由此得到 ϕ = 1 1 + e η \phi=\frac{1}{1+e^{\eta}} ϕ=1+eη1
这就是为什么对数回归的时候要使用Sigmoid函数。
逻辑回归损失函数
- 当事件已经发生了,而未知参数等于多少时,能让这个事情发生的概率最大,执果索因
- 如果想要让所计算的结果更准确,就可以将每次计算的记过进行相乘,让最后的概率最大即可
即极大似然估计法,这里介绍两种损失函数,一种是0、1作为分类标签,一种是以1、-1作为分类标签。
- 第一种损失函数(y_{i]}为标签值)
- 某样本属于正例的概率可以表示为: P ( y = 1 ∣ x ) = exp ( w ⋅ x ) 1 + exp ( w ⋅ x ) P(y=1 | x)=\frac{\exp (w \cdot x)}{1+\exp (w \cdot x)} P(y=1∣x)=1+exp(w⋅x)exp(w⋅x)
- 某样本属于负例的概率可以表示为: P ( y = 0 ∣ x ) = 1 1 + exp ( w ⋅ x ) P(y=0 | x)=\frac{1}{1+\exp (w \cdot x)} P(y=0∣x)=1+exp(w⋅x)1
- 似然函数为: ∏ i = 1 N [ P ( y = 1 ∣ x i ) ] y i [ P ( y = 0 ∣ x i ) ] 1 − y i \prod_{i=1}^{N}\left[\mathrm{P}\left(\mathrm{y}=1 | x_{i}\right)\right]^{y_{i}}\left[\mathrm{P}\left(\mathrm{y}=0 | x_{i}\right)\right]^{1-y_{i}} ∏i=1N[P(y=1∣xi)]yi[P(y=0∣xi)]1−yi
- 代价函数为: cost ( w ) = − 1 N ∑ i = 1 N [ y i log P ( y = 1 ∣ x i ) + ( 1 − y i ) log P ( y = 0 ∣ x i ) ] \operatorname{cost}(w)=-\frac{1}{N} \sum_{i=1}^{N}\left[y_{i} \log \mathrm{P}\left(\mathrm{y}=1 | x_{i}\right)+\left(1-y_{i}\right) \log \mathrm{P}\left(\mathrm{y}=0 | x_{i}\right)\right] cost(w)=−N1∑i=1N[yilogP(y=1∣xi)+(1−yi)logP(y=0∣xi)]
- 使用梯度下降反解参数w
- 第二种损失函数
- 某样本属于正例的概率可以表示为: P ( y = 1 ∣ x ) = exp ( w ⋅ x ) 1 + exp ( w ⋅ x ) P(y=1 | x)=\frac{\exp (w \cdot x)}{1+\exp (w \cdot x)} P(y=1∣x)=1+exp(w⋅x)exp(w⋅x)
- 某样本属于负例的概率可以表示为: P ( y = − 1 ∣ x ) = 1 1 + exp ( w ⋅ x ) P(y=-1 | x)=\frac{1}{1+\exp (w \cdot x)} P(y=−1∣x)=1+exp(w⋅x)1
- 似然函数为: ∏ i = 1 N 1 1 + exp ( − y i w ⋅ x i ) \prod^{N}_{i=1} \frac{1}{1+ {\exp} (-y_{i}w \cdot x_{i})} ∏i=1N1+exp(−yiw⋅xi)1
- 代价函数为: cost ( w ) = 1 N ∑ i = 1 N log ( 1 + exp ( − y i w ⋅ x i ) ) \operatorname{cost}(w)=\frac{1}{N} \sum_{i=1}^{N} \log \left(1+\exp \left(-y_{i} w \cdot x_{i}\right)\right) cost(w)=N1∑i=1Nlog(1+exp(−yiw⋅xi))
- 使用梯度下降反解参数w:
- 目标函数: O b j ( w ) = 1 N ∑ i = 1 N log ( 1 + exp ( − y i w ⋅ x i ) ) + λ 1 2 ∥ w ∥ 2 2 O b j(w)=\frac{1}{N} \sum_{i=1}^{N} \log \left(1+\exp \left(-y_{i} w \cdot x_{i}\right)\right)+\lambda \frac{1}{2}\|w\|_{2}^{2} Obj(w)=N1∑i=1Nlog(1+exp(−yiw⋅xi))+λ21∥w∥22
- 目标函数导数:
O b j w ′ = 1 N ∑ i = 1 N exp ( − y i w ⋅ x i ) 1 + exp ( − y i w ⋅ x i ) ( − y i x i ) + λ w = 1 N ∑ i = 1 N − y i ( 1 − 1 1 + exp ( − y i w ⋅ x i ) ) x i + λ w \begin{aligned} O b j_{w}^{\prime} &=\frac{1}{N} \sum_{i=1}^{N} \frac{\exp \left(-y_{i} w \cdot x_{i}\right)}{1+\exp \left(-y_{i} w \cdot x_{i}\right)}\left(-y_{i} x_{i}\right)+\lambda w \\ &=\frac{1}{N} \sum_{i=1}^{N}-y_{i}\left(1-\frac{1}{1+\exp \left(-y_{i} w \cdot x_{i}\right)}\right) x_{i}+\lambda w \end{aligned} Objw′=N1i=1∑N1+exp(−yiw⋅xi)exp(−yiw⋅xi)(−yixi)+λw=N1i=1∑N−yi(1−1+exp(−yiw⋅xi)1)xi+λw - 更新权重: w = w − γ O b j w ′ w=w - \gamma Obj'_{w} w=w−γObjw′
Softmax多分类
- softmax是logistic回归的一般形式,可解决多分类问题。
例子
- softmax公式:
f
(
z
)
=
e
z
∑
i
=
1
n
e
z
f(z)=\frac{e^{z}}{\sum^n_{i=1}e^z}
f(z)=∑i=1nezez
- 通过输入一个向量,通过softmax公式映射得到一个概率向量,最后将其分到算出概率最大的一类。
理论部分
假设预测值 y y y有 k k k种可能,那么 ϕ i = p ( y = i ; ϕ ) \phi_{i}=p(y=i ; \phi) ϕi=p(y=i;ϕ), ∑ i = 1 k ϕ i = 1 \sum_{i=1}^{k} \phi_{i}=1 ∑i=1kϕi=1, p ( y = k ; ϕ ) = 1 − ∑ i = 1 k ′ − 1 ϕ i p(y=k ; \phi)=1-\sum_{i=1}^{k^{\prime}-1} \phi_{i} p(y=k;ϕ)=1−∑i=1k′−1ϕi。
用于一般线性模型中:
p
(
y
;
ϕ
)
=
ϕ
1
1
{
y
=
1
}
ϕ
2
1
{
y
=
2
}
…
ϕ
k
1
{
y
=
k
}
=
ϕ
1
1
{
y
=
1
}
ϕ
2
1
{
y
=
2
}
…
ϕ
k
1
−
∑
i
=
1
k
−
1
1
{
y
=
i
}
=
ϕ
1
(
T
(
y
)
)
1
ϕ
2
(
T
(
y
)
)
2
…
ϕ
k
1
−
∑
i
=
1
k
−
1
(
T
(
y
)
)
i
=
exp
(
(
T
(
y
)
)
1
log
(
ϕ
1
)
+
(
T
(
y
)
)
2
log
(
ϕ
2
)
+
⋯
+
(
1
−
∑
i
=
1
k
−
1
(
T
(
y
)
)
i
)
log
(
ϕ
k
)
)
=
exp
(
(
T
(
y
)
)
1
log
(
ϕ
1
/
ϕ
k
)
+
(
T
(
y
)
)
2
log
(
ϕ
2
/
ϕ
k
)
+
⋯
+
(
T
(
y
)
)
k
−
1
log
(
ϕ
k
−
1
/
ϕ
k
)
+
log
(
ϕ
k
)
)
\begin{aligned} p(y ; \phi) &=\phi_{1}^{1\{y=1\}} \phi_{2}^{1\{y=2\}} \ldots \phi_{k}^{1\{y=k\}} \\ &=\phi_{1}^{1\{y=1\}} \phi_{2}^{1\{y=2\}} \ldots \phi_{k}^{1-\sum_{i=1}^{k-1} 1\{y=i\}} \\ &=\phi_{1}^{(T(y))_{1}} \phi_{2}^{(T(y))_{2}} \ldots \phi_{k}^{1-\sum_{i=1}^{k-1}(T(y))_{i}} \\ &=\exp \left((T(y))_{1} \log \left(\phi_{1}\right)+(T(y))_{2} \log \left(\phi_{2}\right)+\right.\\ &\left.\cdots+\left(1-\sum_{i=1}^{k-1}(T(y))_{i}\right) \log \left(\phi_{k}\right)\right) \\ &=\exp \left((T(y))_{1} \log \left(\phi_{1} / \phi_{k}\right)+(T(y))_{2} \log \left(\phi_{2} / \phi_{k}\right)+\right.\\ &\left.\cdots+(T(y))_{k-1} \log \left(\phi_{k-1} / \phi_{k}\right)+\log \left(\phi_{k}\right)\right) \end{aligned}
p(y;ϕ)=ϕ11{y=1}ϕ21{y=2}…ϕk1{y=k}=ϕ11{y=1}ϕ21{y=2}…ϕk1−∑i=1k−11{y=i}=ϕ1(T(y))1ϕ2(T(y))2…ϕk1−∑i=1k−1(T(y))i=exp((T(y))1log(ϕ1)+(T(y))2log(ϕ2)+⋯+(1−i=1∑k−1(T(y))i)log(ϕk))=exp((T(y))1log(ϕ1/ϕk)+(T(y))2log(ϕ2/ϕk)+⋯+(T(y))k−1log(ϕk−1/ϕk)+log(ϕk))
就得到:
η
=
[
log
(
ϕ
1
/
ϕ
k
)
log
(
ϕ
2
/
ϕ
k
)
⋮
log
(
ϕ
k
−
1
/
ϕ
k
)
]
a
(
η
)
=
−
log
(
ϕ
k
)
b
(
y
)
=
1
\begin{aligned} \eta &=\left[\begin{array}{c} \log \left(\phi_{1} / \phi_{k}\right) \\ \log \left(\phi_{2} / \phi_{k}\right) \\ \vdots \\ \log \left(\phi_{k-1} / \phi_{k}\right) \end{array}\right] \\ a(\eta) &=-\log \left(\phi_{k}\right) \\ b(y) &=1 \end{aligned}
ηa(η)b(y)=⎣⎢⎢⎢⎡log(ϕ1/ϕk)log(ϕ2/ϕk)⋮log(ϕk−1/ϕk)⎦⎥⎥⎥⎤=−log(ϕk)=1
最后可以求得:
p
(
y
=
i
∣
x
)
=
ϕ
i
=
e
η
i
∑
j
=
1
k
e
η
j
p(y=i|x)=\phi_{i}=\frac{e^{\eta_{i}}}{\sum_{j=1}^{k} e^{\eta_{j}}}
p(y=i∣x)=ϕi=∑j=1keηjeηi
最后获得了最大似然函数:
ℓ
(
θ
)
=
∑
i
=
1
m
log
p
(
y
(
i
)
∣
x
(
i
)
;
θ
)
=
∑
i
=
1
m
log
∏
l
=
1
k
(
e
θ
l
T
(
i
)
∑
j
=
1
k
e
θ
T
x
(
i
)
)
1
(
y
(
i
)
=
l
}
\begin{aligned} \ell(\theta) &=\sum_{i=1}^{m} \log p\left(y^{(i)} | x^{(i)} ; \theta\right) \\ &=\sum_{i=1}^{m} \log \prod_{l=1}^{k}\left(\frac{e^{\theta_{l} T^{(i)}}}{\sum_{j=1}^{k} e^{\theta^{T} x^{(i)}}}\right)^{1\left(y^{(i)}=l\right\}} \end{aligned}
ℓ(θ)=i=1∑mlogp(y(i)∣x(i);θ)=i=1∑mlogl=1∏k(∑j=1keθTx(i)eθlT(i))1(y(i)=l}
分类评估指标
混淆矩阵
- 准确率(accuracy) = 预测对的/所有 = (TP+TN)/(TP+FN+FP+TN)
- 精确率(precision)、查准类= TP/(TP+FP)
- 召回率(recall)、查全率 = TP/(TP+FN)
- F1值就是精确率和召回率的调和均值
F 1 = 2 ∗ F 1=2^{*} F1=2∗ (precision ∗ * ∗ recall) / (precision + recall)
常见指标
PR曲线
某个模型对一批数据数据进行预测,会为每个样本输出一个概率(属于正例的概率)。做分类时如果要严格输出0或1的话,就要使用阈值对概率进行激活。选择不同的阈值时,精确率(确实为正例的数量/模型认为是正例的个数)和召回率(确实为正例的数量/总的正例的数量)就会不同。就产生了PR曲线。
ROC曲线与AUC曲线
ROC全称是“受试者工作特征”(ReceiverOperatingCharacteristic)。ROC曲线的面积就是
AUC(Area Under the Curve)。AUC用于衡量“二分类问题”机器学习算法性能(泛化能力)
真正例(True Positive Rate,TPR),表示所有正例中,预测为正例的比例:
T
P
R
=
T
P
/
(
T
P
+
F
N
)
TPR=TP/(TP+FN)
TPR=TP/(TP+FN)
假正例(False Positive Rate,FPR),表示所有负例中,预测为正例的比例:
F
P
R
=
F
P
/
(
T
N
+
F
P
)
FPR=FP/(TN+FP)
FPR=FP/(TN+FP)
- sklearn的使用
逻辑回归与softmax的sklearn使用
# python3.7
# -*- coding: utf-8 -*-
#@Author : huinono
#@Software : PyCharm
import warnings
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris,load_breast_cancer
from sklearn.metrics import classification_report,roc_auc_score
from sklearn.metrics import precision_score,recall_score,f1_score
from sklearn.metrics import confusion_matrix,roc_curve,accuracy_score
from sklearn.model_selection import train_test_split,GridSearchCV
from sklearn.preprocessing import LabelEncoder,OneHotEncoder,StandardScaler
warnings.filterwarnings('ignore')
mpl.rcParams['font.sans-serif'] = 'SimHei'
mpl.rcParams['axes.unicode_minus'] = 'False'
plt.rcParams['font.sans-serif'] = 'SimHei'
plt.rcParams['axes.unicode_minus'] = 'False'
def LR_example():
data = np.array([[1,1,0],[1,2,0],[0,0,1],[-1,0,1]])
x = data[:,:-1]
y = data[:,-1:]
model = LogisticRegression()
model.fit(x,y.ravel())
y_ = model.predict(x)
print(model.predict_proba(x))
w = model.coef_
b = model.intercept_
def Label_OneHot():
iris_data = load_iris()
x = iris_data['data']
y = iris_data['target']
# 将字符串信息变为数字分类
label = LabelEncoder()
y_label = label.fit_transform(y)
onehot = OneHotEncoder()
y_onehot = onehot.fit_transform(y.reshape(-1,1))
print(y_onehot.toarray())
def Breast_canner():
breast_canner_data = load_breast_cancer()
x = breast_canner_data['data']
y = breast_canner_data['target']
'''
处理数据频数不平衡问题:对少的类别进行添加已达到相同或相近频数
'''
std = StandardScaler().fit_transform(x)
x = pd.DataFrame(x)
y = pd.DataFrame(y)
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.33,random_state=42)
'''
solver:选择梯度下降方法
C:正则化系数的倒数
fit_intercept:拟合截距
max_iter:最大迭代次数
multiclss:处理的属性数量
'''
model = LogisticRegression(solver="sag",C=1.5,fit_intercept=True,max_iter=500,multi_class='multinomial')
model.fit(x_train,y_train)
score = model.score(x_test,y_test)
y_predict = model.predict(x_test)
print(confusion_matrix(y_test,y_predict))
print(classification_report(y_test,y_predict))
print(roc_auc_score(y_test,y_predict))#auc面积
fpr,tpr,th = roc_curve(y_test,y_predict)
plt.plot(fpr,tpr)
plt.show()
def softmax():
iris_data = load_iris()
x = iris_data['data']
y = iris_data['target']
x = StandardScaler().fit_transform(x)
y = LabelEncoder().fit_transform(y)
x_train,x_test,y_train,y_test = train_test_split(x,y,random_state=66,test_size=0.3)
LR = LogisticRegression()
model = GridSearchCV(LR,param_grid={'C':[1,10,20,50]})
model.fit(x_train,y_train)
print(model.score(x_test,y_test))
print(model.best_params_)
y_predict = model.predict(x_test)
if __name__ == '__main__':
Breast_canner()