参考资料
1.李航《统计学习方法》 2.github: https://github.com/fengdu78/lihang-code
Logistic模型与最大熵模型都属于对数线性模型 是否是线性模型取决于训练的参数是否为线性
Logistic回归模型
Logistic分布
设
X
X
X是连续的随机变量,
X
X
X服从Logistic分布是指
X
X
X具有下列分布函数和密度函数:
F
(
x
)
=
P
(
X
≤
x
)
=
1
1
+
e
−
(
x
−
μ
)
/
γ
F(x)=P(X \leq x)=\frac{1}{1+e^{-(x-\mu)/ \gamma}}
F(x)=P(X≤x)=1+e−(x−μ)/γ1
f
(
x
)
=
F
′
(
x
)
=
e
−
(
x
−
μ
)
/
γ
γ
(
1
+
e
−
(
x
−
μ
)
/
γ
)
2
f(x)=F^{'}(x)=\frac{e^{-(x-\mu)/ \gamma}}{\gamma(1+e^{-(x-\mu)/ \gamma})^2}
f(x)=F′(x)=γ(1+e−(x−μ)/γ)2e−(x−μ)/γ
其中:
μ
是
位
置
参
数
,
γ
>
0
是
形
状
参
数
\mu是位置参数,\gamma>0是形状参数
μ是位置参数,γ>0是形状参数
import matplotlib.pyplot as plt
import numpy as np
def DrawLogisticDestribution(mu, gamma):
x = np.arange(-10, 10, 0.01)
y = 1.0 / (1 + np.exp(-(x-mu)/gamma))
y2 = (np.exp(-(x-mu)/gamma)) / pow((1 + np.exp(-(x-mu)/gamma)), 2)
plt.figure(figsize=(7, 5))
plt.plot(x, y, 'b-', label='Cumulative Distribution Function')
plt.plot(x, y2, 'r-', label='Probability Dense Function')
plt.xlabel('x')
plt.ylabel('y')
plt.legend(loc='upper left')
plt.show()
DrawLogisticDestribution(0, 1)
Logistic回归模型
二项Logistic回归模型
P
(
Y
=
1
∣
x
)
=
e
(
w
⋅
x
+
b
)
1
+
e
(
w
⋅
x
+
b
)
P(Y=1|x)=\frac{e^{(w \cdot x + b)}}{1+e^{(w \cdot x + b)}}
P(Y=1∣x)=1+e(w⋅x+b)e(w⋅x+b)
P
(
Y
=
0
∣
x
)
=
1
1
+
e
(
w
⋅
x
+
b
)
P(Y=0|x)=\frac{1}{1+e^{(w \cdot x + b)}}
P(Y=0∣x)=1+e(w⋅x+b)1
x
∈
R
n
是
输
入
,
y
∈
{
0
,
1
}
是
输
出
,
w
∈
R
n
,
b
∈
R
是
参
数
x\in R^n是输入, y\in\{0, 1\}是输出,w \in R^n, b \in R是参数
x∈Rn是输入,y∈{0,1}是输出,w∈Rn,b∈R是参数
对于给定的输入
x
x
x,按照上式求得
P
(
Y
=
1
∣
x
)
,
P
(
Y
=
0
∣
x
)
P(Y=1|x), P(Y=0|x)
P(Y=1∣x),P(Y=0∣x),比较两个条件概率值的大小,将实例
x
x
x分到概率较大的那一类
事件的几率:该事件发生的概率与不发生的概率的比值
如果某事件发生的概率为
p
p
p,则该事件的几率为
p
1
−
p
\frac{p}{1-p}
1−pp
对数几率(logit函数):
l
o
g
i
t
(
p
)
=
l
o
g
p
1
−
p
logit(p)=log\frac{p}{1-p}
logit(p)=log1−pp
对Logistic回归而言,
l
o
g
i
t
P
(
Y
=
1
∣
x
)
P
(
Y
=
0
∣
x
)
=
l
o
g
e
(
w
⋅
x
+
b
)
=
w
⋅
x
+
b
logit \frac{P(Y=1|x)}{P(Y=0|x)} = log e^{(w \cdot x + b)} = w \cdot x + b
logitP(Y=0∣x)P(Y=1∣x)=loge(w⋅x+b)=w⋅x+b
即在逻辑回归模型中,输出
Y
=
1
Y=1
Y=1的对数几率是输入
x
x
x的线性函数
模型参数估计
对于给定的数据集
T
=
{
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
,
.
.
.
,
(
x
N
,
y
N
)
}
T=\{(x_1, y_1),(x_2, y_2),...,(x_N, y_N)\}
T={(x1,y1),(x2,y2),...,(xN,yN)},其中
x
i
∈
R
n
,
y
i
∈
{
0
,
1
}
x_i\in R^n,y_i\in\{0, 1\}
xi∈Rn,yi∈{0,1},可以应用极大似然估计方法估计模型的参数
设
P
(
Y
=
1
∣
x
)
=
π
(
x
)
,
P
(
Y
=
0
∣
x
)
=
1
−
π
(
x
)
P(Y=1|x)=\pi(x),P(Y=0|x)=1-\pi(x)
P(Y=1∣x)=π(x),P(Y=0∣x)=1−π(x),则似然函数为:
∏
i
=
1
N
[
π
(
x
i
)
]
y
i
[
1
−
π
(
x
i
)
]
1
−
y
i
\prod_{i=1}^{N}[\pi(x_i)]^{y_i}[1-\pi(x_i)]^{1-y_i}
i=1∏N[π(xi)]yi[1−π(xi)]1−yi
对数似然函数为:
L
(
ω
)
=
∑
i
=
1
N
[
y
i
l
o
g
(
π
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
l
o
g
π
(
x
i
)
1
−
π
(
x
i
)
+
l
o
g
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
(
w
⋅
x
+
b
)
−
l
o
g
(
1
+
e
x
p
(
w
⋅
x
+
b
)
)
]
\begin{aligned} L(\omega)&=\sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilog\frac{\pi(x_i)}{1-\pi(x_i)}+log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_i(w \cdot x + b)-log(1+exp(w \cdot x + b))]\\ \end{aligned}
L(ω)=i=1∑N[yilog(π(xi))+(1−yi)log(1−π(xi))]=i=1∑N[yilog1−π(xi)π(xi)+log(1−π(xi))]=i=1∑N[yi(w⋅x+b)−log(1+exp(w⋅x+b))]
求
L
(
ω
)
的
极
大
值
问
题
就
变
成
了
以
对
数
函
数
为
目
标
的
最
优
化
问
题
L(\omega)的极大值问题就变成了以对数函数为目标的最优化问题
L(ω)的极大值问题就变成了以对数函数为目标的最优化问题
多项Logistic回归模型
P
(
Y
=
k
∣
x
)
=
e
x
p
(
w
k
⋅
x
+
b
)
1
+
∑
k
=
1
K
−
1
(
w
k
⋅
x
+
b
)
,
k
=
1
,
2
,
.
.
.
,
K
−
1
P(Y=k|x)=\frac{exp(w_k \cdot x + b)}{1+\sum _{k=1}^{K-1}{(w_k \cdot x + b)}},k=1,2,...,K-1
P(Y=k∣x)=1+∑k=1K−1(wk⋅x+b)exp(wk⋅x+b),k=1,2,...,K−1
P
(
Y
=
K
∣
x
)
=
1
1
+
∑
k
=
1
K
−
1
(
w
k
⋅
x
+
b
)
P(Y=K|x)=\frac{1}{1+\sum _{k=1}^{K-1}{(w_k \cdot x + b)}}
P(Y=K∣x)=1+∑k=1K−1(wk⋅x+b)1
x
∈
R
n
+
1
,
w
k
∈
R
n
+
1
x\in R^{n+1}, w_k \in R^{n+1}
x∈Rn+1,wk∈Rn+1
from math import exp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
def create_data():
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
data = np.array(df.iloc[:100, [0, 1, -1]])
return data[:,:2], data[:,-1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
Logistic回归
g ( z ) = 1 1 + e − z , g ′ ( z ) = g ( z ) ( 1 − g ( z ) ) g(z) = \frac {1}{1+e^{-z}},g^{'}(z)=g(z)(1-g(z)) g(z)=1+e−z1,g′(z)=g(z)(1−g(z))
f
w
(
x
)
=
g
(
w
T
x
)
=
1
1
+
e
−
w
T
x
f_w(x)=g(w^Tx)=\frac{1}{1+e^{-w^Tx}}
fw(x)=g(wTx)=1+e−wTx1
对数似然函数为:
L
(
ω
)
=
∑
i
=
1
N
[
y
i
l
o
g
(
π
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
l
o
g
π
(
x
i
)
1
−
π
(
x
i
)
+
l
o
g
(
1
−
π
(
x
i
)
)
]
\begin{aligned} L(\omega)&=\sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilog\frac{\pi(x_i)}{1-\pi(x_i)}+log(1-\pi(x_i))]\\ \end{aligned}
L(ω)=i=1∑N[yilog(π(xi))+(1−yi)log(1−π(xi))]=i=1∑N[yilog1−π(xi)π(xi)+log(1−π(xi))]
关于
w
j
w_j
wj求偏导:
∂
L
(
ω
)
∂
w
j
=
∑
i
=
1
N
[
y
i
l
o
g
(
π
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
π
(
x
i
)
)
]
=
∑
i
=
1
N
[
y
i
l
o
g
f
w
(
x
i
)
+
(
1
−
y
i
)
l
o
g
(
1
−
f
w
(
x
i
)
)
]
=
∑
i
=
1
N
y
i
f
w
′
(
x
i
)
f
w
(
x
i
)
−
1
−
y
i
1
−
f
w
(
x
i
)
f
w
′
(
x
i
)
=
∑
i
=
1
N
(
y
i
f
w
(
x
i
)
−
1
−
y
i
1
−
f
w
(
x
i
)
)
f
w
′
(
x
i
)
=
∑
i
=
1
N
(
y
i
f
w
(
x
i
)
−
1
−
y
i
1
−
f
w
(
x
i
)
)
f
w
(
x
i
)
(
1
−
f
w
(
x
i
)
)
∂
w
T
x
∂
w
j
=
∑
i
=
1
N
(
y
i
(
1
−
f
w
(
x
i
)
)
−
(
1
−
y
i
)
f
w
(
x
i
)
)
x
j
=
∑
i
=
1
N
(
y
−
f
w
(
x
i
)
)
x
j
\begin{aligned} \frac{\partial L(\omega)}{\partial w_j} &= \sum_{i=1}^{N}[y_ilog(\pi(x_i))+(1-y_i)log(1-\pi(x_i))]\\ &=\sum_{i=1}^{N}[y_ilogf_w(x_i)+(1-y_i)log(1-f_w(x_i))]\\ &=\sum_{i=1}^{N}y_i \frac{f_w^{'}{(x_i)}}{f_w(x_i)} - \frac{1-y_i}{1-f_w(x_i)} f_w^{'}{(x_i)}\\ &=\sum_{i=1}^{N}(\frac{y_i}{f_w(x_i)}-\frac{1-y_i}{1-f_w(x_i)})f_w^{'}{(x_i)}\\ &=\sum_{i=1}^{N}(\frac{y_i}{f_w(x_i)}-\frac{1-y_i}{1-f_w(x_i)})f_w(x_i)(1-f_w(x_i))\frac{\partial w^Tx}{\partial w_j}\\ &=\sum_{i=1}^{N}(y_i(1-f_w(x_i))-(1-y_i)f_w(x_i))x_j\\ &=\sum_{i=1}^{N}(y-f_w(x_i))x_j \end{aligned}
∂wj∂L(ω)=i=1∑N[yilog(π(xi))+(1−yi)log(1−π(xi))]=i=1∑N[yilogfw(xi)+(1−yi)log(1−fw(xi))]=i=1∑Nyifw(xi)fw′(xi)−1−fw(xi)1−yifw′(xi)=i=1∑N(fw(xi)yi−1−fw(xi)1−yi)fw′(xi)=i=1∑N(fw(xi)yi−1−fw(xi)1−yi)fw(xi)(1−fw(xi))∂wj∂wTx=i=1∑N(yi(1−fw(xi))−(1−yi)fw(xi))xj=i=1∑N(y−fw(xi))xj
参数更新:
w
j
:
=
w
j
+
α
(
y
i
−
f
w
(
x
i
)
)
x
j
w_j:=w_j+\alpha(y_i - f_w(x_i))x_j
wj:=wj+α(yi−fw(xi))xj
class LogisticRegressionClassifier(object):
def __init__(self, max_iter=200, learning_rate=0.01):
self.max_iter = max_iter
self.learning_rate = learning_rate
def sigmoid(self, x):
return 1 / (1 + exp(-(x)))
def data_matrix(self, X):
data_mat = []
for d in X:
# d [6.0 2.8] *d 6.0 2.8
data_mat.append([1.0, *d])
return data_mat
def fit(self, X, y):
data_mat = self.data_matrix(X)
self.weights = np.zeros((len(data_mat[0]), 1), dtype=np.float32)
for iter_ in range(self.max_iter):
for i in range(len(X)):
result = self.sigmoid(np.dot(data_mat[i], self.weights))
error = y[i] - result
# 参数更新
self.weights += self.learning_rate * error * np.transpose([data_mat[i]])
print("LogisticRegression Model learning_rate={}, max_iter={}".format( self.learning_rate, self.max_iter))
def score(self, X_test, y_test):
right = 0
X_test = self.data_matrix(X_test)
for x, y in zip(X_test, y_test):
result = np.dot(x, self.weights)
if(result > 0 and y == 1) or (result < 0 and y == 0):
right += 1
return right / len(X_test)
lg_clf = LogisticRegressionClassifier()
lg_clf.fit(X_train, y_train)
lg_clf.score(X_test, y_test)
x_ponits = np.arange(4, 8)
y_ = -(lg_clf.weights[1]*x_ponits + lg_clf.weights[0])/lg_clf.weights[2]
plt.plot(x_ponits, y_)
#lg_clf.show_graph()
plt.scatter(X[:50,0],X[:50,1], label='0')
plt.scatter(X[50:,0],X[50:,1], label='1')
plt.legend()
调用sklearn中的内置函数
from sklearn.linear_model import LogisticRegression
clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
clf.score(X_test, y_test)
x_ponits = np.arange(4, 8)
y_ = -(clf.coef_[0][0]*x_ponits + clf.intercept_)/clf.coef_[0][1]
plt.plot(x_ponits, y_)
plt.plot(X[:50, 0], X[:50, 1], 'bo', color='blue', label='0')
plt.plot(X[50:, 0], X[50:, 1], 'bo', color='orange', label='1')
plt.xlabel('sepal length')
plt.ylabel('sepal width')
plt.legend()