1.sigmoid函数
在逻辑回归中,我们引入了一个函数,sigmoid函数。
y
=
1
1
+
e
−
x
y=\frac{1}{1+e^{-x}}
y=1+e−x1
该函数有一个很好的特性就是在实轴域上y的取值在(0,1),且有很好的对称性,对极大值和极小值不敏感(因为在取向无论是正无穷还是负无穷的时候函数的y几乎很稳定)。由于sigmoid函数的值域在(0,1)之间,这正好可以表示一个概率值,令
P
(
Y
=
1
∣
X
)
=
1
1
+
e
−
θ
x
+
b
P(Y=1 |X)=\frac{1}{1+e^{-\theta{x}+b}}
P(Y=1∣X)=1+e−θx+b1其中,
θ
,
x
\theta ,x
θ,x为向量。
上式表示在给定x的值后,预测y=1的概率。所以有假设函数
h
θ
(
x
)
=
1
1
+
e
−
θ
x
+
b
h_{\theta}(x)=\frac{1}{1+e^{-\theta{x}+b}}
hθ(x)=1+e−θx+b1
当Y=1时,
P
(
Y
=
1
)
=
1
1
+
e
−
θ
x
+
b
=
h
θ
(
x
)
P(Y=1)=\frac{1}{1+e^{-\theta{x}+b}}=h_{\theta}(x)
P(Y=1)=1+e−θx+b1=hθ(x)
显然Y=0时,
P
(
Y
=
0
)
=
1
−
P
(
Y
=
1
)
=
1
−
h
θ
(
x
)
P(Y=0)=1-P(Y=1)=1-h_{\theta}(x)
P(Y=0)=1−P(Y=1)=1−hθ(x)
,将其整合为一个公式:
P
=
(
Y
=
y
)
=
y
h
θ
(
x
)
+
(
1
−
y
)
(
1
−
h
θ
(
x
)
)
,
(
y
∈
(
0
,
1
)
)
P=(Y=y)=yh_{\theta}(x)+(1-y)(1-h_{\theta}(x)),(y∈{(0,1}))
P=(Y=y)=yhθ(x)+(1−y)(1−hθ(x)),(y∈(0,1))
2.损失函数
1对数损失函数
这里引入对数似然函数:
L
(
Y
,
P
(
Y
∣
X
)
)
=
−
l
o
g
P
(
Y
∣
X
)
L(Y,P(Y|X))=-logP(Y|X)
L(Y,P(Y∣X))=−logP(Y∣X)
当Y=1时,损失函数为:
−
l
o
g
P
(
Y
=
1
∣
X
)
-logP(Y=1|X)
−logP(Y=1∣X)
当Y=0时,损失函数为:
−
l
o
g
(
1
−
P
(
Y
=
1
∣
X
)
)
-log(1-P(Y=1|X))
−log(1−P(Y=1∣X))
则将两个公式合并就得到了一个样本的损失函数:
L
(
Y
∣
X
)
=
−
y
l
o
g
P
(
Y
=
1
∣
X
)
−
(
1
−
y
)
l
o
g
(
1
−
P
(
Y
=
1
∣
X
)
)
=
−
(
y
l
o
g
(
h
θ
(
x
)
)
+
(
1
−
y
)
l
o
g
(
1
−
h
θ
(
x
)
)
)
L(Y|X)=-ylogP(Y=1|X)-(1-y)log(1-P(Y=1|X))=-(ylog(h_{\theta}(x))+(1-y)log(1-h_{\theta}(x)))
L(Y∣X)=−ylogP(Y=1∣X)−(1−y)log(1−P(Y=1∣X))=−(ylog(hθ(x))+(1−y)log(1−hθ(x)))
所以m个样本的损失函数为:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
(
y
i
l
o
g
(
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
)
J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}
J(θ)=−m1i=1∑m(yilog(hθ(xi))+(1−yi)log(1−hθ(xi)))
2.最大似然函数
在这里我们将y的概率公式合并为
P
(
Y
∣
X
)
=
(
h
θ
(
x
i
)
)
y
i
(
1
−
h
θ
(
x
i
)
)
(
1
−
y
i
)
P(Y |X)=(h_{\theta}(x_{i}))^{y_{i}}(1-h_{\theta}(x_{i}))^{(1-y_{i})}
P(Y∣X)=(hθ(xi))yi(1−hθ(xi))(1−yi)
由于每个样本的概率密度函数是一样的,所以对于每一个样本来说都是同分布的,所以我们构造出这m个样本的最大似然函数:
L
(
θ
∣
(
(
x
1
,
y
1
)
,
(
x
2
,
y
2
)
.
.
.
(
x
m
,
y
m
=
π
i
=
1
m
(
h
θ
(
x
i
)
)
y
i
(
1
−
h
θ
(
x
i
)
(
1
−
y
i
)
L(\theta | ((x_{1},y_{1}),(x_{2},y_{2})...(x_{m},y_{m}=\pi_{i=1}^{m}{(h_{\theta}(x_{i}))^{y_{i}}(1-h_{\theta}(x_{i})^{(1-y_{i})}}
L(θ∣((x1,y1),(x2,y2)...(xm,ym=πi=1m(hθ(xi))yi(1−hθ(xi)(1−yi)
对似然函数取对数的得到对数似然函数:
l
o
g
L
(
θ
)
=
∑
i
=
1
m
(
y
i
l
o
g
(
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
)
logL(\theta)=\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}
logL(θ)=i=1∑m(yilog(hθ(xi))+(1−yi)log(1−hθ(xi)))
似然函数是要求似然函数的最大值,而我们引入代价函数
J
(
θ
)
=
−
1
m
l
o
g
L
(
θ
)
J(\theta)=-\frac{1}{m}logL(\theta)
J(θ)=−m1logL(θ)来用梯度下降法来求最小值。
J
(
θ
)
=
−
1
m
∑
i
=
1
m
(
y
i
l
o
g
(
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
)
J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}
J(θ)=−m1i=1∑m(yilog(hθ(xi))+(1−yi)log(1−hθ(xi)))
3.对 θ \theta θ求偏导
∂ θ J ( θ ) = − 1 m ∑ i = 1 m ( y i 1 h θ ( x i ) ∂ θ h θ ( x i ) − ( 1 − y i ) ( 1 1 − h θ ( x ) ) ( ∂ θ h θ ( x i ) ) ) \frac{\partial}{\theta}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}({y_{i}\frac{1}{h_{\theta}(x_{i})}\frac{\partial}{\theta}h_{\theta}(x_{i})}-(1-y_{i})(\frac{1}{1-h_{\theta}(x)})(\frac{\partial}{\theta}h_{\theta}(x_{i}))) θ∂J(θ)=−m1∑i=1m(yihθ(xi)1θ∂hθ(xi)−(1−yi)(1−hθ(x)1)(θ∂hθ(xi)))
= − 1 m ∑ i = 1 m ( y i 1 h θ ( x i ) − ( 1 − y i ) ( 1 1 − h θ ( x i ) ) ∂ θ h θ ( x i ) =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}\frac{1}{h_{\theta}(x_{i})}-(1-y_{i})(\frac{1}{1-h_{\theta}(x_{i})}) \frac{\partial}{\theta}h_{\theta}(x_{i}) =−m1∑i=1m(yihθ(xi)1−(1−yi)(1−hθ(xi)1)θ∂hθ(xi)
sigmoid函数求导:
g ( z ) = 1 1 + e − z g(z)=\frac{1}{1+e^{-z}} g(z)=1+e−z1
g ′ ( z ) = e − z ( 1 + e − z ) 2 g^{'}(z)=\frac{e^{-z}}{(1+e^{-z})^{2}} g′(z)=(1+e−z)2e−z
= 1 1 + e − z e − z 1 + e − z =\frac{1}{1+e^{-z}}\frac{e^{-z}}{1+e^{-z}} =1+e−z11+e−ze−z
= 1 1 + e − z ( 1 + e − z 1 + e − z − 1 1 + e − z ) =\frac{1}{1+e^{-z}}(\frac{1+e^{-z}}{1+e^{-z}}-\frac{1}{1+e^{-z}}) =1+e−z1(1+e−z1+e−z−1+e−z1)
= g ( z ) ( 1 − g ( z ) ) =g(z)(1-g(z)) =g(z)(1−g(z))
由于 h θ ( x ) = g ( θ x ) = 1 1 + e − θ x h_{\theta}(x)=g(\theta{x})=\frac{1}{1+e^{-\theta x}} hθ(x)=g(θx)=1+e−θx1
所以: ∂ θ h θ ( x ) = h θ ( x ) ( 1 − h θ ( x ) ) x \frac{\partial}{\theta}h_{\theta}(x)=h_{\theta}({x})(1-h_{\theta}(x))x θ∂hθ(x)=hθ(x)(1−hθ(x))x
∂ θ J ( θ ) \frac{\partial}{\theta}J(\theta) θ∂J(θ)
= − 1 m ∑ i = 1 m ( y i 1 h θ ( x i ) − ( 1 − y i ) ( 1 1 − h θ ( x i ) ) ( h θ ( x i ) ( 1 − h θ ( x i ) ) x i =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}\frac{1}{h_{\theta}(x_{i})}-(1-y_{i})(\frac{1}{1-h_{\theta}(x_{i})})(h_{\theta}({x_{i}})(1-h_{\theta}(x_{i}))x_{i} =−m1∑i=1m(yihθ(xi)1−(1−yi)(1−hθ(xi)1)(hθ(xi)(1−hθ(xi))xi
= − 1 m ∑ i = 1 m ( y i ( 1 − h θ ( x i ) ) − ( 1 − y i ) h θ ( x i ) ) x i =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}(1-h_{\theta}(x_{i}))-(1-y_{i})h_{\theta}(x_{i}))x_{i} =−m1∑i=1m(yi(1−hθ(xi))−(1−yi)hθ(xi))xi
= − 1 m ∑ i = 1 m ( y i − y i h θ ( x i ) − h θ ( x i ) + y i h θ ( x i ) ) x i =-\frac{1}{m}\sum_{i=1}^{m}(y_{i}-y_{i}h_{\theta}(x_{i})-h_{\theta}(x_{i})+y_{i}h_{\theta}(x_{i}))x_{i} =−m1∑i=1m(yi−yihθ(xi)−hθ(xi)+yihθ(xi))xi
= 1 m ∑ i = 1 m ( h θ ( x i ) − y i ) x i =\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x_{i})-y_{i})x_{i} =m1∑i=1m(hθ(xi)−yi)xi
最终的结果与线性回归求得的结果一样
4.代码演示
import numpy as np
import matplotlib.pyplot as plt
```python
x=np.array([0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,
2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50])
x=x.reshape(-1,1)
y=np.array([0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1])
x=np.concatenate((np.ones((x.shape[0],1)),x),axis=1)#添加偏置项
y=y.reshape(-1,1)
y.shape
(20, 1)
plt.scatter(x[:,1],y)
plt.xlabel('Hours')
plt.ylabel('pass')
plt.title('exam-data')
plt.show()
J
(
θ
)
=
−
1
m
∑
i
=
1
m
(
y
i
l
o
g
(
h
θ
(
x
i
)
)
+
(
1
−
y
i
)
l
o
g
(
1
−
h
θ
(
x
i
)
)
)
J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}{(y_{i}log(h_{\theta}(x_{i}))+(1-y_{i})log(1-h_{\theta}(x_{i})))}
J(θ)=−m1i=1∑m(yilog(hθ(xi))+(1−yi)log(1−hθ(xi)))
∂
θ
J
(
θ
)
=
1
m
∑
i
=
1
m
(
h
θ
(
x
i
)
−
y
i
)
x
i
\frac{\partial}{\theta}J(\theta)=\frac{1}{m}\sum_{i=1}^{m}(h_{\theta}(x_{i})-y_{i})x_{i}
θ∂J(θ)=m1∑i=1m(hθ(xi)−yi)xi
def sigmoid(x):#sigmoid函数
return 1.0/(1+np.exp(-x))
def cost(x,y,theta):#代价函数
x=np.matrix(x)
y=np.matrix(y)
theta=np.matrix(theta)
first=np.multiply(y,np.log(sigmoid(x*theta)))
second=np.multiply(1-y,np.log(1-sigmoid(x*theta)))
return np.sum(first+second)/(-len(x))
def grad(x,y,theta,epochs=1000,lr=0.001):#进行梯度下降
x=np.matrix(x)
y=np.matrix(y)
theta=np.matrix(theta)
#print(x.shape,' ',theta.shape)
m=x.shape[0]
costList=[]
for i in range(epochs+1):
#print(x.shape,' ',theta.shape)
h=sigmoid(x*theta)
delta=x.T*(h-y)/m
theta=theta-lr*delta
if(i%50==0):
costList.append(cost(x,y,theta))#计算损失值
return theta,costList
theta=np.ones((x.shape[1],1))
#print(theta.shape)
theta,costList=grad(x,y,theta,3000,0.3)
a=np.linspace(0,3000,61)#生成61个数
plt.plot(a,costList,c='y')
plt.show()
from sklearn.linear_model import LogisticRegression
x=np.array([0.50,0.75,1.00,1.25,1.50,1.75,1.75,2.00,2.25,2.50,
2.75,3.00,3.25,3.50,4.00,4.25,4.50,4.75,5.00,5.50])
x=x.reshape(-1,1)
y=np.array([0,0,0,0,0,0,1,0,1,0,1,0,1,0,1,1,1,1,1,1])
y=y.reshape(-1,1)
model=LogisticRegression()
model.fit(x,y)
b=model.intercept_
a=model.coef_
print(a,b)
print(theta)
[[1.14860386]] [-3.13952411]
[[-4.07770898]
[ 1.50464392]]
D:\IDEA\environment\py38\lib\site-packages\sklearn\utils\validation.py:985: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
y = column_or_1d(y, warn=True)
from sklearn.metrics import classification_report
def predect(x,theta):
x=np.matrix(x)
theta=np.matrix(theta)
return [1 if i>0.5 else 0 for i in (sigmoid(x*theta))]
x2=np.concatenate((np.ones((x.shape[0],1)),x),axis=1)#添加偏置项
prediction=predect(x2,theta)
print(classification_report(y,prediction))
precision recall f1-score support
0 0.80 0.80 0.80 10
1 0.80 0.80 0.80 10
accuracy 0.80 20
macro avg 0.80 0.80 0.80 20
weighted avg 0.80 0.80 0.80 20
可以看出正确率有80%
5.(补充)reshape函数
①numpy.arange(n).reshape(a, b) 依次生成n个自然数,并且以a行b列的数组形式显示
②mat (or array).reshape(c, -1) 必须是矩阵格式或者数组格式,才能使用 .reshape(c, -1) 函数, 表示将此矩阵或者数组重组,以 c行d列的形式表示
③reshape(1,-1)转化成1行
reshape(2,-1)转换成两行
reshape(-1,1)转换成1列
reshape(-1,2)转化成两列
>>> import numpy as np
>>> np.arange(16).reshape(2,8)#生成16个数,以2行8列形式显示
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15]])
>>> a=np.arange(16).reshape(2,8)#生成16个数,以2行8列形式显示
>>> a.shape
(2, 8)
>>> a.reshape(4,-1)#改变为m行,d列(-1表示列数自动计算,d=a*b/m)
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]])
>>> a.reshape(-1,2)#改变为d行,m列(-1表示行数自动计算,d=a*b/m)
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]])
>>> np.array(1,12,2)#(a,b.c) 从数字a起,步长为c,到b结束
>>> np.arange(1,12,2)#(a,b.c) 从数字a起数,步长为c,到b结束
array([ 1, 3, 5, 7, 9, 11])
>>> a.reshape(1,-1)#1行
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]])
>>> a.reshape(2,-1)#2行
array([[ 0, 1, 2, 3, 4, 5, 6, 7],
[ 8, 9, 10, 11, 12, 13, 14, 15]])
>>> a.reshape(-1,1)#一列
array([[ 0],
[ 1],
[ 2],
[ 3],
[ 4],
[ 5],
[ 6],
[ 7],
[ 8],
[ 9],
[10],
[11],
[12],
[13],
[14],
[15]])
>>> a.reshape(-1,2)#2列
array([[ 0, 1],
[ 2, 3],
[ 4, 5],
[ 6, 7],
[ 8, 9],
[10, 11],
[12, 13],
[14, 15]])
>>>