首先,Logistics回归的损失函数不是和线性回归的损失函数一样,并非平方误差函数,因为平方误差用在Logistic中,代价函数是一个非凸函数,会有多个局部最小值。
推导过程如下:
假设函数:
h
θ
(
x
)
=
g
(
θ
T
x
)
=
1
1
+
e
−
θ
T
x
h_\theta(x)=g({\theta}^{T}x)=\frac{1}{1+e^{-{\theta}^Tx}}
hθ(x)=g(θTx)=1+e−θTx1
sigmoid函数及其导数:
g
(
x
)
=
1
1
+
e
−
x
,
g
′
(
x
)
=
g
(
x
)
[
1
−
g
(
x
)
]
g(x)=\frac{1}{1+e^{-x}},g^{'}(x)=g(x)[1-g(x)]
g(x)=1+e−x1,g′(x)=g(x)[1−g(x)]
概率分布:
{
P
(
y
=
1
∣
x
;
θ
)
=
h
θ
(
x
)
P
(
y
=
0
∣
x
;
θ
)
=
1
−
h
θ
(
x
)
\begin{cases} & \text P(y=1|x;\theta)=h_\theta(x) \\ & \text P(y=0|x;\theta)=1-h_\theta(x) \end{cases}
{P(y=1∣x;θ)=hθ(x)P(y=0∣x;θ)=1−hθ(x)
合并为:
P
(
y
∣
x
;
θ
)
=
[
h
θ
(
x
)
]
y
[
1
−
h
θ
(
x
)
]
1
−
y
P(y|x;\theta)=[h_\theta(x)]^{y}[1-h_\theta(x)]^{1-y}
P(y∣x;θ)=[hθ(x)]y[1−hθ(x)]1−y
采用最大似然估计进行参数估计,似然函数:
L
(
θ
)
=
∏
i
=
1
m
P
(
y
(
i
)
∣
x
(
i
)
;
θ
)
L(\theta)=\prod_{i=1}^{m}P(y^{(i)}|x^{(i)};\theta)
L(θ)=i=1∏mP(y(i)∣x(i);θ)
=
∏
i
=
1
m
[
h
θ
(
x
(
i
)
)
]
y
(
i
)
[
1
−
h
θ
(
x
(
i
)
)
]
1
−
y
(
i
)
=\prod_{i=1}^{m}[h_\theta(x^{(i)})]^{y^{(i)}}[1-h_\theta(x^{(i)})]^{1-y^{(i)}}
=i=1∏m[hθ(x(i))]y(i)[1−hθ(x(i))]1−y(i)
似然函数取对数:
l
(
θ
)
=
l
o
g
L
(
θ
)
=
∑
i
=
1
m
y
(
i
)
l
o
g
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
(
1
−
l
o
g
h
θ
(
x
(
i
)
)
)
l(\theta)=log L(\theta)=\sum_{i=1}^{m}y^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})(1-logh_\theta(x^{(i)}))
l(θ)=logL(θ)=i=1∑my(i)loghθ(x(i))+(1−y(i))(1−loghθ(x(i)))
所以,我们要使似然函数取最大值,采用梯度上升的方法求
l
(
θ
)
l(\theta)
l(θ)的最大值。
在吴恩达机器学习视频中,他令损失函数:
J
(
θ
)
=
−
1
m
l
(
θ
)
=
−
1
m
∑
i
=
1
m
y
(
i
)
l
o
g
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
J(\theta)=-\frac{1}{m}l(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))
J(θ)=−m1l(θ)=−m1i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))
所以求似然函数
l
(
θ
)
l(\theta)
l(θ)的最大值,相当于求损失函数
J
(
θ
)
J(\theta)
J(θ)的最小值,采用梯度下降。
所以,在吴恩达机器学习视频中,他直接给出的logistics的误差函数就是这样的:
C
o
s
t
(
h
θ
(
x
)
,
y
)
=
{
-
l
o
g
h
θ
(
x
)
,
当
y
=
1
-
l
o
g
(
1
−
h
θ
(
x
)
)
,
当
y
=
0
=
−
y
(
i
)
l
o
g
h
θ
(
x
(
i
)
)
−
(
1
−
y
(
i
)
)
(
1
−
l
o
g
h
θ
(
x
(
i
)
)
)
Cost(h_\theta(x),y)= \begin{cases} & \text -logh_\theta(x) ,当 y=1\\ & \text -log(1-h_\theta(x)),当 y=0 \end{cases} =-y^{(i)}logh_\theta(x^{(i)})-(1-y^{(i)})(1-logh_\theta(x^{(i)}))
Cost(hθ(x),y)={-loghθ(x),当y=1-log(1−hθ(x)),当y=0=−y(i)loghθ(x(i))−(1−y(i))(1−loghθ(x(i)))
梯度求解过程:
J
(
θ
)
=
−
1
m
∑
i
=
1
m
y
(
i
)
l
o
g
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
l
o
g
(
1
−
h
θ
(
x
(
i
)
)
)
J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}logh_\theta(x^{(i)})+(1-y^{(i)})log(1-h_\theta(x^{(i)}))
J(θ)=−m1i=1∑my(i)loghθ(x(i))+(1−y(i))log(1−hθ(x(i)))
∂
∂
θ
j
J
(
θ
)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
1
h
θ
(
x
(
i
)
)
∂
∂
θ
j
h
θ
(
x
(
i
)
)
+
(
1
−
y
(
i
)
)
1
1
−
h
θ
(
x
(
i
)
)
∂
∂
θ
j
(
1
−
h
θ
(
x
(
i
)
)
)
]
\frac{\partial }{\partial \theta_j}J(\theta)=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}\frac{1}{h_\theta(x^{(i)})}\frac{\partial }{\partial \theta_j}h_\theta(x^{(i)})+(1-y^{(i)})\frac{1}{1-h_\theta(x^{(i)})}\frac{\partial }{\partial \theta_j}(1-h_\theta(x^{(i)}))]
∂θj∂J(θ)=−m1i=1∑m[y(i)hθ(x(i))1∂θj∂hθ(x(i))+(1−y(i))1−hθ(x(i))1∂θj∂(1−hθ(x(i)))]
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
1
h
θ
(
x
(
i
)
)
∂
∂
θ
j
h
θ
(
x
(
i
)
)
−
(
1
−
y
(
i
)
)
1
1
−
h
θ
(
x
(
i
)
)
∂
∂
θ
j
h
θ
(
x
(
i
)
)
]
=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}\frac{1}{h_\theta(x^{(i)})}\frac{\partial }{\partial \theta_j}h_\theta(x^{(i)})-(1-y^{(i)})\frac{1}{1-h_\theta(x^{(i)})}\frac{\partial }{\partial \theta_j}h_\theta(x^{(i)})]
=−m1i=1∑m[y(i)hθ(x(i))1∂θj∂hθ(x(i))−(1−y(i))1−hθ(x(i))1∂θj∂hθ(x(i))]
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
1
h
θ
(
x
(
i
)
)
−
(
1
−
y
(
i
)
)
1
1
−
h
θ
(
x
(
i
)
)
]
∂
∂
θ
j
h
θ
(
x
(
i
)
)
=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}\frac{1}{h_\theta(x^{(i)})}-(1-y^{(i)})\frac{1}{1-h_\theta(x^{(i)})}]\frac{\partial }{\partial \theta_j}h_\theta(x^{(i)})
=−m1i=1∑m[y(i)hθ(x(i))1−(1−y(i))1−hθ(x(i))1]∂θj∂hθ(x(i))
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
1
h
θ
(
x
(
i
)
)
−
(
1
−
y
(
i
)
)
1
1
−
h
θ
(
x
(
i
)
)
]
h
θ
(
x
(
i
)
)
[
1
−
h
θ
(
x
(
i
)
)
]
x
j
(
i
)
=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}\frac{1}{h_\theta(x^{(i)})}-(1-y^{(i)})\frac{1}{1-h_\theta(x^{(i)})}]h_\theta(x^{(i)})[1-h_\theta(x^{(i)})]x^{(i)}_{j}
=−m1i=1∑m[y(i)hθ(x(i))1−(1−y(i))1−hθ(x(i))1]hθ(x(i))[1−hθ(x(i))]xj(i)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
(
1
−
h
θ
(
x
(
i
)
)
)
−
(
1
−
y
(
i
)
)
h
θ
(
x
(
i
)
)
]
x
j
(
i
)
=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}(1-h_\theta(x^{(i)}))-(1-y^{(i)})h_\theta(x^{(i)})]x^{(i)}_{j}
=−m1i=1∑m[y(i)(1−hθ(x(i)))−(1−y(i))hθ(x(i))]xj(i)
=
−
1
m
∑
i
=
1
m
[
y
(
i
)
−
h
θ
(
x
(
i
)
)
]
x
j
(
i
)
=-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}-h_\theta(x^{(i)})]x^{(i)}_{j}
=−m1i=1∑m[y(i)−hθ(x(i))]xj(i)
梯度更新过程:
θ
j
:
=
θ
j
−
α
∂
∂
θ
j
J
(
θ
)
=
θ
j
−
α
(
−
1
m
∑
i
=
1
m
[
y
(
i
)
−
h
θ
(
x
(
i
)
)
]
x
j
(
i
)
)
\theta_j:=\theta_j-\alpha\frac{\partial }{\partial \theta_j}J(\theta)=\theta_j-\alpha(-\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}-h_\theta(x^{(i)})]x^{(i)}_{j})
θj:=θj−α∂θj∂J(θ)=θj−α(−m1i=1∑m[y(i)−hθ(x(i))]xj(i))
=
θ
j
+
α
(
1
m
∑
i
=
1
m
[
y
(
i
)
−
h
θ
(
x
(
i
)
)
]
x
j
(
i
)
)
=\theta_j+\alpha(\frac{1}{m}\sum_{i=1}^{m}[y^{(i)}-h_\theta(x^{(i)})]x^{(i)}_{j})
=θj+α(m1i=1∑m[y(i)−hθ(x(i))]xj(i))
1.数据处理
testSet.txt:
-0.017612 14.053064 0
-1.395634 4.662541 1
-0.752157 6.538620 0
-1.322371 7.152853 0
0.423363 11.054677 0
0.406704 7.067335 1
0.667394 12.741452 0
-2.460150 6.866805 1
0.569411 9.548755 0
-0.026632 10.427743 0
0.850433 6.920334 1
1.347183 13.175500 0
1.176813 3.167020 1
-1.781871 9.097953 0
-0.566606 5.749003 1
0.931635 1.589505 1
-0.024205 6.151823 1
-0.036453 2.690988 1
-0.196949 0.444165 1
1.014459 5.754399 1
1.985298 3.230619 1
-1.693453 -0.557540 1
-0.576525 11.778922 0
-0.346811 -1.678730 1
-2.124484 2.672471 1
1.217916 9.597015 0
-0.733928 9.098687 0
-3.642001 -1.618087 1
0.315985 3.523953 1
1.416614 9.619232 0
-0.386323 3.989286 1
0.556921 8.294984 1
1.224863 11.587360 0
-1.347803 -2.406051 1
1.196604 4.951851 1
0.275221 9.543647 0
0.470575 9.332488 0
-1.889567 9.542662 0
-1.527893 12.150579 0
-1.185247 11.309318 0
-0.445678 3.297303 1
1.042222 6.105155 1
-0.618787 10.320986 0
1.152083 0.548467 1
0.828534 2.676045 1
-1.237728 10.549033 0
-0.683565 -2.166125 1
0.229456 5.921938 1
-0.959885 11.555336 0
0.492911 10.993324 0
0.184992 8.721488 0
-0.355715 10.325976 0
-0.397822 8.058397 0
0.824839 13.730343 0
1.507278 5.027866 1
0.099671 6.835839 1
-0.344008 10.717485 0
1.785928 7.718645 1
-0.918801 11.560217 0
-0.364009 4.747300 1
-0.841722 4.119083 1
0.490426 1.960539 1
-0.007194 9.075792 0
0.356107 12.447863 0
0.342578 12.281162 0
-0.810823 -1.466018 1
2.530777 6.476801 1
1.296683 11.607559 0
0.475487 12.040035 0
-0.783277 11.009725 0
0.074798 11.023650 0
-1.337472 0.468339 1
-0.102781 13.763651 0
-0.147324 2.874846 1
0.518389 9.887035 0
1.015399 7.571882 0
-1.658086 -0.027255 1
1.319944 2.171228 1
2.056216 5.019981 1
-0.851633 4.375691 1
-1.510047 6.061992 0
-1.076637 -3.181888 1
1.821096 10.283990 0
3.010150 8.401766 1
-1.099458 1.688274 1
-0.834872 -1.733869 1
-0.846637 3.849075 1
1.400102 12.628781 0
1.752842 5.468166 1
0.078557 0.059736 1
0.089392 -0.715300 1
1.825662 12.693808 0
0.197445 9.744638 0
0.126117 0.922311 1
-0.679797 1.220530 1
0.677983 2.556666 1
0.761349 10.693862 0
-2.168791 0.143632 1
1.388610 9.341997 0
0.317029 14.739025 0
h θ ( x ) = θ 0 + θ 1 x 1 + θ 2 x 2 , y = 1 1 + e − h θ ( x ) h_{\theta}(x)=\theta_0+\theta_1x_1+\theta_2x_2,y=\frac{1}{1+e^{-h_{\theta(x)}}} hθ(x)=θ0+θ1x1+θ2x2,y=1+e−hθ(x)1
2.梯度下降
参考链接:https://blog.csdn.net/achuo/article/details/51160101
'''data'''
def loadDataSet():
f=open('testSet.txt')
data=[]
label=[]
for line in f.readlines():
tmp=line.strip().split()
data.append((1,float(tmp[0]),float(tmp[1])))
label.append(int(tmp[2]))
#data:(100,3)
#label:(100)
return data,label
def sigmoid(x):
return 1.0/(1+np.exp(-x))
'''梯度下降'''
def gradAscent(data,label):
#data:(100,3)
#label:(100)
#(100,3)
dataM = np.mat(data)
#(100,1)
labelM = np.mat(label).transpose()
#m个数据,n个特征(包括第0维特征1.0)
m,n = np.shape(data)
alpha = 0.001
iters = 500
#(3,1)
weights = np.ones((n,1))
for i in range(iters):
#(100,3)*(3,1)=(100,1)
h = sigmoid(dataM*weights)
error = labelM-h #对应上面的梯度更新过程
weights = weights + alpha*dataM.transpose()*error
return weights
y
=
1
1
+
e
−
θ
T
x
=
1
1
+
e
θ
0
+
θ
1
x
1
+
θ
2
x
2
y=\frac{1}{1+e^{-\boldsymbol{\theta^{T}x}}}=\frac{1}{1+e^{\theta_0+\theta_1x_1+\theta_2x_2}}
y=1+e−θTx1=1+eθ0+θ1x1+θ2x21
更新公式:
θ
j
:
=
θ
j
−
α
⋅
∑
i
=
1
m
(
h
θ
(
x
(
i
)
)
−
y
(
i
)
)
⋅
x
j
(
i
)
\theta_j:=\theta_j-\alpha ·\sum_{i=1}^{m}(h_\theta(x^{(i)})-y^{(i)})·x_{j}^{(i)}
θj:=θj−α⋅i=1∑m(hθ(x(i))−y(i))⋅xj(i)