# logistic回归

## logistic分布

F(x)=P(xx)=11+e(xμ)/γ f(x)=F(x)=e(xμ)/γγ(1+e(xμ)/γ)2F(x)=P(x≤x)=11+e−(x−μ)/γ f(x)=F′(x)=e−(x−μ)/γγ(1+e−(x−μ)/γ)2

f(x)f(x)F(x)F(x)图像如下，其中分布函数是以(μ,12)(μ,12)为中心对阵，γγ越小曲线变化越快。

## logistic回归模型

P(Y=1|x)=exp(wx+b)1+exp(wx+b) P(Y=0|x)=11+exp(wx+b)P(Y=1|x)=exp(w⋅x+b)1+exp(w⋅x+b) P(Y=0|x)=11+exp(w⋅x+b)

### 参数估计

P(Y=1|x)=π(x),P(Y=0|x)=1π(x)P(Y=1|x)=π(x),P(Y=0|x)=1−π(x)

_i=1N[π(x_i)]y_i[1π(x_i)]1y_i∏_i=1N[π(x_i)]y_i[1−π(x_i)]1−y_i

L(w)=_i=1N[y_ilogπ(x_i)+(1y_i)log(1π(x_i))] =_i=1N[y_ilogπ(x_i)1π(x_i)+log(1π(x_i))]L(w)=∑_i=1N[y_ilog⁡π(x_i)+(1−y_i)log⁡(1−π(x_i))] =∑_i=1N[y_ilog⁡π(x_i)1−π(x_i)+log⁡(1−π(x_i))]

## 梯度上升确定回归系数

logistic回归的sigmoid函数：

σ(z)=11+ezσ(z)=11+e−z

y=w_0+w_1x_1+w_2x_2+...+w_nx_ny=w_0+w_1x_1+w_2x_2+...+w_nx_n

y=w_0+wTxy=w_0+wTx

w=w+α_wf(w)w=w+α∇_wf(w)

### 训练算法

• 每个回归系数都初始化为1
• 重复N次
• 计算整个数据集合的梯度
• 使用αf(x)α⋅∇f(x)来更新w向量
• 返回回归系数
#!/usr/bin/env python
# encoding:utf-8

import math
import numpy
import time
import matplotlib.pyplot as plt

def sigmoid(x):
return 1.0 / (1 + numpy.exp(-x))

dataMat = []
laberMat = []
with open("test.txt", 'r') as f:
arry = line.strip().split()
dataMat.append([1.0, float(arry[0]), float(arry[1])])
laberMat.append(float(arry[2]))
return numpy.mat(dataMat), numpy.mat(laberMat).transpose()

start_time = time.time()
m, n = numpy.shape(dataMat)
weights = numpy.ones((n, 1))
for i in range(maxCycle):
h = sigmoid(dataMat * weights)
error = laberMat - h
weights += alpha * dataMat.transpose() * error
duration = time.time() - start_time
print "duration of time:", duration
return weights

start_time = time.time()
m, n = numpy.shape(dataMat)
weights = numpy.ones((n, 1))
for i in range(m):
h = sigmoid(dataMat[i] * weights)
error = laberMat[i] - h
weights += alpha * dataMat[i].transpose() * error
duration = time.time() - start_time
print "duration of time:", duration
return weights

"""better one, use a dynamic alpha"""
start_time = time.time()
m, n = numpy.shape(dataMat)
weights = numpy.ones((n, 1))
for j in range(numIter):
for i in range(m):
alpha = 4 / (1 + j + i) + 0.01
h = sigmoid(dataMat[i] * weights)
error = laberMat[i] - h
weights += alpha * dataMat[i].transpose() * error
duration = time.time() - start_time
print "duration of time:", duration
return weights
start_time = time.time()

def show(dataMat, laberMat, weights):
m, n = numpy.shape(dataMat)
min_x = min(dataMat[:, 1])[0, 0]
max_x = max(dataMat[:, 1])[0, 0]
xcoord1 = []; ycoord1 = []
xcoord2 = []; ycoord2 = []
for i in range(m):
if int(laberMat[i, 0]) == 0:
xcoord1.append(dataMat[i, 1]); ycoord1.append(dataMat[i, 2])
elif int(laberMat[i, 0]) == 1:
xcoord2.append(dataMat[i, 1]); ycoord2.append(dataMat[i, 2])
fig = plt.figure()
ax.scatter(xcoord1, ycoord1, s=30, c="red", marker="s")
ax.scatter(xcoord2, ycoord2, s=30, c="green")
x = numpy.arange(min_x, max_x, 0.1)
y = (-weights[0] - weights[1]*x) / weights[2]
ax.plot(x, y)
plt.xlabel("x1"); plt.ylabel("x2")
plt.show()

if __name__ == "__main__":
show(dataMat, laberMat, weights)

αα进行优化，动态调整步长（同样降低了迭代次数，但是由于代码采用动态调整alpha，提高了结果的准确性），结果如下

• 0
点赞
• 0
收藏
• 0
评论
06-08
11-28
10-20
04-13 4万+
03-29
08-11
10-31
03-21 380
03-15 1247

### “相关推荐”对你有帮助么？

• 非常没帮助
• 没帮助
• 一般
• 有帮助
• 非常有帮助

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、C币套餐、付费专栏及课程。