本文主要贴Logistic回归算法实现,Logistic回归的原理和推导可参考《机器学习》、《机器学习实战》《统计学习方法》等教材,以及http://blog.csdn.net/dongtingzhizi/article/details/15962797
http://blog.csdn.net/u011197534/article/details/53492915 等文章。
语言: Python ; 数据集: 《机器学习实战》中疝气病症预测马的死亡率预处理后的数据集
def LoadData(filename): #导入数据
dataset = []
with open(filename) as file_object: #将数据读入内存,with会在合适的时候关闭文件
file_lines = file_object.readlines()
for line in file_lines:
#print line
dataset.append(line.strip().split())
return dataset
def gradAscent(dat): #梯度上升算法
dataset = dat[:,:-1]
label = dat[:,-1] #转置变为列向量
dataset = dataset.astype(np.float64)
label = label.astype(np.float64)
#ml = label.shape
label.shape = (1,label.shape[0])#一维数组转置必须指定大小
label = label.T
m,n = np.shape(dataset)
oneline = np.ones((m,1))
dataset = np.concatenate((dataset,oneline),axis=1)
w = np.ones((n+1,1))
#print dataset
alpha = 0.01
for num in xrange(100):
#print w
h = Sigmod(np.dot(dataset,w))
#print np.dot(dataset,w)
err = label - h
#print 'label: ',label
#print 'h: ',h
#print 'err:',err
w = w + alpha * np.dot(dataset.T,err)
#print 'w',w
return w
def resultcalc(inX,w): #输入单个列向量,计算输出结果
return Sigmod(np.array(inX).T * w[:,-1] +w[-1])
def Sigmod(z): #Sigmod函数
return 1.0/(1 + math.e**(-z))
def fliter(inLab): #对结果的二值处理
label = inLab[:]
for i in range(len(label)):
if label[i] > 0.5:
label[i] = 1
else: label[i] = 0
return label
def accuratecalc(dat,w): #准确度计算
dataset = dat[:,:-1]
label = dat[:,-1] #转置变为列向量
label.shape = (1,label.shape[0])#一维数组转置必须指定大小
label = label.T #转置变为列向量
m,n = np.shape(dataset)
oneline = np.ones((m,1))
dataset = np.concatenate((dataset,oneline),axis=1)
dataset = dataset.astype(np.float64)
label = label.astype(np.float64)
h = np.dot(dataset,w)
#print Sigmod(h)
labelcalc = fliter(Sigmod(h))
err = label - labelcalc
#count = Counter(list(err)), Counter不能对numpy计数
#print label,labelcalc
rightcounter = np.count_nonzero(err)
return 1 - rightcounter/(m * 1.0)