Logistic回归算法实现

最新推荐文章于 2023-12-15 17:00:28 发布

__jiangcheng

最新推荐文章于 2023-12-15 17:00:28 发布

阅读量572

点赞数

分类专栏：数据挖掘文章标签： Python 机器学习 Logistic回归

本文链接：https://blog.csdn.net/pengkingli/article/details/75213186

版权

数据挖掘专栏收录该内容

6 篇文章 3 订阅

订阅专栏

本文主要贴Logistic回归算法实现，Logistic回归的原理和推导可参考《机器学习》、《机器学习实战》《统计学习方法》等教材，以及http://blog.csdn.net/dongtingzhizi/article/details/15962797

http://blog.csdn.net/u011197534/article/details/53492915 等文章。

语言： Python ；数据集：《机器学习实战》中疝气病症预测马的死亡率预处理后的数据集

def LoadData(filename): #导入数据
    dataset = []
    with open(filename) as file_object: #将数据读入内存，with会在合适的时候关闭文件
        file_lines = file_object.readlines()
    for line in file_lines:
        #print line
        dataset.append(line.strip().split())
    return dataset

def gradAscent(dat): #梯度上升算法
    dataset = dat[:,:-1]
    label = dat[:,-1]  #转置变为列向量
    dataset = dataset.astype(np.float64)
    label = label.astype(np.float64)
    #ml = label.shape
    label.shape = (1,label.shape[0])#一维数组转置必须指定大小
    label = label.T
    m,n = np.shape(dataset)
    oneline = np.ones((m,1))
    dataset = np.concatenate((dataset,oneline),axis=1)
    w = np.ones((n+1,1))
    #print dataset
    alpha = 0.01
    for num in xrange(100):
        #print w
        h = Sigmod(np.dot(dataset,w))
        #print np.dot(dataset,w)
        err = label - h
        #print 'label: ',label
        #print 'h: ',h
        #print 'err:',err
        w = w + alpha * np.dot(dataset.T,err) 
        #print 'w',w
    return  w

def resultcalc(inX,w):  #输入单个列向量，计算输出结果
    return Sigmod(np.array(inX).T * w[:,-1] +w[-1])

def Sigmod(z): #Sigmod函数
    return 1.0/(1 + math.e**(-z))

def fliter(inLab): #对结果的二值处理
    label = inLab[:]
    for i in range(len(label)):
        if label[i] > 0.5:
            label[i] = 1
        else: label[i] = 0
    return label

def accuratecalc(dat,w): #准确度计算
    dataset = dat[:,:-1]
    label = dat[:,-1]  #转置变为列向量
    label.shape = (1,label.shape[0])#一维数组转置必须指定大小
    label = label.T  #转置变为列向量
    m,n = np.shape(dataset)
    oneline = np.ones((m,1))
    dataset = np.concatenate((dataset,oneline),axis=1)
    dataset = dataset.astype(np.float64)
    label = label.astype(np.float64)
    h = np.dot(dataset,w)
    #print Sigmod(h)
    labelcalc = fliter(Sigmod(h))
    err = label - labelcalc
    #count = Counter(list(err)),  Counter不能对numpy计数
    #print label,labelcalc
    rightcounter = np.count_nonzero(err)
    return 1 - rightcounter/(m * 1.0)

这是简单的二值Logistic回归算法，简单说一下这里用的是梯度上升算法，梯度上升算法在计算每个系数的增量的时候回综合所有的数据集，增量是一个向量；随机梯度上升算法每次迭代用的是一条数据，增量是一个值。随机上升梯度每次随机抽取数据且采用逐渐衰减的步长，可避免迭代周期性的波动，减少计算量。

__jiangcheng

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Logistic回归算法实现

本文主要贴Logistic回归算法实现，Logistic回归的原理和推导可参考《机器学习》、《机器学习实战》《统计学习方法》等教材，以及http://blog.csdn.net/dongtingzhizi/article/details/15962797http://blog.csdn.net/u011197534/article/details/53492915 等文章。语言： Py
复制链接

扫一扫