Machine Learning In Action - Chapter 5 Logistic Regression

Chapter5 - Logistic Regression

For the logistic regression classifier we’ll take our features and multiply each one by a weight and then add them up. This result will be put into the sigmoid, and we’ll get a number between 0 and 1. Anything above 0.5 we’ll classify as a 1, and anything below 0.5 we’ll classify as a 0. You can also think of logistic regression as a probability estimate.

  • 原理

σ(z)=11+ez,z=wTxz=w0x0+w1x1+...+wnxn

其中w是我们需要训练出来的n维权值向量,x是我们输入的n维特征向量。

大于0.5的类别为1,小于的类别为0.

  • 实现

(1) 加载数据中有一行

dataMat.append([1.0, float(lineArr[0]), float(lineArr[1])])

线性模型是有一个常数项w0,它代表了拟合线的上下浮动

(2) 训练数据

def gradAscent(dataMatIn, classLabels):
  dataMatrix = mat(dataMatIn)
  labelMat = mat(classLabels).transpose()
  m,n = shape(dataMatrix)
  alpha = 0.001
  maxCycles = 500
  weights = ones((n,1))
  for k in range(maxCycles):
    h = sigmoid(dataMatrix*weights)
    error = (labelMat - h)
    weights = weights + alpha * dataMatrix.transpose()* error
  return weights

难理解的是倒数第二行,为什么这样就是梯度上升了呢,其实这后面隐藏了一个巨大的推导过程,详细的推导见

http://blog.csdn.net/dongtingzhizi/article/details/15962797

有非常非常详细的推导。

(3) 提高效率

随机梯度上升-This is known as stochastic gradient ascent. Stochastic gradient ascent is an example of an online learning algorithm. This is known as online because we can incrementally update the classifier as new data comes in rather than all at once.

(4) 个别特征数据缺失

一般有以下处理方法

1.Use the feature’s mean value from all the available data.
2.Fill in the unknown with a special value like -1.
3.Ignore the instance.
4.Use a mean value from similar items.
5.Use another machine learning algorithm to predict the value.

在LR中,采取的办法是将缺失值设为0,且这种方法没有影响,因为

weights = weights + alpha * error * dataMatrix[randIndex]

如果某个特征的值为0,那么weights对应的特征的改变也为0,并没有朝着哪个方向迈进,因此选0对训练结果没有影响。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值