根据已知数据拟合一条曲线,这条曲线上最小均方误差最小化为优化标准。得到一堆参数,然后保存下来,就是保存模型。 给定一个样本,相乘即可得到预测的值,一般用于回归。
(ng讲的第一个例子) 分类的时候,加个激活函数,如sigmoid函数
细节:
梯度下降://每次迭代都是用全部数据,数据量很大的时候,非常贵(也称为批量梯度下降,Batch Gradient Descent)
def gradAscent(dataMatIn,classLabels):
dataMatrix = mat(dataMatIn) //矩阵运算
labelMat = mat(classLabels).transpose() //转置
m,n = shape(dataMatrix)
alpha = 0.001
maxCycles = 500
weights = ones((n,1))
for k in range(maxCycles)
h = sigmoid(dataMatrix*weights) //全部数据 m*1
error = labelMat-h //m*1
weights = weights+alpha*dataMatrix.transpose()*error
return weights
随机梯度上升://每次用一个样本来更新权值,容易过拟合,局部最优 (SGD)
def stoGradAscent(dataMatIn,classLabels):
dataMatrix = mat(dataMatIn) //矩阵运算
labelMat = mat(classLabels).transpose() //转置
m,n = shape(dataMatrix)
alpha = 0.01
weights = ones(m)
for i in range(m)
h=sigmoid(sum(dataMatrix[i]*weights))
error = labelMat[i]-h
weights = weights+ alpha*error*dataMatrix[i]
return weights
改进的随机梯度下降:
def stoGrandAscent1(dataMatIn,classLabels,numIters=200):
dataMatrix = mat(dataMatIn) //矩阵运算
labelMat = mat(classLabels).transpose() //转置
m,n = shape(dataMatrix)
weights = ones(m)
for j in range(numIters): //迭代次数
dataIndex = range(m)
for i in range(m): //m个样本
alpha = 4/(1.0+j+i)+0.01
randIndex = int(random.uniform(0,len(dataIndex)))
h = sigmoid(sum(dataMatrix[i]*weights))
error = labelMat – h
weights = weights+ alpha*error*dataMatrix[randIndex]
del(dataIndex[randIndex]) //子循环里面剔除已经计算过的样本,无放回抽样
return weights
mini-batch 梯度下降: (keras中采用这种方法)
//每次选取小批量数据,一个一个计算梯度然后取均值用来更新权值。(MBGD)
减少随机性,下降的方向更具有代表性
参考:机器学习实战