摘要:本文是spark ml的逻辑回归实现,共包括5个部分,分别为LogisticCostFun、binaryUpdataInPlace、multinomialUpdataInPlace、LogisticAggregator和train。其中,
(一)LogisticCostFun:计算损失函数
(二)binaryUpdataInPlace:对2分类模型,进行参数更新
(三)multinomialUpdataInPlace:对多分类模型,进行参数更新
(四)LogisticAggregator:对分布式的LR,进行整合
(五)train:训练
得到结论如下:
1)spark 只对样本实现了分布式,并没有对参数实现分布式
2)样本增加时,训练不会有问题
3)参数增加时,训练会有影响。(spark还是将LR的所有参数广播到各个节点)
(一) LogisticCostFun
/**
* LogisticCostFun implements Breeze's DiffFunction[T] for a multinomial (softmax) logistic loss
* function, as used in multi-class classification (it is also used in binary logistic regression).
* It returns the loss and gradient with L2 regularization at a particular point (coefficients).
* It's used in Breeze's convex optimization routines.
*/
private class LogisticCostFun(
instances: RDD[Instance],
numClasses: Int,
fitIntercept: Boolean,
standardization: Boolean,
bcFeaturesStd: Broadcast[Array[Double]],
regParamL2: Double,
multinomial: Boolean,
aggregationDepth: Int) extends DiffFunction[BDV[Double]] {
override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = {
val coeffs = Vectors.fromBreeze(coefficients)
val bcCoeffs = instances.context.broadcast(coeffs)
val featuresStd = bcFeaturesStd.value
val numFeatures = featuresStd.length
val numCoefficientSets = if (multinomial) numClasses else 1
val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else numFeatures
val logisticAggregator = {
val seqOp = (c: LogisticAggregator, instance: Instance) => c.add(instance)
val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => c1.merge(c2)
instances.treeAggregate(
new LogisticAggregator(bcCoeffs, bcFeaturesStd, numClasses, fitIntercept,
multinomial)
)(seqOp, combOp, aggregationDepth)
}
val totalGradientMatrix = logisticAggregator.gradient
val coefMatrix = new DenseMatrix(numCoefficientSets, numFeaturesPlusIntercept, coeffs.toArray)
// regVal is the sum of coefficients squares excluding intercept for L2 regularization.
val regVal = if (regParamL2 == 0.0) {
0.0
} else {
var sum = 0.0
coefMatrix.foreachActive { case (classIndex, featureIndex, value) =>
// We do not apply regularization to the intercepts
val isIntercept = fitIntercept && (featureIndex == numFeatures)
if (!isIntercept) {
// The following code will compute the loss of the regularization; also
// the gradient of the regularization, and add back to totalGradientArray.
sum += {
if (standardization) {
val gradValue = totalGradientMatrix(classIndex, featureIndex) //取梯度矩阵中(classIndex, featureIndex)对应的值
totalGradientMatrix.update(classIndex, featureIndex, gradValue + regParamL2 * value) //正则项的梯度为regParamL2 * value
value * value //正则项的loss为value * value
} else {
if (featuresStd(featureIndex) != 0.0) {
// If `standardization` is false, we still standardize the data
// to improve the rate of convergence; as a result, we have to
// perform this reverse standardization by penalizing each component
// differently to get effectively the same objective function when
// the training dataset is not standardized.
val temp = value / (featuresStd(featureIndex) * featuresStd(featureIndex)) //梯度除以相应类别的特征的方差
val gradValue = totalGradientMatrix(classIndex, featureIndex)
totalGradientMatrix.update(classIndex, featureIndex, gradValue + regParamL2 * temp)
value * temp //loss也相应除以方差
} else { //如果该类别的特征的标准差为0.0,不需要scaling
0.0
}
}
}
}
}
0.5 * regParamL2 * sum //正则项总的loss
}
bcCoeffs.destroy(blocking = false)
(logisticAggregator.loss + regVal, new BDV(totalGradientMatrix.toArray)) //返回(loss, gradient)
}
}
(二)binaryUpdateInPlace
/** Update gradient and loss using binary loss function. */
private def binaryUpdateInPlace(
features: Vector,
weight: Double,
label: Double): Unit = {
val localFeaturesStd = bcFeaturesStd.value
val localCoefficients = bcCoefficients.value
val localGradientArray = gradientSumArray
val margin = - {
var sum = 0.0
features.foreachActive { (index, value) =>
if (localFeaturesStd(index) != 0.0 && value != 0.0) {
sum += localCoefficients(index) * value / localFeaturesStd(index) // sum = x * W^{T}
}
}
if (fitIntercept) sum += localCoefficients(numFeaturesPlusIntercept - 1) // sum = x * W^{T} + b
sum
}
val multiplier = weight * (1.0 / (1.0 + math.exp(margin)) - label) //weight是该样本的权重
features.foreachActive { (index, value) =>
if (localFeaturesStd(index) != 0.0 && value != 0.0) {
localGradientArray(index) += multiplier * value / localFeaturesStd(index) //更新梯度
}
}
if (fitIntercept) {
localGradientArray(numFeaturesPlusIntercept - 1) += multiplier
} //更新截距b对应的梯度
if (label > 0) {
// 在MLUtils.log1pExp(x)实现中,if(x > 0) log(1+exp(x)) = x + log(1+exp(-x))
lossSum += weight * MLUtils.log1pExp(margin) //loss = log(1+exp(margin))
} else {
lossSum += weight * (MLUtils.log1pExp(margin) - margin) //loss = margin - log(1+exp(margin))
}
}
(三)multinomialUpdateInPlace
Note that there is a difference between multinomial (softmax) and binary loss. The binary case uses one outcome class as a “pivot” and regresses the other class against the pivot. In the multinomial case, the softmax loss function is used to model each class probability independently. Using softmax loss produces K
sets of coefficients, while using a pivot class produces K - 1
sets of coefficients (a single coefficient vector in the binary case). In the binary case, we can say that the coefficients are shared between the positive and negative classes. When regularization is applied, multinomial (softmax) loss will produce a result different from binary loss since the positive and negative don’t share the coefficients while the binary regression shares the coefficients between positive and negative.
The following is a mathematical derivation for the multinomial (softmax) loss. The probability of the multinomial outcome y y taking on any of the K possible outcomes is:
The model coefficients