【源码解析】Logistic Regression

摘要:本文是spark ml的逻辑回归实现,共包括5个部分,分别为LogisticCostFun、binaryUpdataInPlace、multinomialUpdataInPlace、LogisticAggregator和train。其中, (一)LogisticCostFun:计算损失函数 (二)binaryUpdataInPlace:对2分类模型,进行参数更新 (三)multinom...
摘要由CSDN通过智能技术生成

摘要:本文是spark ml的逻辑回归实现,共包括5个部分,分别为LogisticCostFun、binaryUpdataInPlace、multinomialUpdataInPlace、LogisticAggregator和train。其中,
(一)LogisticCostFun:计算损失函数
(二)binaryUpdataInPlace:对2分类模型,进行参数更新
(三)multinomialUpdataInPlace:对多分类模型,进行参数更新
(四)LogisticAggregator:对分布式的LR,进行整合
(五)train:训练
得到结论如下:
1)spark 只对样本实现了分布式,并没有对参数实现分布式
2)样本增加时,训练不会有问题
3)参数增加时,训练会有影响。(spark还是将LR的所有参数广播到各个节点)

(一) LogisticCostFun

/**
 * LogisticCostFun implements Breeze's DiffFunction[T] for a multinomial (softmax) logistic loss
 * function, as used in multi-class classification (it is also used in binary logistic regression).
 * It returns the loss and gradient with L2 regularization at a particular point (coefficients).
 * It's used in Breeze's convex optimization routines.
 */
private class LogisticCostFun(
    instances: RDD[Instance],
    numClasses: Int,
    fitIntercept: Boolean,
    standardization: Boolean,
    bcFeaturesStd: Broadcast[Array[Double]],
    regParamL2: Double,
    multinomial: Boolean,
    aggregationDepth: Int) extends DiffFunction[BDV[Double]] {
   

  override def calculate(coefficients: BDV[Double]): (Double, BDV[Double]) = {
    val coeffs = Vectors.fromBreeze(coefficients)
    val bcCoeffs = instances.context.broadcast(coeffs)
    val featuresStd = bcFeaturesStd.value
    val numFeatures = featuresStd.length
    val numCoefficientSets = if (multinomial) numClasses else 1
    val numFeaturesPlusIntercept = if (fitIntercept) numFeatures + 1 else numFeatures

    val logisticAggregator = {
      val seqOp = (c: LogisticAggregator, instance: Instance) => c.add(instance)
      val combOp = (c1: LogisticAggregator, c2: LogisticAggregator) => c1.merge(c2)

      instances.treeAggregate(
        new LogisticAggregator(bcCoeffs, bcFeaturesStd, numClasses, fitIntercept,
          multinomial)
      )(seqOp, combOp, aggregationDepth)
    }

    val totalGradientMatrix = logisticAggregator.gradient
    val coefMatrix = new DenseMatrix(numCoefficientSets, numFeaturesPlusIntercept, coeffs.toArray)
    // regVal is the sum of coefficients squares excluding intercept for L2 regularization.
    val regVal = if (regParamL2 == 0.0) {
      0.0
    } else {
      var sum = 0.0
      coefMatrix.foreachActive { case (classIndex, featureIndex, value) =>
        // We do not apply regularization to the intercepts
        val isIntercept = fitIntercept && (featureIndex == numFeatures)
        if (!isIntercept) {
          // The following code will compute the loss of the regularization; also
          // the gradient of the regularization, and add back to totalGradientArray.
          sum += {
            if (standardization) {
              val gradValue = totalGradientMatrix(classIndex, featureIndex)  //取梯度矩阵中(classIndex, featureIndex)对应的值
              totalGradientMatrix.update(classIndex, featureIndex, gradValue + regParamL2 * value)  //正则项的梯度为regParamL2 * value
              value * value  //正则项的loss为value * value
            } else {
              if (featuresStd(featureIndex) != 0.0) {
                // If `standardization` is false, we still standardize the data
                // to improve the rate of convergence; as a result, we have to
                // perform this reverse standardization by penalizing each component
                // differently to get effectively the same objective function when
                // the training dataset is not standardized.
                val temp = value / (featuresStd(featureIndex) * featuresStd(featureIndex))  //梯度除以相应类别的特征的方差
                val gradValue = totalGradientMatrix(classIndex, featureIndex)
                totalGradientMatrix.update(classIndex, featureIndex, gradValue + regParamL2 * temp)
                value * temp  //loss也相应除以方差
              } else {  //如果该类别的特征的标准差为0.0,不需要scaling
                0.0
              }
            }
          }
        }
      }
      0.5 * regParamL2 * sum  //正则项总的loss
    }
    bcCoeffs.destroy(blocking = false)

    (logisticAggregator.loss + regVal, new BDV(totalGradientMatrix.toArray))  //返回(loss, gradient)
  }
}

(二)binaryUpdateInPlace

/** Update gradient and loss using binary loss function. */
  private def binaryUpdateInPlace(
      features: Vector,
      weight: Double,
      label: Double): Unit = {

    val localFeaturesStd = bcFeaturesStd.value
    val localCoefficients = bcCoefficients.value
    val localGradientArray = gradientSumArray
    val margin = - {
      var sum = 0.0
      features.foreachActive { (index, value) =>
        if (localFeaturesStd(index) != 0.0 && value != 0.0) {
          sum += localCoefficients(index) * value / localFeaturesStd(index)  // sum = x * W^{T}
        }
      }
      if (fitIntercept) sum += localCoefficients(numFeaturesPlusIntercept - 1) // sum = x * W^{T} + b
      sum
    }

    val multiplier = weight * (1.0 / (1.0 + math.exp(margin)) - label)  //weight是该样本的权重

    features.foreachActive { (index, value) =>
      if (localFeaturesStd(index) != 0.0 && value != 0.0) {
        localGradientArray(index) += multiplier * value / localFeaturesStd(index)  //更新梯度
      }
    }

    if (fitIntercept) {
      localGradientArray(numFeaturesPlusIntercept - 1) += multiplier
    }  //更新截距b对应的梯度

    if (label > 0) {
      // 在MLUtils.log1pExp(x)实现中,if(x > 0) log(1+exp(x)) = x + log(1+exp(-x))
      lossSum += weight * MLUtils.log1pExp(margin)  //loss = log(1+exp(margin))
    } else {
      lossSum += weight * (MLUtils.log1pExp(margin) - margin) //loss = margin - log(1+exp(margin))
    }
  }

(三)multinomialUpdateInPlace

Note that there is a difference between multinomial (softmax) and binary loss. The binary case uses one outcome class as a “pivot” and regresses the other class against the pivot. In the multinomial case, the softmax loss function is used to model each class probability independently. Using softmax loss produces K sets of coefficients, while using a pivot class produces K - 1 sets of coefficients (a single coefficient vector in the binary case). In the binary case, we can say that the coefficients are shared between the positive and negative classes. When regularization is applied, multinomial (softmax) loss will produce a result different from binary loss since the positive and negative don’t share the coefficients while the binary regression shares the coefficients between positive and negative.

The following is a mathematical derivation for the multinomial (softmax) loss. The probability of the multinomial outcome y y taking on any of the K possible outcomes is:


P ( y i = 0 | x i , β ) = e x i T β 0 k = 0 K 1 e x i T β k P ( y i = 1 | x i , β ) = e x i T β 1 k = 0 K 1 e x i T β k P ( y i = K 1 | x i , β ) = e x i T β K 1 k = 0 K 1 e x i T β k

The model coefficients

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值