spark.mllib源码阅读-优化算法2-Updater

Updater是Spark中进行机器学习时对用于更新参数的轮子,参数更新的过程是

1、第i轮的机器学习求解得到的参数wi

2、第i+1轮计算得到的梯度值

3、正则化选项

来计算第i+1轮的机器学习要求解的参数wi+1

 

Spark实现了三类Updater,SimpleUpdater、L1Updater及SquaredL2Updater,他们之间关系为


SimpleUpdater:

无正则化的Updater,直接基于梯度值来更新参数wi+1=wi - gradient*rate

实现代码如下:

class SimpleUpdater extends Updater {
  override def compute(
      weightsOld: Vector,
      gradient: Vector,
      stepSize: Double,
      iter: Int,
      regParam: Double): (Vector, Double) = {
    val thisIterStepSize = stepSize / math.sqrt(iter) //计算rate
    val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector
    brzAxpy(-thisIterStepSize, gradient.asBreeze, brzWeights) //brzWeights - thisIterStepSize*gradient

    (Vectors.fromBreeze(brzWeights), 0)
  }
}

 L1Updater:

L1正则化的Updater,在参数更新的时候,对参数值的范围做了一定的限制,即使w更稀疏至于L1正则化的过程,原始文档讲的很详细

* If w(参数,下同) is greater than shrinkageVal(通过一个基于当前迭代次数和正则化因子的计算的值), set weight component to w-shrinkageVal.
* If w is less than -shrinkageVal, set weight component to w+shrinkageVal.
* If w is (-shrinkageVal, shrinkageVal), set weight component to 0.

实现代码也很简单:

class L1Updater extends Updater {
  override def compute(
      weightsOld: Vector,
      gradient: Vector,
      stepSize: Double,
      iter: Int,
      regParam: Double): (Vector, Double) = {
    val thisIterStepSize = stepSize / math.sqrt(iter)
    // Take gradient step
    val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector
    brzAxpy(-thisIterStepSize, gradient.asBreeze, brzWeights)
    // Apply proximal operator (soft thresholding)
    val shrinkageVal = regParam * thisIterStepSize
    var i = 0
    val len = brzWeights.length
    while (i < len) {
      val wi = brzWeights(i)
      brzWeights(i) = signum(wi) * max(0.0, abs(wi) - shrinkageVal)
      i += 1
    }
    (Vectors.fromBreeze(brzWeights), brzNorm(brzWeights, 1.0) * regParam)
  }
}

L2正则化Updater:

L2正则化Updater,在原有损失函数的基础上加上1/2 ||w||^2,对正则化之后的损失函数求梯度得到参数的更新

w' = (1 - thisIterStepSize * regParam) * w - thisIterStepSize * gradient
class SquaredL2Updater extends Updater {
  override def compute(
      weightsOld: Vector,
      gradient: Vector,
      stepSize: Double,
      iter: Int,
      regParam: Double): (Vector, Double) = {
    val thisIterStepSize = stepSize / math.sqrt(iter)
    val brzWeights: BV[Double] = weightsOld.asBreeze.toDenseVector
    brzWeights :*= (1.0 - thisIterStepSize * regParam)
    brzAxpy(-thisIterStepSize, gradient.asBreeze, brzWeights)
    val norm = brzNorm(brzWeights, 2.0)
    (Vectors.fromBreeze(brzWeights), 0.5 * regParam * norm * norm)
  }
}

在定制自己的正则化方式时,可以继承抽象基类Updater并实现参数更新compute方法


  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
好的,以下是使用Scala和Spark MLlib实现支持向量机分类算法的代码示例: ```scala import org.apache.spark.{SparkConf, SparkContext} import org.apache.spark.mllib.classification.SVMModel import org.apache.spark.mllib.classification.{SVMWithSGD, SVMModel} import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics import org.apache.spark.mllib.optimization.L1Updater import org.apache.spark.mllib.util.MLUtils object SVMExample { def main(args: Array[String]): Unit = { val conf = new SparkConf().setAppName("SVMExample").setMaster("local") val sc = new SparkContext(conf) // 加载和解析数据文件 val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt") // 将数据分为训练集和测试集 val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1) // 模型训练 val numIterations = 100 val model = SVMWithSGD.train(training, numIterations) // 模型参数设置 model.setIntercept(true) model.optimizer .setNumIterations(200) .setRegParam(0.1) .setUpdater(new L1Updater) // 模型测试 val scoreAndLabels = test.map { point => val score = model.predict(point.features) (score, point.label) } // 计算模型评估指标 val metrics = new BinaryClassificationMetrics(scoreAndLabels) val auROC = metrics.areaUnderROC() println(s"Area under ROC = $auROC") // 保存模型 model.save(sc, "target/tmp/scalaSVMWithSGDModel") val sameModel = SVMModel.load(sc, "target/tmp/scalaSVMWithSGDModel") } } ``` 以上代码实现了使用Scala和Spark MLlib训练一个基于梯度下降法的支持向量机分类模型,并使用测试数据集评估模型性能,最后将模型保存到本地磁盘。请注意,这只是一个简单的示例,实际应用中需要根据数据集的特性和需求进行适当的调参和优化。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值