Spark MLlib SVM算法

最新推荐文章于 2021-05-18 16:13:19 发布

sunbow0

最新推荐文章于 2021-05-18 16:13:19 发布

阅读量1w

点赞数 1

分类专栏： Spark Spark MLlib 文章标签： spark mllib svm

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

本文链接：https://blog.csdn.net/sunbow0/article/details/45582771

版权

本文详细介绍了Spark Mllib中的SVM支持向量机算法，包括SVM的基础理论、SVMWithSGD的源码分析，以及核心的GradientDescent优化算法。SVMWithSGD类中包含了梯度下降、正则化参数、迭代次数等关键配置。通过runMiniBatchSGD方法进行迭代优化，使用HingeGradient和SquaredL2Updater分别计算梯度和更新权重。此外，还给出了Mllib SVM的实例，展示如何读取数据、划分训练集和测试集、创建模型并评估预测精度。

摘要由CSDN通过智能技术生成

1.1 SVM支持向量机算法

支持向量机理论知识参照以下文档：

支持向量机SVM（一）

http://www.cnblogs.com/jerrylead/archive/2011/03/13/1982639.html

支持向量机SVM（二）

http://www.cnblogs.com/jerrylead/archive/2011/03/13/1982684.html

支持向量机（三）核函数

http://www.cnblogs.com/jerrylead/archive/2011/03/18/1988406.html

支持向量机（四）

http://www.cnblogs.com/jerrylead/archive/2011/03/18/1988415.html

支持向量机（五）SMO算法

http://www.cnblogs.com/jerrylead/archive/2011/03/18/1988419.html

SVM的目标函数及梯度下降更新公式如下：

MLlib 中 SVM的代码结构如下：

1.2 Spark Mllib SVM源码分析

1.2.1 SVMWithSGD

SVM算法的train方法，由SVMWithSGD类的object定义了train函数，在train函数中新建了SVMWithSGD对象。

package org.apache.spark.mllib.classification

// 1 类：SVMWithSGD

class SVMWithSGD private (

privatevar stepSize: Double,

privatevar numIterations: Int,

privatevar regParam: Double,

privatevar miniBatchFraction: Double)

extends GeneralizedLinearAlgorithm[SVMModel] with Serializable {

privateval gradient = new HingeGradient()

privateval updater = new SquaredL2Updater()

overrideval optimizer = new GradientDescent(gradient, updater)

.setStepSize(stepSize)

.setNumIterations(numIterations)

.setRegParam(regParam)

.setMiniBatchFraction(miniBatchFraction)

overrideprotectedval validators = List(DataValidators.binaryLabelValidator)

/**

* Construct a SVM object with default parameters: {stepSize: 1.0, numIterations: 100,

* regParm: 0.01, miniBatchFraction: 1.0}.

*/

defthis() = this(1.0, 100, 0.01, 1.0)

overrideprotecteddef createModel(weights: Vector, intercept: Double) = {

new SVMModel(weights, intercept)

}

}

SVMWithSGD类中参数说明：

stepSize: 迭代步长，默认为1.0

numIterations: 迭代次数，默认为100

regParam: 正则化参数，默认值为0.0

miniBatchFraction: 每次迭代参与计算的样本比例，默认为1.0

gradient：HingeGradient ()，梯度下降；

updater：SquaredL2Updater ()，正则化，L2范数；

optimizer：GradientDescent (gradient, updater)，梯度下降最优化计算。

// 2 train方法

object SVMWithSGD {

/**

* Train a SVM model given an RDD of (label, features) pairs. We run a fixed number

* of iterations of gradient descent using the specified step size. Each iteration uses

* `miniBatchFraction` fraction of the data to calculate the gradient. The weights used in

* gradient descent are initialized using the initial weights provided.

*

* NOTE: Labels used in SVM should be {0, 1}.

*

* @param input RDD of (label, array of features) pairs.

* @param numIterations Number of iterations of gradient descent to run.

* @param stepSize Step size to be used for each iteration of gradient descent.

* @param regParam Regularization parameter.

* @param miniBatchFraction Fraction of data to be used per iteration.

* @param initialWeights Initial set of weights to be used. Array should be equal in size to

* the number of features in the data.

*/

最低0.47元/天解锁文章

关注

1
点赞
踩
10

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录

评论 1

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。