spark向量、矩阵类型

先来个普通的数组:

<code class="hljs javascript has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">scala> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">var</span> arr=<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">Array</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>)
arr: <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">Array</span>[Double] = <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">Array</span>(<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.0</span>, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4.0</span>)</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

可以将它转换成一个Vector:

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">scala> import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mllib</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linalg</span>._
scala> var vec=Vectors<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.dense</span>(arr)
<span class="hljs-label" style="box-sizing: border-box;">vec:</span> org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mllib</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linalg</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.Vector</span> = [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4.0</span>]</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

再做一个RDD[Vector]:

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">scala> val rdd=sc<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.makeRDD</span>(Seq(Vectors<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.dense</span>(arr),Vectors<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.dense</span>(arr<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.map</span>(_*<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">10</span>)),Vectors<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.dense</span>(arr<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.map</span>(_*<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">100</span>))))
<span class="hljs-label" style="box-sizing: border-box;">rdd:</span> org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.rdd</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.RDD</span>[org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mllib</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linalg</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.Vector</span>] = ParallelCollectionRDD[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">6</span>] at makeRDD at <console>:<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">26</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li></ul>

可以根据这个RDD做一个分布式的矩阵:

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">scala> import org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mllib</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linalg</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.distributed</span>._
scala> val mat: RowMatrix = new RowMatrix(rdd)
<span class="hljs-label" style="box-sizing: border-box;">mat:</span> org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mllib</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linalg</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.distributed</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.RowMatrix</span> = org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mllib</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linalg</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.distributed</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.RowMatrix</span><span class="hljs-localvars" style="box-sizing: border-box;">@3133</span>b850
scala> val m = mat<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.numRows</span>()
<span class="hljs-label" style="box-sizing: border-box;">m:</span> Long = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>
scala> val n = mat<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.numCols</span>()
<span class="hljs-label" style="box-sizing: border-box;">n:</span> Long = <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span></code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li></ul>

试试统计工具,算算平均值:

<code class="hljs avrasm has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;">scala> var sum=Statistics<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.colStats</span>(rdd)
scala> sum<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mean</span>
<span class="hljs-label" style="box-sizing: border-box;">res7:</span> org<span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.apache</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.spark</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.mllib</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.linalg</span><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">.Vector</span> = [<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">37.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">74.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">111.0</span>,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">148.0</span>]</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li></ul>

版权声明:本文为博主原创文章,未经博主允许不得转载。

混淆矩阵(Confusion Matrix)是用于衡量分类器性能的重要指标,它展示了分类器在测试数据集上的分类结果。混淆矩阵是一个二维矩阵,其中每行代表预测的类别,每列代表实际的类别,矩阵的每个元素则表示分类器将某个实例预测为某个类别的次数。 在 Scala 中,你可以使用 Spark MLlib 中提供的混淆矩阵类 `MulticlassMetrics` 来计算混淆矩阵。下面是一个简单的示例代码: ```scala import org.apache.spark.mllib.evaluation.MulticlassMetrics import org.apache.spark.rdd.RDD // 假设 predictions 是一个 RDD,其中每个元素是一个 (predictedLabel: Double, trueLabel: Double) 的元组 val metrics = new MulticlassMetrics(predictions) // 获取混淆矩阵 val confusionMatrix = metrics.confusionMatrix // 输出混淆矩阵 println(s"Confusion matrix:\n${confusionMatrix.toString}") ``` 在向量机(Support Vector Machine,SVM)中,它是一种常见的分类算法,可以用于解决二分类和多分类问题。在 Scala 中,你可以使用 Spark MLlib 中提供的 SVM 类来构建 SVM 模型。下面是一个简单的示例代码: ```scala import org.apache.spark.mllib.classification.{SVMModel, SVMWithSGD} import org.apache.spark.mllib.util.MLUtils // 加载训练数据 val data = MLUtils.loadLibSVMFile(sc, "data/mllib/sample_libsvm_data.txt") // 将数据分为训练集和测试集 val splits = data.randomSplit(Array(0.6, 0.4), seed = 11L) val training = splits(0).cache() val test = splits(1) // 训练 SVM 模型 val numIterations = 100 val model = SVMWithSGD.train(training, numIterations) // 在测试集上进行预测 val predictions = model.predict(test.map(_.features)) // 评估模型性能 val labelsAndPredictions = test.map { point => val prediction = model.predict(point.features) (point.label, prediction) } val metrics = new MulticlassMetrics(labelsAndPredictions) val accuracy = metrics.accuracy println(s"Test accuracy: $accuracy") ``` 上述示例代码中,我们首先加载了一个 SVM 样例数据集,然后将数据集分为训练集和测试集。接着,我们使用 SVMWithSGD 类训练了一个 SVM 模型,并使用该模型对测试集进行预测。最后,我们使用 MulticlassMetrics 类计算模型的准确率。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值