kmeans各种

原创 2016年08月28日 23:20:46

一、
scala> def loadLibSVMFile(sc:SparkContext,path:String,numFeatures:Int,mainPartitions:Int): RDD[LabeledPoint]={
     | val parsed =sc. textFile("/home/sc/Desktop/data.txt",2)
     | val parsed =sc. textFile("/home/sc/Desktop/data.txt",2).map(_.trim)
     | .filter(line => !(line.isEmpty || line.startsWith("#")))
     | .map{line =>
     | val items = line.spilt(' ')
     | val label = items.head.toDouble
     | val (indices,values)=items.tail.filter(_.nonEmpty).map{ item =>
     | val indexAndValue = item.spilt(' ')
     | val (indices,values)=items.tail.filter(_.nonEmpty).map{ item =>
     | val indexAndValue = item.spilt(':')
     | val index =indexAndValue(0).toInt - 1
     | val value = indexAndValue(1).toDouble
     | (index, value)
     | }.unzip
     | (label, indices,toArray, value.toArray)
     | }
二、
 def loadLibSVMFile(sc:SparkContext,path:String,numFeatures:Int,mainPartitions:Int)={
     val parsed =sc. textFile("/home/sc/Desktop/data.txt",2).map(_.trim)
      .filter(line => !(line.isEmpty || line.startsWith("#")))
     .map{line =>
      val items = line.spilt(' ')
      val label = items.head.toDouble
      val (indices,values)=items.tail.filter(_.nonEmpty).map{ item =>
      val indexAndValue = item.spilt(':')
      val index =indexAndValue(0).toInt - 1
      val value = indexAndValue(1).toDouble
      (index, value)
      }.unzip
      (label, indices,toArray, value.toArray)
      }
      val d = if (numFeatures > 0)
      {
      numFeatures
      }else{
      parsed.persist(StorageLevel1.MEMORY_ONLY)
      parsed.map{ case (label, indices, values) =>
      indices.lastOption.getOrElse(0)
      }.reduce(math.max) + 1
      }
      parsed.map{ case (label, indices, values
      ) =>
      LabledPoint(labels, Vectors.sparse(d, indices, values))
      }
      }
三、spark examples
/*
 * Licensed to the Apache Software Foundation (ASF) under one or more
 * contributor license agreements.  See the NOTICE file distributed with
 * this work for additional information regarding copyright ownership.
 * The ASF licenses this file to You under the Apache License, Version 2.0
 * (the "License"); you may not use this file except in compliance with
 * the License.  You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */


// scalastyle:off println
package org.apache.spark.examples


import breeze.linalg.{Vector, DenseVector, squaredDistance}


import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.SparkContext._


/**
 * K-means clustering.
 *
 * This is an example implementation for learning how to use Spark. For more conventional use,
 * please refer to org.apache.spark.mllib.clustering.KMeans
 */
object SparkKMeans {


  def parseVector(line: String): Vector[Double] = {
    DenseVector(line.split(' ').map(_.toDouble))
  }


  def closestPoint(p: Vector[Double], centers: Array[Vector[Double]]): Int = {
    var bestIndex = 0
    var closest = Double.PositiveInfinity


    for (i <- 0 until centers.length) {
      val tempDist = squaredDistance(p, centers(i))
      if (tempDist < closest) {
        closest = tempDist
        bestIndex = i
      }
    }


    bestIndex
  }


  def showWarning() {
    System.err.println(
      """WARN: This is a naive implementation of KMeans Clustering and is given as an example!
        |Please use the KMeans method found in org.apache.spark.mllib.clustering
        |for more conventional use.
      """.stripMargin)
  }


  def main(args: Array[String]) {


    if (args.length < 3) {
      System.err.println("Usage: SparkKMeans <file> <k> <convergeDist>")
      System.exit(1)
    }


    showWarning()


    val sparkConf = new SparkConf().setAppName("SparkKMeans")
    val sc = new SparkContext(sparkConf)
    val lines = sc.textFile(args(0))
    val data = lines.map(parseVector _).cache()
    val K = args(1).toInt
    val convergeDist = args(2).toDouble


    val kPoints = data.takeSample(withReplacement = false, K, 42).toArray
    var tempDist = 1.0


    while(tempDist > convergeDist) {
      val closest = data.map (p => (closestPoint(p, kPoints), (p, 1)))


      val pointStats = closest.reduceByKey{case ((p1, c1), (p2, c2)) => (p1 + p2, c1 + c2)}


      val newPoints = pointStats.map {pair =>
        (pair._1, pair._2._1 * (1.0 / pair._2._2))}.collectAsMap()


      tempDist = 0.0
      for (i <- 0 until K) {
        tempDist += squaredDistance(kPoints(i), newPoints(i))
      }


      for (newP <- newPoints) {
        kPoints(newP._1) = newP._2
      }
      println("Finished iteration (delta = " + tempDist + ")")
    }


    println("Final centers:")
    kPoints.foreach(println)
    sc.stop()
  }
}
版权声明:本文为博主原创文章,未经博主允许不得转载。

SparkKMeans

1. 读取每一行,按 def parseVector(line: String): Vector[Double] = { DenseVector(line.split(' ').map(_...

kmeans聚类方法的使用

  • 2016年02月01日 14:56
  • 13KB
  • 下载

机器学习实战--kMeans

前面的几个章节主要学习了监督学习,从这节开始,进入到无监督学习。这节的内容主要有kMeans,kMeans簇的后处理,二分kMeans。一、kMeans1、算法原理: 2、算方法实现: 1、初...

Kmeans算法为各国体育水平分类

  • 2016年08月15日 15:15
  • 72KB
  • 下载

Kmeans算法python实现

  • 2017年08月10日 15:21
  • 8KB
  • 下载

Matalb处理:RGB转YUV,再Kmeans分割 1.0版

23日晚,新式采阈值从图像分割开始
  • ZJU_YH
  • ZJU_YH
  • 2014年12月23日 19:45
  • 289

VC的Kmeans程序

  • 2017年08月31日 20:03
  • 2.25MB
  • 下载

KMeans聚类实验

  • 2017年05月10日 11:43
  • 56KB
  • 下载

聚类算法KMeans和KMedoid 的Matlab实现

KMeans和KMedoid算法是聚类算法中比较普遍的方法,本文讲了其原理和matlab中实现的代码。 1.目标:        找出一个分割,使得距离平方和最小 ...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:kmeans各种
举报原因:
原因补充:

(最多只允许输入30个字)