xingzhiqing的博客

万难比不过一难,开了弓,就不回头。

RDD[Vector]


1.629502 1.66991
1.871226 1.898365
1.46171 1.91306
1.58579 1.537943
2.018275 1.836801
1.98899 2.006619
1.599317 1.991072
1.991236 1.235661
1.057009 1.601767
1.889463 1.86318
1.368395 1.213885
1.251551 1.821578
1.904642 1.523114
1.383058 1.641584
1.182018 1.286603
1.030947 1.093305
2.050907 1.327946
1.74832 2.008842
2.02456 1.23564
1.02345 1.25648
1\
scala> val data_path="/home/sc/Desktop/data.txt"
data_path: String = /home/sc/Desktop/data.txt


scala> val data = sc.textFile(data_path).map(_.split(" ")).map(f => f.map(f => f.toDouble))
16/08/12 06:03:54 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 38.8 KB, free 135.9 KB)
16/08/12 06:03:54 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.2 KB, free 140.1 KB)
16/08/12 06:03:54 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:50455 (size: 4.2 KB, free: 517.4 MB)
16/08/12 06:03:54 INFO SparkContext: Created broadcast 4 from textFile at <console>:35
data: org.apache.spark.rdd.RDD[Array[Double]] = MapPartitionsRDD[13] at map at <console>:35


scala> import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.Vectors


scala> val datal = data.map(f => Vectors.dense(f))
datal: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MapPartitionsRDD[14] at map at <console>:39


scala> datal.collect
16/08/12 06:04:14 INFO FileInputFormat: Total input paths to process : 1
16/08/12 06:04:14 INFO SparkContext: Starting job: collect at <console>:42
16/08/12 06:04:14 INFO DAGScheduler: Got job 2 (collect at <console>:42) with 1 output partitions
16/08/12 06:04:14 INFO DAGScheduler: Final stage: ResultStage 2 (collect at <console>:42)
16/08/12 06:04:14 INFO DAGScheduler: Parents of final stage: List()
16/08/12 06:04:14 INFO DAGScheduler: Missing parents: List()
16/08/12 06:04:14 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[14] at map at <console>:39), which has no missing parents
16/08/12 06:04:14 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.6 KB, free 143.7 KB)
16/08/12 06:04:14 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2027.0 B, free 145.7 KB)
16/08/12 06:04:14 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:50455 (size: 2027.0 B, free: 517.4 MB)
16/08/12 06:04:14 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1006
16/08/12 06:04:14 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[14] at map at <console>:39)
16/08/12 06:04:14 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/08/12 06:04:14 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2133 bytes)
16/08/12 06:04:14 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
16/08/12 06:04:14 INFO HadoopRDD: Input split: file:/home/sc/Desktop/data.txt:0+351
16/08/12 06:04:14 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 2786 bytes result sent to driver
16/08/12 06:04:14 INFO DAGScheduler: ResultStage 2 (collect at <console>:42) finished in 0.166 s
16/08/12 06:04:14 INFO DAGScheduler: Job 2 finished: collect at <console>:42, took 0.257591 s
16/08/12 06:04:14 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 163 ms on localhost (1/1)
16/08/12 06:04:14 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
res3: Array[org.apache.spark.mllib.linalg.Vector] = Array([1.629502,1.66991], [1.871226,1.898365], [1.46171,1.91306], [1.58579,1.537943], [2.018275,1.836801], [1.98899,2.006619], [1.599317,1.991072], [1.991236,1.235661], [1.057009,1.601767], [1.889463,1.86318], [1.368395,1.213885], [1.251551,1.821578], [1.904642,1.523114], [1.383058,1.641584], [1.182018,1.286603], [1.030947,1.093305], [2.050907,1.327946], [1.74832,2.008842], [2.02456,1.23564], [1.02345,1.25648])


scala> 
阅读更多
版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/xingzhiqing/article/details/52348890
文章标签: spark scala
个人分类: 记录
上一篇submit SparkKMeans
下一篇spark,scala并行求和
想对作者说点什么? 我来说一句

没有更多推荐了,返回首页

关闭
关闭