RDD[Vector]


1.629502 1.66991
1.871226 1.898365
1.46171 1.91306
1.58579 1.537943
2.018275 1.836801
1.98899 2.006619
1.599317 1.991072
1.991236 1.235661
1.057009 1.601767
1.889463 1.86318
1.368395 1.213885
1.251551 1.821578
1.904642 1.523114
1.383058 1.641584
1.182018 1.286603
1.030947 1.093305
2.050907 1.327946
1.74832 2.008842
2.02456 1.23564
1.02345 1.25648
1\
scala> val data_path="/home/sc/Desktop/data.txt"
data_path: String = /home/sc/Desktop/data.txt


scala> val data = sc.textFile(data_path).map(_.split(" ")).map(f => f.map(f => f.toDouble))
16/08/12 06:03:54 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 38.8 KB, free 135.9 KB)
16/08/12 06:03:54 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.2 KB, free 140.1 KB)
16/08/12 06:03:54 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:50455 (size: 4.2 KB, free: 517.4 MB)
16/08/12 06:03:54 INFO SparkContext: Created broadcast 4 from textFile at <console>:35
data: org.apache.spark.rdd.RDD[Array[Double]] = MapPartitionsRDD[13] at map at <console>:35


scala> import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.mllib.linalg.Vectors


scala> val datal = data.map(f => Vectors.dense(f))
datal: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MapPartitionsRDD[14] at map at <console>:39


scala> datal.collect
16/08/12 06:04:14 INFO FileInputFormat: Total input paths to process : 1
16/08/12 06:04:14 INFO SparkContext: Starting job: collect at <console>:42
16/08/12 06:04:14 INFO DAGScheduler: Got job 2 (collect at <console>:42) with 1 output partitions
16/08/12 06:04:14 INFO DAGScheduler: Final stage: ResultStage 2 (collect at <console>:42)
16/08/12 06:04:14 INFO DAGScheduler: Parents of final stage: List()
16/08/12 06:04:14 INFO DAGScheduler: Missing parents: List()
16/08/12 06:04:14 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[14] at map at <console>:39), which has no missing parents
16/08/12 06:04:14 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.6 KB, free 143.7 KB)
16/08/12 06:04:14 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2027.0 B, free 145.7 KB)
16/08/12 06:04:14 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:50455 (size: 2027.0 B, free: 517.4 MB)
16/08/12 06:04:14 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1006
16/08/12 06:04:14 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[14] at map at <console>:39)
16/08/12 06:04:14 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/08/12 06:04:14 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2133 bytes)
16/08/12 06:04:14 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
16/08/12 06:04:14 INFO HadoopRDD: Input split: file:/home/sc/Desktop/data.txt:0+351
16/08/12 06:04:14 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 2786 bytes result sent to driver
16/08/12 06:04:14 INFO DAGScheduler: ResultStage 2 (collect at <console>:42) finished in 0.166 s
16/08/12 06:04:14 INFO DAGScheduler: Job 2 finished: collect at <console>:42, took 0.257591 s
16/08/12 06:04:14 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 163 ms on localhost (1/1)
16/08/12 06:04:14 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
res3: Array[org.apache.spark.mllib.linalg.Vector] = Array([1.629502,1.66991], [1.871226,1.898365], [1.46171,1.91306], [1.58579,1.537943], [2.018275,1.836801], [1.98899,2.006619], [1.599317,1.991072], [1.991236,1.235661], [1.057009,1.601767], [1.889463,1.86318], [1.368395,1.213885], [1.251551,1.821578], [1.904642,1.523114], [1.383058,1.641584], [1.182018,1.286603], [1.030947,1.093305], [2.050907,1.327946], [1.74832,2.008842], [2.02456,1.23564], [1.02345,1.25648])


scala> 
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

星之擎

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值