spark,scala并行求和

原创 2016年08月28日 23:18:57
scala> val text=sc.textFile("/home/sc/Desktop/data.txt")


16/08/08 02:57:19 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 38.8 KB, free 124.7 KB)


16/08/08 02:57:24 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.2 KB, free 128.9 KB)


16/08/08 02:57:24 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on localhost:51836 (size: 4.2 KB, free: 517.4 MB)


16/08/08 02:57:24 INFO SparkContext: Created broadcast 4 from textFile at <console>:27
text: org.apache.spark.rdd.RDD[String] = /home/sc/Desktop/data.txt MapPartitionsRDD[14] at textFile at <console>:27






scala> val int=text.flatMap(line => line.split(" "))
int: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[15] at flatMap at <console>:29


scala> val double = int.map(_.toDouble)


double: org.apache.spark.rdd.RDD[Double] = MapPartitionsRDD[16] at map at <console>:31






scala> val rdd1 = double.reduce(_ + _)


16/08/08 02:59:45 INFO FileInputFormat: Total input paths to process : 1


16/08/08 02:59:47 INFO SparkContext: Starting job: reduce at <console>:33


16/08/08 02:59:47 INFO DAGScheduler: Got job 1 (reduce at <console>:33) with 1 output partitions


16/08/08 02:59:47 INFO DAGScheduler: Final stage: ResultStage 2 (reduce at <console>:33)


16/08/08 02:59:47 INFO DAGScheduler: Parents of final stage: List()
16/08/08 02:59:47 INFO DAGScheduler: Missing parents: List()


16/08/08 02:59:48 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[16] at map at <console>:31), which has no missing parents


16/08/08 02:59:54 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.6 KB, free 132.4 KB)


16/08/08 03:00:07 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2046.0 B, free 134.4 KB)


16/08/08 03:00:07 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on localhost:51836 (size: 2046.0 B, free: 517.4 MB)
16/08/08 03:00:07 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1006
16/08/08 03:00:07 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[16] at map at <console>:31)


16/08/08 03:00:07 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
16/08/08 03:00:09 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, partition 0,PROCESS_LOCAL, 2133 bytes)


16/08/08 03:00:09 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)


16/08/08 03:00:09 INFO HadoopRDD: Input split:file:/home/sc/Desktop/data.txt:0+351
16/08/08 03:00:10 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 2163 bytes result sent to driver


16/08/08 03:00:10 INFO DAGScheduler: ResultStage 2 (reduce at <console>:33) finished in 2.840 s


16/08/08 03:00:10 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 2858 ms on localhost (1/1)


16/08/08 03:00:10 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 


16/08/08 03:00:10 INFO DAGScheduler: Job 1 finished: reduce at <console>:33, took 23.077075
 s
rdd1: Double = 64.023721






scala> 

版权声明:本文为博主原创文章,未经博主允许不得转载。

相关文章推荐

Scala Maximum Subarray 最大子串 leetcode 53

Find the contiguous subarray within an array (containing at least one number) which has the largest ...

【数据结构】链表的原理及java实现

一:单向链表基本介绍链表是一种数据结构,和数组同级。比如,Java中我们使用的ArrayList,其实现原理是数组。而LinkedList的实现原理就是链表了。链表在进行循环遍历时效率不高,但是插入和...

K-means 聚类算法的理解与案例实战

工作之后,发现对算法和技术的理解和上学时学习是不一样的,当时也整理了几篇关于k-means聚类的文章,但是现在看起来比较苍白和空洞,于是打算带着重新学习的态度对以往学习过或者见过的一些机器学习算法进行...

【Spring】Spring MVC原理及配置详解

【Spring】Spring MVC原理及配置1.Spring MVC概述:Spring MVC是Spring提供的一个强大而灵活的web框架。借助于注解,Spring MVC提供了几乎是POJO的开...

Scala求和例子

def sum(f: Int => Int)(a: Int)(b: Int): Int = { @annotation.tailrec def loop(n: Int, acc: Int...

《机器学习实战》二分-kMeans算法(二分K均值聚类)

首先二分-K均值是为了解决k-均值的用户自定义输入簇值k所延伸出来的自己判断k数目,其基本思路是: 为了得到k个簇,将所有点的集合分裂成两个簇,从这些簇中选取一个继续分裂,如此下去,直到产生k个簇。 ...

Scala 强大的集合数据操作示例

Scala是数据挖掘算法领域最有力的编程语言之一,语言本身是面向函数,这也符合了数据挖掘算法的常用场景:在原始数据集上应用一系列的变换,语言本身也对集合操作提供了众多强大的函数,本文将以List类型为...

JAVA经典算法40题

【程序1】题目:古典问题:有一对兔子,从出生后第3个月起每个月都生一对兔子,小兔子长到第四个月后每个月又生一对兔子,假如兔子都不 死,问每个月的兔子总数为多少? 1.程序分析: 兔子的规律...
  • chrp99
  • chrp99
  • 2013-04-08 11:29
  • 48943

spark常用函数:transformation和action

1、RDD提供了两种类型的操作:transformation和action 所有的transformation都是采用的懒策略,如果只是将transformation提交是不会执行计算的,计算只有在...

scala 并行集合在spark中的应用

一.scala并行集合现在有一个集合,对它的每个元素进行处理,比如: val arr = List[String]("a","b","c") arr.foreach(println(_)...
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:深度学习:神经网络中的前向传播和反向传播算法推导
举报原因:
原因补充:

(最多只允许输入30个字)