spark collect遍历

      spark collect操作的特点是从远程集群是拉取数据到本地,经过网络传输,如果数据量的话,会给网络造成很大的压力,和foreach的却别是,foreach是在远程集群上遍历rdd中的元素,如果是在本地的话,差别不大。建议使用foreach,不要用collect.

    直接上代码:

@SuppressWarnings("unchecked")
    public static void myCollect(){
        SparkConf conf=new SparkConf()
        .setAppName("myCollect")
        .setMaster("local");
        JavaSparkContext sc=new JavaSparkContext(conf);
        List<Double> list=Arrays.asList(1.0,2.0,3.0,4.0);
    JavaDoubleRDD doubleRdd= sc.parallelizeDoubles(list, 2);
    JavaRDD<Double> mapRdd=doubleRdd.map(new Function<Double,Double>(){

        private static final long serialVersionUID = 1L;

        @Override
        public Double call(Double arg0) throws Exception {
            // TODO Auto-generated method stub
            return arg0+2;
        }
        
    });
    
    List<Double> doubleList=mapRdd.collect();
    for(Double d:doubleList){
        System.out.println("d:"+d);
    }
        
        
    }



结果如下:

d:3.0
d:4.0
d:5.0
d:6.0


详细日志情况如下:


16/05/03 21:43:54 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/05/03 21:43:54 INFO SecurityManager: Changing view acls to: admin
16/05/03 21:43:54 INFO SecurityManager: Changing modify acls to: admin
16/05/03 21:43:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(admin); users with modify permissions: Set(admin)
16/05/03 21:43:56 INFO Utils: Successfully started service 'sparkDriver' on port 52849.
16/05/03 21:43:57 INFO Slf4jLogger: Slf4jLogger started
16/05/03 21:43:58 INFO Remoting: Starting remoting
16/05/03 21:43:58 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@192.168.213.1:52862]
16/05/03 21:43:58 INFO Utils: Successfully started service 'sparkDriverActorSystem' on port 52862.
16/05/03 21:43:58 INFO SparkEnv: Registering MapOutputTracker
16/05/03 21:43:58 INFO SparkEnv: Registering BlockManagerMaster
16/05/03 21:43:58 INFO DiskBlockManager: Created local directory at C:\Users\admin\AppData\Local\Temp\blockmgr-6f3b927d-6959-40fc-9da4-bf8caf4c8283
16/05/03 21:43:58 INFO MemoryStore: MemoryStore started with capacity 2.4 GB
16/05/03 21:43:59 INFO SparkEnv: Registering OutputCommitCoordinator
16/05/03 21:43:59 INFO Utils: Successfully started service 'SparkUI' on port 4040.
16/05/03 21:43:59 INFO SparkUI: Started SparkUI at http://192.168.213.1:4040
16/05/03 21:44:00 INFO Executor: Starting executor ID driver on host localhost
16/05/03 21:44:00 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 52869.
16/05/03 21:44:00 INFO NettyBlockTransferService: Server created on 52869
16/05/03 21:44:00 INFO BlockManagerMaster: Trying to register BlockManager
16/05/03 21:44:00 INFO BlockManagerMasterEndpoint: Registering block manager localhost:52869 with 2.4 GB RAM, BlockManagerId(driver, localhost, 52869)
16/05/03 21:44:00 INFO BlockManagerMaster: Registered BlockManager
16/05/03 21:44:02 INFO SparkContext: Starting job: collect at ActionOperation.java:56
16/05/03 21:44:02 INFO DAGScheduler: Got job 0 (collect at ActionOperation.java:56) with 2 output partitions
16/05/03 21:44:02 INFO DAGScheduler: Final stage: ResultStage 0 (collect at ActionOperation.java:56)
16/05/03 21:44:02 INFO DAGScheduler: Parents of final stage: List()
16/05/03 21:44:02 INFO DAGScheduler: Missing parents: List()
16/05/03 21:44:02 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[2] at map at ActionOperation.java:44), which has no missing parents
16/05/03 21:44:02 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.6 KB, free 2.6 KB)
16/05/03 21:44:02 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1627.0 B, free 4.2 KB)
16/05/03 21:44:02 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:52869 (size: 1627.0 B, free: 2.4 GB)
16/05/03 21:44:02 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1006
16/05/03 21:44:02 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[2] at map at ActionOperation.java:44)
16/05/03 21:44:02 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
16/05/03 21:44:02 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,PROCESS_LOCAL, 2037 bytes)
16/05/03 21:44:02 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
16/05/03 21:44:02 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 1009 bytes result sent to driver
16/05/03 21:44:02 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,PROCESS_LOCAL, 2037 bytes)
16/05/03 21:44:02 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
16/05/03 21:44:02 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 202 ms on localhost (1/2)
16/05/03 21:44:02 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 1009 bytes result sent to driver
16/05/03 21:44:02 INFO DAGScheduler: ResultStage 0 (collect at ActionOperation.java:56) finished in 0.279 s
16/05/03 21:44:02 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 44 ms on localhost (2/2)
16/05/03 21:44:02 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
16/05/03 21:44:02 INFO DAGScheduler: Job 0 finished: collect at ActionOperation.java:56, took 0.863032 s
d:3.0
d:4.0
d:5.0
d:6.0
16/05/03 21:44:02 INFO SparkContext: Invoking stop() from shutdown hook
16/05/03 21:44:03 INFO SparkUI: Stopped Spark web UI at http://192.168.213.1:4040
16/05/03 21:44:03 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
16/05/03 21:44:03 INFO MemoryStore: MemoryStore cleared
16/05/03 21:44:03 INFO BlockManager: BlockManager stopped
16/05/03 21:44:03 INFO BlockManagerMaster: BlockManagerMaster stopped
16/05/03 21:44:03 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
16/05/03 21:44:03 INFO SparkContext: Successfully stopped SparkContext
16/05/03 21:44:03 INFO ShutdownHookManager: Shutdown hook called
16/05/03 21:44:03 INFO ShutdownHookManager: Deleting directory C:\Users\admin\AppData\Local\Temp\spark-b75f1e09-fb73-474c-8581-f4d0ab1b3697




  • 0
    点赞
  • 5
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值