Spark入门:在Intellij IDEA上单机运行Spark的RRD的map和filter

map是对RDD数据集里面每个数据都进行括号里面的操作。
filter是对RDD数据集进行过滤,符合括号里面的操作的,留下来

import org.apache.spark.{SparkConf, SparkContext}

object testone {

  def main(args: Array[String]) {
    val conf = new SparkConf().setAppName("test one")
    val sc=new SparkContext(conf)
    //并行集合是通过调用SparkContext的parallelize方法,在一个已经存在的Scala集合上创建的(一个Seq对象)。
    //集合的对象将会被拷贝,创建出一个可以被并行操作的分布式数据集。
    //下面是如何从一个Scala集合创建一个并行集合
    val rdd =sc.parallelize(List(1,2,3,4,5,6,7,8,9))
    val mappedRDD=rdd.map(x=>x*3)
    //collect:Actions算子。相当于toArrays,将分布式的RDD返回为一个单机的scala Array数组
    mappedRDD.collect.foreach(println)
    val filterRDD=mappedRDD.filter(x=>x>10)
    /*
    打印RDD中的所有元素的通常作法是使用rdd.foreach(println)
    或者 rdd.map(println).在单一的机器上,这样做会产生期望的结果。
    然而,在集群模式下,输出将会在每个执行体的标准输出上,而不是在驱动节点上,
    所以驱动节点的标准输出将不会有结果。为了在驱动节点上打印所有元素,
    可以先使用collect方法将RDD带到驱动节点上,rdd.collect().foreach(println)。
    这样做可能会造成驱动节点内存溢出,因为collect方法将整个RDD收集到一台机器上。
    如果你只是想打印RDD的部分元素,可以使用较为安全的做法:rdd.take(100).foreach(println)
    */
    filterRDD.collect().foreach(println)
    //在Spark的编程中,大多数功能的实现都只要一行代码即可完成。
    val filterRDDagain=sc.parallelize(List(1,2,3,4,5)).map(x=>x*2).filter(x=>x>4).collect().foreach(println)



  }

}
/usr/lib/jvm/java-7-sun/bin/java -Dspark.master=local -Didea.launcher.port=7533 -Didea.launcher.bin.path=/opt/idea/bin -Dfile.encoding=UTF-8 -classpath /usr/lib/jvm/java-7-sun/jre/lib/jfr.jar:/usr/lib/jvm/java-7-sun/jre/lib/javaws.jar:/usr/lib/jvm/java-7-sun/jre/lib/resources.jar:/usr/lib/jvm/java-7-sun/jre/lib/plugin.jar:/usr/lib/jvm/java-7-sun/jre/lib/jfxrt.jar:/usr/lib/jvm/java-7-sun/jre/lib/jsse.jar:/usr/lib/jvm/java-7-sun/jre/lib/charsets.jar:/usr/lib/jvm/java-7-sun/jre/lib/deploy.jar:/usr/lib/jvm/java-7-sun/jre/lib/management-agent.jar:/usr/lib/jvm/java-7-sun/jre/lib/rt.jar:/usr/lib/jvm/java-7-sun/jre/lib/jce.jar:/usr/lib/jvm/java-7-sun/jre/lib/ext/sunpkcs11.jar:/usr/lib/jvm/java-7-sun/jre/lib/ext/sunjce_provider.jar:/usr/lib/jvm/java-7-sun/jre/lib/ext/sunec.jar:/usr/lib/jvm/java-7-sun/jre/lib/ext/dnsns.jar:/usr/lib/jvm/java-7-sun/jre/lib/ext/zipfs.jar:/usr/lib/jvm/java-7-sun/jre/lib/ext/localedata.jar:/opt/IdeaProjects/SparkTest/target/scala-2.10/classes:/home/xuyao/.sbt/boot/scala-2.10.4/lib/scala-library.jar:/home/xuyao/spark/lib/spark-assembly-1.4.0-hadoop2.4.0.jar:/opt/idea/lib/idea_rt.jar com.intellij.rt.execution.application.AppMain testone
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/07/14 20:57:41 INFO SparkContext: Running Spark version 1.4.0
15/07/14 20:57:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/07/14 20:57:44 WARN Utils: Your hostname, hadoop resolves to a loopback address: 127.0.1.1; using 192.168.73.129 instead (on interface eth0)
15/07/14 20:57:44 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/07/14 20:57:44 INFO SecurityManager: Changing view acls to: xuyao
15/07/14 20:57:44 INFO SecurityManager: Changing modify acls to: xuyao
15/07/14 20:57:44 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(xuyao); users with modify permissions: Set(xuyao)
15/07/14 20:57:46 INFO Slf4jLogger: Slf4jLogger started
15/07/14 20:57:46 INFO Remoting: Starting remoting
15/07/14 20:57:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@192.168.73.129:37946]
15/07/14 20:57:47 INFO Utils: Successfully started service 'sparkDriver' on port 37946.
15/07/14 20:57:47 INFO SparkEnv: Registering MapOutputTracker
15/07/14 20:57:47 INFO SparkEnv: Registering BlockManagerMaster
15/07/14 20:57:47 INFO DiskBlockManager: Created local directory at /tmp/spark-eaaea5ed-bb48-4980-a40d-517c8fdbc043/blockmgr-34f5b064-af4a-44c7-880f-135ce2a50e8c
15/07/14 20:57:47 INFO MemoryStore: MemoryStore started with capacity 131.6 MB
15/07/14 20:57:47 INFO HttpFileServer: HTTP File server directory is /tmp/spark-eaaea5ed-bb48-4980-a40d-517c8fdbc043/httpd-623c7c2f-b6bd-4815-9de3-2c649fcf42b8
15/07/14 20:57:47 INFO HttpServer: Starting HTTP Server
15/07/14 20:57:48 INFO Utils: Successfully started service 'HTTP file server' on port 57984.
15/07/14 20:57:48 INFO SparkEnv: Registering OutputCommitCoordinator
15/07/14 20:57:53 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/07/14 20:57:53 INFO SparkUI: Started SparkUI at http://192.168.73.129:4040
15/07/14 20:57:54 INFO Executor: Starting executor ID driver on host localhost
15/07/14 20:57:54 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 35800.
15/07/14 20:57:54 INFO NettyBlockTransferService: Server created on 35800
15/07/14 20:57:54 INFO BlockManagerMaster: Trying to register BlockManager
15/07/14 20:57:54 INFO BlockManagerMasterEndpoint: Registering block manager localhost:35800 with 131.6 MB RAM, BlockManagerId(driver, localhost, 35800)
15/07/14 20:57:54 INFO BlockManagerMaster: Registered BlockManager
15/07/14 20:57:55 INFO SparkContext: Starting job: collect at testone.scala:14
15/07/14 20:57:55 INFO DAGScheduler: Got job 0 (collect at testone.scala:14) with 1 output partitions (allowLocal=false)
15/07/14 20:57:55 INFO DAGScheduler: Final stage: ResultStage 0(collect at testone.scala:14)
15/07/14 20:57:55 INFO DAGScheduler: Parents of final stage: List()
15/07/14 20:57:55 INFO DAGScheduler: Missing parents: List()
15/07/14 20:57:55 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at testone.scala:12), which has no missing parents
15/07/14 20:57:56 WARN SizeEstimator: Failed to check whether UseCompressedOops is set; assuming yes
15/07/14 20:57:56 INFO MemoryStore: ensureFreeSpace(1904) called with curMem=0, maxMem=137948037
15/07/14 20:57:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 1904.0 B, free 131.6 MB)
15/07/14 20:57:56 INFO MemoryStore: ensureFreeSpace(1201) called with curMem=1904, maxMem=137948037
15/07/14 20:57:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1201.0 B, free 131.6 MB)
15/07/14 20:57:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:35800 (size: 1201.0 B, free: 131.6 MB)
15/07/14 20:57:56 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:874
15/07/14 20:57:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at testone.scala:12)
15/07/14 20:57:56 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/07/14 20:57:56 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1345 bytes)
15/07/14 20:57:56 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/07/14 20:57:56 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 639 bytes result sent to driver
15/07/14 20:57:56 INFO DAGScheduler: ResultStage 0 (collect at testone.scala:14) finished in 0.221 s
15/07/14 20:57:56 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 198 ms on localhost (1/1)
15/07/14 20:57:56 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
15/07/14 20:57:56 INFO DAGScheduler: Job 0 finished: collect at testone.scala:14, took 0.658227 s
3
6
9
12
15
18
21
24
27
15/07/14 20:57:56 INFO SparkContext: Starting job: collect at testone.scala:25
15/07/14 20:57:56 INFO DAGScheduler: Got job 1 (collect at testone.scala:25) with 1 output partitions (allowLocal=false)
15/07/14 20:57:56 INFO DAGScheduler: Final stage: ResultStage 1(collect at testone.scala:25)
15/07/14 20:57:56 INFO DAGScheduler: Parents of final stage: List()
15/07/14 20:57:56 INFO DAGScheduler: Missing parents: List()
15/07/14 20:57:56 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[2] at filter at testone.scala:15), which has no missing parents
15/07/14 20:57:56 INFO MemoryStore: ensureFreeSpace(2088) called with curMem=3105, maxMem=137948037
15/07/14 20:57:56 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.0 KB, free 131.6 MB)
15/07/14 20:57:56 INFO MemoryStore: ensureFreeSpace(1272) called with curMem=5193, maxMem=137948037
15/07/14 20:57:56 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1272.0 B, free 131.6 MB)
15/07/14 20:57:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:35800 (size: 1272.0 B, free: 131.6 MB)
15/07/14 20:57:56 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:874
15/07/14 20:57:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[2] at filter at testone.scala:15)
15/07/14 20:57:56 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
15/07/14 20:57:56 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, PROCESS_LOCAL, 1345 bytes)
15/07/14 20:57:56 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/07/14 20:57:56 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 627 bytes result sent to driver
15/07/14 20:57:56 INFO DAGScheduler: ResultStage 1 (collect at testone.scala:25) finished in 0.006 s
15/07/14 20:57:56 INFO DAGScheduler: Job 1 finished: collect at testone.scala:25, took 0.029612 s
12
15
18
21
24
27
15/07/14 20:57:56 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 10 ms on localhost (1/1)
15/07/14 20:57:56 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 
15/07/14 20:57:56 INFO SparkContext: Starting job: collect at testone.scala:27
15/07/14 20:57:56 INFO DAGScheduler: Got job 2 (collect at testone.scala:27) with 1 output partitions (allowLocal=false)
15/07/14 20:57:56 INFO DAGScheduler: Final stage: ResultStage 2(collect at testone.scala:27)
15/07/14 20:57:56 INFO DAGScheduler: Parents of final stage: List()
15/07/14 20:57:56 INFO DAGScheduler: Missing parents: List()
15/07/14 20:57:56 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[5] at filter at testone.scala:27), which has no missing parents
15/07/14 20:57:56 INFO MemoryStore: ensureFreeSpace(2088) called with curMem=6465, maxMem=137948037
15/07/14 20:57:56 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 2.0 KB, free 131.5 MB)
15/07/14 20:57:56 INFO MemoryStore: ensureFreeSpace(1259) called with curMem=8553, maxMem=137948037
15/07/14 20:57:56 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1259.0 B, free 131.5 MB)
15/07/14 20:57:56 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:35800 (size: 1259.0 B, free: 131.6 MB)
15/07/14 20:57:56 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:874
15/07/14 20:57:56 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[5] at filter at testone.scala:27)
15/07/14 20:57:56 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
15/07/14 20:57:56 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, localhost, PROCESS_LOCAL, 1329 bytes)
15/07/14 20:57:56 INFO Executor: Running task 0.0 in stage 2.0 (TID 2)
15/07/14 20:57:56 INFO Executor: Finished task 0.0 in stage 2.0 (TID 2). 615 bytes result sent to driver
15/07/14 20:57:56 INFO TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 17 ms on localhost (1/1)
15/07/14 20:57:56 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 
15/07/14 20:57:56 INFO DAGScheduler: ResultStage 2 (collect at testone.scala:27) finished in 0.013 s
15/07/14 20:57:56 INFO DAGScheduler: Job 2 finished: collect at testone.scala:27, took 0.027730 s
6
8
10
15/07/14 20:57:56 INFO SparkContext: Invoking stop() from shutdown hook
15/07/14 20:57:56 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:35800 in memory (size: 1259.0 B, free: 131.6 MB)
15/07/14 20:57:56 INFO SparkUI: Stopped Spark web UI at http://192.168.73.129:4040
15/07/14 20:57:56 INFO DAGScheduler: Stopping DAGScheduler
15/07/14 20:57:56 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
15/07/14 20:57:56 INFO Utils: path = /tmp/spark-eaaea5ed-bb48-4980-a40d-517c8fdbc043/blockmgr-34f5b064-af4a-44c7-880f-135ce2a50e8c, already present as root for deletion.
15/07/14 20:57:56 INFO MemoryStore: MemoryStore cleared
15/07/14 20:57:57 INFO BlockManager: BlockManager stopped
15/07/14 20:57:57 INFO BlockManagerMaster: BlockManagerMaster stopped
15/07/14 20:57:57 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
15/07/14 20:57:57 INFO SparkContext: Successfully stopped SparkContext
15/07/14 20:57:57 INFO Utils: Shutdown hook called
15/07/14 20:57:57 INFO Utils: Deleting directory /tmp/spark-eaaea5ed-bb48-4980-a40d-517c8fdbc043
15/07/14 20:57:57 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.

Process finished with exit code 0
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值