spark运行自带例子_Scala IDE 搭建Spark 2开发环境和运行例子

在widow上用Scala IDE  创建Spark 2.0 的开发环境

1、创建 maven Project

2、 修改pom.xml

花了很多时间在这里修改pom.xml,  可以参考如maven repository和Github的pom.xml

最后我的pom.xml如下:

4.0.0

com.big

mydata

0.0.1-SNAPSHOT

org.scala-lang

scala-library

2.10.6

org.apache.spark

spark-core_2.10

2.0.0

scala-tools.org

Scala-tools Maven2 Repository

http://scala-tools.org/repo-releases

src/main/scala

src/test/scala

org.scala-tools

maven-scala-plugin

compile

testCompile

compile

3. 修改配置

选中项目右键 configure -> add  scala nature..

右键property  在对话框调整Scala 编译器等信息

4. 写spark例子

在/src/main/scala  下,创建SimpleApp .scala

import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

import org.apache.spark.SparkConf

object SimpleApp {

def main(args: Array[String]) {

val logFile = "C:/spark/README.md" // Should be some file on your system

val conf = new SparkConf().setAppName("Simple Application").setMaster("local")

val sc = new SparkContext(conf)

val logData = sc.textFile(logFile, 2).cache()

val numAs = logData.filter(line => line.contains("a")).count()

val numBs = logData.filter(line => line.contains("b")).count()

println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))

}

}

5.  运行例子

选中pom.xml 右键 Run AS ->Maven  build

如果build成功,可以选中 SimpleApp .scala  右键运行Scala application

就可以在控制台看到Log, 说明成功。

............................

16/10/13 16:32:20 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 43 ms on localhost (2/2)

16/10/13 16:32:20 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool

16/10/13 16:32:20 INFO DAGScheduler: Job 0 finished: count at SimpleApp .scala:15, took 0.414282 s

16/10/13 16:32:20 INFO SparkContext: Starting job: count at SimpleApp .scala:16

16/10/13 16:32:20 INFO DAGScheduler: Got job 1 (count at SimpleApp .scala:16) with 2 output partitions

16/10/13 16:32:20 INFO DAGScheduler: Final stage: ResultStage 1 (count at SimpleApp .scala:16)

16/10/13 16:32:20 INFO DAGScheduler: Parents of final stage: List()

16/10/13 16:32:20 INFO DAGScheduler: Missing parents: List()

16/10/13 16:32:20 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp .scala:16), which has no missing parents

16/10/13 16:32:20 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 3.2 KB, free 897.5 MB)

16/10/13 16:32:20 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 1965.0 B, free 897.5 MB)

16/10/13 16:32:20 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.56.182.2:52894 (size: 1965.0 B, free: 897.6 MB)

16/10/13 16:32:20 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012

16/10/13 16:32:20 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[3] at filter at SimpleApp .scala:16)

16/10/13 16:32:20 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks

16/10/13 16:32:20 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, localhost, partition 0, PROCESS_LOCAL, 5278 bytes)

16/10/13 16:32:20 INFO Executor: Running task 0.0 in stage 1.0 (TID 2)

16/10/13 16:32:20 INFO BlockManager: Found block rdd_1_0 locally

16/10/13 16:32:20 INFO Executor: Finished task 0.0 in stage 1.0 (TID 2). 954 bytes result sent to driver

16/10/13 16:32:20 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, localhost, partition 1, PROCESS_LOCAL, 5278 bytes)

16/10/13 16:32:20 INFO Executor: Running task 1.0 in stage 1.0 (TID 3)

16/10/13 16:32:20 INFO BlockManager: Found block rdd_1_1 locally

16/10/13 16:32:20 INFO Executor: Finished task 1.0 in stage 1.0 (TID 3). 954 bytes result sent to driver

16/10/13 16:32:20 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 32 ms on localhost (1/2)

16/10/13 16:32:20 INFO DAGScheduler: ResultStage 1 (count at SimpleApp .scala:16) finished in 0.038 s

16/10/13 16:32:20 INFO DAGScheduler: Job 1 finished: count at SimpleApp .scala:16, took 0.069170 s

16/10/13 16:32:20 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 21 ms on localhost (2/2)

16/10/13 16:32:20 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool

Lines with a: 61, Lines with b: 27

16/10/13 16:32:20 INFO SparkContext: Invoking stop() from shutdown hook 16/10/13 16:32:20 INFO SparkUI: Stopped Spark web UI at http://10.56.182.2:4040 16/10/13 16:32:20 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 16/10/13 16:32:20 INFO MemoryStore: MemoryStore cleared 16/10/13 16:32:20 INFO BlockManager: BlockManager stopped 16/10/13 16:32:20 INFO BlockManagerMaster: BlockManagerMaster stopped 16/10/13 16:32:20 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 16/10/13 16:32:20 INFO SparkContext: Successfully stopped SparkContext 16/10/13 16:32:20 INFO ShutdownHookManager: Shutdown hook called

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值