Learning Spark

1. IDEA下,打包spark jar包:

在projectName/project/目录下,新建assembly.sbt文件,添加内容:

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.5")

build.sbt文件:

name := "ScalaTest"

version := "0.1"

scalaVersion := "2.11.8"

libraryDependencies += "mysql" % "mysql-connector-java" % "8.0.11" % "compile"

libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0" % "provided"

libraryDependencies += "org.apache.spark" % "spark-streaming_2.11" % "2.3.0" % "provided"

libraryDependencies += "org.apache.spark" % "spark-streaming-kafka-0-10_2.11" % "2.3.0" % "compile"

libraryDependencies += "org.apache.kafka" % "kafka-clients" % "0.10.0" % "compile"

其中:compile会包含到jar包中,provided不会。

jar包依赖冲突导致,
解决办法:在build.sbt文件中添加如下:
assemblyMergeStrategy in assembly := {
    case PathList("javax", "servlet", xs @ _*) => MergeStrategy.last
    case PathList("javax", "activation", xs @ _*) => MergeStrategy.last
    case PathList("org", "apache", xs @ _*) => MergeStrategy.last
    case PathList("com", "google", xs @ _*) => MergeStrategy.last
    case PathList("com", "esotericsoftware", xs @ _*) => MergeStrategy.last
    case PathList("com", "codahale", xs @ _*) => MergeStrategy.last
    case PathList("com", "yammer", xs @ _*) => MergeStrategy.last
    case "about.html" => MergeStrategy.rename
    case "META-INF/ECLIPSEF.RSA" => MergeStrategy.last
    case "META-INF/mailcap" => MergeStrategy.last
    case "META-INF/mimetypes.default" => MergeStrategy.last
    case "plugin.properties" => MergeStrategy.last
    case "log4j.properties" => MergeStrategy.last
    case x =>
        val oldStrategy = (assemblyMergeStrategy in assembly).value
        oldStrategy(x)
}

Run -> Edit Configurations -> 新建一个sbt Task,Tasks填写:assembly

Run -> Run 'assembly'


2. 从本地向一个外部集群提交任务,报错:

Service 'sparkDriver' could not bind on a random free port. You may check whether configuring an appropriate binding address.

修改conf/spark-env.sh:

export SPARK_LOCAL_IP=localhost (或本地ip)


3. 资源不够问题:

运行一个spark任务,如果资源不够,那么会一直打印:

WARN  TaskSchedulerImpl:66 - Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

如果资源不足,spark任务不会立即结束,而是继续等待,如果有stream任务,那么这种任务会一直占着资源,不释放,会导致其他任务一直累积在spark cluster中,却不被执行。

查看 sparkmaster:8080,可以看到core或内存不够,如果是core不足,解决方案:

用sparkConf.set("spark.cores.max", "2");

参考http://wenda.chinahadoop.cn/question/2433

http://spark.apache.org/docs/latest/spark-standalone.html​ Resource Scheduling 章节

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值