场景:运行IDEA本地测试代码,实现提交job到本地虚拟机Spark集群环境运行
前提:本地宿主机和虚拟机网路互通
def main(args: Array[String]): Unit = {
val sparkSession = SparkSession.builder
.master("spark://hostname:7077")
.appName("countByKey")
.config("spark.jars", "D:\\xxxx\\sparkdemo-1.0.jar")
//指定当前提交任务的Driver端ip
.config("spark.driver.host", "宿主机的ip地址")
.getOrCreate()
val sc = sparkSession.sparkContext
val rdd2: RDD[(String, Int)] = sc.parallelize(List(
("zhangsan", 18),
("zhangsan", 19),
("lisi", 20),
("lisi", 20),
("wangwu", 18)
), 1)
val countbykey: collection.Map[String, Long] = rdd2.countByKey()
println("countbykey = " + countbykey)
sc.stop()
}
错误一
20/04/03 21:25:45 WARN StandaloneAppClient$ClientEndpoint: Failed to connect to master bigdata131:7077
org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:205)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:100)
at org.apache.spark.rpc.RpcEnv.setupEndpointRef(RpcEnv.scala:108)
at org.apache.spark.deploy.client.StandaloneAppClient$ClientEndpoint$$anonfun$tryRegisterAllMasters$1$$anon$1.run(StandaloneAppClient.scala:106)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.io.StreamCorruptedException: invalid stream header: 01000D31
原因:项目代码使用的spark版本和spark集群版本不一致
将pom.xml文件中的 <spark.version>2.1.0</spark.version>改为2.1.0即可
错误二
20/04/03 21:36:20 WARN TaskSchedulerImpl: Initial job has not accepted any resources;
check your cluster UI to ensure that workers are registered and have sufficient resources
原因:由于网络环境导致job运行过程中无法找到正确Driver端的ip地址
.config("spark.driver.host", "宿主机的ip地址")通过手动指定driver.host,端口使用默认的即可
错误三
Caused by: java.lang.ClassNotFoundException: test.xf.cn.SQLContextTest2$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
添加.config("spark.jars", "D:\\xxxx\\sparkdemo-1.0.jar") 当前项目编译打包后的jar包路径配置即可