1.Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
今天换了两台大内存的节点,并把master的地址改为master的域名即可以解决
val conf = new SparkConf().setMaster("spark://ue191:7077").setAppName("LdaSpark").set("spark.executor.memory", "6g").set("spark.cor
es.max","5")
setMaster要用域名才不会出错
2. java.io.EOFException
要在sbt文件上加上hadoop的dependencies
name := "simple"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "0.9.1"
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "0.20.2"
resolvers += "Akka Repository" at "http://repo.akka.io/releases/"
3.用户自定义类,ClassNotFound错误
要添加jar包到sc
sc.addJar("target/scala-2.10/ldaspark_2.10-1.0.jar")
4.每次都需要重新打包,再运行,因为直接sbt run不确保会重新编译打包
sbt 编译一次大概需要20s
5.java.net.NoRouteToHostException 错误
每台机器要关闭防火墙或者开放端口,
6.org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4
在xxx.sbt文件上添加hadoop的依赖
libraryDependencies += "org.apache.hadoop" % "hadoop-client" % "2.2.0"
7. libraryDependencies之间要空一行。