Spark版本:2.1.2,spark自带scala版本2.11.8,spark IDE 版本4.7
Eclipse 中创建WordCount项目, 将scala library container 设置为2.11.11 , 将spark/jars中的jar都导入reference libraries,
代码
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark.{ SparkConf, SparkContext }
import org.apache.spark.rdd.RDD
object RunWordCount {
def main(args: Array[String]): Unit = {
Logger.getLogger("org").setLevel(Level.OFF)
System.setProperty("spark.ui.showConsoleProgress", "false")
println("开始运行RunWordCount")
val sc = new SparkContext(new SparkConf().setAppName("wordCount").setMaster("local[1]"))
//val sc = new SparkContext(new SparkConf().setAppName("wordCount").setMaster("local[1]").set("spark.testing.memory", "536870912"))
println("开始读取文本文件...")
val textFile = sc.textFile("data/LICENSE.txt")
println("开始创建RDD...")
val countsRDD = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
println("开始保存到文本文件...")
try {
countsRDD.saveAsTextFile("data/output")
println("已经存盘成功")
} catch {
case e: Exception => println("输出目录已经存在,请先删除原有目录");
}
}
}
运行报错2
Exception in thread "main" java.lang.IllegalArgumentException: System memory 251396096 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.
at org.apache.spark.memory.UnifiedMemoryManager$.getMaxMemory(UnifiedMemoryManager.scala:216)
at org.apache.spark.memory.UnifiedMemoryManager$.apply(UnifiedMemoryManager.scala:198)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:330)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:174)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:257)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:432)
at RunWordCount$.main(RunWordCount.scala:17)
at RunWordCount.main(RunWordCount.scala)
这是JVM申请的memory不够导致
参考http://blog.csdn.net/shenshendeai/article/details/54631237
应该在conf里设置一下spark.testing.memory.
通过尝试,发现可以有2个地方可以设置
1. 自己的源代码处,可以在conf之后加上:
val conf = new SparkConf().setAppName("word count")
conf.set("spark.testing.memory", "2147480000")//后面的值大于512m即可
2. 可以在Eclipse的Run Configuration处,有一栏是Arguments,下面有VMarguments,在下面添加下面一行(值也是只要大于512m即可)
-Dspark.testing.memory=1073741824
其他的参数,也可以动态地在这里设置,比如-Dspark.master=spark://hostname:7077
再运行就不会报这个错误了。
我用了第一个方法,修改成注释掉的那行代码,运行即不报错。