为什么要设置呢?摘自Spark-2.3.0官方文档 http://spark.apache.org/docs/latest/running-on-yarn.html#preparations
(个人理解:spark运行所需的jar包,不设置的话每次运行就需要上传到yarn管理的各个节点的缓存,很麻烦很影响性能。如果设置了,比如说放在HDFS上,就不需要每次都上传而是从HDFS上读取,能快那么一点点……)
(但是呢,HDFS上如果只设置保存三份数据,而如果需要20个节点来运行spark任务,会怎么样呢……)
Running Spark on YARN requires a binary distribution of Spark which is built with YARN support. Binary distributions can be downloaded from the |