spark-on-yarn jar包问题
submit运行过程中会把spark的jar包上传到HDFS的/user/hadoop/.sparkStaging路径下面,运行完毕进行释放,上传的这个过程实际上比较耗费时间
WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
spark.yarn.jars和spark.yarn.archive参数都不设置的情况下,会上传所有的jar包
INFO yarn.Client: Uploading resource file:/tmp/spark-668107c8-8b33-46ba-abea-ec3d6ccf12ef/__spark_libs__1763828378893967375.zip -> hdfs://hadoop001:9000/user/wzj/.sparkStaging/application_1585137346352_0005/__spark_libs__1763828378893967375.zip
INFO yarn.Client: Uploading resource file:/tmp/spark-668107c8-8b33-46ba-abea-ec3d6ccf12ef/__spark_conf__1888492531721785739.zip -> hdfs://hadoop001:9000/user/wzj/.sparkStaging/application_1585137346352_0005/__spark_conf__.zip
优化
1.在hdfs上新建一个目录并上传spark的所有jar包
[wzj@hadoop001 logs]$ hadoop fs