spark执行优化——依赖上传到HDFS(spark.yarn.jar和spark.yarn.archive的使用)

1.简述

使用yarn的方式提交spark应用时,在没有配置spark.yarn.archive或者spark.yarn.jars时, 看到输出的日志在输出Neither spark.yarn.jars nor spark.yarn.archive is set;一段指令后,会看到不停地上传本地jar到HDFS上,内容如下,这个过程会非常耗时。可以通过在spark-defaults.conf配置里添加spark.yarn.archive或spark.yarn.jars来缩小spark应用的启动时间。

 Will allocate AM container, with 896 MB memory including 384 MB overhead
2020-12-01 11:16:11 INFO  Client:54 - Setting up container launch context for our AM
2020-12-01 11:16:11 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-01 11:16:11 INFO  Client:54 - Preparing resources for our AM container
2020-12-01 11:16:12 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2020-12-01 11:16:14 INFO  Client:54 - Uploading resource file:/tmp/spark-897c6291-e0bd-47e6-8d42-7f67225c4819/__spark_libs__5294834939010995385.zip -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606792499194_0001/__spark_libs__5294834939010995385.zip
2020-12-01 11:16:18 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/wordcount.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606792499194_0001/wordcount.jar
2020-12-01 11:16:18 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zookeeper-3.4.6.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606792499194_0001/zookeeper-3.4.6.jar
2020-12-01 11:16:18 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/xz-1.0.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606792499194_0001/xz-1.0.jar

2.spark官网对这两个配置的解释

在这里插入图片描述
中文释义大概如下
在这里插入图片描述

3.spark.yarn.jars使用

3.1 将spark根目录下jars里的所有jar包上传到HDFS
 hadoop fs -mkdir -p  /spark-yarn/jars
 hadoop fs -put /opt/module/spark-2.3.2-bin-hadoop2.7/jars/* /spark-yarn/jars/
3.2 修改spark-defaults.conf
spark.yarn.jars hdfs://hadoop122:9000/spark-yarn/jars/*.jar
3.3 效果
2020-12-01 13:53:52 INFO  Client:54 - Setting up container launch context for our AM
2020-12-01 13:53:52 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-01 13:53:52 INFO  Client:54 - Preparing resources for our AM container
2020-12-01 13:53:53 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/JavaEWAH-0.3.2.jar
2020-12-01 13:53:53 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/RoaringBitmap-0.5.11.jar
3.4 可能遇到的错误

ERROR client.TransportClient: Failed to send RPC RPC

Caused by: java.io.IOException: Failed to send RPC 5353749227723805834 to /192.168.10.122:58244: java.nio.channels.ClosedChannelException
	at org.apache.spark.network.client.TransportClient.lambda$sendRpc$2(TransportClient.java:237)
	at io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
	at io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
	at io.netty.util.concurrent.DefaultPromise.access$000(DefaultPromise.java:34)
	at io.netty.util.concurrent.DefaultPromise$1.run(DefaultPromise.java:431)
	at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
	at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:403)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:463)
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
	at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.nio.channels.ClosedChannelException
	at io.netty.channel.AbstractChannel$AbstractUnsafe.write(...)(Unknown Source)

关闭通道异常,看上去是超时的问题,这个问题当运行spark-shell --master yarn-client时,可能也会出现。在yarn-site.xml里添加如下配置可以解决

<property>
		<name>yarn.nodemanager.pmem-check-enabled</name>
		<value>false</value>
</property>
<property>
		<name>yarn.nodemanager.vmem-check-enabled</name>
		<value>false</value>
</property>

4.spark.yarn.archive使用

4.1 将spark根目录下jars里的所有jar包上传到HDFS

打包要注意所有的jar都在zip包的根目录中

cd /opt/module/spark-2.3.2-bin-hadoop2.7/jars/
zip -q -r spark_jars_2.3.2.zip *
hadoop fs -mkdir /spark-yarn/zip
hadoop fs -put spark_jars_2.3.2.zip /spark-yarn/zip/
4.2 修改spark-defaults.conf
spark.yarn.archive hdfs://hadoop122:9000/spark-yarn/zip/spark_jars_2.3.2.zip
4.3 效果
2020-12-01 14:41:53 INFO  Client:54 - Setting up container launch context for our AM
2020-12-01 14:41:53 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-01 14:41:53 INFO  Client:54 - Preparing resources for our AM container
2020-12-01 14:41:54 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/zip/spark_jars_2.3.2.zip
2020-12-01 14:41:54 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/wordcount.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/wordcount.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zstd-jni-1.3.2-2.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/zstd-jni-1.3.2-2.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zookeeper-3.4.6.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/zookeeper-3.4.6.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/xz-1.0.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/xz-1.0.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/xmlenc-0.52.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/xmlenc-0.52.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/xml-apis-1.3.04.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/xml-apis-1.3.04.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/xercesImpl-2.9.1.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/xercesImpl-2.9.1.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/xbean-asm5-shaded-4.4.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/xbean-asm5-shaded-4.4.jar
2020-12-01 14:41:55 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/spark-core_2.11-2.3.2.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606801972366_0009/spark-core_2.11-2.3.2.jar
4.4 可能遇到的错误

应用的driver日志

错误: 找不到或无法加载主类 org.apache.spark.deploy.yarn.ApplicationMaster

如果像如下的打包方式,就会保留目录的层级到zip包中,就会报错如上

zip -q -r spark_jars_2.3.2.zip /opt/module/spark-2.3.2-bin-hadoop2.7/jars/*

在这里插入图片描述

5.效果对比

spark官网关于这两个配置的说明有以下两块
在这里插入图片描述
Preparations处有个说明如果未指定spark.yarn.archive或者spark.yarn.jars,Spark将创建一个zip文件,包含所有$SPARK_HOME/jars路径下jar包,并将其上传到分布式缓存。
在这里插入图片描述
spark.yarn.archive有个说明是,说如果两个参数都配置,应用程序会优先使用 spark.yarn.archive会代替 spark.yarn.jars 在所有容器中使用打包文件。

为了更好的看出效果,我分别就配置的几种情况,提交yarn-cluster任务查看控制台输出的日志情况

5.1 未配置spark.yarn.jars/spark.yarn.archive

上传:__spark_conf__xxx.zip、__spark_libs__xxx.zip、jar包、wordcount.jar

2020-12-02 14:40:01 INFO  Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2020-12-02 14:40:01 INFO  Client:54 - Setting up container launch context for our AM
2020-12-02 14:40:01 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-02 14:40:01 INFO  Client:54 - Preparing resources for our AM container
2020-12-02 14:40:02 WARN  Client:66 - Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2020-12-02 14:40:04 INFO  Client:54 - Uploading resource file:/tmp/spark-343f5dda-9476-4330-b2ed-407ec6aa00e9/__spark_libs__2239070030081220213.zip -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0003/__spark_libs__2239070030081220213.zip
2020-12-02 14:40:07 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/wordcount.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0003/wordcount.jar
2020-12-02 14:40:07 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zstd-jni-1.3.2-2.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0003/zstd-jni-1.3.2-2.jar
2020-12-02 14:40:07 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zookeeper-3.4.6.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0003/zookeeper-3.4.6.jar
...
...
2020-12-02 14:40:14 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/aopalliance-repackaged-2.4.0-b34.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0003/aopalliance-repackaged-2.4.0-b34.jar
2020-12-02 14:40:14 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/activation-1.1.1.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0003/activation-1.1.1.jar
2020-12-02 14:40:14 INFO  Client:54 - Uploading resource file:/tmp/spark-343f5dda-9476-4330-b2ed-407ec6aa00e9/__spark_conf__7169080322038125856.zip -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0003/__spark_conf__.zip
2020-12-02 14:40:14 INFO  SecurityManager:54 - Changing view acls to: root
...
5.2 配置spark.yarn.jars

上传:__spark_conf__xxx.zip、(未上传)jar包、wordcount.jar

2020-12-02 14:47:07 INFO  Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2020-12-02 14:47:07 INFO  Client:54 - Setting up container launch context for our AM
2020-12-02 14:47:07 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-02 14:47:07 INFO  Client:54 - Preparing resources for our AM container
2020-12-02 14:47:08 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/JavaEWAH-0.3.2.jar
2020-12-02 14:47:08 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/RoaringBitmap-0.5.11.jar
2020-12-02 14:47:08 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/ST4-4.0.4.jar
2020-12-02 14:47:08 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/activation-1.1.1.jar
...
...
2020-12-02 14:47:08 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/zookeeper-3.4.6.jar
2020-12-02 14:47:08 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/zstd-jni-1.3.2-2.jar
2020-12-02 14:47:08 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/wordcount.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0004/wordcount.jar
2020-12-02 14:47:09 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/zstd-jni-1.3.2-2.jar added multiple times to distributed cache
2020-12-02 14:47:09 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/zookeeper-3.4.6.jar added multiple times to distributed cache
...
...
2020-12-02 14:47:09 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/scala-parser-combinators_2.11-1.0.4.jar added multiple times to distributed cache
2020-12-02 14:47:09 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/scalap-2.11.0.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0004/scalap-2.11.0.jar
2020-12-02 14:47:09 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/scala-logging_2.11-3.5.0.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0004/scala-logging_2.11-3.5.0.jar
2020-12-02 14:47:09 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/scala-library-2.11.12.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0004/scala-library-2.11.12.jar
2020-12-02 14:47:09 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/scala-compiler-2.11.12.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0004/scala-compiler-2.11.12.jar
...
...
2020-12-02 14:47:11 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/aopalliance-repackaged-2.4.0-b34.jar added multiple times to distributed cache
2020-12-02 14:47:11 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/activation-1.1.1.jar added multiple times to distributed cache
2020-12-02 14:47:11 INFO  Client:54 - Uploading resource file:/tmp/spark-b763698e-77bd-4004-8d71-c0eca5a1006d/__spark_conf__5511617263425666988.zip -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0004/__spark_conf__.zip
2020-12-02 14:47:11 INFO  SecurityManager:54 - Changing view acls to: root
5.3 配置spark.yarn.archive

上传:__spark_conf__xxx.zip、jar包、wordcount.jar

2020-12-02 14:53:43 INFO  Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2020-12-02 14:53:43 INFO  Client:54 - Setting up container launch context for our AM
2020-12-02 14:53:43 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-02 14:53:43 INFO  Client:54 - Preparing resources for our AM container
2020-12-02 14:53:44 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/zip/spark_jars_2.3.2.zip
2020-12-02 14:53:44 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/wordcount.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0005/wordcount.jar
2020-12-02 14:53:44 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zstd-jni-1.3.2-2.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0005/zstd-jni-1.3.2-2.jar
2020-12-02 14:53:45 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zookeeper-3.4.6.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0005/zookeeper-3.4.6.jar
...
...
2020-12-02 14:53:51 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/aopalliance-repackaged-2.4.0-b34.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0005/aopalliance-repackaged-2.4.0-b34.jar
2020-12-02 14:53:51 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/activation-1.1.1.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0005/activation-1.1.1.jar
2020-12-02 14:53:51 INFO  Client:54 - Uploading resource file:/tmp/spark-0ce3c5f1-6083-499a-b5cd-1a5700e74bf3/__spark_conf__6154114087017136415.zip -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0005/__spark_conf__.zip
2020-12-02 14:53:51 INFO  SecurityManager:54 - Changing view acls to: root
5.4 配置spark.yarn.jars&spark.yarn.archive

上传:__spark_conf__xxx.zip、jar包、wordcount.jar

2020-12-02 14:59:08 INFO  Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2020-12-02 14:59:08 INFO  Client:54 - Setting up container launch context for our AM
2020-12-02 14:59:08 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-02 14:59:08 INFO  Client:54 - Preparing resources for our AM container
2020-12-02 14:59:09 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/zip/spark_jars_2.3.2.zip
2020-12-02 14:59:09 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/wordcount.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0006/wordcount.jar
2020-12-02 14:59:10 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zstd-jni-1.3.2-2.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0006/zstd-jni-1.3.2-2.jar
2020-12-02 14:59:10 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/zookeeper-3.4.6.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0006/zookeeper-3.4.6.jar
...
...
2020-12-02 14:59:16 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/aopalliance-repackaged-2.4.0-b34.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0006/aopalliance-repackaged-2.4.0-b34.jar
2020-12-02 14:59:16 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/lib/activation-1.1.1.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0006/activation-1.1.1.jar
2020-12-02 14:59:16 INFO  Client:54 - Uploading resource file:/tmp/spark-3451a4e5-fb97-45a5-85dc-36f645fb7db3/__spark_conf__4794321975937827120.zip -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0006/__spark_conf__.zip
2020-12-02 14:59:16 INFO  SecurityManager:54 - Changing view acls to: root
5.5 配置spark.yarn.jars并且上传了程序所有的依赖jar到HDFS /spark-yarn/jars/ 下

上传:__spark_conf__xxx.zip、wordcount.jar

2020-12-02 15:19:30 INFO  Client:54 - Will allocate AM container, with 1408 MB memory including 384 MB overhead
2020-12-02 15:19:30 INFO  Client:54 - Setting up container launch context for our AM
2020-12-02 15:19:30 INFO  Client:54 - Setting up the launch environment for our AM container
2020-12-02 15:19:30 INFO  Client:54 - Preparing resources for our AM container
2020-12-02 15:19:31 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/JavaEWAH-0.3.2.jar
2020-12-02 15:19:31 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/RoaringBitmap-0.5.11.jar
2020-12-02 15:19:31 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/ST4-4.0.4.jar
2020-12-02 15:19:31 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/activation-1.1.1.jar
...
...
2020-12-02 15:19:31 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/zookeeper-3.4.6.jar
2020-12-02 15:19:31 INFO  Client:54 - Source and destination file systems are the same. Not copying hdfs://hadoop122:9000/spark-yarn/jars/zstd-jni-1.3.2-2.jar
2020-12-02 15:19:31 INFO  Client:54 - Uploading resource file:/home/workspace/wordcount/wordcount.jar -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0007/wordcount.jar
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/zstd-jni-1.3.2-2.jar added multiple times to distributed cache
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/zookeeper-3.4.6.jar added multiple times to distributed cache
...
...
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/scala-parser-combinators_2.11-1.0.4.jar added multiple times to distributed cache
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/scalap-2.11.0.jar added multiple times to distributed cache
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/scala-logging_2.11-3.5.0.jar added multiple times to distributed cache
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/scala-library-2.11.12.jar added multiple times to distributed cache
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/scala-compiler-2.11.12.jar added multiple times to distributed cache
...
...
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/aopalliance-repackaged-2.4.0-b34.jar added multiple times to distributed cache
2020-12-02 15:19:31 WARN  Client:66 - Same name resource file:///home/workspace/wordcount/lib/activation-1.1.1.jar added multiple times to distributed cache
2020-12-02 15:19:32 INFO  Client:54 - Uploading resource file:/tmp/spark-e9eca080-9f38-4fe5-ae5f-663a3e54c718/__spark_conf__3182593855268881202.zip -> hdfs://hadoop122:9000/user/root/.sparkStaging/application_1606887365975_0007/__spark_conf__.zip
2020-12-02 15:19:32 INFO  SecurityManager:54 - Changing view acls to: root
结论:

5.5(配置spark.yarn.jars并且上传了程序所有的依赖jar到HDFS /spark-yarn/jars/ 下)的方式更能减少资源的上传

参考文章:
https://blog.csdn.net/liyaya0201/article/details/105277681
https://www.cnblogs.com/yyy-blog/p/11110388.html
https://www.jianshu.com/p/e44e948b8d5f
https://blog.csdn.net/u012957549/article/details/89361485

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值