Spark-rapids报错定位:Could not load cudf jni library... | ai.rapids.cudf.NativeDepsLoader.loadNativeDeps(NativeDepsLoader.java:91 java.io.IOException: Error loading dependencies
spark-shell提交任务
spark-shell \
--master yarn \
--driver-memory 1G \
--conf spark.executor.memory=1G \
--conf spark.executor.cores=2 \
--conf spark.executor.resource.gpu.amount=1 \
--conf spark.rapids.memory.pinnedPool.size=1G \
--conf spark.locality.wait=0s \
--conf spark.plugins=com.nvidia.spark.SQLPlugin \
--conf spark.executor.resource.gpu.discoveryScript=./getGpusResources.sh \
--files ${SPARK_RAPIDS_DIR}/getGpusResources.sh \
--jars ${SPARK_CUDF_JAR},${SPARK_RAPIDS_PLUGIN_JAR}
报错现象及定位思路
基于Yarn对GPU的调度,使用Spark-rapids提交Spark任务报错,尝试了很多种办法最终找到了问题所在,google 百度暂未看到有相关博客。
报错信息如下(ip、路径等信息已用xxx代替):
[2021-02-27 17:46:46.784]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/xxx/spark-archive-3x.zip/slf4j-log4j12-1.7.30.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in <