显示spark的classpath
场景:
运行了程序,但是报NoSuchMethodError,或者ClassNotFoundException,这是一类问题,那么就要找spark的classpath都加载了什么jar包,要么是版本不对,要么就根本没有对应的jar包。
15/05/28 12:46:46 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at com.ldamodel.LdaModel$$anonfun$5$$anonfun$apply$5.apply(LdaModel.scala:22)
at com.ldamodel.LdaModel$$anonfun$5$$anonfun$apply$5.apply(LdaModel.scala:22)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
方案:
export SPARK_PRINT_LAUNCH_COMMAND=true
(pyspark2.4.3) darren@ubuntu:~$ export SPARK_PRINT_LAUNCH_COMMAND=true
(pyspark2.4.3) darren@ubuntu:~$ pyspark
Spark Command: python
========================================
Python 3.7.0 (default, Oct 9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
Spark Command: /home/darren/program/java/bin/java -cp /home/darren/anaconda3/envs/pyspark2.4.3/lib/python3.7/site-packages/pyspark/conf:/home/darren/anaconda3/envs/pyspark2.4.3/lib/python3.7/site-packages/pyspark/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --name PySparkShell pyspark-shell
========================================
21/02/03 17:50:07 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.0.4 instead (on interface enp0s3)
21/02/03 17:50:07 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/03 17:50:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.3
/_/
Using Python version 3.7.0 (default, Oct 9 2018 10:31:47)
SparkSession available as 'spark'.
打印出了Spark Command,并且加载了哪些jar包也非常清楚/home/darren/anaconda3/envs/pyspark2.4.3/lib/python3.7/site-packages/pyspark/jars/*
不添加环境变量时的表现:
(pyspark2.4.3) darren@ubuntu:~$ pyspark
Python 3.7.0 (default, Oct 9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
21/02/03 17:48:11 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.0.4 instead (on interface enp0s3)
21/02/03 17:48:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/03 17:48:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.4.3
/_/
Using Python version 3.7.0 (default, Oct 9 2018 10:31:47)
SparkSession available as 'spark'.
当然,不添加这个环境变量也可以,可以在Spark UI》Environment 选项卡下查看classpath,例如: