Spark 常用的辅助配置

显示spark的classpath

场景:

运行了程序,但是报NoSuchMethodError,或者ClassNotFoundException,这是一类问题,那么就要找spark的classpath都加载了什么jar包,要么是版本不对,要么就根本没有对应的jar包。

15/05/28 12:46:46 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
    at com.ldamodel.LdaModel$$anonfun$5$$anonfun$apply$5.apply(LdaModel.scala:22)
    at com.ldamodel.LdaModel$$anonfun$5$$anonfun$apply$5.apply(LdaModel.scala:22)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
    at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)

方案:

export SPARK_PRINT_LAUNCH_COMMAND=true
(pyspark2.4.3) darren@ubuntu:~$ export SPARK_PRINT_LAUNCH_COMMAND=true
(pyspark2.4.3) darren@ubuntu:~$ pyspark
Spark Command: python
========================================
Python 3.7.0 (default, Oct  9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
Spark Command: /home/darren/program/java/bin/java -cp /home/darren/anaconda3/envs/pyspark2.4.3/lib/python3.7/site-packages/pyspark/conf:/home/darren/anaconda3/envs/pyspark2.4.3/lib/python3.7/site-packages/pyspark/jars/* -Xmx1g org.apache.spark.deploy.SparkSubmit --name PySparkShell pyspark-shell
========================================
21/02/03 17:50:07 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.0.4 instead (on interface enp0s3)
21/02/03 17:50:07 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/03 17:50:07 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Python version 3.7.0 (default, Oct  9 2018 10:31:47)
SparkSession available as 'spark'.

打印出了Spark Command,并且加载了哪些jar包也非常清楚/home/darren/anaconda3/envs/pyspark2.4.3/lib/python3.7/site-packages/pyspark/jars/*

不添加环境变量时的表现:

(pyspark2.4.3) darren@ubuntu:~$ pyspark
Python 3.7.0 (default, Oct  9 2018, 10:31:47)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
21/02/03 17:48:11 WARN Utils: Your hostname, ubuntu resolves to a loopback address: 127.0.1.1; using 192.168.0.4 instead (on interface enp0s3)
21/02/03 17:48:11 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
21/02/03 17:48:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.4.3
      /_/

Using Python version 3.7.0 (default, Oct  9 2018 10:31:47)
SparkSession available as 'spark'.

当然,不添加这个环境变量也可以,可以在Spark UI》Environment 选项卡下查看classpath,例如:

 

 
Spark常用的行为算子包括foreach、collect、count、first、take、reduce等。\[3\]其中,foreach是一个没有返回值的行为算子,它会对RDD中的每个元素应用一个函数,并触发一个job。\[3\]collect会将RDD中的所有元素以数组的形式返回到驱动程序中,适用于数据量较小的情况。\[3\]count用于计算RDD中元素的个数。\[3\]first返回RDD中的第一个元素。\[3\]take返回RDD中的前n个元素。\[3\]reduce对RDD中的元素进行聚合操作,返回一个单一的结果。\[3\]这些行为算子在Spark开发中经常被使用。 #### 引用[.reference_title] - *1* [大数据开发之Spark常用RDD算子](https://blog.csdn.net/qq_43278189/article/details/121236183)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *2* [Spark常用算子之转换算子](https://blog.csdn.net/qq_43589217/article/details/122244030)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] - *3* [Spark常用算子之行为算子](https://blog.csdn.net/qq_43589217/article/details/122289057)[target="_blank" data-report-click={"spm":"1018.2226.3001.9630","extra":{"utm_source":"vip_chatgpt_common_search_pc_result","utm_medium":"distribute.pc_search_result.none-task-cask-2~all~insert_cask~default-1-null.142^v91^insert_down28v1,239^v3^insert_chatgpt"}} ] [.reference_item] [ .reference_list ]
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值