pyspark 加载jar_PySpark 2.4:以编程方式添加Maven JAR坐标停止工作

本文介绍了如何在PySpark 2.4中以编程方式加载Maven JAR坐标,特别是在遇到加载问题时的解决方案。通过设置环境变量、指定Spark配置和使用`SparkConf`来确保正确加载Kafka相关的Spark扩展库。
摘要由CSDN通过智能技术生成

importsys,os,multiprocessingfrompyspark.sqlimportDataFrame,DataFrameStatFunctions,DataFrameNaFunctionsfrompyspark.confimportSparkConffrompyspark.sqlimportSparkSessionfrompyspark.sqlimportfunctionsassFnfrompyspark.sql.typesimport*frompyspark.sql.typesimportRow# ------------------------------------------# Note: Row() in .../pyspark/sql/types.py# isn't included in '__all__' list(), so# we must import it by name here.# ------------------------------------------num_cpus=multiprocessing.cpu_count()# Number of CPUs for SPARK Local mode.os.environ.pop('SPARK_MASTER_HOST',None)# Since we're using pip/pySpark these three ENVsos.environ.pop('SPARK_MASTER_POST',None)# aren't needed; and we ensure pySpark doesn'tos.environ.pop('SPARK_HOME',None)# get confused by them, should they be set.os.environ.pop('PYTHONSTARTUP',None)# Just in case pySpark 2.x attempts to read this.os.environ['PYSPARK_PYTHON']=sys.executable# Make SPARK Workers use same Python as Master.os.environ['JAVA_HOME']='/usr/lib/jvm/jre'# Oracle JAVA for our pip/python3/pySpark 2.4 (CDH's JRE won't work).JARS_IVE_REPO='/home/jdoe/SPARK.JARS.REPO.d/'# ======================================================================# Maven Coordinates for JARs (and their dependencies) needed to plug# extra functionality into Spark 2.x (e.g. Kafka SQL and Streaming)# A one-time internet connection is necessary for Spark to autimatically# download JARs specified by the coordinates (and dependencies).# ======================================================================spark_jars_packages=','.join(['org.apache.spark:spark-streaming-kafka-0-10_2.11:2.4.0','org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0',])# ======================================================================spark_conf=SparkConf()spark_conf.setAll([('spark.master','local[{}]'.format(num_cpus)),('spark.app.name','myApp'),('spark.submit.deployMode','client'),('spark.ui.showConsoleProgress','true'),('spark.eventLog.enabled','false'),('spark.logConf','false'),('spark.jars.repositories','file:/'+JARS_IVE_REPO),('spark.jars.ivy',JARS_IVE_REPO),('spark.jars.packages',spark_jars_packages),])spark_sesn=SparkSession.builder.config(conf=spark_conf).getOrCreate()spark_ctxt=spark_sesn.sparkContext

spark_reader=spark_sesn.read

spark_streamReader=spark_sesn.readStream

spark_ctxt.setLogLevel("WARN")

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值