pyspark提交kafka任务缺少spark-streaming-kafka-0-8-assembly.jar报错解决

pyspark提交kafka任务缺少spark-streaming-kafka-0-8-assembly.jar报错解决方案

1、开启kafka生产端

[root@hadoop102 ~]# kafka-console-producer --broker-list hadoop102:9092 --topic test1
 

2、pyspark接收脚本: test2.py

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils

sc = SparkContext('local[*]','myapp')

ssc = StreamingContext(sc,5)

zkQuorum = 'hadoop102:2181'
groupId = 'kafka1'
topics = {'test1':1}
kafkaParams={'auto.offset.reset':'earliest'}
dstream=KafkaUtils.createStream(ssc,zkQuorum,groupId,topics).map(lambda t:t[1])


# 计算词频统计

result = dstream.flatMap(lambda line:line.split(' ')).map(lambda word:(word,1)).reduceByKey(lambda a,b:a+b)

# 输出
result.pprint()

#开启程序
ssc.start()
ssc.awaitTermination()
 

3、报错如下:

[root@hadoop102 ~]# spark-submit test2.py 
21/07/07 20:18:21 WARN lineage.LineageWriter: Lineage directory /var/log/spark/lineage doesn't exist or is not writable. Lineage for this application will be disabled.

________________________________________________________________________________________________

  Spark Streaming's Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.4.0-cdh6.2.1 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.4.0-cdh6.2.1.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

________________________________________________________________________________________________


Traceback (most recent call last):
  File "/root/test2.py", line 13, in <module>
    dstream=KafkaUtils.createStream(ssc,zkQuorum,groupId,topics).map(lambda t:t[1])
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 78, in createStream
  File "/opt/cloudera/parcels/CDH-6.2.1-1.cdh6.2.1.p0.1425774/lib/spark/python/lib/pyspark.zip/pyspark/streaming/kafka.py", line 217, in _get_helper
TypeError: 'JavaPackage' object is not callable

 4、解决方案如上提示:

2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.4.0-cdh6.2.1.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

报错提示告知要到http://search.maven.org/去下载spark-streaming-kafka-0-8-assembly 对应版本号2.4.0的jar包

jar包下载地址:

https://download.csdn.net/download/qq_22905163/20068310?spm=1001.2014.3001.5501

5、我下载的是spark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar

我的环境是CDH6.2.1,我把spark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar /root/目录下进行指定jar提交

提示:之前把这个jar包放到了spark下面的jars/目录下面,以下test2.py的程序ok,但启动pyspark直接无法sparkSession初始化,报错

6、成功执行

[root@hadoop102 ~]# spark-submit --jars spark-streaming-kafka-0-8-assembly_2.11-2.4.0.jar test2.py

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值