python中import timesys_pyspark-kafka集成:缺少lib

为了开始与卡夫卡的合作项目,我在这个地址遵循Databricks的指示:

代码:# coding: utf-8

import sys

import os,time

sys.path.append("/usr/local/lib/python2.7/dist-packages")

from pyspark.sql import SparkSession,Row

from pyspark import SparkContext,SQLContext

from pyspark.streaming import StreamingContext

from pyspark.streaming.kafka import KafkaUtils

from pyspark.sql.types import *

import pyspark.sql.functions

import json

spark = SparkSession.builder.appName("Kakfa-test").getOrCreate()

spark.sparkContext.setLogLevel('WARN')

trainingSchema = StructType([

StructField("code",StringType(),True),

StructField("ean",StringType(),True),

StructField("name",StringType(),True),

StructField("description",StringType(),True),

StructField("category",StringType(),True),

StructField("attributes",StringType(),True)

])

trainingDF = spark.createDataFrame(sc.emptyRDD(),trainingSchema)

broker, topic =

['kafka.partner.stg.some.domain:9092','hybris.products']

df = spark \

.readStream \

.format("kafka") \

.option("kafka.bootstrap.servers",

"kafka.partner.stg.some.domain:9092") \

.option("subscribe", "hybris.products") \

.option("startingOffsets", "earliest") \

.load()

我的Hadoop版本是2.6,Spark版本是2.3.0

带有spark-submit的命令行是:

spark-submit --jars jars/spark-sql-kafka-0-10_2.11-2.3.0.jar kafka-test-002.py

错误消息:Py4JJavaError: An error occurred while calling o48.load.

: java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer

at org.apache.spark.sql.kafka010.KafkaSourceProvider$.(KafkaSourceProvider.scala:413)

at org.apache.spark.sql.kafka010.KafkaSourceProvider$.(KafkaSourceProvider.scala)

at org.apache.spark.sql.kafka010.KafkaSourceProvider.validateStreamOptions(KafkaSourceProvider.scala:360)

at org.apache.spark.sql.kafka010.KafkaSourceProvider.sourceSchema(KafkaSourceProvider.scala:64)

at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:231)

at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:94)

at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:94)

at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:33)

at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:170)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)

at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)

at py4j.Gateway.invoke(Gateway.java:282)

at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)

at py4j.commands.CallCommand.execute(CallCommand.java:79)

at py4j.GatewayConnection.run(GatewayConnection.java:214)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

您可以在我上面提到的网站上查看,我要导入的jar文件是完全相同的文件。所以,我不知道为什么会这样。也许另一个模块没有提到?我真的迷路了

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值