spark + kafka + sparkstreaming (Python版)(踩坑后记录)(一站式搭建)

主要参考厦大林子雨老师的博客

按以下步骤来(按给出的参考链接顺序搭建)

推荐一种python 安装各种其他包的方法,利用豆瓣源使用pip3,比如安装tensorflow

pip3 install tensorflow -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com

apt更新软件源见下连接

https://blog.csdn.net/zjb1314th/article/details/105428244

为方便大家使用,作者使用的spark+hadoop+spark-streaming+kakfka匹配的版本集合资源如下,不想被版本的匹配问题及国内网下载过慢问题困扰的朋友可以下载我提供的这个一站式集合  资源++

1.spark搭建

http://dblab.xmu.edu.cn/blog/2441-2/

http://dblab.xmu.edu.cn/blog/1689-2/

2.kafka

http://dblab.xmu.edu.cn/blog/1096-2/

3.spark中整合kafka

http://dblab.xmu.edu.cn/blog/1743-2/

这篇博客中下面这部分有些问题,需要注意改正

这样,我们就把spark-streaming-kafka-0-8_2.11-2.1.0.jar文件拷贝到了“/usr/local/spark/jars/kafka”目录下。
同时,我们还要修改spark目录下conf/spark-env.sh文件,修改该文件下面的SPARK_DIST_CLASSPATH变量

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath):$(/usr/local/hbase/bin/hbase classpath):/usr/local/spark/examples/jars/*:/usr/local/spark/jars/kafka/*:/usr/local/kafka/libs/*`

 

1.这个环境变量里最后一个字符不要复制进去,还有这个环境变量里有hBase的变量配置,没安hbase的会出错

如按以上步骤搭建环境变量应为

export SPARK_DIST_CLASSPATH=$(/usr/local/hadoop/bin/hadoop classpath):/usr/local/spark/examples/jars/*:/usr/local/spark/jars/kafka/*:/usr/local/kafka/libs/*

2.这个spark-streaming-kafka-0-8_2.11-2.1.0.jar一定要和你之前的spark和scala版本对上,不然会报错,作者就是因为spark版本没对上,导致了以下错误,2.11是scala对应版本,2.1.0为spark版本。

 ERROR executor.Executor: Exception in task 0.0 in stage 0.0 (TID 0)

java.lang.AbstractMethodError

      at org.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala:99)

      at org.apache.spark.streaming.kafka.KafkaReceiver.initializeLogIfNecessary(KafkaInputDStream.scala:68)

      at org.apache.spark.internal.Logging$class.log(Logging.scala:46)

      at org.apache.spark.streaming.kafka.KafkaReceiver.log(KafkaInputDStream.scala:68)

      at org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)

      at org.apache.spark.streaming.kafka.KafkaReceiver.logInfo(KafkaInputDStream.scala:68)

      at org.apache.spark.streaming.kafka.KafkaReceiver.onStart(KafkaInputDStream.scala:90)

      at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)

      at org.apache.spark.streaming.receiver.ReceiverSupervisor.start(ReceiverSupervisor.scala:131)

      at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:601)

      at org.apache.spark.streaming.scheduler.ReceiverTracker$ReceiverTrackerEndpoint$$anonfun$9.apply(ReceiverTracker.scala:591)

      at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)

      at org.apache.spark.SparkContext$$anonfun$37.apply(SparkContext.scala:2212)

      at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)

      at org.apache.spark.scheduler.Task.run(Task.scala:123)

      at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)

      at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)

      at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)

 

 

 

  • 1
    点赞
  • 3
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值