【Spark】Spark Stream 整合 Flume

官网教程:http://spark.apache.org/docs/1.3.0/streaming-flume-integration.html

有两种集成方式:

1、flume把数据推给stream

2、stream从flume拉取数据

 

基于方式1讲解

Flume有3个组件:source -> channel -> sink (streaming)

 

1、在flume的conf目录下增加 flume-spark-push.sh 文件,内容如下

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'


#define agent name a1, and configuration
a2.sources = r2
a2.channels = c2
a2.sinks = k2


#define sources
a2.sources.r2.type = exec
a2.sources.r2.command = tail -f /opt/datas/spark-flume/wctotal.log
a2.sources.r2.shell = /bin/bash -c


#define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100


#define sink
a2.sinks.k2.type = avro
a2.sinks.k2.hostname = hadoop-senior.ibeifeng.com
a2.sinks.k2.port = 9999


#define the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

2、在 /opt/datas/spark-flume 目录下准备文件wctotal.log,保存数据

hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop

3、把依赖包复制到spark的externallibs保存,从2个地方复制;一个是spark编译后的包,两个是flume安装目录下的包

cp /opt/modules/spark-1.3.0-src/external/flume/target/spark-streaming-flume_2.10-1.3.0.jar  /opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/


cp /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/lib/flume-avro-source-1.5.0-cdh5.3.6.jar /opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/
cp /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/lib/flume-ng-sdk-1.5.0-cdh5.3.6.jar /opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/

4、以本地模式运行,spark-shell 加载依赖包,如果有多个包,用逗号隔开

bin/spark-shell --jars \
/opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/spark-streaming-flume_2.10-1.3.0.jar,/opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/flume-avro-source-1.5.0-cdh5.3.6.jar,/opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/flume-ng-sdk-1.5.0-cdh5.3.6.jar

5、在spark-shell输入框,输入命令

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.flume._
import org.apache.spark.storage.StorageLevel


val ssc = new StreamingContext(sc, Seconds(5))

val stream = FlumeUtils.createStream(ssc, "hadoop-senior.ibeifeng.com", 9999, StorageLevel.MEMORY_ONLY_SER_2)

stream.count().map(cnt => "Received " + cnt + " flume event.").print()

ssc.start()             
ssc.awaitTermination() 

6、启动flume

bin/flume-ng agent -c conf -n a2 -f conf/flume-spark-push.sh -Dflume.root.logger=DEBUG,console

7、进入/opt/datas/spark-flume目录,追加数据到 wctotal.log 文件末尾

echo "hadoop spark hadoop" >> wctotal.log 

8、可以看到spark-shell框,打印日志,显示有日志被监控到有新的内容

Received 1 flume event.

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值