【Spark】Spark Stream 整合 Flume

最新推荐文章于 2022-07-14 09:58:12 发布

晚风中的自由

最新推荐文章于 2022-07-14 09:58:12 发布

阅读量108

点赞数

分类专栏： Spark 大数据文章标签： Spark

本文链接：https://blog.csdn.net/u014028317/article/details/103301842

版权

大数据同时被 2 个专栏收录

41 篇文章 0 订阅

订阅专栏

Spark

24 篇文章 3 订阅

订阅专栏

官网教程：http://spark.apache.org/docs/1.3.0/streaming-flume-integration.html

有两种集成方式：

1、flume把数据推给stream

2、stream从flume拉取数据

基于方式1讲解

Flume有3个组件：source -> channel -> sink (streaming)

1、在flume的conf目录下增加 flume-spark-push.sh 文件，内容如下

# The configuration file needs to define the sources, 
# the channels and the sinks.
# Sources, channels and sinks are defined per agent, 
# in this case called 'agent'


#define agent name a1, and configuration
a2.sources = r2
a2.channels = c2
a2.sinks = k2


#define sources
a2.sources.r2.type = exec
a2.sources.r2.command = tail -f /opt/datas/spark-flume/wctotal.log
a2.sources.r2.shell = /bin/bash -c


#define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100


#define sink
a2.sinks.k2.type = avro
a2.sinks.k2.hostname = hadoop-senior.ibeifeng.com
a2.sinks.k2.port = 9999


#define the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

2、在 /opt/datas/spark-flume 目录下准备文件wctotal.log，保存数据

hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop
hadoop spark hadoop

3、把依赖包复制到spark的externallibs保存，从2个地方复制；一个是spark编译后的包，两个是flume安装目录下的包

cp /opt/modules/spark-1.3.0-src/external/flume/target/spark-streaming-flume_2.10-1.3.0.jar  /opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/


cp /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/lib/flume-avro-source-1.5.0-cdh5.3.6.jar /opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/
cp /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/lib/flume-ng-sdk-1.5.0-cdh5.3.6.jar /opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/

4、以本地模式运行，spark-shell 加载依赖包,如果有多个包，用逗号隔开

bin/spark-shell --jars \
/opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/spark-streaming-flume_2.10-1.3.0.jar,/opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/flume-avro-source-1.5.0-cdh5.3.6.jar,/opt/cdh-5.3.6/spark-1.3.0-bin-2.5.0-cdh5.3.6/externallibs/flume-ng-sdk-1.5.0-cdh5.3.6.jar

5、在spark-shell输入框，输入命令

import org.apache.spark._
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.flume._
import org.apache.spark.storage.StorageLevel


val ssc = new StreamingContext(sc, Seconds(5))

val stream = FlumeUtils.createStream(ssc, "hadoop-senior.ibeifeng.com", 9999, StorageLevel.MEMORY_ONLY_SER_2)

stream.count().map(cnt => "Received " + cnt + " flume event.").print()

ssc.start()             
ssc.awaitTermination()

6、启动flume

bin/flume-ng agent -c conf -n a2 -f conf/flume-spark-push.sh -Dflume.root.logger=DEBUG,console

7、进入/opt/datas/spark-flume目录，追加数据到 wctotal.log 文件末尾

echo "hadoop spark hadoop" >> wctotal.log

8、可以看到spark-shell框，打印日志，显示有日志被监控到有新的内容

Received 1 flume event.

晚风中的自由

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
【Spark】Spark Stream 整合 Flume

官网教程：http://spark.apache.org/docs/1.3.0/streaming-flume-integration.html有两种集成方式：1、flume把数据推给stream2、stream从flume拉取数据基于方式1讲解Flume有3个组件：source -> channel -> sink (streaming)1、在fl...
复制链接

扫一扫