SparkStreaming和Flume整合

maven依赖:

    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming-flume_2.11</artifactId>
      <version>${spark.version}</version>
    </dependency>

Streaming+Flume整合有两种模式
1.Flume-style Push-based Approach

Flume源码:

avro-sink-agent.sources = netcat-source
avro-sink-agent.sinks = avro-sink
avro-sink-agent.channels = netcat-memory-channel

avro-sink-agent.sources.netcat-source.type = netcat
avro-sink-agent.sources.netcat-source.bind = localhost
avro-sink-agent.sources.netcat-source.port = 44444

avro-sink-agent.channels.netcat-memory-channel.type = memory

avro-sink-agent.sinks.avro-sink.type = avro
avro-sink-agent.sinks.avro-sink.hostname = localhost
avro-sink-agent.sinks.avro-sink.port = 41414

avro-sink-agent.sources.netcat-source.channels = netcat-memory-channel
avro-sink-agent.sinks.avro-sink.channel = netcat-memory-channel

Spark源码:

package com.ruoze.spark.SparkStreaming_flume
import org.apache.spark.SparkConf
import org.apache.spark.streaming.{Seconds, StreamingContext}
import org.apache.spark.streaming.flume.FlumeUtils
object ssf {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setMaster("local[2]").setAppName("ssf")
    val ssc = new StreamingContext(sparkConf, Seconds(10))
    val lines = FlumeUtils.createStream(ssc, "ruozehadoop000", 41414)
    // 业务操作区域,此处为WC
    // SparkFlumeEvent ==> String
    lines.map(x => new String(x.event.getBody.array()).trim)
      .flatMap(_.split(",")).map((_,1)).reduceByKey(_+_)
      .print()
    ssc.start()
    ssc.awaitTermination()
  }
}

先起Spark再起Flume
spark-submit --master local[2] \
--packages org.apache.spark:spark-streaming-flume_2.11:2.3.0 \
--class com.ruozedata.spark.streaming.day03.StreamingFlumeApp02 \
/home/hadoop/lib/g3-spark-1.0.jar

flume-ng agent \
--name avro-sink-agent \
--conf $FLUME_HOME/conf \
--conf-file /home/hadoop/script/flume/flume_push_streaming.conf \
-Dflume.root.logger=INFO,console &

2.Pull-based Approach using a Custom Sink(先起Flume再起Spark)(相对较好)

Flume源码:

avro-sink-agent.sources = netcat-source
avro-sink-agent.sinks = spark-sink
avro-sink-agent.channels = netcat-memory-channel

avro-sink-agent.sources.netcat-source.type = netcat
avro-sink-agent.sources.netcat-source.bind = localhost
avro-sink-agent.sources.netcat-source.port = 44444

avro-sink-agent.channels.netcat-memory-channel.type = memory

avro-sink-agent.sinks.spark-sink.type = org.apache.spark.streaming.flume.sink.SparkSink
avro-sink-agent.sinks.spark-sink.hostname = localhost
avro-sink-agent.sinks.spark-sink.port = 41414

avro-sink-agent.sources.netcat-source.channels = netcat-memory-channel
avro-sink-agent.sinks.spark-sink.channel = netcat-memory-channel

Spark源码与模式一对比换个API即可,其他不变
val lines = FlumeUtils.createPollingStream(ssc, "ruozehadoop000", 41414)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值