push方式
1、编写代码(以word count为例)
首先要引入maven包
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-flume_2.11</artifactId>
<version>2.2.0</version>
</dependency>
然后编写业务逻辑(word count)
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.flume.{FlumeUtils, SparkFlumeEvent}
import org.apache.spark.streaming.{Seconds, StreamingContext}
object TestSp extends App {
//todo 创建一个spark StrieamingContext对象
private val sparkConf: SparkConf = new SparkConf().setMaster("local[2]").setAppName("StreamingDemo2")
private val ssc = new StreamingContext(sparkConf,Seconds(5))
//todo push 方式
private val flumeStream: ReceiverInputDStream[SparkFlumeEvent] = FlumeUtils.createStream(ssc,"hadoopt",55555)
flumeStream.map(x=>new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
然后把scala文件打成jar包
2、编写flume配置文件(在linux虚拟机上)
agent.sources = s1
agent.channels = c1
agent.sinks = sk1
# 设置Source的类型为netcat,使用的channel为c1
agent.sources.s1.type = netcat
agent.sources.s1.bind = hadoopt
agent.sources.s1.port = 44444
agent.sources.s1.channels = c1
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1000
# AvroSink向Spark(55555)推送数据
# 使用push createStream
agent.sinks.sk1.type = avro
agent.sinks.sk1.hostname = hadoopt
agent.sinks.sk1.port = 55555
agent.sinks.sk1.channel = c1
3、测试
在jar包所在文件夹运行:
spark-submit --class spStreaming1.test1.TestSp spStreamingPractice-1.0-SNAPSHOT.jar
spStreaming1.test1.TestSp 是类名(包含路径)
spStreamingPractice-1.0-SNAPSHOT.jar 是jar包名
然后如图:
然后打开另一个窗口,进入flume配置文件目录执行:
flume-ng agent -f spark-streaming-flume.conf -n agent
spark-streaming-flume.conf 是刚才编写的配置文件名
然后如图:
再打开一个新窗口:
telnet hadoopt 44444
输入一些单词,用空格隔开
比如:
然后在第一个窗口就可以看到:
至此,push方式测试成功!
poll方式
1、引入与push相同的jar包
编写业务逻辑(word count)
import org.apache.spark.SparkConf
import org.apache.spark.streaming.dstream.ReceiverInputDStream
import org.apache.spark.streaming.flume.{FlumeUtils, SparkFlumeEvent}
import org.apache.spark.streaming.{Seconds, StreamingContext}
object TestPoll extends App {
//todo 创建一个spark StrieamingContext对象
private val sparkConf: SparkConf = new SparkConf().setMaster("local[2]").setAppName("StreamingDemo2")
private val ssc = new StreamingContext(sparkConf,Seconds(5))
//todo poll方式
private val flumePollStream: ReceiverInputDStream[SparkFlumeEvent] = FlumeUtils.createPollingStream(ssc,"hadoopt",55555)
flumePollStream.map(x=>new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
同样打成jar包,上传至linux虚拟机
2、编写flume配置文件(在linux虚拟机上)
agent.sources = s1
agent.channels = c1
agent.sinks = sk1
#设置Source的内省为netcat,使用的channel为c1
agent.sources.s1.type = netcat
agent.sources.s1.bind = hadoopt
agent.sources.s1.port = 44444
agent.sources.s1.channels = c1
#SparkSink,要求flume lib目录存在spark-streaming-flume-sink_2.11-x.x.x.jar
agent.sinks.sk1.type = org.apache.spark.streaming.flume.sink.SparkSink
agent.sinks.sk1.hostname = hadoopt
agent.sinks.sk1.port = 55555
agent.sinks.sk1.channel = c1
#设置channel信息
#内存模式
agent.channels.c1.type = memory
agent.channels.c1.capacity = 1000
3、测试
测试方式与push方式基本相同
但要先启动Flume
然后启动Spark Streaming作业
最后telnet连接44444端口并发送数据