1.Flume agent的配置
simple-agent.sources = netcat-source
simple-agent.sinks = avro-sink
simple-agent.channels = memory-channel
simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = bigdata
simple-agent.sources.netcat-source.port = 44444
simple-agent.sinks.avro-sink.type = avro
simple-agent.sinks.avro-sink.hostname = bigdata
simple-agent.sinks.avro-sink.port = 41414
simple-agent.channels.memory-channel.type = memory
simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channe
2.Spark streaming应用程序的开发
import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
object FlumePushWordCount {
def main(args: Array[String]): Unit = {
val sparkConf = new SparkConf().setAppName("FlumePushWordCount").setMaster("local[2]")
val ssc = new StreamingContext(sparkConf,Seconds(5))
val flumeStream = FlumeUtils.createStream(ssc,"bigdata",41414)
flumeStream.map(x=> new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}
3.本地环境调试
更改Spark streaming应用程序中的bigdata为本地ip:0.0.0.0
更改flume配置文件中sink的hostname为本地ip
这两步调整是为了在本地测试
之后(1)启动spark streaming作业
(2)启动flume agent
(3)利用telnet localhost 44444 输入数据,在控制台观察结果
4.服务器环境调试
将本地的project用maven打包,mvn clean package -DskipTests
将打包后的jar包拷贝至服务器中
启动spark,spark-submit
启动flume
通过telnet输入数据
在服务器上观察结果