1.flume agent的配置
simple-agent.sources = netcat-source
simple-agent.sinks = spark-sink
simple-agent.channels = memory-channel
simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = bigdata
simple-agent.sources.netcat-source.port = 44444
simple-agent.sinks.avro-sink.type = org.apache.spark.streaming.flume.sink.SparkSink
simple-agent.sinks.avro-sink.hostname = bigdata
simple-agent.sinks.avro-sink.port = 41414
simple-agent.channels.memory-channel.type = memory
simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channel
2.spark streaming应用程序开发
import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
object FlumePullWordCount {
def main(args: Array[String]): Unit = {
if(args.length != 2){
System.err.println("Usage:FlumePullWordCount<hostname> <port>")
System.exit(1)
}
val Array(hostname,port) = args
val sparkConf = new SparkConf().setAppName("FlumePullWordCount").setMaster("local[2]")
val ssc = new StreamingContext(sparkConf,Seconds(5))
val flumeStream = FlumeUtils.createPollingStream(ssc,hostname,port.toInt)
flumeStream.map(x=> new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}
3.本地调试
首先启动flume:flume-ng agent --name simple-agent --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/flume_pull_streaming.conf -Dflume.root.logger=INFO,console
用telnet发送数据:telnet 192.168.254.128 44444
启动spark streaming应用程序,观察统计结果
4.服务器段测试
将spark streaming应用程序打包:mvn clean package -DskipTests
将打包好的jar包拷贝到服务器段
启动flume和telnet
启动spark submit :spark-submit \
--class com.zbw.spark.FlumePullWordCount \
--master local[2] \
--packages org.apache.spark:spark-streaming-flume_2.11:2.2.0 \
/home/bigdata/lib/sparktrain-1.0.jar \
bigdata 41414
输入数据,观察结果