【参考:http://spark.apache.org/docs/2.1.0/streaming-flume-integration.html】
1.环境
spark2.1.0
flume1.6.0
2.flume的配置文件flume_push_streaming.conf
(1)flume作用是将服务器数据,传递到本地windows环境的端口
(2)IP:192.168.57.1是本地windows的IP
simple-agent.sources = netcat-source
simple-agent.sinks = avro-sink
simple-agent.channels = memory-channel
simple-agent.sources.netcat-source.type = netcat
simple-agent.sources.netcat-source.bind = hadoop
simple-agent.sources.netcat-source.port = 44444
simple-agent.sinks.avro-sink.type = avro
simple-agent.sinks.avro-sink.hostname = 192.168.57.1
simple-agent.sinks.avro-sink.port = 41414
simple-agent.channels.memory-channel.type = memory
simple-agent.sources.netcat-source.channels = memory-channel
simple-agent.sinks.avro-sink.channel = memory-channel
3.scala代码
(1)依赖
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-flume_2.11</artifactId>
<version>${spark.version}</version>
</dependency>
(2)代码
package _0918MukeSpark
import org.apache.spark.SparkConf
import org.apache.spark.streaming.flume.FlumeUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
/**
* sparkstreaming 整合flume的第一种方式
*/
object FlumePushWordCount_product {
def main(args: Array[String]): Unit = {
//实际生产使用
if(args.length!=2){
System.err.println("Usage:FlumePushWordCount_product <hostname><port>")
System.exit(1)
}
val Array(hostname,port)=args
var sparkConf=new SparkConf().setMaster("local[2]").setAppName("FlumePushWordCount")
val ssc=new StreamingContext(sparkConf,Seconds(5))
//TODO:如何使用Sparkfluming 整合flume
// val flumeStream= FlumeUtils.createStream(ssc,"0.0.0.0",41414)
val flumeStream= FlumeUtils.createStream(ssc,hostname,port.toInt)
flumeStream.map(x=>new String(x.event.getBody.array()).trim)
.flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).print()
ssc.start()
ssc.awaitTermination()
}
}
4.测试
(1)启动FlumePushWordCount代码
运行左边的下三角-》Edit Configurations-》Program arguments,填写:0.0.0.0 41414
(2)启动flume
bin/flume-ng agent \
--name simple-agent \
--conf conf \
--conf-file conf/flume_push_streaming.conf \
-Dflume.root.logger=INFO,console
(3)telnet输入数据
[root@bigdata /]# telnet hadoop 44444
Trying 192.168.31.3...
Connected to hadoop.
Escape character is '^]'.
fe
OK
sef