第一种sparkStreaming 整合Flume
flume采用 netcat-memory-avro架构
本地测试
1:本地启动sprakStreaming服务,(0.0.0.0 10000)
2. 服务器中启动flume agent
3. telnet往端口中输入数据,观察本地idea控制台输出数据
服务器测试
mvn打包:mvn clean package -DskipTests
上传至服务器
先启动spark
spark-submit \
--class com.tuzhihai.flumespark.FlumePushSpark \
--master local[2] \
--packages org.apache.spark:spark-streaming-flume_2.11:2.2.0 \
192.168.145.128 10000
后启动flume
flume-ng agent \
--name netcat-memory-avro \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/netcat-memory-avro.conf \
-Dflume.root.logger=INFO,console
在端口输入数据
telnet 192.168.145.128 9990
观察flume控制台
push方式为什么要先启动spark,后启动flume?
因为采用的是flume-Push,要push到一个服务器里,首先这个服务里得存在是不?所以要先启动spark这个接收数据的服务器,再启动flume这个采集数据的工具
flume-push-stream.conf
flume-ng agent \
--name netcat-memory-avro \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/netcat-memory-avro.conf \
-Dflume.root.logger=INFO,console
# example netcat-memory-avro
netcat-memory-avro.sources = netcat-source
netcat-memory-avro.sinks = avro-sink
netcat-memory-avro.channels = memory-channel
# Describe/configure the source
netcat-memory-avro.sources.netcat-source.type = netcat
netcat-memory-avro.sources.netcat-source.bind = 192.168.145.128
netcat-memory-avro.sources.netcat-source.port = 9999
# Describe/ the sink
netcat-memory-avro.sinks.avro-sink.type = avro
netcat-memory-avro.sinks.avro-sink.hostname = 192.168.145.128
netcat-memory-avro.sinks.avro-sink.port = 10000
# Use a channel which buffers events in memory
netcat-memory-avro.channels.memory-channel.type = memory
# Bind the source and sink to the channel
netcat-memory-avro.sources.netcat-source.channels = memory-channel
netcat-memory-avro.sinks.avro-sink.channel = memory-channel