目标
将A服务器上的日志实时采集到B服务器
技术选型
版本说明
flume:apache-flume-1.6.0-cdh5.15.1-bin
kafka:kafka_2.11-0.9.0.0
配置文件
A服务器Flume配置文件:exec_memory_avro.conf
exec_memory_avro.sources = exec_source
exec_memory_avro.sinks = avro_sink
exec_memory_avro.channels = memory_channel
exec_memory_avro.sources.exec_source.type = exec
exec_memory_avro.sources.exec_source.command = tail -F /data/log/access_10000.log
exec_memory_avro.sources.exec_source.shell = /bin/sh -c
exec_memory_avro.sinks.avro_sink.type = avro
exec_memory_avro.sinks.avro_sink.hostname = hlsijx
exec_memory_avro.sinks.avro_sink.port = 44444
exec_memory_avro.channels.memory_channel.type = memory
exec_memory_avro.sources.exec_source.channels = memory_channel
exec_memory_avro.sinks.avro_sink.channel = memory_channel
B服务器Flume配置文件:avro_memory_kafka.conf
avro_memory_kafka.sources = avro_source
avro_memory_kafka.sinks = kafka_sink
avro_memory_kafka.channels = memory_channel
avro_memory_kafka.sources.avro_source.type = avro
avro_memory_kafka.sources.avro_source.bind = hlsijx
avro_memory_kafka.sources.avro_source.port = 44444
avro_memory_kafka.sinks.kafka_sink.type = org.apache.flume.sink.kafka.KafkaSink
avro_memory_kafka.sinks.kafka_sink.kafka.bootstrap.servers = hlsijx:9092
avro_memory_kafka.sinks.kafka_sink.kafka.topic = hello-spark
avro_memory_kafka.sinks.kafka_sink.flumeBatchSize = 3
avro_memory_kafka.channels.memory_channel.type = memory
avro_memory_kafka.sources.avro_source.channels = memory_channel
avro_memory_kafka.sinks.kafka_sink.channel = memory_channel
测试
1、启动B服务器的Agent(必须先启动B,因为要监听44444端口)
flume-ng agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/avro_memory_kafka.conf \
--name avro_memory_kafka \
-Dflume.root.logger=INFO,console
2、启动A服务器的Agent
flume-ng agent \
--conf $FLUME_HOME/conf \
--conf-file $FLUME_HOME/conf/exec_memory_avro.conf \
--name exec_memory_avro \
-Dflume.root.logger=INFO,console
3、新开一个窗户输入echo hello flume >> access_10000.log
,同时启动flume和kafka在服务器B就能看到消费结果。