需求:将A服务器上的日志实时采集到B服务器
说明:如下图
机器A对应本人hadoop01虚拟机,
机器B对应本人hadoop02虚拟机
机器A监控日志文件,向机器A中输入数据,avro sink把新产生的日志输出到对应的avro source 指定的hostname 和port上,通过avro source对应的agent将日志输出到控制台(kafka)
技术选型:
exec source + memory channel + avro sink (配置在机器A/hadoop01上)
avro source + memory channel + logger sink (配置在机器B/hadoop02上)
vim exec-memory-avro.conf hadoop01机器
#定义这个agent中各组件的名字
exec-memory-avro.sources = exec-sources
exec-memory-avro.sinks = avro-sink
exec-memory-avro.channels = memory-channel
exec-memory-avro.sources.exec-sources.type = exec
exec-memory-avro.sources.exec-sources.command = tail -f /opt/bigdatas/flumedata.log
exec-memory-avro.sources.exec-sources.shell = /bin/sh -c
# 指定hadoop02
exec-memory-avro.sinks.avro-sink.type = avro
exec-memory-avro.sinks.avro-sink.hostname = hadoop02
exec-memory-avro.sinks.avro-sink.port = 44444
exec-memory-avro.channels.memory-channel.type = memory
exec-memory-avro.sources.exec-sources.channels = memory-channel
exec-memory-avro.sinks.avro-sink.channel = memory-channel
vim avro-memory-logger.conf hadoop02服务器
#定义这个agent中各组件的名字
avro-memory-logger.sources = avro-sources
avro-memory-logger.sinks = logger-sink
avro-memory-logger.channels = memory-channel
avro-memory-logger.sources.avro-sources.type = avro
avro-memory-logger.sources.avro-sources.bind = hadoop02
avro-memory-logger.sources.avro-sources.port = 44444
avro-memory-logger.sinks.logger-sink.type = logger
avro-memory-logger.channels.memory-channel.type = memory
avro-memory-logger.sources.avro-sources.channels = memory-channel
avro-memory-logger.sinks.logger-sink.channel = memory-channel
先启动hadoop02机器 avro-memory-logger
./flume-ng agent --name avro-memory-logger --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/avro-memory-logger.conf -Dflume.root.logger=INFO,console
再启动hadoop01机器 exec-memory-avro
./flume-ng agent --name exec-memory-avro --conf $FLUME_HOME/conf --conf-file $FLUME_HOME/conf/exec-memory-avro.conf -Dflume.root.logger=INFO,console
另外打开hadoop01服务器:输入数据到日志文件里
打开hadoo02可以看到数据从hadoop01机器流向hadoop02机器: