环境准备
安装
安装 flume-1.7.0
主要配置
flume-env.sh
# Enviroment variables can be set here.
export JAVA_HOME=/home/user/soft/jdk1.8.0/
# Give Flume more memory and pre-allocate, enable remote monitoring via JMX
export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
启动agent
flume-ng agent --conf conf --conf-file /usr/local/flume/conf/*.conf --name a1 -Dflume.root.logger=INFO,console
其中 --conf conf 是由于执行该命令时在flume安装根目录,完整命令如下:
flume-ng agent --conf /usr/local/flume/conf --conf-file /usr/local/flume/conf/*.conf --name a1 -Dflume.root.logger=INFO,console
模型
案例
netcat
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
#相当于服务器
a1.sources.r1.bind = 192.168.56.101
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
客户端执行:
/usr/local/flume >telnet 192.168.56.101 44444
Trying 192.168.56.101...
Connected to 192.168.56.101.
Escape character is '^]'.
hello
OK
flume-ng agent --conf /usr/local/flume/conf --conf-file /usr/local/flume/conf/netcat.conf --name a1 -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /usr/local/flume/conf/flume-env.sh
Info: Including Hive libraries found via () for Hive access
+ exec /home/user/soft/jdk1.8.0//bin/java -Xms100m -Xmx2000m -Dcom.sun.management.jmxremote -Dflume.root.logger=INFO,console -cp '/usr/local/flume/conf:/usr/local/flume/lib/*:/lib/*' -Djava.library.path= org.apache.flume.node.Application --conf-file /usr/local/flume/conf/netcat.conf --name a1
2017-05-02 11:11:36,454 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting
2017-05-02 11:11:36,458 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:134)] Reloading configuration file:/usr/local/flume/conf/netcat.conf
2017-05-02 11:11:36,462 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: k1 Agent: a1
2017-05-02 11:11:36,462 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:k1
2017-05-02 11:11:36,462 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:k1
2017-05-02 11:11:36,471 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [a1]
2017-05-02 11:11:36,471 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:147)] Creating channels
2017-05-02 11:11:36,478 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel c1 type memory
2017-05-02 11:11:36,481 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:201)] Created channel c1
2017-05-02 11:11:36,481 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source r1, type netcat
2017-05-02 11:11:36,487 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: k1, type: logger
2017-05-02 11:11:36,489 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:116)] Channel c1 connected to [r1, k1]
2017-05-02 11:11:36,495 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:137)] Starting new configuration:{ sourceRunners:{r1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:r1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@39496a6d counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
2017-05-02 11:11:36,502 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:144)] Starting Channel c1
2017-05-02 11:11:36,506 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
2017-05-02 11:11:36,506 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: c1 started
2017-05-02 11:11:36,508 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:171)] Starting Sink k1
2017-05-02 11:11:36,508 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:182)] Starting Source r1
2017-05-02 11:11:36,508 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting
2017-05-02 11:11:36,525 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:169)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.56.101:44444]
2017-05-02 11:11:58,541 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:95)] Event: { headers:{} body: 68 65 6C 6C 6F 0D hello. }
分布式 avro 日志收集
参考文章:http://blog.csdn.net/alphags/article/details/52862578
log.conf 配置
# Agent a1
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# source 配置
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/user/logs/app.log
# sink 配置 服务器地址配置
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=1.1.1.1
a1.sinks.k1.port=4545
# channel 配置
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 绑定source、single到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
#Agent a2
a2.sources=r2
a2.sinks=k2
a2.channels=c2
#a2 source 配置 相当于服务器
a2.sources.r2.type=avro
a2.sources.r2.bind=1.1.1.1
a2.sources.r2.port=4545
#a2 sink 配置 将合并后的日志数据写到/data/local/collector目录下
#logger的话如果输出太长会做文本截断
#a2.sinks.k2.type = logger
a2.sinks.k2.type = file_roll
a2.sinks.k2.sink.directory = /data/local/collector
a2.sinks.k2.sink.rollInterval=3600
#a2 channel配置
#a2.channels.c2.type = memory
#a2.channels.c2.capacity = 1000
#a2.channels.c2.transactionCapacity = 100
a2.channels.c2.type = file
a2.channels.c2.checkpointDir=/data/local/channels/checkpoint
a2.channels.c2.dataDirs = /data/local/channels/data
# 绑定source、single到channel上
a2.sources.r2.channels=c2
a2.sinks.k2.channel=c2
多个日志来源服务器分别执行:
flume-ng agent --conf /usr/local/flume/conf --conf-file /usr/local/flume/conf/log.conf --name a1 -Dflume.root.logger=INFO,console
日志收集服务器执行:
flume-ng agent --conf /usr/local/flume/conf --conf-file /usr/local/flume/conf/log.conf --name a2 -Dflume.root.logger=INFO,console
收集到的日志如下:
-rw-r--r-- 1 user dev 55022 4月 28 12:19 1493350131057-1
-rw-r--r-- 1 user dev 52214 4月 28 13:19 1493350131057-2
-rw-r--r-- 1 user dev 28611 4月 28 14:04 1493350131057-3