1.安装,参考官方文档
https://ccp.cloudera.com/display/CDH4DOC/Flume+Installation
2.配置文档
conf/flume-ng.conf:
agent1.sources = log01
agent1.sinks = hdfs01
agent1.channels = momery01
agent1.sources.log01.type = exec
agent1.sources.log01.command = tail -F /home/bi/data-integration/biserver-ce/tomcat/logs/catalina.out
agent1.sources.log01.channels = momery01
agent1.sinks.hdfs01.type = hdfs
agent1.sinks.hdfs01.hdfs.path = hdfs://hdfs:8020/flume/webdata
agent1.sinks.hdfs01.channel = momery01
agent1.channels.momery01.type = memory
agent1.channels.momery01.capacity = 1000
agent1.channels.momery01.transactionCapacity = 100
3.启动flume-ng-agent服务
(注意:可能一次启动不来,需要启动两次)
4.切换用户
$ su hdfs
5.执行采集
$ bin/flume-ng agent -c conf -f conf/flume.conf -n agent1
...
12/11/28 16:03:53 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 1
12/11/28 16:03:53 INFO node.FlumeNode: Flume node starting - agent1
12/11/28 16:03:53 INFO nodemanager.DefaultLogicalNodeManager: Node manager starting
12/11/28 16:03:53 INFO lifecycle.LifecycleSupervisor: Starting lifecycle supervisor 10
12/11/28 16:03:53 INFO properties.PropertiesFileConfigurationProvider: Configuration provider starting
12/11/28 16:03:53 INFO properties.PropertiesFileConfigurationProvider: Reloading configuration file:conf/flume.conf
12/11/28 16:03:53 INFO conf.FlumeConfiguration: Processing:hdfs01
12/11/28 16:03:53 INFO conf.FlumeConfiguration: Processing:hdfs01
12/11/28 16:03:53 INFO conf.FlumeConfiguration: Processing:hdfs01
12/11/28 16:03:53 INFO conf.FlumeConfiguration: Added sinks: hdfs01 Agent: agent1
12/11/28 16:03:53 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [agent1]
12/11/28 16:03:53 INFO properties.PropertiesFileConfigurationProvider: Creating channels
12/11/28 16:03:53 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: CHANNEL, name: momery01, registered successfully.
12/11/28 16:03:53 INFO properties.PropertiesFileConfigurationProvider: created channel momery01
12/11/28 16:03:54 INFO sink.DefaultSinkFactory: Creating instance of sink: hdfs01, type: hdfs
12/11/28 16:03:54 INFO hdfs.HDFSEventSink: Hadoop Security enabled: false
12/11/28 16:03:54 INFO instrumentation.MonitoredCounterGroup: Monitoried counter group for type: SINK, name: hdfs01, registered successfully.
12/11/28 16:03:54 INFO nodemanager.DefaultLogicalNodeManager: Starting new configuration:{ sourceRunners:{log01=EventDrivenSourceRunner: { source:org.apache.flume.source.ExecSource{name:log01,state:IDLE} }} sinkRunners:{hdfs01=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4ebac9b9 counterGroup:{ name:null counters:{} } }} channels:{momery01=org.apache.flume.channel.MemoryChannel{name: momery01}} }
12/11/28 16:03:54 INFO nodemanager.DefaultLogicalNodeManager: Starting Channel momery01
12/11/28 16:03:54 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: momery01 started
12/11/28 16:03:54 INFO nodemanager.DefaultLogicalNodeManager: Waiting for channel: momery01 to start. Sleeping for 500 ms
12/11/28 16:03:55 INFO nodemanager.DefaultLogicalNodeManager: Starting Sink hdfs01
12/11/28 16:03:55 INFO nodemanager.DefaultLogicalNodeManager: Starting Source log01
12/11/28 16:03:55 INFO source.ExecSource: Exec source starting with command:tail -F /home/bi/data-integration/biserver-ce/tomcat/logs/catalina.out
12/11/28 16:03:55 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: hdfs01 started
6. 更新日志文件catalina.out ,控制台会出现
12/11/28 16:04:03 INFO hdfs.BucketWriter: Creating hdfs://hdfs004:8020/flume/webdata/FlumeData.1354089841687.tm