flume 特点:
分布式、可靠、高可用的海量日志采集、聚合和传输的系统
在生产者和消费者中间起协调作用
flume工作原理:
flume的数据流由事件(event)贯穿始终。事件是flume的基本单位,它携带日数据并且携带带有头信息,
这些event由agent外部的source生成,当source捕获事件后会进行特定的格式化,然后source会把事件推入channel中,
保存事件直到sink事件处理完该事件为止,sink负责持久化或者把事件推向另一个source或者写入hdfs、hbase
flume配置:
netcat收集数据(netcat source):
-
创建flume配置文件:$> /soft/flume/conf/xxx.conf
# example.conf: A single-node Flume configuration# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 # Describe/configure the source a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 44444 # Describe the sink a1.sinks.k1.type = logger # Use a channel which buffers events in memory a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 # Bind the source and sink to the channel a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
启动flume:
flume $> $ bin/flume-ng agent --conf conf --conf-file conf/xxxx.conf --name a1 -Dflume.root.logger=INFO,console -
客户端连接flume: (配置文件中已经指定IP:PORT)
$> nc localhost 44444 -
连接测试:
客户端产生数据:
$> hello worldflume收集客户端数据: (sinkRunner-PollingRunner-DefaultSinkProcessor)Event: {headers:{} body: 68 65 6C 6F 20 66 6C 75 6D 65 hello world}
实时收集(Exec Source):
-
配置
a1.sources = r1
a1.sinks = k1
a1.channels = c1a1.sources.r1.type = exec a1.sources.r1.command = tail -F /home/ubuntu/data/flume/execSource.txt a1.sinks.k1.type = logger a1.channels.c1.type = memory a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
启动flume
flume $> $ bin/flume-ng agent --conf conf --conf-file conf/execSource.conf --name a1 -Dflume.root.logger=INFO,console -
监听execSource.txt文件
$> echo hello world >> execSource.txt
批量收集数据:(Spooling Directory Source)
-
配置文件
spooling监听spoolDir目录是否有文件移入,如果有文件移入,则将对该文件并进行处理,完毕之后对文件重命名或者删除
a1.sources = r1
a1.sinks = k1
a1.channels = c1a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /home/ubuntu/data/flume/flumeSpool a1.sources.r1.fileHeader = true a1.sinks.k1.type = logger a1.channels.c1.type = memory a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
创建目录
/home/ubuntu/data/flume/flumeSpool -
启动flume
序列源测试:(Sequence Generator Source)
-
配置conf文件
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = seq
a1.sources.r1.channels = c1a1.channels.c1.type = memory
a1.sinks.k1.type = logger#a1.sources.r1.bind = localhost
#a1.sources.r1.port = 44444 -
启动flume
flume官方文档:
flume官方文档