flume原理
flume工作原理:
flume的数据流由事件(event)贯穿始终。事件是flume的基本单位,它携带日数据并且携带带有头信息,
这些event由agent外部的source生成,当source捕获事件后会进行特定的格式化,然后source会把事件推入channel中,
保存事件直到sink事件处理完该事件为止,sink负责持久化或者把事件推向另一个source或者写入hdfs、hbase
flume sinks:
hdfs sinks:
-
配置文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1a1.sources.r1.type = netcat
al.sources.r1.bind = localhost
a1.sources.r1.port = 44444a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/%y-%m-%d/%H/%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minutea1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sinks.k1.hdfs.useLocalTimeStamp = truea1.sources.r1.channels = c1
a1.sinks.k1.channel = c1 -
启动flume
$> hdfs dfs -mkdir /flume
flume $> $ bin/flume-ng agent --conf conf --conf-file conf/hdfs.conf --name a1 -
结果
ubuntu@s100:~$ hdfs dfs -lsr /home/ubuntu/flume
lsr: DEPRECATED: Please use 'ls -R' instead.
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/11
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/11/50
-rw-r--r-- 3 ubuntu supergroup 125 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/11/50/events-.1544260319853
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/12
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/12/00
-rw-r--r-- 3 ubuntu supergroup 130 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/12/00/events-.1544260327730
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/12/20
-rw-r--r-- 3 ubuntu supergroup 206 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/12/20/events-.1544260345929.tmp
drwxr-xr-x - ubuntu supergroup 0 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/12/30
-rw-r--r-- 3 ubuntu supergroup 126 2018-12-08 01:12 /home/ubuntu/flume/18-12-08/01/12/30/events-.1544260358119.tmp
ubuntu@s100:~$ hdfs dfs -text /home/ubuntu/flume/18-12-08/01/11/50/events-.1544260319853
1544260321568 68 65 6c 6c 6f 20 20 74 6f 6d
ubuntu@s100:~$
hbase sinks:
-
配置conf文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1a1.sources.r1.type = netcat
al.sources.r1.bind = localhost
a1.sources.r1.port = 44444a1.sinks.k1.type = hbase
a1.sinks.k1.table = foo_table
a1.sinks.k1.columnFamily = bar_cf
a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializera1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1 -
启动flume
flume $> $ bin/flume-ng agent --conf conf --conf-file conf/hbase.conf --name a1 -
创建表
hbase shell $> create ‘foo_table’,‘bar_cf’
$> nc localhost 44444
$> hello world
hbase shell $> scan ‘bar_cf’ -
输入
hello world
OK
hello tom
OK
hello world
OK
hello world
OK
hello world
OK
- 结果
hbase(main):002:0> scan 'foo_table'
ROW COLUMN+CELL
1544274602241-L46siThioB-0 column=bar_cf:payload, timestamp=1544274605757, value=hello world
1544274605837-L46siThioB-1 column=bar_cf:payload, timestamp=1544274608840, value=hello tom
1544274610202-L46siThioB-2 column=bar_cf:payload, timestamp=1544274613204, value=hello world
1544274613297-L46siThioB-3 column=bar_cf:payload, timestamp=1544274616309, value=hello world
1544274617162-L46siThioB-4 column=bar_cf:payload, timestamp=1544274620164, value=hello world
Kafka sinks(输出到kafka):待续
flume官方文档:
flume.apache.org/FlumeUserGuide.html