Flume的使用
案例1监控端口数据
http://flume.apache.org/FlumeUserGuide.html#a-simple-example
- 创建一个专门放置flume配置文件的目录
mkdir -p /opt/bdp/apache-flume-1.6.0-bin/options
- 创建配置文件
vim example.conf
##新增以下内容
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sinks.k1.type = logger
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
-
启动flume
flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
-
安装telnet
yum install telnet
-
向44444端口中输入数据
telnet localhost 44444
-
退出:在启动服务的窗口关闭
ctrl + c
提示:Memory Chanel 配置
capacity:默认该通道中最大的可以存储的event数量是100,
trasactionCapacity:每次最大可以source中拿到或者送到sink中的event数量也是100
keep-alive:event添加到通道中或者移出的允许时间
byte:即event的字节量的限制,只包括eventbody
案例2两个flume做集群
-
node01服务器中,配置文件
# Name the components on this agent a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = exec a1.sources.r1.command = tail -F /opt/bdp/flume.txt a1.sinks.k1.type = avro a1.sinks.k1.hostname = node02 a1.sinks.k1.port = 45454 a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
node02服务器中,安装Flume(步骤略)
a2.sources = r1 a2.sinks = k1 a2.channels = c1 a2.sources.r1.type = avro a2.sources.r1.bind = node02 a2.sources.r1.port = 45454 a2.sinks.k1.type = logger a2.channels.c1.type = memory a2.channels.c1.capacity = 1000 a2.channels.c1.transactionCapacity = 100 a2.sources.r1.channels = c1 a2.sinks.k1.channel = c1
-
先启动node02的Flume
flume-ng agent -n a2 -c options/ -f example.conf -Dflume.root.logger=INFO,console
-
再启动node01的Flume
flume-ng agent -n a1 -c options/ -f example.conf2
-
打开telnet 测试 node02控制台输出结果
案例3Exec Source
http://flume.apache.org/FlumeUserGuide.html#exec-source
配置文件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/bdp/flume.exec.log
Describe the sink
a1.sinks.k1.type = logger
Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
-
启动Flume
flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
-
创建空文件演示 touch flume.exec.log,循环添加数据
for i in {1..50}; do echo "$i hi flume" >> flume.exec.log ; sleep 0.1; done ping www.baidu.com >> baidu.log
案例4Spooling Source
http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source
-
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sinks.k1.type = logger a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /home/logs a1.sources.r1.fileHeader = true a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
启动Flume
flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
-
拷贝文件演示
mkdir logs cp flume.exec.log logs/
案例5hdfs sink
http://flume.apache.org/FlumeUserGuide.html#hdfs-sink
-
配置文件
a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type = spooldir a1.sources.r1.spoolDir = /home/logs a1.sources.r1.fileHeader = true a1.sinks.k1.type=hdfs a1.sinks.k1.hdfs.path=hdfs://hdfs-bdp/flume/%Y-%m-%d/%H%M ##每隔60s或者文件大小超过10M的时候产生新文件 ##hdfs有多少条消息时新建文件,0不基于消息个数 a1.sinks.k1.hdfs.rollCount=0 ##hdfs创建多长时间新建文件,0不基于时间 a1.sinks.k1.hdfs.rollInterval=60 ##hdfs多大时新建文件,0不基于文件大小 a1.sinks.k1.hdfs.rollSize=10240 ##当目前被打开的临时文件在该参数指定的时间(秒)内,没有任何数据写入,则将该临时文件关闭并重命名成目标文件 a1.sinks.k1.hdfs.idleTimeout=3 a1.sinks.k1.hdfs.fileType=DataStream a1.sinks.k1.hdfs.useLocalTimeStamp=true ##每五分钟生成一个目录: ##是否启用时间上的”舍弃”,这里的”舍弃”,类似于”四舍五入”,后面再介绍。如果启用,则会影响除了%t的其他所有时间表达式 a1.sinks.k1.hdfs.round=true ##时间上进行“舍弃”的值; a1.sinks.k1.hdfs.roundValue=5 ##时间上进行”舍弃”的单位,包含:second,minute,hour a1.sinks.k1.hdfs.roundUnit=minute a1.channels.c1.type = memory a1.channels.c1.capacity = 1000 a1.channels.c1.transactionCapacity = 100 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1
-
创建HDFS目录
hadoop fs -mkdir /flume
-
启动Flume
flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
-
查看hdfs文件
hadoop fs -ls /flume/*
五入”,后面再介绍。如果启用,则会影响除了%t的其他所有时间表达式
a1.sinks.k1.hdfs.round=true
##时间上进行“舍弃”的值;
a1.sinks.k1.hdfs.roundValue=5
##时间上进行”舍弃”的单位,包含:second,minute,hour
a1.sinks.k1.hdfs.roundUnit=minute
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
- 创建HDFS目录
```apl
hadoop fs -mkdir /flume
-
启动Flume
flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
-
查看hdfs文件
hadoop fs -ls /flume/*