Flume的案列使用

Flume的使用

案例1监控端口数据

http://flume.apache.org/FlumeUserGuide.html#a-simple-example

  • 创建一个专门放置flume配置文件的目录
mkdir -p /opt/bdp/apache-flume-1.6.0-bin/options

  • 创建配置文件
vim example.conf

##新增以下内容

# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

a1.sinks.k1.type = logger

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

  • 启动flume

    flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
    
    
  • 安装telnet

    yum install telnet
    
    
  • 向44444端口中输入数据

    telnet localhost 44444
    
  • 退出:在启动服务的窗口关闭

    ctrl + c
    

提示:Memory Chanel 配置

​ capacity:默认该通道中最大的可以存储的event数量是100,

​ trasactionCapacity:每次最大可以source中拿到或者送到sink中的event数量也是100

​ keep-alive:event添加到通道中或者移出的允许时间

​ byte:即event的字节量的限制,只包括eventbody

案例2两个flume做集群

  • node01服务器中,配置文件

    # Name the components on this agent
    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    a1.sources.r1.type = exec
    a1.sources.r1.command = tail -F /opt/bdp/flume.txt
    
    a1.sinks.k1.type = avro
    a1.sinks.k1.hostname = node02
    a1.sinks.k1.port = 45454
    
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    
    
  • node02服务器中,安装Flume(步骤略)

    a2.sources = r1
    a2.sinks = k1
    a2.channels = c1
    
    a2.sources.r1.type = avro
    a2.sources.r1.bind = node02
    a2.sources.r1.port = 45454
    
    a2.sinks.k1.type = logger
    
    a2.channels.c1.type = memory
    a2.channels.c1.capacity = 1000
    a2.channels.c1.transactionCapacity = 100
    
    a2.sources.r1.channels = c1
    a2.sinks.k1.channel = c1
    
  • 先启动node02的Flume

    flume-ng agent -n a2 -c options/ -f example.conf -Dflume.root.logger=INFO,console
    
  • 再启动node01的Flume

    flume-ng agent -n a1 -c options/ -f example.conf2
    
  • 打开telnet 测试 node02控制台输出结果

案例3Exec Source

http://flume.apache.org/FlumeUserGuide.html#exec-source

配置文件

a1.sources = r1
a1.sinks = k1
a1.channels = c1

Describe/configure the source

a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/bdp/flume.exec.log

Describe the sink

a1.sinks.k1.type = logger

Use a channel which buffers events in memory

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

Bind the source and sink to the channel

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
  • 启动Flume

    flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
    
  • 创建空文件演示 touch flume.exec.log,循环添加数据

    for i in {1..50}; do echo "$i hi flume" >> flume.exec.log ; sleep 0.1; done
    
    ping  www.baidu.com >> baidu.log
    

案例4Spooling Source

http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

  • 配置文件

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    a1.sinks.k1.type = logger
    
    a1.sources.r1.type = spooldir
    a1.sources.r1.spoolDir = /home/logs
    a1.sources.r1.fileHeader = true
    
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    
  • 启动Flume

    flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
    
    
  • 拷贝文件演示

    mkdir logs
    
    cp flume.exec.log logs/
    
    

案例5hdfs sink

http://flume.apache.org/FlumeUserGuide.html#hdfs-sink

  • 配置文件

    a1.sources = r1
    a1.sinks = k1
    a1.channels = c1
    
    a1.sources.r1.type = spooldir
    a1.sources.r1.spoolDir = /home/logs
    a1.sources.r1.fileHeader = true
    
    a1.sinks.k1.type=hdfs
    a1.sinks.k1.hdfs.path=hdfs://hdfs-bdp/flume/%Y-%m-%d/%H%M
    
    ##每隔60s或者文件大小超过10M的时候产生新文件
    
    ##hdfs有多少条消息时新建文件,0不基于消息个数
    
    a1.sinks.k1.hdfs.rollCount=0
    
    ##hdfs创建多长时间新建文件,0不基于时间
    
    a1.sinks.k1.hdfs.rollInterval=60
    
    ##hdfs多大时新建文件,0不基于文件大小
    
    a1.sinks.k1.hdfs.rollSize=10240
    
    ##当目前被打开的临时文件在该参数指定的时间(秒)内,没有任何数据写入,则将该临时文件关闭并重命名成目标文件
    
    a1.sinks.k1.hdfs.idleTimeout=3
    
    a1.sinks.k1.hdfs.fileType=DataStream
    a1.sinks.k1.hdfs.useLocalTimeStamp=true
    
    ##每五分钟生成一个目录:
    
    ##是否启用时间上的”舍弃”,这里的”舍弃”,类似于”四舍五入”,后面再介绍。如果启用,则会影响除了%t的其他所有时间表达式
    
    a1.sinks.k1.hdfs.round=true
    
    ##时间上进行“舍弃”的值;
    
    a1.sinks.k1.hdfs.roundValue=5
    
    ##时间上进行”舍弃”的单位,包含:second,minute,hour
    
    a1.sinks.k1.hdfs.roundUnit=minute
    
    a1.channels.c1.type = memory
    a1.channels.c1.capacity = 1000
    a1.channels.c1.transactionCapacity = 100
    
    a1.sources.r1.channels = c1
    a1.sinks.k1.channel = c1
    
  • 创建HDFS目录

    hadoop fs -mkdir /flume
    
  • 启动Flume

    flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
    
    
  • 查看hdfs文件

    hadoop fs -ls /flume/*
    

五入”,后面再介绍。如果启用,则会影响除了%t的其他所有时间表达式

a1.sinks.k1.hdfs.round=true

##时间上进行“舍弃”的值;

a1.sinks.k1.hdfs.roundValue=5

##时间上进行”舍弃”的单位,包含:second,minute,hour

a1.sinks.k1.hdfs.roundUnit=minute

a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1




- 创建HDFS目录

```apl
hadoop fs -mkdir /flume
  • 启动Flume

    flume-ng agent -n a1 -c options/ -f example.conf -Dflume.root.logger=INFO,console
    
    
  • 查看hdfs文件

    hadoop fs -ls /flume/*
    
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值