Flume基础入门

Flume
  Cloudera开发的框架,Flume是一个非常优秀日志采集组件,类似于logstash,我们通常将Flume作为agent部署在application server上,用于收集本地的日志文件。
  实时收集数据
  
  apache/ngnix
  
  Kafka/Flume + storm/spark  spark streaming
  
  Web Server -> Source -> Channel -> Sink -> HDFS
                -----------Agent----------
  Source  采集数据,产生数据流的地方,同时将数据流传输到Channel。
  Channel 连接Source和Sink,像一个队列。
  Sink    从Channel收集数据,将数据写到目标源,可以是下一个Source或HDFS或HBase。

  • source:源文件、源数据端,指定Flume从何处采集数据(流)。Flume支持多种source,比如“Avro source”(类似RPC模式,接收远端Avro客户端发送的数据Entity)、“Thrift Source”(Thrift客户端发送的数据)、“Exec Source”(linux指令返回的数据条目)、“Kafka Source”、“Syslog Source”、“Http Source”等等。

    我们本文主要涉及到Spooling和Taildir两种,Taildir是1.7新增的特性,在此之前,如果想实现tail特性,需要使用“Exec Source”来模拟,或者自己开发代码。

  • channel:通道,简单而言就是数据流的缓冲池,多个source的数据可以发送给一个channel,在channel内部可以对数据进行cache、溢出暂存、流量整形等。目前Flume支持“Memory Channel”(数据保存在有限空间的内存中)、“JDBC Channel”(数据暂存在数据库中,保障恢复)、“Kafka Channel”(暂存在kafka中)、“File Channel”(暂存在本地文件中);除Memory之外,其他的channel都支持持久化,可以在故障恢复、sink离线或者无sink等场景下提供有效的担保机制,避免消息丢失和流量抗击。

  • sink:流输出端,每个channel都可以对应一个sink,每个sink可以指定一种类型的存储方式,目前Flume支持的sink类型比较常用的有“HDFS Sink”(将数据保存在hdfs中)、“Hive Sink”、“Logger Sink”(特殊场景,将数据以INFO级别输出到控制台,通常用于测试)、“Avro Sink”、“Thrift Sink”、“File Roll Sink”(转存到本地文件系统中)等等。

  运行在有logs的地方
  系统:Linux
  JVM/JDK
  轻量级(eg.zookeeper,)
 
安装
  vi flume-env.sh
  export JAVA_HOME=/opt/jdk1.8.0_171
  

#主要命令参数
[root@hadoop-senior01 flume-1.5.0-cdh5.3.6]# bin/flume-ng 

Usage: bin/flume-ng <command> [options]...

commands:
  agent                     run a Flume agent

global options:
  --conf,-c <conf>          use configs in <conf> directory
  -Dproperty=value          sets a Java system property value

agent options:
  --name,-n <name>          the name of this agent (required)
  --conf-file,-f <file>     specify a config file (required if -z missing)

  
-----------------------------------------------------------------------------------
案例1
bin/flume-ng agent --conf conf --name agent-test --conf-file test.conf

bin/flume-ng agent -c conf -n agent-test -f test.conf

#编写conf,vi a1.conf
# The configuration file needs to define the sources,
# the channels and the sinks.

###define agent
a1.sources = r1
a1.channels = c1
a1.sinks = k1

###define sources
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop-senior01.zhangbk.com
a1.sources.r1.port = 44444

###define channels
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

###define sink
a1.sinks.k1.type = logger

###bind the sources and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
#命令执行:
bin/flume-ng agent \
-c conf \
-n a1 \
-f conf/a1.conf \
-Dflume.root.logger=DEBUG,console


telnet ip port ,输入数据-----------------------------------------------------------------------------------
案例2 收集log,hive运行的日志
  /opt/hive-0.13.1-cdh5.3.6/logs/hive.log
  memory channel
  
  hdfs
    /user/flume/hive-logs/
  agent程序
  

# The configuration file needs to define the sources,
# the channels and the sinks.

###define agent
a2.sources = r2
a2.channels = c2
a2.sinks = k2

###define sources
a2.sources.r2.type = exec
a2.sources.r2.command = tail -f /opt/hive-0.13.1-cdh5.3.6/logs/hive.log
a2.sources.r2.shell = /bin/bash -c

###define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 100

###define sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.fileType = DataStream 
a2.sinks.k2.hdfs.path = hdfs://hadoop-senior01.zhangbk.com:8020/user/flume/hive-logs/
#a2.sinks.k2.hdfs.path = hdfs://ns1/user/flume/hive-logs/
a2.sinks.k2.hdfs.writeFormat = Text
a2.sinks.k2.hdfs.batchSize = 10

###bind the sources and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

命令执行:
bin/flume-ng agent \
-c conf \
-n a2 \
-f conf/flume-tail.conf \
-Dflume.root.logger=DEBUG,console注意需要hadoop中的jar包
/opt/hadoop-2.5.0-cdh5.3.6/share/hadoop/hdfs/hadoop-hdfs-2.5.0-cdh5.3.6.jar
/opt/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/hadoop-common-2.5.0-cdh5.3.6.jar
/opt/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/hadoop-auth-2.5.0-cdh5.3.6.jar
/opt/hadoop-2.5.0-cdh5.3.6/share/hadoop/common/lib/commons-configuration-1.6.jar
--------------------------------------------------------------------------------------
  
监控某个日志文件的目录
  log4j设置文件大小
  FileChannel
  

# The configuration file needs to define the sources,
# the channels and the sinks.

###define agent
a3.sources = r3
a3.channels = c3
a3.sinks = k3

###define sources
a3.sources.r3.type = spooldir
a3.sources.r3.spoolDir = /opt/flume-1.5.0-cdh5.3.6/spoollogs
a3.sources.r3.ignorePattern = ^(.)*\\.log$
a3.sources.r3.fileSuffix = .delete

###define channels
a3.channels.c3.type = file
a3.channels.c3.checkpointDir = /opt/flume-1.5.0-cdh5.3.6/filechannel/checkpoint
a3.channels.c3.dataDirs = /opt/flume-1.5.0-cdh5.3.6/filechannel/data

###define sink
a3.sinks.k3.type = hdfs
a3.sinks.k3.hdfs.fileType = DataStream
a3.sinks.k3.hdfs.path = hdfs://hadoop-senior01.zhangbk.com:8020/user/flume/splogs/%Y%m%d
a3.sinks.k3.hdfs.useLocalTimeStamp = true
#a3.sinks.k3.hdfs.path = hdfs://ns1/user/flume/hive-logs/
a3.sinks.k3.hdfs.writeFormat = Text
a3.sinks.k3.hdfs.batchSize = 10

###bind the sources and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

命令执行:
bin/flume-ng agent \
-c conf \
-n a3 \
-f conf/flume-app.conf \
-Dflume.root.logger=DEBUG,console
-------------------------------------------------------------------------------------------------

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值