flume使用详解

最新推荐文章于 2024-07-23 08:03:31 发布

小江_xiaojiang

最新推荐文章于 2024-07-23 08:03:31 发布

阅读量6.4k

点赞数 1

分类专栏： Flume

本文链接：https://blog.csdn.net/jiangsanfeng1111/article/details/53337887

版权

Flume 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

引言

Flume是一个分布式、可靠、和高可用的海量日志聚合的系统，支持在系统中定制各类数据发送方，用于收集数据；同时，Flume提供对数据进行简单处理，并写到各种数据接受方（可定制）的能力。

Flume环境搭建

安装：cdh版本：flume-ng-1.5.0-cdh5.3.6.tar.gz

tar -zxvf flume-ng-1.5.0-cdh5.3.6.tar.gz -C /opt/cdh-5.3.6

配置

flume-env.sh
export JAVA_HOME=/opt/modules/jdk1.7.0_67

第一个agent应用编写实时读取数据（详见官网）

在conf目录中

cp flume-conf.properties.template a1.conf
vi a1.conf

内容如下：

### define agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

### define sources 
a1.sources.r1.type = netcat
a1.sources.r1.bind = hadoop-senior.ibeifeng.com
a1.sources.r1.port = 44444

### define channels
a1.channels.c1.type = memory
al.channels.c1.capacity = 1000
al.channels.c1.transactionCapacity = 1000

### define sink
al.sinks.k1.type = logger

#### bind the source and sink to the channel
a1.sources.r1.channels = c1
al.sinks.k1.channel = c1

启动

$ bin/flume-ng agent 
--conf conf \
--conf-file a1.conf \
--name a1 \
-Dflume.root.logger=INFO,console

另开一个终端执行如下命令如果telnet命令不能执行则需要安装

$ telnet localhost 44444
Trying 127.0.0.1...
Connected to localhost.localdomain (127.0.0.1).
Escape character is '^]'.
Hello world! <ENTER>
OK

在终端可以看到如下信息：

12/06/19 15:32:19 INFO source.NetcatSource: Source starting
12/06/19 15:32:19 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:44444]
12/06/19 15:32:34 INFO sink.LoggerSink: Event: { headers:{} body: 48 65 6C 6C 6F 20 77 6F 72 6C 64 21 0D          Hello world!. }

Flume第二个Agent应用讲解（实时监控读取日志数据，存储hdfs文件系统）

* 收集log
   hive运行的日志 /opt/cdh-5.3.6/hive-0.13.1-cdh6.3.6/logs/hive.log
   使用这个命令：tail -f
* memory
   内存管道
* hdfs
   存储位置
   /user/beifeng/flume/hive-logs/

vi flume-tail.conf

### define agent
a2.sources = r2
a2.sinks = k2
a2.channels = c2

### define sources 
a2.sources.r2.type = exec
a2.sources.r2.commad = tail -f /opt/cdh-5.3.6/hive-0.12.1-cdh6.3.6/logs/hive.log
a2.sources.r2.shell = /bin/bash -c

### define channels
a2.channels.c2.type = memory
a2.channels.c2.capacity = 1000
a2.channels.c2.transactionCapacity = 1000

### define sink
a2.sinks.k2.type = hdfs
a2.sinks.k2.hdfs.path = hdfs://hadoop-senior.ibeifeng.com:8020/user/beifeng/flume/hive-logs/
### bin/hdfs dfs -mkdir -p /user/beifeng/flume/hive-logs/ (先去hadoop上创建好地址)
a2.sinks.k2.hdfs.fileType = DataStream
a2.sinks.k2.hdfs.writeFormat = Text
a2.sinks.k2.hdfs.batchSize = 10

#### bind the source and sink to the channel
a2.sources.r2.channels = c2
a2.sinks.k2.channel = c2

启动

$ bin/flume-ng agent 
--conf conf \
--conf-file flume-tail.conf \
--name a2 \
-Dflume.root.logger=INFO,console

直接运行不成功，还有包没有导入，将hadoop包放入flume中
将commons-configuration-1.6.jar
hadoop-hdfs-2.5.0-cdh5.3.6.jar
hadoop-common-2.5.0-cdh.5.3.6.jar
hadoop-auth-2.5.0.cdh5.3.6.jar
放入flume的lib目录下,现在可以运行了

结果：

接下来执行hive语句，这样hive就会产生日志，而flume会实时的传输日志信息
打开：http://hadoop-senior.ibeifeng.com:50070 下面/user/beifeng/flume/hive-logs下会有文件增加

Flume实在案例讲解（监控日志目录日志数据，实时抽取之hdfs系统上）

Spooling Direcotory Source
1、在使用exec来监听数据源虽然实时性较高，但是可靠性较差，当source程序运行异常或者Linux命令中断都会
造成数据丢失，在恢复正常运行之前数据的完整性无法得到保障
2、Spool Direcotory Paths 通过监听某个目录下新增文件，并将文件的内容读取出来，实现日志信息的收集。
实际生产中会结合log4j来使用。被传输结束的文件会修改后缀名，添加completed后缀（可修改）

实例：

监控某个日志文件的目录
/app/logs/2014-12-20
....
/app/logs/2016-11-12
   zz.log   ->   不收集变化的日志文件
   xx.log.comp   ->   20M
   yy.log.comp   ->   20M

vi flume-app.conf

### define agent
a3.sources = r3
a3.sinks = k3
a3.channels = c3

### define sources 
a3.sources.r3.type = spooldir
a3.sources.r3.spoolDir = /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/spoollogs
a3.sources.r3.ignorePattern = ^(.)*\\.log$
a3.sources.r3.fileSuffix = .delete

### define channels
a3.channels.c3.type = file
a3.channels.c3.checkpointDir = /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/filechannel/checkpoint
a3.channels.c3.dataDirs = /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/filechannel/data

### define sink
a3.sinks.k3.type = hdfs
a3.sinks.k3.hdfs.path = hdfs://hadoop.ibeifeng.com:8020/user/beifeng/flume/splogs/
### bin/hdfs dfs -mkdir -p /user/beifeng/flume/hive-logs/ (先去hadoop上创建好地址)
a3.sinks.k3.hdfs.fileType = DataStream
a3.sinks.k3.hdfs.writeFormat = Text
a3.sinks.k3.hdfs.batchSize = 10

#### bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

启动：

$ bin/flume-ng agent 
--conf conf \
--conf-file flume-app.conf \
--name a3 \
-Dflume.root.logger=INFO,console

结果：

拷贝一些文件数据到spoollogs目录下，以.log结尾的不会被抽取到hdfs中
其他文件都会被抽取，并且文件后缀添加了.delete

自动添加创建时间配置

a3.sinks.k3.hdfs.useLocalTimeStamp = true

vi flume-app.conf
### define agent
a3.sources = r3
a3.sinks = k3
a3.channels = c3

### define sources 
a3.sources.r3.type = spooldir
a3.sources.r3.spoolDir = /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/spoollogs
a3.sources.r3.ignorePattern = ^(.)*\\.log$
a3.sources.r3.fileSuffix = .delete

### define channels
a3.channels.c3.type = file
a3.channels.c3.checkpointDir = /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/filechannel/checkpoint
a3.channels.c3.dataDirs = /opt/cdh-5.3.6/flume-1.5.0-cdh5.3.6/filechannel/data

### define sink
a3.sinks.k3.type = hdfs
### a3.sinks.k3.hdfs.path = hdfs://ns1/user/beifeng/flume/splogs/%Y%m%d
a3.sinks.k3.hdfs.path = hdfs://hadoop.ibeifeng.com:8020/user/beifeng/flume/splogs/
### bin/hdfs dfs -mkdir -p /user/beifeng/flume/hive-logs/ (先去hadoop上创建好地址)
a3.sinks.k3.hdfs.fileType = DataStream
a3.sinks.k3.hdfs.writeFormat = Text
a3.sinks.k3.hdfs.batchSize = 10
a3.sinks.k3.hdfs.useLocalTimeStamp = true

#### bind the source and sink to the channel
a3.sources.r3.channels = c3
a3.sinks.k3.channel = c3

启动：

$ bin/flume-ng agent 
--conf conf \
--conf-file flume-app.conf \
--name a3 \
-Dflume.root.logger=INFO,console

小江_xiaojiang

关注

1
点赞
踩
8

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录