Apache Flume

最新推荐文章于 2024-04-26 17:10:13 发布

代码编制世界

最新推荐文章于 2024-04-26 17:10:13 发布

阅读量250

点赞数

分类专栏： flume apache 文章标签： flume

本文链接：https://blog.csdn.net/qq_44962429/article/details/113995089

版权

flume 同时被 2 个专栏收录

1 篇文章 0 订阅

订阅专栏

apache

1 篇文章 0 订阅

订阅专栏

官方文档：http://flume.apache.org

1、概述

Flume是一个分布式、可靠、高可用的高效的日志数据收集、聚合以及传输系统，它简单和灵活的架构是基于数据流的。Flume具备强大的容错保证机制，有多种容错和恢复保证。Flume使用简单可扩展的数据模型允许开发在线分析处理应用。
在这里插入图片描述

Flume Event：事件对象被定义数据流中一个单元，Event数据流的有效载荷（body）为采集到的一条记录，Event Head中可以添加一些可选的KV的描述信息。

Flume Agent：Agent实例是一个JVM进程，它里面包含三个核心组件（Source、Channel、Sink），可以将数据从外部的系统传输到目的地进行有效存储。

Agent Source：Source组件负责数据的收集接受，并且会将收集到的数据封装为==Event(Head[k=v] + Body[一条记录])==事件对象，发送给Channel。

Agent Channel：Channel组件，类似于写缓存，本质上Event队列（符合队列先进先出FIFO）。

Agent Sink：Sink组件，负责Channel中Event最终处理，将采集到的数据保存到指定的外部存储系统中。

2、环境搭建

[root@hadoop ~]# tar -zxf apache-flume-1.7.0-bin.tar.gz -C /usr
[root@hadoop ~]# cd /usr/apache-flume-1.7.0-bin/

启动指令：

bin/flume-ng agent -n $agent_name -c conf -f conf/flume-conf.properties.template

$agent_name：agent的名字。
conf/flume-conf.properties.template：flume文件的path。

3、使用案例

功能：flume读取netcat，数据打印在控制台上

# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动：

[root@hadoop apache-flume-1.7.0-bin]# bin/flume-ng agent --conf conf --conf-file conf/simple.conf --name a1 -Dflume.root.logger=INFO,console

4、常用的Source、Channel、Sink

4.1 Source

①netcat：Netcat常使用于测试环境，启动服务，通过TCP/IP协议客户端发送请求数据，进行采集。

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444

②exec：Exec将Linux的操作指令的执行结果作为数据来源。

a1.sources.r1.type = exec
a1.sources.r1.command = tail -f /usr/apache-flume-1.7.0-bin/access.log

③Spooling Directory：将Linux文件系统中某一个目录中文本文件的内容作为数据来源。

注意：数据目录中数据文件的内容一旦采集完成，数据文件会自动重命名为.COMPLETED，如果监控文件夹下新增文件，flume同样可以采集到，并且命名.COMPLETED

a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /root/data

④kafka：将Kafka消息队列中的数据，作为Source的数据来源。

# 定义消息源类型
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
# 定义kafka所在zk的地址
a1.sources.r1.zookeeperConnect = 192.168.139.156:2181
# 配置消费的kafka topic
a1.sources.r1.topic = kafkasource
# 配置消费者组的id
a1.sources.r1.groupId = flume
# 消费超时时间,参照如下写法可以配置其他所有kafka的consumer选项。注意格式从kafka.xxx开始是consumer的配置属性
a1.sources.r1.kafka.consumer.timeout.ms = 100

4.2 Channel

①Memory：使用内存存储Event事件，使用Memory存放数据可能会造成数据丢失。

a1.channels.c1.type = memory

②JDBC：将Event存储到一个内嵌的数据库Derby中

a1.channels.c1.type = jdbc

③Spillable Memory Channel：内存溢写的Channel，当内存中存放的Event达到阈值会自动溢写到磁盘进行存储。

4.3 Sink

①Logger：将数据最终输出到控制台窗口以INFO级别日志的形式进行展示

a1.sinks.k1.type = logger

②HDFS：将数据保存到HDFS中进行持久化存储

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://hadoop:9000/flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
# 每10分钟产生一个数据目录 不够10分钟的数据 存放到相同的数据目录中
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.fileType = DataStream

出现异常：Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null

解决方案：给Event事件对象添加TimeStamp时间戳信息，拦截器（interceptor）

a1.sources.r1.type = netcat
a1.sources.r1.bind = 192.168.12.129
a1.sources.r1.port = 44444
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp

HDFS Sink 默认采用SequenceFile的文件格式存放采集到的数据，如果需要保存数据的真实内容，需要将fileType修改为DataStream

5、综合使用案例

代码编制世界

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Apache Flume

官方文档：http://flume.apache.org1、概述Flume是一个分布式、可靠、高可用的高效的日志数据收集、聚合以及传输系统，它简单和灵活的架构是基于数据流的。Flume具备强大的容错保证机制，有多种容错和恢复保证。Flume使用简单可扩展的数据模型允许开发在线分析处理应用。Flume Event：事件对象被定义数据流中一个单元，Event数据流的有效载荷（body）为采集到的一条记录，Event Head中可以添加一些可选的KV的描述信息。Flume Agent：Agent实例是一
复制链接

扫一扫