Flume常用拓扑结构

最新推荐文章于 2021-11-22 13:36:23 发布

freesOcean

最新推荐文章于 2021-11-22 13:36:23 发布

阅读量552

点赞数 1

分类专栏：大数据文章标签：大数据

本文链接：https://blog.csdn.net/gexiaoyizhimei/article/details/108869582

版权

大数据专栏收录该内容

20 篇文章 2 订阅

订阅专栏

文章目录

Flume 支持下面几种方式读取日志流数据：

Avro
Thrift
Syslog
Netcat

Avro source和Avro sink 对Flume灵活的拓扑结构至关重要。

Avro source ：可以接受Avro Client 发送的事件，或者Avro sink发送的事件。

Avro Source被设计为高扩展的RPC服务器端，能从其他的Flume Agent的Avro Sink或者使用Flume的SDK发送数据的客户端应用，接受数据到一个Flume Agent中。

Avro sink ：从channel获取的事件，在Avro sink中会封装成 Avro Event ，然后发送到绑定的ip和端口

a1.channels = c1
a1.sinks = k1
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = 10.10.10.10
a1.sinks.k1.port = 4545

多个Flume串联

如果需要数据流经多个Flume，需要采用Avro连接。

在这里插入图片描述

案例：

监控机器master 上的一个日志变化，将事件通过Avro sink 写入到 slaver01机器的 Avro Source ，然后输出到logger sink 输出到控制台。

机器A (作为客户端)：

配置文件

agent11.sources = r1
agent11.sinks = k1
agent11.channels = c1
# Describe/configure the source
agent11.sources.r1.type = TAILDIR
agent11.sources.r1.positionFile = /opt/flume/tail_dir_connection.json
agent11.sources.r1.filegroups = f1
agent11.sources.r1.filegroups.f1 = /opt/flume/files/zhangxu.*
# Describe the sink
agent11.sinks.k1.type = avro
agent11.sinks.k1.hostname = slaver01
agent11.sinks.k1.port = 40444
# Use a channel which buffers events in memory
agent11.channels.c1.type = memory
agent11.channels.c1.capacity = 1000
agent11.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent11.sources.r1.channels = c1
agent11.sinks.k1.channel = c1

启动：

./bin/flume-ng agent -c conf/ -n agent11 -f ./job/conf/flume-avro-connection.conf -Dflume.root.logger=INFO,console

机器B （作为服务端）

agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
# Describe/configure the source
agent1.sources.r1.type = avro
agent1.sources.r1.bind = slaver01
agent1.sources.r1.port = 40444

# Describe the sink
agent1.sinks.k1.type = logger
# Use a channel which buffers events in memory
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

启动：

/opt/flume/bin/flume-ng agent -c /opt/flume/conf/ -n agent1 -f flume-avro-connection.conf -Dflume.root.logger=INFO,console

多个Flume聚合

一个典型的应用场景：多个客户端产生日志，通过Flume聚合到一个存储系统上的Flume做统一的收集处理。第一层的Flume 需要配置成Avro sink ，然后都指向存储系统上的Flume 的Avro Source。

在这里插入图片描述

案例：

三台机器 master 、slaver01、slaver01 。slaver01和slaver02通过Taildir source和exec source 收集，然后通过avro sink 聚合到master。然后用file_roll sink输出到文件：

master:

配置文件

agent11.sources = r1
agent11.sinks = k1
agent11.channels = c1
# Describe/configure the source
agent11.sources.r1.type = avro
agent11.sources.r1.bind = master
agent11.sources.r1.port = 40444
# Describe the sink
agent11.sinks.k1.type = logger
# Use a channel which buffers events in memory
agent11.channels.c1.type = memory
agent11.channels.c1.capacity = 1000
agent11.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent11.sources.r1.channels = c1
agent11.sinks.k1.channel = c1

启动：

./bin/flume-ng agent -c conf/ -n agent11 -f job/conf/flume-avro-connection.conf -Dflume.root.logger=INFO,console

slaver01

agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
# Describe/configure the source
agent1.sources.r1.type = TAILDIR
agent1.sources.r1.positionFile = /opt/flume/reduce_tail_reduce_dir.json
agent1.sources.r1.filegroups = f1
agent1.sources.r1.filegroups.f1 = /opt/flume/files/xiaomao.*
# Describe the sink
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = master
agent1.sinks.k1.port = 40444
# Use a channel which buffers events in memory
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

启动：

./bin/flume-ng agent -c conf/ -n agent1 -f job/conf/flume-reduce-tail.conf -Dflume.root.logger=INFO,console

slaver02

agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
# Describe/configure the source
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -f /opt/flume/files/xiaomao.txt
agent1.sources.r1.shell = /bin/bash -c
# Describe the sink
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = master
agent1.sinks.k1.port = 40444
# Use a channel which buffers events in memory
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

启动：

./bin/flume-ng agent -c conf/ -n agent1 -f job/conf/flume-reduce-exec.conf -Dflume.root.logger=INFO,console

多路复用和负载均衡（故障转移）需要先了解Flume内部原理：

复制和多路复用

一个Source可以发给多个Channel，多个sink 绑定不同的Channel，可以做不同的处理。结构如下：

在这里插入图片描述

Flume Channel Selectors 属性selector.type如果不配置，默认是 replicating

一个简单示例：

a1.sources = r1
a1.channels = c1 c2 c3
a1.sources.r1.selector.type = replicating
a1.sources.r1.channels = c1 c2 c3
a1.sources.r1.selector.optional = c3

事件会同时发送给c1,c2,c3 但是c3是可选的，意味着，如果给c3发送失败，不会回滚put事务，而c1,c2如果接受失败，会触发事务回滚。

示例：

监控日志，然后Flume1通过exec source接收后，写入到两个Channel ，两个Channel 分别通过Avro Sink 传入另外两个Flume，并分别写入HDFS和本地文件系统。

这里只给出Flume1的配置文件。

配置文件如下：

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 将数据流复制给所有 channel 默认即该选项。
a1.sources.r1.selector.type = replicating
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive/logs/hive.log
a1.sources.r1.shell = /bin/bash -c
# Describe the sink
# sink 端的 avro 是一个数据发送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

负载均衡和故障转移

Agent1的事件通过Avro 发送给Agent 2 、Agent3、Agent4 可以通过设置 Flume Sink Processors 属性，可以控制负载均衡和故障转移。原理就是通过设置多个sink,组成一个sink group 然后就可以实现负载均衡或者故障转移特性。

在这里插入图片描述

默认Sink Processors只会接受一个 sink 。needs to be default, failover or load_balance

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance

示例：

实现一个故障转移案例。只给出Flume1的配置文件。

# Name the components on this agent
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1

freesOcean

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flume常用拓扑结构

文章目录多个Flume串联多个Flume聚合复制和多路复用负载均衡和故障转移Flume 支持下面几种方式读取日志流数据：AvroThriftSyslogNetcatAvro source和Avro sink 对Flume灵活的拓扑结构至关重要。Avro source ：可以接受Avro Client 发送的事件，或者Avro sink发送的事件。Avro Source被设计为高扩展的RPC服务器端，能从其他的Flume Agent的Avro Sink或者使用Flume的SDK发
复制链接

扫一扫