Flume常用拓扑结构


Flume 支持下面几种方式读取日志流数据:

  1. Avro
  2. Thrift
  3. Syslog
  4. Netcat

Avro source和Avro sink 对Flume灵活的拓扑结构至关重要。

  • Avro source : 可以接受Avro Client 发送的 事件,或者Avro sink发送的事件。

    Avro Source被设计为高扩展的RPC服务器端,能从其他的Flume Agent的Avro Sink或者使用Flume的SDK发送数据的客户端应用,接受数据到一个Flume Agent中。

  • Avro sink :从channel获取的事件,在Avro sink中会封装成 Avro Event ,然后发送到绑定的ip和端口

    a1.channels = c1
    a1.sinks = k1
    a1.sinks.k1.type = avro
    a1.sinks.k1.channel = c1
    a1.sinks.k1.hostname = 10.10.10.10
    a1.sinks.k1.port = 4545
    
多个Flume串联

如果需要数据流经多个Flume,需要采用Avro连接。

在这里插入图片描述

案例:

监控 机器master 上的一个日志变化,将事件通过Avro sink 写入到 slaver01机器的 Avro Source ,然后输出到logger sink 输出到控制台。

机器A (作为客户端):

配置文件

agent11.sources = r1
agent11.sinks = k1
agent11.channels = c1
# Describe/configure the source
agent11.sources.r1.type = TAILDIR
agent11.sources.r1.positionFile = /opt/flume/tail_dir_connection.json
agent11.sources.r1.filegroups = f1
agent11.sources.r1.filegroups.f1 = /opt/flume/files/zhangxu.*
# Describe the sink
agent11.sinks.k1.type = avro
agent11.sinks.k1.hostname = slaver01
agent11.sinks.k1.port = 40444
# Use a channel which buffers events in memory
agent11.channels.c1.type = memory
agent11.channels.c1.capacity = 1000
agent11.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent11.sources.r1.channels = c1
agent11.sinks.k1.channel = c1

启动:

./bin/flume-ng agent -c conf/ -n agent11 -f ./job/conf/flume-avro-connection.conf -Dflume.root.logger=INFO,console

机器B (作为服务端)

agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
# Describe/configure the source
agent1.sources.r1.type = avro
agent1.sources.r1.bind = slaver01
agent1.sources.r1.port = 40444

# Describe the sink
agent1.sinks.k1.type = logger
# Use a channel which buffers events in memory
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

启动:

/opt/flume/bin/flume-ng agent -c /opt/flume/conf/ -n agent1 -f flume-avro-connection.conf -Dflume.root.logger=INFO,console
多个Flume聚合

一个典型的应用场景:多个客户端产生日志,通过Flume聚合到一个存储系统上的Flume做统一的收集处理。第一层的Flume 需要配置成Avro sink ,然后都指向存储系统上的Flume 的Avro Source。

在这里插入图片描述

案例:

三台机器 master 、slaver01、slaver01 。slaver01和slaver02通过Taildir source和exec source 收集,然后通过avro sink 聚合到master。然后用file_roll sink输出到文件:

  • master:

配置文件

agent11.sources = r1
agent11.sinks = k1
agent11.channels = c1
# Describe/configure the source
agent11.sources.r1.type = avro
agent11.sources.r1.bind = master
agent11.sources.r1.port = 40444
# Describe the sink
agent11.sinks.k1.type = logger
# Use a channel which buffers events in memory
agent11.channels.c1.type = memory
agent11.channels.c1.capacity = 1000
agent11.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent11.sources.r1.channels = c1
agent11.sinks.k1.channel = c1

启动:

./bin/flume-ng agent -c conf/ -n agent11 -f job/conf/flume-avro-connection.conf -Dflume.root.logger=INFO,console
  • slaver01
agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
# Describe/configure the source
agent1.sources.r1.type = TAILDIR
agent1.sources.r1.positionFile = /opt/flume/reduce_tail_reduce_dir.json
agent1.sources.r1.filegroups = f1
agent1.sources.r1.filegroups.f1 = /opt/flume/files/xiaomao.*
# Describe the sink
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = master
agent1.sinks.k1.port = 40444
# Use a channel which buffers events in memory
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

启动:

./bin/flume-ng agent -c conf/ -n agent1 -f job/conf/flume-reduce-tail.conf -Dflume.root.logger=INFO,console
  • slaver02
agent1.sources = r1
agent1.sinks = k1
agent1.channels = c1
# Describe/configure the source
agent1.sources.r1.type = exec
agent1.sources.r1.command = tail -f /opt/flume/files/xiaomao.txt
agent1.sources.r1.shell = /bin/bash -c
# Describe the sink
agent1.sinks.k1.type = avro
agent1.sinks.k1.hostname = master
agent1.sinks.k1.port = 40444
# Use a channel which buffers events in memory
agent1.channels.c1.type = memory
agent1.channels.c1.capacity = 1000
agent1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
agent1.sources.r1.channels = c1
agent1.sinks.k1.channel = c1

启动:

./bin/flume-ng agent -c conf/ -n agent1 -f job/conf/flume-reduce-exec.conf -Dflume.root.logger=INFO,console

多路复用和负载均衡(故障转移)需要先了解Flume内部原理:

复制和多路复用

一个Source可以发给多个Channel,多个sink 绑定不同的Channel,可以做不同的处理。结构如下:

在这里插入图片描述

Flume Channel Selectors 属性selector.type如果不配置,默认是 replicating

一个简单示例:

a1.sources = r1
a1.channels = c1 c2 c3
a1.sources.r1.selector.type = replicating
a1.sources.r1.channels = c1 c2 c3
a1.sources.r1.selector.optional = c3

事件会同时发送给c1,c2,c3 但是c3是可选的,意味着,如果给c3发送失败,不会回滚put事务,而c1,c2如果接受失败,会触发事务回滚。

示例:

监控日志,然后Flume1通过exec source接收后,写入到两个Channel ,两个Channel 分别通过Avro Sink 传入另外两个Flume,并分别写入HDFS和本地文件系统。

这里只给出Flume1的配置文件。

配置文件如下:

# Name the components on this agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1 c2
# 将数据流复制给所有 channel 默认即该选项。
a1.sources.r1.selector.type = replicating
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/module/hive/logs/hive.log
a1.sources.r1.shell = /bin/bash -c
# Describe the sink
# sink 端的 avro 是一个数据发送者
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
a1.channels.c2.type = memory
a1.channels.c2.capacity = 1000
a1.channels.c2.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1 c2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c2

负载均衡和故障转移

Agent1的事件通过Avro 发送给Agent 2 、Agent3、Agent4 可以通过设置 Flume Sink Processors 属性,可以控制负载均衡和故障转移。原理就是通过设置多个sink,组成一个sink group 然后就可以实现负载均衡或者故障转移特性。

在这里插入图片描述

默认Sink Processors只会接受一个 sink 。needs to be default, failover or load_balance

a1.sinkgroups = g1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance

示例:

实现一个故障转移案例。只给出Flume1的配置文件。

# Name the components on this agent
a1.sources = r1
a1.channels = c1
a1.sinkgroups = g1
a1.sinks = k1 k2
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
a1.sinkgroups.g1.processor.type = failover
a1.sinkgroups.g1.processor.priority.k1 = 5
a1.sinkgroups.g1.processor.priority.k2 = 10
a1.sinkgroups.g1.processor.maxpenalty = 10000
# Describe the sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop102
a1.sinks.k1.port = 4141
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop102
a1.sinks.k2.port = 4142
# Describe the channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinkgroups.g1.sinks = k1 k2
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值