第4.1.2章 flume的拓扑结构

1 多个应用通过连接同一个Flume Agent
参考logback与flume集成章节的描述,很容想到各个系统使用同一个logback和flume的配置文件,轻松的实现了将日志数据写入HDFS中的效果。
1
这种设计的问题也很凸显,单点故障。
2 Flume Agent集群
很容易就想到了,使用haproxy做代理
2
haproxy配置也较为简单

frontend flume_hdfs_front
		bind *:9095
		mode tcp
    log global
    option tcplog
    timeout client 3600s
    backlog 4096
    maxconn 1000000
    default_backend flume_hdfs_back

backend flume_hdfs_back
		mode tcp
		option log-health-checks
   	option redispatch
   	option tcplog
   	balance roundrobin
   	timeout connect 1s
   	timeout queue 5s
   	timeout server 3600s
		balance roundrobin
		server f1 192.168.5.174:44444 check inter 2000 rise 3 fall 3 weight 1 
		server f2 192.168.5.173:44444 check inter 2000 rise 3 fall 3 weight 1

我设置的一个文件大小是10M,当我将其中一台机器174停掉后,我发现每台flume会生成自己的文件,而不是在原来的文件中追加。
3
当我再把174启动,把173停掉时,一条新的记录又产生了。
4
方案2中,Flume Agent直连HDFS的方式,当多个Flume Agent写入HDFS的时候,HDFS的namenode就会产生巨大的压力。
3 Flume根据不同的业务写入不同的数据库
5
这里还是将flume作为hbase的前置节点,探索flume到一定程度,就不用重复造轮子了,前人走过的路,跟着摔跟头是不明智的,flume高并发优化——(2)精简结构,以及 flume高并发优化——(4)kafka channel
我们公司的flume拓扑结构还需要精简
4 使用kafka channel,写入hbase
1
这种方式写多个sink对消费同一个channel,source传递来的数据只会在某一个sink中执行,执行成功后另外的sink不会重复执行.

# read from kafka and write to hbase
dzm-agent.sources = dzm-source
dzm-agent.channels = dzm-channel dzm-channel-detail
dzm-agent.sinks = dzm-sink dzm-sink-detail

# source
dzm-agent.sources.dzm-source.type=avro
dzm-agent.sources.dzm-source.bind=0.0.0.0
dzm-agent.sources.dzm-source.port=44443
#dzm-agent.sources.dzm-source.selector.type = replicating

# channel
dzm-agent.channels.dzm-channel.type = org.apache.flume.channel.kafka.KafkaChannel
dzm-agent.channels.dzm-channel.kafka.bootstrap.servers = ceshi185:19092,ceshi186:19092,ceshi185:19092
dzm-agent.channels.dzm-channel.kafka.topic = flume_dzm_channel
dzm-agent.channels.dzm-channel.kafka.consumer.group.id = flume_dzm_channel

# sink
dzm-agent.sinks.dzm-sink.type = asynchbase
dzm-agent.sinks.dzm-sink.table = t_invoice_ticket
dzm-agent.sinks.dzm-sink.columnFamily = i
dzm-agent.sinks.dzm-sink.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceHbaseSerializer

# detail-channel
dzm-agent.channels.dzm-channel-detail.type = org.apache.flume.channel.kafka.KafkaChannel
dzm-agent.channels.dzm-channel-detail.kafka.bootstrap.servers = ceshi185:19092,ceshi186:19092,ceshi185:19092
dzm-agent.channels.dzm-channel-detail.kafka.topic = flume_dzm_detail_channel
dzm-agent.channels.dzm-channel-detail.kafka.consumer.group.id = flume_dzm_detail_channel

# detail-sink
dzm-agent.sinks.dzm-sink-detail.type = asynchbase
dzm-agent.sinks.dzm-sink-detail.table = t_invoice_detail_ticket
dzm-agent.sinks.dzm-sink-detail.columnFamily = i
dzm-agent.sinks.dzm-sink-detail.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceDetailHbaseSerializer

# assemble
dzm-agent.sources.dzm-source.channels = dzm-channel dzm-channel-detail
dzm-agent.sinks.dzm-sink.channel = dzm-channel
dzm-agent.sinks.dzm-sink-detail.channel = dzm-channel-detail

5 sink组
sink组中所有sink是不会同时被激活,任何时候只有它们中的一个用来发送数据,因此sink组不应该用来更快的清除channel。

# read from kafka and write to hbase
dzm-agent.sources = dzm-source
dzm-agent.channels = dzm-channel dzm-channel-detail
dzm-agent.sinks = dzm-sink1 dzm-sink2 dzm-sink3 dzm-sink-detail1 dzm-sink-detail2 dzm-sink-detail3
dzm-agent.sinkGroups = dzm-sg dzm-sg-detail

# source
dzm-agent.sources.dzm-source.type=avro
dzm-agent.sources.dzm-source.bind=0.0.0.0
dzm-agent.sources.dzm-source.port=44443
#dzm-agent.sources.dzm-source.selector.type = replicating

# channel
dzm-agent.channels.dzm-channel.type = org.apache.flume.channel.kafka.KafkaChannel
dzm-agent.channels.dzm-channel.kafka.bootstrap.servers = ceshi185:19092,ceshi186:19092,ceshi185:19092
dzm-agent.channels.dzm-channel.kafka.topic = flume_dzm_channel
dzm-agent.channels.dzm-channel.kafka.consumer.group.id = flume_dzm_channel

# sink group
dzm-agent.sinkGroups = dzm-sg
dzm-agent.sinkGroups.dzm-sg.sinks = dzm-sink1 dzm-sink2 dzm-sink3
dzm-agent.sinkGroups.dzm-sg.processor.type = load_balance
dzm-agent.sinkGroups.dzm-sg.processor.backoff = load_balance

# sink
dzm-agent.sinks.dzm-sink1.type = asynchbase
dzm-agent.sinks.dzm-sink1.table = t_invoice_ticket
dzm-agent.sinks.dzm-sink1.columnFamily = i
dzm-agent.sinks.dzm-sink1.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceHbaseSerializer

dzm-agent.sinks.dzm-sink2.type = asynchbase
dzm-agent.sinks.dzm-sink2.table = t_invoice_ticket
dzm-agent.sinks.dzm-sink2.columnFamily = i
dzm-agent.sinks.dzm-sink2.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceHbaseSerializer

dzm-agent.sinks.dzm-sink3.type = asynchbase
dzm-agent.sinks.dzm-sink3.table = t_invoice_ticket
dzm-agent.sinks.dzm-sink3.columnFamily = i
dzm-agent.sinks.dzm-sink3.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceHbaseSerializer

# detail-channel
dzm-agent.channels.dzm-channel-detail.type = org.apache.flume.channel.kafka.KafkaChannel
dzm-agent.channels.dzm-channel-detail.kafka.bootstrap.servers = ceshi185:19092,ceshi186:19092,ceshi185:19092
dzm-agent.channels.dzm-channel-detail.kafka.topic = flume_dzm_detail_channel
dzm-agent.channels.dzm-channel-detail.kafka.consumer.group.id = flume_dzm_detail_channel

# detail-sink group
dzm-agent.sinkGroups = dzm-detail-sg
dzm-agent.sinkGroups.dzm-detail-sg.sinks = dzm-sink-detail1 dzm-sink-detail2 dzm-sink-detail3
dzm-agent.sinkGroups.dzm-detail-sg.processor.type = load_balance
dzm-agent.sinkGroups.dzm-detail-sg.processor.backoff = load_balance

# detail-sink
dzm-agent.sinks.dzm-sink-detail1.type = asynchbase
dzm-agent.sinks.dzm-sink-detail1.table = t_invoice_detail_ticket
dzm-agent.sinks.dzm-sink-detail1.columnFamily = i
dzm-agent.sinks.dzm-sink-detail1.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceDetailHbaseSerializer

dzm-agent.sinks.dzm-sink-detail2.type = asynchbase
dzm-agent.sinks.dzm-sink-detail2.table = t_invoice_detail_ticket
dzm-agent.sinks.dzm-sink-detail2.columnFamily = i
dzm-agent.sinks.dzm-sink-detail2.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceDetailHbaseSerializer

dzm-agent.sinks.dzm-sink-detail3.type = asynchbase
dzm-agent.sinks.dzm-sink-detail3.table = t_invoice_detail_ticket
dzm-agent.sinks.dzm-sink-detail3.columnFamily = i
dzm-agent.sinks.dzm-sink-detail3.serializer = com.bwjf.flume.invoice.dzm.sink.InvoiceDetailHbaseSerializer

# assemble
dzm-agent.sources.dzm-source.channels = dzm-channel dzm-channel-detail
dzm-agent.sinks.dzm-sink1.channel = dzm-channel
dzm-agent.sinks.dzm-sink2.channel = dzm-channel
dzm-agent.sinks.dzm-sink3.channel = dzm-channel
dzm-agent.sinks.dzm-sink-detail1.channel = dzm-channel-detail
dzm-agent.sinks.dzm-sink-detail2.channel = dzm-channel-detail
dzm-agent.sinks.dzm-sink-detail3.channel = dzm-channel-detail

参考资料:

  1. 《Flume构建高可用、可扩展的海量日志采集系统》,Hari ShreedHaran著,马延辉,史东杰译。
  2. flume高并发优化——(3)haproxy
  3. flume高并发优化——(9)配置文件交由zookeeper管理
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

warrah

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值