生产环境下Flume配置

五台集群三台Flume的配置方案

鉴于网上关于flume的知识理论太多太杂,缺少生产环境下的配置方案,所以特发此篇我们集群的配置方案供大家参考。

1.基本信息

flume采用三节点的集群模式
flume资源配置:export JAVA_OPTS="-Xms8192m -Xmx8192m -Dcom.sun.management.jmxremote"

2.flume必备的测试conf

##给三个组件取个名字
a1.sources = r1
a1.sinks = k1
a1.channels = c1

##测试source选择
##配置netcat监听本地9999端口的数据(推荐测试贼好用)
##netcat的安装命令yum -y install nc
##客户端本地连接写入数据:nc localhost 9999 
a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 9999

##从kafka获取数据
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 2000
a1.sources.r1.batchDurationMillis = 5000
a1.sources.r1.kafka.bootstrap.servers = *:9092,*:9092,*:9092
a1.sources.r1.kafka.topics = nats_robotindex
a1.sources.r1.kafka.consumer.group.id = robotindex_interceptor

##这是我测试常用的拦截器方案,看需求选择
##自定义拦截器
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = com.yogo.flume.TimeStampInterceptor$Builder

##正则拦截器
a1.sources.r1.interceptors = regex 
a1.sources.r1.interceptors.regex.type=REGEX_FILTER 
##这样配置的拦截器就只会接收消息中不带有rm或者kill的Event
a1.sources.r1.interceptors.regex.regex=(rm)|(kill) 
a1.sources.r1.interceptors.regex.excludeEvents=true

##channels二选一
##memory内存
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 100000

##file文件
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /cluster/vdb/flume/checkpoint/robot_index
a1.channels.c1.dataDirs = /cluster/vdb/flume/data/robot_index/
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 1000000
a1.channels.c1.keep-alive = 15

##输出方式
##输出到控制台(推荐 可以看到event的header信息)
a1.sinks.k1.type = logger

##输出到kafka
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = *:9092,*:9092,*:9092
a1.sinks.k1.kafka.topic = first
a1.sinks.k1.kafka.producer.acks = 1

##输出到hdfs
a1.sinks.k1.hdfs.filePrefix = 105-
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/robot/db/t_robot_index/%Y-%m-%d
a1.sinks.k1.hdfs.round = false
a1.sinks.k1.hdfs.rollInterval = 3600
##rollSize根据需求更改,我们用了lzo压缩,要求落盘的文件块大
a1.sinks.k1.hdfs.rollSize = 393216000
a1.sinks.k1.hdfs.batchSize = 5000
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true

## 控制输出文件是原生文件 
##我们集群用的lzo压缩,单纯测试可以不带
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = lzop

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.flume读取生产kafka配置

## 组件
a1.channels = c1
a1.sinks = k1

## channel1  使用kafkachannel省去source
## 我们生产的kafka日产2亿条日志数据,使用以下配置未出现消息堆积情况
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.parseAsFlumeEvent = false
a1.channels.c1.kafka.bootstrap.servers = *:9092,*:9092,*:9092
a1.channels.c1.kafka.topic = topic
a1.channels.c1.kafka.consumer.group.id = topic_group
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 1000000
a1.channels.c1.keep-alive = 15

## sink1
a1.sinks.k1.hdfs.filePrefix = 105-
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/robot/db/robot_behavior/%Y-%m-%d
a1.sinks.k1.hdfs.round = false

##hdfs落盘
a1.sinks.k1.hdfs.rollInterval = 3600
a1.sinks.k1.hdfs.rollSize = 393216000
a1.sinks.k1.hdfs.batchSize = 5000
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true

## 控制输出文件是原生文件
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = lzop

## 拼装
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

4.flume针对生产业务的配置(数据不能丢)

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 20000
a1.sources.r1.batchDurationMillis = 1000
a1.sources.r1.kafka.bootstrap.servers = *:9092,*:9092,*:9092
a1.sources.r1.kafka.topics = topic
a1.sources.r1.kafka.consumer.group.id = topic_group


a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /cluster/vdb/flume/checkpoint/robot
a1.channels.c1.dataDirs = /cluster/vdb/flume/data/robot/
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 20000
a1.channels.c1.checkpointInterval = 60000
a1.channels.c1.keep-alive = 15


##hdfs
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/robot/db/roboterror_behavior/%Y-%m-%d
a1.sinks.k1.hdfs.round = false

##hdfs落盘
a1.sinks.k1.hdfs.rollInterval = 3600
a1.sinks.k1.hdfs.rollSize = 393216000
a1.sinks.k1.hdfs.batchSize = 5000
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.minBlockReplicas = 1
a1.sinks.k1.hdfs.useLocalTimeStamp = true

## 控制输出文件是原生文件
a1.sinks.k1.hdfs.fileType = CompressedStream
a1.sinks.k1.hdfs.codeC = lzop

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

以上配置可以直接拿来测试和使用,如果还有想了解生产环境下flume配置信息的可以留言。

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值