Flume 相关

工作中刚接触flume遇到的问题以及一些总结


监控启动命令

nohup bin/flume-ng agent --conf conf --conf-file conf/storm-log-flume.conf --name a1 -Dflume.monitoring.type=http -Dflume.monitoring.port=1234 > flume.log & 端口随意,不冲突就行


查看命令

相关信息代表意义:
"SINK.k2": {
"ConnectionCreatedCount": "0", //下一个阶段或存储系统创建的连接数量(如HDFS创建一个新文件)
"BatchCompleteCount": "0", //与最大批量尺寸相等的批量的数量
"EventDrainAttemptCount": "0", //sink尝试写出到存储的事件总数量
"BatchEmptyCount": "0", //空的批量的数量,如果数量很大表示souce写数据比sink清理数据慢速度慢很多
"StartTime": "1511140384263",
"BatchUnderflowCount": "0", //比sink配置使用的最大批量尺寸更小的批量的数量,如果该值很高也表示sink比souce更快
"ConnectionFailedCount": "0", //下一阶段或存储系统由于错误关闭的连接数量(如HDFS上一个新创建的文件因为超时而关闭)
"ConnectionClosedCount": "0", //下一阶段或存储系统关闭的连接数量(如在HDFS中关闭一个文件)
"Type": "SINK",
"RollbackCount": "45",
"EventDrainSuccessCount": "4403509000", //sink成功写出到存储的事件总数量
"KafkaEventSendTimer": "3241483501",
"StopTime": "0"
},
"CHANNEL.c2": {
"ChannelCapacity": "1000000", //channel的容量
"ChannelFillPercentage": "0.0468", //channel满时的百分比
"Type": "CHANNEL",
"ChannelSize": "468", //目前channel中事件的总数量
"EventTakeSuccessCount": "4403509000", //sink成功读取的事件的总数量
"EventTakeAttemptCount": "4403554469", //sink尝试从channel拉取事件的总数量。这不意味着每次事件都被返回,因为sink拉取的时候channel可能没有任何数据
"StartTime": "1511140384257", //channel启动时自Epoch以来的毫秒值时间
"EventPutAttemptCount": "4403508486", //Source尝试写入Channe的事件总数量
"EventPutSuccessCount": "4403508486", //成功写入channel且提交的事件总数量
"StopTime": "0" //channel停止时自Epoch以来的毫秒值时间
},
"SOURCE.r2": {
"EventReceivedCount": "4403508788", //目前为止source已经接收到的事件总数量
"AppendBatchAcceptedCount": "0", //接收到事件批次的总数量
"Type": "SOURCE",
"EventAcceptedCount": "4403508486", /成功写出到channel的事件总数量,且source返回success给创建事件的sink或RPC客户端系统
"AppendReceivedCount": "0", //每批只有一个事件的事件总数量(与RPC调用中的一个append调用相等) /
"StartTime": "1511140384258", //source启动时自Epoch以来的毫秒值时间
"OpenConnectionCount": "0", //目前与客户端或sink保持连接的总数量(目前只有avro source展现该度量)
"AppendAcceptedCount": "10", //单独传入的事件到Channel且成功返回的事件总数量
"AppendBatchReceivedCount": "10", //成功提交到channel的批次的总数量
"StopTime": "0" //source停止时自Epoch以来的毫秒值时间
}


配置文件相关:

通过flume采集storm归档日志

日志收集端配置:
a1.sources = r1
a1.sinks = k1 k2 k3
a1.channels = c1
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 1000
#分流配置,会影响性能
#a1.sinkgroups = g1
#a1.sinkgroups.g1.sinks = k1 k2 k3
#a1.sinkgroups.g1.processor.type = load_balance
#a1.sinkgroups.g1.processor.backoff = true
#a1.sinkgroups.g1.processor.selector = round_robin
#sources监控文件变化的配置
a1.sources.r1.type = org.apache.flume.source.taildir.TaildirSource
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /data/storm/logs/.*
a1.sources.r1.channels = c1

# interceptor正则表达式过滤
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = host
a1.sources.r1.interceptors.i2.type = regex_filter
a1.sources.r1.interceptors.i2.regex = 消费过程信息
                                                                                                              
# Describe the sink接收数据的sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = ip //接收消息的flume机器的IP
a1.sinks.k1.port = 4545 //对应接收的端口
                                                                                                              
a1.sinks.k2.type =avro
a1.sinks.k2.channel = c1
a1.sinks.k2.hostname = ip
a1.sinks.k2.port = 4545
                                                                                                              
a1.sinks.k3.type =avro
a1.sinks.k3.channel = c1
a1.sinks.k3.hostname = ip
a1.sinks.k3.port = 4545


日志分析端配置
#配置多个sink可以加快处理速度,但是会加大CPU负载,并不是越多越好
a1.sources = r1
a1.sinks = k1 k2 k3
a1.channels = c1

#a1.sinkgroups = g1
#a1.sinkgroups.g1.sinks = k1 k2 k3
#a1.sinkgroups.g1.processor.type = load_balance
#a1.sinkgroups.g1.processor.backoff = true
#a1.sinkgroups.g1.processor.selector = round_robin
                                                                                                                                                                                                                                   
# Use a channel which buffers events in memory
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /data/iot/flume/flow_event/checkpoint
a1.channels.c1.useDualCheckpoints = true
a1.channels.c1.backupCheckpointDir = /data/iot/flume/flow_event/backup
a1.channels.c1.dataDirs = /data/iot/flume/flow_event/data
#channel大小以及事务提交数据的配置要适量,不是越大越好,太大会占用过多的内存,导致flume会挂起
a1.channels.c1.transactionCapacity = 60000
a1.channels.c1.capacity = 500000
a1.channels.c1.checkpointInterval = 60000
a1.channels.c1.keep-alive = 5
a1.channels.c1.maxFileSize = 5368709120 #jvm内存的80%
#接收数据的sources,对应端口日志收集端的设置  收集线程设置可以加快sources收集的速度                                                                                                                                                
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.threads = 8
a1.sources.r1.bind = ip  //分析端本机的ip
a1.sources.r1.port = 4545 //对应收集端的端口
                                                                                                                                                                                                                                   
# Describe the sink 指定自定义sink类入口,连接channel
a1.sinks.k1.type =  sink  //自定义sink
a1.sinks.k2.type =  sink  //自定义sink
a1.sinks.k3.type =  sink  //自定义sink
a1.sinks.k3.channel = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1


可能需要有用到的:

1.如果服务器资源允许,可以通过调整conf目录下的配置文件flume-env.sh的jvm内存来提高性能
2.如果控制台没有答应日志,需要在conf目录下加入log4j.properties文件,指定日志文件的存储目录
3.可以通过top -Hp 进程ID 来查看对应线程的情况
4.避免数据丢失,在启动flume的时候,建议先启动分析端flume,再启动日志收集flume,关闭的时候则先关闭收集再关闭分析

------------------------------------------------------------------------------------------------------------------------------------------------------

本文只做学习过程中的记录,如有错误,欢迎更正,共同学习。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值