flume使用整理

最新推荐文章于 2021-06-30 19:21:40 发布

乐活每天

最新推荐文章于 2021-06-30 19:21:40 发布

阅读量377

点赞数

文章标签： flume

本文链接：https://blog.csdn.net/u014434044/article/details/79569244

版权

一、安装
1、下载安装包
http://apache.fayea.com/flume/1.6.0/
2、上传解压
tar -xzvf apache-flume-1.6.0-bin.tar.gz
3、配置文件
1.mv flume-env.sh.template flume-env.sh
vi flume-env.sh 配置JAVA_HOME
2.mv flume-conf.properties.template flume-conf.properties
vi flume-conf.properties
flume的配置文件名字可以自己定义
三个组件如果使用的type不同，配置不一样
4、启动flume
bin/flume-ng agent -c ./conf/ -f ./conf/flume-conf.properties -n agent0 -Dflume.root.logger=INFO,console

二、介绍
flume是cloudera开发的一套日志搜集系统。最初被称为flume OG，最后一个版本是0.94.0，日志传输不稳定问题严重。2011年10月22日，cloudera完成了flume-728,
完成了对flume的里程碑式的改动，重构了核心组件、核心配置及核心架构。重构后的flume被称为flume NG。flume被纳入apache名下，称为apache flume。
flume由三个主要的组件构成：source、sink、channel。
source用于搜集数据，将其分成transtion和event打入channel；
channel用于对数据进行简单的缓存；
sink取出channel中的数据，存到数据库、文件系统或者发给远程服务器。
三、flume整合kafka
1、下载flume-kafka-plus
2、拷贝flume-conf.properties到flume的conf目录，配置文件
3、拷贝插件包的libs目录的jar到flume的lib目录下
4、拷贝插件包的package目录里的插件jar到flume的lib目录下
5、flume的插件包是个完整的maven工程，里面所依赖的flume和kafka的版本可以换成自己使用的
四、flume使用zookeeper管理
1、下载zookeeper的jar包，3.4.6，放在flume的lib目录下
2、启动flume的命令
bin/flume-ng agent -c conf -f conf/flume-conf.properties -z 172.19.1.1:2181 -p /flume -n a1 -Dflume.root.logger=INFO,console
其中-z是配置zookeeper集群地址，-p表示zookeeper存放flume的agent配置的路径
五、flume配置
5.1、source类型不同
1、Avro
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
2、spool
Spool监测配置的目录下新增的文件，并将文件中的数据读取出来。需要注意两点：
　　　　1) 拷贝到spool目录下的文件不可以再打开编辑。
　　　　2) spool目录下不可包含相应的子目录
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = /home/hadoop/flume-1.5.0-bin/logs
a1.sources.r1.fileHeader = true
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3、exec
EXEC执行一个给定的命令获得输出的源,如果要使用tail命令，必选使得file足够大才能看到输出内容
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.channels = c1
a1.sources.r1.command = tail -F /home/hadoop/flume-1.5.0-bin/log_exec_tail
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
4、syslogtcp
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = syslogtcp
a1.sources.r1.port = 5140
a1.sources.r1.host = localhost
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
5、jsonhandler
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.http.HTTPSource
a1.sources.r1.port = 8888
a1.sources.r1.channels = c1
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
5.2、sink不同
1、logger
2、hdfs
3、kafka
4、avro
5、file_roll
agent.sinks.s1.type=file_roll
agent.sinks.s1.sink.directory=dir
5.3、channel selector不同
官方文档上channel selectors 有两种类型:
Replicating Channel Selector (default)
Multiplexing Channel Selector
这两种selector的区别是:Replicating 会将source过来的events发往所有channel,而Multiplexing 可以选择该发往哪些channel。

六、flume监控
flume监控，源生支持HTTP及ganglia两种方式获取监控指标，
HTTP方式可以通过在启动参数里添加-Dflume.monitoring.type=http -Dflume.monitoring.port=xxxx参数，从页面获取指标，是json格式
ganglia方式需要安装ganglia，并且需要在启动参数添加-Dflume.monitoring.type=ganglia -Dflume.monitoring.hosts=ganglia地址

自定义监控需要实现org.apache.flume.instrumentation.MonitorService接口，并且在flume的启动参数里添加-Dflume.monitoring.type=自定义监控类 -Dflume.monitoring.node=监控上报地址
flume源生提供三类计数器SourceCounter、SinkCounter、ChannelCounter，这三个类都继承MonitoredCounterGroup，实现对应的MBean，
MonitoredCounterGroup用于启停counter及注册MBean，MBean要符合JMX标准，flume源码的MBean是接口，里面提供一些属性的get或set方法，
get和set方法用于表示该属性是否可读或可写

ambari-metrics获取flume指标是通过JMXPollUtil.getAllMBeans()方法获取所有MBean的属性，将需要上报的通过HTTP Post的方式上报给ams
七、flume对接ambari-metrics
flume指标上报给ams，ams是提供一个自定义的flume监控，将获取的指标通过http post的方式发送给flume的collector。
flume上报kafkaChannel指标问题：
版本：apache-flume1.7
问题指标：ChannelFillPercentage
原因：1.kafkaChannel不会设置ChannelSize和ChannelCapacity两个指标，导致ChannelFillPercentage指标取值Double.MAX_VALUE
2.flume上报channel指标时会收集一段时间内的所有值一起上报给ams，ams收到指标，会将这些值求和，导致入库时double类型入库失败，
报NumberFormatException异常。

乐活每天

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
flume使用整理

一、安装 1、下载安装包 http://apache.fayea.com/flume/1.6.0/ 2、上传解压 tar -xzvf apache-flume-1.6.0-bin.tar.gz 3、配置文件 1.mv flume-env.sh.template flume-env.sh vi flume-env.sh 配置JAVA_HOME 2.mv flume-conf.prope...
复制链接

扫一扫