flume是什么??
是一个分布式的高效的海量日志数据收集工具。
是一个分布式、可靠、可用的高效的日志数据收集、聚合、移动的工具。
flume的结构??
agent:flume运行的一个最小单元。独立运行在单个jvm里面。一个agent里面包括一个或者多个sources、channels、sinks。
client:客户端,相当于产生数据的地方。
source:从客户端产生数据的地方收集数据。
channel:数据管道,用于接收source端的数据,然后将数据推送到对应的sink中。
sink:从channel中拉去数据,并将其存储到对应的持久化系统中。
Event:事件,一个event相当于一条数据。
interceptor:拦截器,它作用于source端,将符合条件的数据进行过滤。flume也允许拦截器链。
selector:选择器,flume有两种选择器,默认使用replicating,还有一个multiplexing。
channel的类型有哪些??
memory:
优点:速度快
缺点:容易丢失数据
file:
优点:数据安全性高
缺点:速度慢
sink的类型有哪些??
sink等。
和1.x的区别:
1、1.x使用的叫ng,0.9x以前的叫og。
2、1.x支持组件自定义开发,0.9x对自定义组件开发较难。
3、1.x不在区分逻辑node节点,所有的物理节点都统一称为agent。
安装:
sink
(a1是agent的别名,运行的时候使用)
c1
s1
#定义单个组件的属性(每一个组件最少都有一个类型type)
a1.sources.r1.type=avro
192.168.216.121
6666
channl的属性
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
#定义loggersink
a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16
#将source和sink分别和channl连接
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
/home/flume/log.00
logger
c1
s1
exec
/home/flume/log.01
a2.channels.c1.type=memory
a2.channels.c1.capacity=1000
a2.channels.c1.transactionCapacity=100
a2.channels.c1.keep-alive=3
a2.channels.c1.byteCapacityBufferPercentage=20
a2.channels.c1.byteCapacity=800000
a2.sinks.s1.type=logger
a2.sinks.s1.maxBytesToLog=30
a2.sources.r1.channels=c1
a2.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
/home/flume/log.01
(监控目录需要被先创建)
c1
s1
a1.sources.r1.type=spooldir
/home/flume/spool
.COMPLETED
a1.sources.r1.deletePolicy=never
a1.sources.r1.fileHeader=false
a1.sources.r1.fileHeaderKey=file
a1.sources.r1.basenameHeader=false
a1.sources.r1.basenameHeaderKey=basename
a1.sources.r1.batchSize=100
a1.sources.r1.inputCharset=UTF-8
a1.sources.r1.bufferMaxLines=1000
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
/home/flume/spool/$i;done
c1
s1
a1.sources.r1.type=syslogtcp
hadoop01
6666
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
需要先安装nc
6666
sink
c1
s1
a1.sources.r1.type=http
hadoop01
6666
org.example.rest.RestHandler
props
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
http://hadoop01:6666
sink
c1
s1
a1.sources.r1.type=syslogtcp
hadoop01
6666
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=flume-hdfs
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=false
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666
sink
c1
s1
a1.sources.r1.type=syslogtcp
hadoop01
6666
a1.channels.c1.type=file
a1.channels.c1.dataDirs=/home/flume/filechannel/data
a1.channels.c1.checkpointDir=/home/flume/filechannel/point
a1.channels.c1.transactionCapacity=10000
a1.channels.c1.checkpointInterval=30000
a1.channels.c1.capacity=1000000
a1.channels.c1.keep-alive=3
a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=flume-hdfs
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666
拦截器:
Interceptor
Interceptor
Interceptor
sink
c1
s1
a1.sources.r1.type=syslogtcp
hadoop01
6666
i3
a1.sources.r1.interceptors.i1.type=timestamp
a1.sources.r1.interceptors.i1.preserveExisting=false
a1.sources.r1.interceptors.i2.type=host
a1.sources.r1.interceptors.i2.preserveExisting=false
a1.sources.r1.interceptors.i2.useIP=true
a1.sources.r1.interceptors.i2.hostHeader=hostname
a1.sources.r1.interceptors.i3.type=static
a1.sources.r1.interceptors.i3.preserveExisting=false
a1.sources.r1.interceptors.i3.key=hn
a1.sources.r1.interceptors.i3.value=hadoop01
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=%{hostname}
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=false
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666
sink
c1
s1
a1.sources.r1.type=syslogtcp
hadoop01
6666
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=regex_filter
#不要加引号包裹正则
a1.sources.r1.interceptors.i1.excludeEvents=false
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=%{hostname}
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=false
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666
------------selector:选择器
作用于source阶段,决定一条数据去往哪一个channel、sink
c2
s2
a1.sources.r1.type=syslogtcp
hadoop01
6666
a1.sources.r1.selector.type=replicating
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100
a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/rep
a1.sinks.s1.hdfs.filePrefix=s1sink
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true
a1.sinks.s2.type=hdfs
a1.sinks.s2.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/rep
a1.sinks.s2.hdfs.filePrefix=s2sink
a1.sinks.s2.hdfs.fileSuffix=.log
a1.sinks.s2.hdfs.inUseSuffix=.tmp
a1.sinks.s2.hdfs.rollInterval=60
a1.sinks.s2.hdfs.rollSize=1024
a1.sinks.s2.hdfs.rollCount=10
a1.sinks.s2.hdfs.idleTimeout=0
a1.sinks.s2.hdfs.batchSize=100
a1.sinks.s2.hdfs.fileType=DataStream
a1.sinks.s2.hdfs.writeFormat=Text
a1.sinks.s2.hdfs.round=true
a1.sinks.s2.hdfs.roundValue=1
a1.sinks.s2.hdfs.roundUnit=second
a1.sinks.s2.hdfs.useLocalTimeStamp=true
c2
a1.sinks.s1.channel=c1
a1.sinks.s2.channel=c2
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666
c2
s2
a1.sources.r1.type=http
hadoop01
6666
a1.sources.r1.selector.type=multiplexing
state
c1
c2
c1
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000
a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100
a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/mul
a1.sinks.s1.hdfs.filePrefix=s1sink
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true
a1.sinks.s2.type=hdfs
a1.sinks.s2.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/mul
a1.sinks.s2.hdfs.filePrefix=s2sink
a1.sinks.s2.hdfs.fileSuffix=.log
a1.sinks.s2.hdfs.inUseSuffix=.tmp
a1.sinks.s2.hdfs.rollInterval=60
a1.sinks.s2.hdfs.rollSize=1024
a1.sinks.s2.hdfs.rollCount=10
a1.sinks.s2.hdfs.idleTimeout=0
a1.sinks.s2.hdfs.batchSize=100
a1.sinks.s2.hdfs.fileType=DataStream
a1.sinks.s2.hdfs.writeFormat=Text
a1.sinks.s2.hdfs.round=true
a1.sinks.s2.hdfs.roundValue=1
a1.sinks.s2.hdfs.roundUnit=second
a1.sinks.s2.hdfs.useLocalTimeStamp=true
c2
a1.sinks.s1.channel=c1
a1.sinks.s2.channel=c2
启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
http://hadoop01:6666
http://hadoop01:6666
------------------------flume的集群----
其实slave配置差不多。
配置192.168.216.121:
a1.sources=r1
a1.channels=c1
a1.sinks=s1
a1.sources.r1.type=syslogtcp
a1.sources.r1.host=192.168.216.121
a1.sources.r1.port=6666
a1.channels.c1.type=memory
a1.sinks.s1.type=avro
a1.sinks.s1.hostname=192.168.216.123
a1.sinks.s1.port=6666
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
配置192.168.216.122:
a1.sources=r1
a1.channels=c1
a1.sinks=s1
a1.sources.r1.type=http
a1.sources.r1.bind=192.168.216.122
a1.sources.r1.port=6666
a1.channels.c1.type=memory
a1.sinks.s1.type=avro
a1.sinks.s1.hostname=192.168.216.123
a1.sinks.s1.port=6666
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
#配置192.168.216.123:
a1.sources=r1
a1.channels=c1
a1.sinks=s1
a1.sources.r1.type=avro
a1.sources.r1.bind=192.168.216.123
a1.sources.r1.port=6666
a1.channels.c1.type=memory
a1.sinks.s1.type=logger
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1
启动master:
-Dflume.root.logger=INFO,console
再启动slave:
-Dflume.root.logger=INFO,console
-Dflume.root.logger=INFO,console