flume入门案例

flume是什么??
是一个分布式的高效的海量日志数据收集工具。
是一个分布式、可靠、可用的高效的日志数据收集、聚合、移动的工具。

flume的结构??
agent:flume运行的一个最小单元。独立运行在单个jvm里面。一个agent里面包括一个或者多个sources、channels、sinks。
client:客户端,相当于产生数据的地方。
source:从客户端产生数据的地方收集数据。
channel:数据管道,用于接收source端的数据,然后将数据推送到对应的sink中。
sink:从channel中拉去数据,并将其存储到对应的持久化系统中。
Event:事件,一个event相当于一条数据。
interceptor:拦截器,它作用于source端,将符合条件的数据进行过滤。flume也允许拦截器链。
selector:选择器,flume有两种选择器,默认使用replicating,还有一个multiplexing。

channel的类型有哪些??
memory:
优点:速度快
缺点:容易丢失数据
file:
优点:数据安全性高
缺点:速度慢

sink的类型有哪些??
sink等。

和1.x的区别:
1、1.x使用的叫ng,0.9x以前的叫og。
2、1.x支持组件自定义开发,0.9x对自定义组件开发较难。
3、1.x不在区分逻辑node节点,所有的物理节点都统一称为agent。

安装:

sink
(a1是agent的别名,运行的时候使用)
c1
s1

#定义单个组件的属性(每一个组件最少都有一个类型type)
a1.sources.r1.type=avro
192.168.216.121
6666

channl的属性
a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

#定义loggersink
a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16

#将source和sink分别和channl连接
a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console

测试:
/home/flume/log.00
logger
c1
s1
exec
/home/flume/log.01

a2.channels.c1.type=memory
a2.channels.c1.capacity=1000
a2.channels.c1.transactionCapacity=100
a2.channels.c1.keep-alive=3
a2.channels.c1.byteCapacityBufferPercentage=20
a2.channels.c1.byteCapacity=800000

a2.sinks.s1.type=logger
a2.sinks.s1.maxBytesToLog=30

a2.sources.r1.channels=c1
a2.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
/home/flume/log.01

(监控目录需要被先创建)
c1
s1
a1.sources.r1.type=spooldir
/home/flume/spool
.COMPLETED
a1.sources.r1.deletePolicy=never
a1.sources.r1.fileHeader=false
a1.sources.r1.fileHeaderKey=file
a1.sources.r1.basenameHeader=false
a1.sources.r1.basenameHeaderKey=basename
a1.sources.r1.batchSize=100
a1.sources.r1.inputCharset=UTF-8
a1.sources.r1.bufferMaxLines=1000

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
/home/flume/spool/$i;done

c1
s1

a1.sources.r1.type=syslogtcp
hadoop01
6666

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
需要先安装nc
6666
sink
c1
s1
a1.sources.r1.type=http
hadoop01
6666
org.example.rest.RestHandler
props

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.sinks.s1.type=logger
a1.sinks.s1.maxBytesToLog=16

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
http://hadoop01:6666

sink
c1
s1

a1.sources.r1.type=syslogtcp
hadoop01
6666

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=flume-hdfs
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=false

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666
sink
c1
s1

a1.sources.r1.type=syslogtcp
hadoop01
6666

a1.channels.c1.type=file
a1.channels.c1.dataDirs=/home/flume/filechannel/data
a1.channels.c1.checkpointDir=/home/flume/filechannel/point
a1.channels.c1.transactionCapacity=10000
a1.channels.c1.checkpointInterval=30000
a1.channels.c1.capacity=1000000
a1.channels.c1.keep-alive=3

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=flume-hdfs
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666


拦截器:
Interceptor
Interceptor
Interceptor

sink

c1
s1

a1.sources.r1.type=syslogtcp
hadoop01
6666
i3
a1.sources.r1.interceptors.i1.type=timestamp
a1.sources.r1.interceptors.i1.preserveExisting=false
a1.sources.r1.interceptors.i2.type=host
a1.sources.r1.interceptors.i2.preserveExisting=false
a1.sources.r1.interceptors.i2.useIP=true
a1.sources.r1.interceptors.i2.hostHeader=hostname
a1.sources.r1.interceptors.i3.type=static
a1.sources.r1.interceptors.i3.preserveExisting=false
a1.sources.r1.interceptors.i3.key=hn
a1.sources.r1.interceptors.i3.value=hadoop01

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=%{hostname}
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=false

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666

sink

c1
s1

a1.sources.r1.type=syslogtcp
hadoop01
6666
a1.sources.r1.interceptors=i1
a1.sources.r1.interceptors.i1.type=regex_filter
#不要加引号包裹正则
a1.sources.r1.interceptors.i1.excludeEvents=false

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/%H%M
a1.sinks.s1.hdfs.filePrefix=%{hostname}
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=false

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666

------------selector:选择器
作用于source阶段,决定一条数据去往哪一个channel、sink

c2
s2

a1.sources.r1.type=syslogtcp
hadoop01
6666
a1.sources.r1.selector.type=replicating

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/rep
a1.sinks.s1.hdfs.filePrefix=s1sink
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true

a1.sinks.s2.type=hdfs
a1.sinks.s2.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/rep
a1.sinks.s2.hdfs.filePrefix=s2sink
a1.sinks.s2.hdfs.fileSuffix=.log
a1.sinks.s2.hdfs.inUseSuffix=.tmp
a1.sinks.s2.hdfs.rollInterval=60
a1.sinks.s2.hdfs.rollSize=1024
a1.sinks.s2.hdfs.rollCount=10
a1.sinks.s2.hdfs.idleTimeout=0
a1.sinks.s2.hdfs.batchSize=100
a1.sinks.s2.hdfs.fileType=DataStream
a1.sinks.s2.hdfs.writeFormat=Text
a1.sinks.s2.hdfs.round=true
a1.sinks.s2.hdfs.roundValue=1
a1.sinks.s2.hdfs.roundUnit=second
a1.sinks.s2.hdfs.useLocalTimeStamp=true

c2
a1.sinks.s1.channel=c1
a1.sinks.s2.channel=c2

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
6666
c2
s2

a1.sources.r1.type=http
hadoop01
6666
a1.sources.r1.selector.type=multiplexing
state
c1
c2
c1

a1.channels.c1.type=memory
a1.channels.c1.capacity=1000
a1.channels.c1.transactionCapacity=100
a1.channels.c1.keep-alive=3
a1.channels.c1.byteCapacityBufferPercentage=20
a1.channels.c1.byteCapacity=800000

a1.channels.c2.type=memory
a1.channels.c2.capacity=1000
a1.channels.c2.transactionCapacity=100

a1.sinks.s1.type=hdfs
a1.sinks.s1.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/mul
a1.sinks.s1.hdfs.filePrefix=s1sink
a1.sinks.s1.hdfs.fileSuffix=.log
a1.sinks.s1.hdfs.inUseSuffix=.tmp
a1.sinks.s1.hdfs.rollInterval=60
a1.sinks.s1.hdfs.rollSize=1024
a1.sinks.s1.hdfs.rollCount=10
a1.sinks.s1.hdfs.idleTimeout=0
a1.sinks.s1.hdfs.batchSize=100
a1.sinks.s1.hdfs.fileType=DataStream
a1.sinks.s1.hdfs.writeFormat=Text
a1.sinks.s1.hdfs.round=true
a1.sinks.s1.hdfs.roundValue=1
a1.sinks.s1.hdfs.roundUnit=second
a1.sinks.s1.hdfs.useLocalTimeStamp=true

a1.sinks.s2.type=hdfs
a1.sinks.s2.hdfs.path=hdfs://hadoop01:9000/flume/%Y/%m/%d/mul
a1.sinks.s2.hdfs.filePrefix=s2sink
a1.sinks.s2.hdfs.fileSuffix=.log
a1.sinks.s2.hdfs.inUseSuffix=.tmp
a1.sinks.s2.hdfs.rollInterval=60
a1.sinks.s2.hdfs.rollSize=1024
a1.sinks.s2.hdfs.rollCount=10
a1.sinks.s2.hdfs.idleTimeout=0
a1.sinks.s2.hdfs.batchSize=100
a1.sinks.s2.hdfs.fileType=DataStream
a1.sinks.s2.hdfs.writeFormat=Text
a1.sinks.s2.hdfs.round=true
a1.sinks.s2.hdfs.roundValue=1
a1.sinks.s2.hdfs.roundUnit=second
a1.sinks.s2.hdfs.useLocalTimeStamp=true

c2
a1.sinks.s1.channel=c1
a1.sinks.s2.channel=c2

启动agent的服务:
-Dflume.root.logger=INFO,console
测试:
http://hadoop01:6666
http://hadoop01:6666

------------------------flume的集群----
其实slave配置差不多。
配置192.168.216.121:

a1.sources=r1
a1.channels=c1
a1.sinks=s1

a1.sources.r1.type=syslogtcp
a1.sources.r1.host=192.168.216.121
a1.sources.r1.port=6666

a1.channels.c1.type=memory

a1.sinks.s1.type=avro
a1.sinks.s1.hostname=192.168.216.123
a1.sinks.s1.port=6666

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

配置192.168.216.122:
a1.sources=r1
a1.channels=c1
a1.sinks=s1

a1.sources.r1.type=http
a1.sources.r1.bind=192.168.216.122
a1.sources.r1.port=6666

a1.channels.c1.type=memory

a1.sinks.s1.type=avro
a1.sinks.s1.hostname=192.168.216.123
a1.sinks.s1.port=6666

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

#配置192.168.216.123:

a1.sources=r1
a1.channels=c1
a1.sinks=s1

a1.sources.r1.type=avro
a1.sources.r1.bind=192.168.216.123
a1.sources.r1.port=6666

a1.channels.c1.type=memory

a1.sinks.s1.type=logger

a1.sources.r1.channels=c1
a1.sinks.s1.channel=c1

启动master:
-Dflume.root.logger=INFO,console
再启动slave:
-Dflume.root.logger=INFO,console
-Dflume.root.logger=INFO,console

6666
http://192.168.216.122:6666

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值