Flume是Cloudera提供的一个高可用的,高可靠的,分布式的海量日志采集、聚合和传输的系统,Flume支持在日志系统中定制各类数据发送方,用于收集数据;同时,Flume提供对数据进行简单处理,并写到各种数据接受方(可定制)的能力。
Source:
从数据发生器接收数据,并将接收的数据以Flume的event格式传递给一个或者多个通道channel,Flume提供多种数据接收的方式,比如Avro,Thrift,twitter1%等
Channel:
channel是一种短暂的存储容器,它将从source处接收到的event格式的数据缓存起来,直到它们被sinks消费掉,它在source和sink间起着桥梁的作用,channel是一个完整的事务,这一点保证了数据在收发的时候的一致性. 并且它可以和任意数量的source和sink链接. 支持的类型有: JDBC channel , File System channel , Memory channel等.
sink:
sink将数据存储到集中存储器比如Hbase和HDFS,它从channels消费数据(events)并将其传递给目标地. 目标地可能是另一个sink,也可能HDFS,HBase.
将flume/conf下的flume-env.sh.template文件修改为flume-env.sh,并配置flume-env.sh文件
mv flume-env.sh.template flume-env.sh
vi flume-env.sh
export JAVA_HOME=/opt/module/jdk1.8.0_144
在/opt/module/flume/conf目录下创建file-flume-kafka.conf文件vim file-flume-kafka.conf
a1.sources=r1
a1.channels=c1 c2
#configure source
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /opt/module/flume/test/log_position.json
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /tmp/logs/app.+
a1.sources.r1.fileHeader = true
a1.sources.r1.channels = c1 c2
#interceptor
a1.sources.r1.interceptors = i1 i2
a1.sources.r1.interceptors.i1.type = com.flume.interceptor.LogETLInterceptor
B
u
i
l
d
e
r
a
1.
s
o
u
r
c
e
s
.
r
1.
i
n
t
e
r
c
e
p
t
o
r
s
.
i
2.
t
y
p
e
=
c
o
m
.
f
l
u
m
e
.
i
n
t
e
r
c
e
p
t
o
r
.
L
o
g
T
y
p
e
I
n
t
e
r
c
e
p
t
o
r
Builder a1.sources.r1.interceptors.i2.type = com.flume.interceptor.LogTypeInterceptor
Buildera1.sources.r1.interceptors.i2.type=com.flume.interceptor.LogTypeInterceptorBuilder
a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = topic
a1.sources.r1.selector.mapping.topic_start = c1
a1.sources.r1.selector.mapping.topic_event = c2
#configure channel
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = hadoop2:9092,hadoop3:9092,hadoop4:9092
a1.channels.c1.kafka.topic = topic_start
a1.channels.c1.parseAsFlumeEvent = false
a1.channels.c1.kafka.consumer.group.id = flume-consumer
a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c2.kafka.bootstrap.servers = hadoop2:9092,hadoop3:9092,hadoop4:9092
a1.channels.c2.kafka.topic = topic_event
a1.channels.c2.parseAsFlumeEvent = false
a1.channels.c2.kafka.consumer.group.id = flume-consumer
flume启停脚本
vim f1.sh
#! /bin/bash
case $1 in
“start”){
for i in hadoop102 hadoop103
do
echo " --------启动 $i 采集flume-------"
ssh $i “nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/file-flume-kafka.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/test1 2>&1 &”
done
};;
“stop”){
for i in hadoop102 hadoop103
do
echo " --------停止 $i 采集flume-------"
ssh $i “ps -ef | grep file-flume-kafka | grep -v grep |awk ‘{print $2}’ | xargs kill”
done
};;
esac
深入理解Flume
最新推荐文章于 2022-11-13 15:00:42 发布