一、source
1、avro source
侦听Avro端口并从外部Avro客户端流接收事件。 当与另一个(上一跳)Flume代理上的内置Avro Sink配对时,它可以创建分层集合拓扑。
channels | – | |
type | – | The component type name, needs to be avro |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
使用场景:分层的数据收集。
例如:两层的日志收集:
使用flume将Nginx日志文件上传到hdfs上,要求hdfs上的目录使用日期归档
Flume:
agent的配置 source channel sink
flume的部署模式:
两层模式:
第一层:Flume agent 与每台nginx部署在一起
exec source + memory channel/file channel + avro sink
第二层:(收集汇集层)
avro source + memory channel + hdfs sink
flume agent启动过程:
先启动第二层flume agent avro 服务端
先打印日志到控制台,检查是否报错:
bin/flume-ng agent --name a2 --conf conf/ --conf-file conf/agents/flume_a2.conf -Dflume.root.logger=INFO,console
查看端口:
netstat -tlnup | grep prot
再启动第一层 flume agent
其中第一层的conf-file如下:
a1.conf
# exec source + memory channel + avro sink
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /opt/datas/nginx/user_logs/access.log
# Describe the sink avro sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = rainbow.com.cn
a1.sinks.k1.port = 4545
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# combine Source channel sink
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a2.conf
# avro source + memory channel + hdfs sink
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.channels = c1
a2.sources.r1.bind = rainbow.com.cn
a2.sources.r1.port = 4545
# hdfs sink
a2.sinks.k1.type = hdfs
a2.sinks.k1.channel = c1
a2.sinks.k1.hdfs.path = /nginx_logs/events/%y-%m-%d/
a2.sinks.k1.hdfs.filePrefix = events-
# hfds上文件目录创建的频率
#a2.sinks.k1.hdfs.round = true
#a2.sinks.k1.hdfs.roundValue =