一、flume配置
一台机器做负载均衡,二台服务器做存储hdfs
============================================
hadoop01
============================================
#声明Agent
a1.sources = r1
a1.sinks = k1 k2
a1.channels = c1
#声明source
a1.sources.r1.type = spoolDir
a1.sources.r1.spoolDir = /root/work/data/flumeData
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
#声明Sink
a1.sinks.k1.type = avro
a1.sinks.k1.hostname = hadoop02
a1.sinks.k1.port = 44444
a1.sinks.k2.type = avro
a1.sinks.k2.hostname = hadoop03
a1.sinks.k2.port = 44444
a1.sinkGroups = g1
a1.sinkGroups.g1.sinks = k1 k2
a1.sinkgroups.g1.processor.type = load_balance
a1.sinkgroups.g1.processor.selector = random
#声明channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.TransactionCapacity = 100
#绑定关系
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k2.channel = c1
============================================
hadoop02 hadoop03
============================================
#声明Agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
#声明source
a1.sources.r1.type = avro
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 44444
#声明Sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /zebra/reportTime=%Y-%m-%d %H-00-00
#避免产生大量的小文件,因是单机操作,则备份设为1
a1.sinks.k1.hdfs.rollInterval = 30
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.minBlockReplicas = 1
#声明channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.TransactionCapacity = 100
#声明channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
============================================
二、启动过程中存在的问题
1、服务器防火墙没有关闭,无法连接服务器
2、hadoop01读取文件时报错,hadoop01中配置的channel capacity太小,扩大容量
a1.sinks.k2.type=hdfs
a1.sinks.k2.hdfs.path=hdfs://hadoop11:9000/flumedata
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.rollInterval = 30
a1.sinks.k1.hdfs.rollSize = 0
a1.sinks.k1.hdfs.rollCount = 0
a1.sinks.k1.hdfs.minBlockReplicas = 1