软件下载和安装
flume下载地址:http://archive.apache.org/dist/flume/
-》解压缩
tar -zxf /opt/softwares/flume-ng-1.6.0-cdh5.10.2.tar.gz
-》配置文件:flume-env.sh
export JAVA_HOME=/opt/apps/jdk1.7.0_67
-》测试是否成功
bin/flume-ng version
flume的flume-ng命令
Usage: bin/flume-ng <command> [options]...
commands:
agent run a Flume agent
avro-client run an avro Flume client
global options:
--conf,-c <conf> use configs in <conf> directory
agent options:
--name,-n <name> the name of this agent (required)
--conf-file,-f <file> specify a config file (required if -z missing)
avro-client options:
--rpcProps,-P <file> RPC client properties file with server connection params
--host,-H <host> hostname to which events will be sent
--port,-p <port> port of the avro source
--dirname <dir> directory to stream to avro source
--filename,-F <file> text file to stream to avro source (default: std input)
--headerFile,-R <file> File containing event headers as key/value pairs on each new line
提交任务的命令:
bin/flume-ng agent --conf conf --name agent --conf-file conf/test.properties
bin/flume-ng agent -c conf -n agent -f conf/test.properties
bin/flume-ng avro-client --conf conf --host ibeifeng.class --port 8080
配置情况选择:
flume安装在hadoop集群中:
- 配置JAVA_HOME:export JAVA_HOME=/opt/apps/jdk1.7.0_67
flume安装在hadoop集群中,而且还配置了HA:
- HDFS访问入口变化
- 配置JAVA_HOME:export JAVA_HOME=/opt/apps/jdk1.7.0_67
- 还需要添加hadoop的core-site.xml和hdfs-site.xml拷贝到flume的conf目录
flume不在hadoop集群里:
- 配置JAVA_HOME:export JAVA_HOME=/opt/apps/jdk1.7.0_67
- 还需要添加hadoop的core-site.xml和hdfs-site.xml拷贝到flume的conf目录
- 将hadoop的一些jar包添加到flume的lib目录下(用的是什么版本拷贝什么版)
运行
bin/flume-ng agent --conf conf --conf-file conf/flume-agent.properties --name a1 -Dflume.root.logger=INFO,console
需要在conf/flume-agent.properties中配置相关信息,以从kafka中消费数据转存到HDFS为例,配置如下
#定义agent名, source、channel、sink的名称
agent.sources = r1
agent.channels = c1
agent.sinks = k1
#具体定义source
# 定义消息源类型
agent.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
# 定义kafka所在zk的地址
agent.sources.r1.zookeeperConnect = dbtest1:2181,dbtest2:2182,dbtest3:2183
agent.sources.r1.kafka.bootstrap.servers = dbtest1:9092,dbtest2:9093,dbtest3:9094
agent.sources.r1.brokerList = dbtest1:9092,dbtest2:9093,dbtest3:9094
# 配置消费的kafka topic
agent.sources.r1.topic = my-replicated-topic5
#agent.sources.r1.kafka.consumer.timeout.ms = 100
# 配置消费者组的id
agent.sources.r1.kafka.consumer.group.id = flume
#自定义拦截器
#agent.sources.r1.interceptors=i1
#agent.sources.r1.interceptors.i1.type=com.hadoop.flume.FormatInterceptor$Builder
#具体定义channel
# channel类型
agent.channels.c1.type = memory
# channel存储的事件容量
agent.channels.c1.capacity = 10000
# 事务容量
agent.channels.c1.transactionCapacity = 100
#具体定义sink
agent.sinks.k1.type = hdfs
agent.sinks.k1.hdfs.path = hdfs://dbtest1:8020/test/%Y%m%d
agent.sinks.k1.hdfs.fileType = DataStream
agent.sinks.k1.hdfs.writeFormat = Text
agent.sinks.k1.hdfs.rollInterval = 3
agent.sinks.k1.hdfs.rollSize = 1024000
agent.sinks.k1.hdfs.rollCount = 0
#配置前缀和后缀
agent.sinks.k1.hdfs.fileSuffix=.data
agent.sinks.k1.hdfs.filePrefix = localhost-%Y-%m-%d
agent.sinks.k1.hdfs.useLocalTimeStamp = true
agent.sinks.k1.hdfs.idleTimeout = 60
#避免文件在关闭前使用临时文件
#agent.sinks.k1.hdfs.inUserPrefix=_
#agent.sinks.k1.hdfs.inUserSuffix=
#组装channels
agent.sources.r1.channels = c1
agent.sinks.k1.channel = c1