flume部署和使用

最新推荐文章于 2023-10-27 18:13:00 发布

yuandiyzy1987

最新推荐文章于 2023-10-27 18:13:00 发布

阅读量433

点赞数

分类专栏：大数据维护文章标签： flume

本文链接：https://blog.csdn.net/yuandiyzy1987/article/details/81564322

版权

大数据维护专栏收录该内容

7 篇文章 0 订阅

订阅专栏

软件下载和安装

flume下载地址：http://archive.apache.org/dist/flume/

-》解压缩
  tar -zxf /opt/softwares/flume-ng-1.6.0-cdh5.10.2.tar.gz
-》配置文件：flume-env.sh
  export JAVA_HOME=/opt/apps/jdk1.7.0_67
-》测试是否成功
  bin/flume-ng version

flume的flume-ng命令

Usage: bin/flume-ng <command> [options]...

commands:
  agent                     run a Flume agent
  avro-client               run an avro Flume client

global options:
  --conf,-c <conf>          use configs in <conf> directory

agent options:
  --name,-n <name>          the name of this agent (required)
  --conf-file,-f <file>     specify a config file (required if -z missing)

avro-client options:
  --rpcProps,-P <file>   RPC client properties file with server connection params
  --host,-H <host>       hostname to which events will be sent
  --port,-p <port>       port of the avro source
  --dirname <dir>        directory to stream to avro source
  --filename,-F <file>   text file to stream to avro source (default: std input)
  --headerFile,-R <file> File containing event headers as key/value pairs on each new line

提交任务的命令：

bin/flume-ng agent --conf conf --name agent --conf-file conf/test.properties  
bin/flume-ng agent -c conf -n agent -f conf/test.properties
bin/flume-ng avro-client --conf conf --host ibeifeng.class --port 8080

配置情况选择：

flume安装在hadoop集群中：
- 配置JAVA_HOME：export JAVA_HOME=/opt/apps/jdk1.7.0_67

flume安装在hadoop集群中，而且还配置了HA：
- HDFS访问入口变化
- 配置JAVA_HOME：export JAVA_HOME=/opt/apps/jdk1.7.0_67
- 还需要添加hadoop的core-site.xml和hdfs-site.xml拷贝到flume的conf目录

flume不在hadoop集群里：
- 配置JAVA_HOME：export JAVA_HOME=/opt/apps/jdk1.7.0_67
- 还需要添加hadoop的core-site.xml和hdfs-site.xml拷贝到flume的conf目录
- 将hadoop的一些jar包添加到flume的lib目录下（用的是什么版本拷贝什么版）

运行

bin/flume-ng agent --conf conf --conf-file conf/flume-agent.properties --name a1 -Dflume.root.logger=INFO,console

需要在conf/flume-agent.properties中配置相关信息，以从kafka中消费数据转存到HDFS为例,配置如下

#定义agent名， source、channel、sink的名称
agent.sources = r1
agent.channels = c1
agent.sinks = k1

#具体定义source
# 定义消息源类型
agent.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
# 定义kafka所在zk的地址
agent.sources.r1.zookeeperConnect = dbtest1:2181,dbtest2:2182,dbtest3:2183
agent.sources.r1.kafka.bootstrap.servers = dbtest1:9092,dbtest2:9093,dbtest3:9094
agent.sources.r1.brokerList = dbtest1:9092,dbtest2:9093,dbtest3:9094
# 配置消费的kafka topic
agent.sources.r1.topic = my-replicated-topic5
#agent.sources.r1.kafka.consumer.timeout.ms = 100
# 配置消费者组的id
agent.sources.r1.kafka.consumer.group.id = flume

#自定义拦截器
#agent.sources.r1.interceptors=i1
#agent.sources.r1.interceptors.i1.type=com.hadoop.flume.FormatInterceptor$Builder

#具体定义channel
# channel类型
agent.channels.c1.type = memory
# channel存储的事件容量
agent.channels.c1.capacity = 10000
# 事务容量
agent.channels.c1.transactionCapacity = 100
#具体定义sink
agent.sinks.k1.type = hdfs
agent.sinks.k1.hdfs.path = hdfs://dbtest1:8020/test/%Y%m%d 
agent.sinks.k1.hdfs.fileType = DataStream
agent.sinks.k1.hdfs.writeFormat = Text
agent.sinks.k1.hdfs.rollInterval = 3
agent.sinks.k1.hdfs.rollSize = 1024000
agent.sinks.k1.hdfs.rollCount = 0

#配置前缀和后缀
agent.sinks.k1.hdfs.fileSuffix=.data
agent.sinks.k1.hdfs.filePrefix = localhost-%Y-%m-%d

agent.sinks.k1.hdfs.useLocalTimeStamp = true
agent.sinks.k1.hdfs.idleTimeout = 60

#避免文件在关闭前使用临时文件
#agent.sinks.k1.hdfs.inUserPrefix=_
#agent.sinks.k1.hdfs.inUserSuffix=

#组装channels
agent.sources.r1.channels = c1
agent.sinks.k1.channel = c1