【Flume】数据采集引擎Flume

最新推荐文章于 2024-06-28 22:13:28 发布

超新星X

最新推荐文章于 2024-06-28 22:13:28 发布

阅读量529

点赞数

分类专栏： BigData Hadoop 文章标签： Flume

本文链接：https://blog.csdn.net/xin93/article/details/80671642

版权

15 篇文章 0 订阅

订阅专栏

4 篇文章 0 订阅

订阅专栏

 
 一、概述 

 
 flume是一个高效的、可靠的、可用的分布式海量日志数据收集、聚合、传输的工具。 

 
 Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.  

 
 二、flum的架构 

 
 三、flume的组件以及作用 

 
 client：客户端（运行agent的地方） 

 
 source： 数据源，负责接收数据 

 
 channel：管道，负责接收source端的数据，将数据推送到sink端。 

 
 sink：下沉器，负责去拉取channel的数据，将其持久化到存储系统。 

 
 interceptor: 拦截器，flume允许使用拦截器拦截数据，可以作用于source、sink端，flume还可以允许拦截器链。 

 
 selector：选择器，作用于source端，决定数据往哪个channel发送。 

 
 event：flume的事件，相当于一条数据。 

 
 agent：flume的客户端，一个agent运行在一个jvm里，它是flume的最小运行单元。 

 
 source的种类 

 
 avro\exec\spooling dir\syslogtcp\httpsource\kafka 

 
 channel的种类 

 
 file、memory、jdbc、kafka 

 
 sink的种类 

 
 avro、hdfs、kafka、hbase、logger 

 
 数据模型： 

 
 单一的数据模型： 

 
 多数据流模型： 

 
 四、flume的安装 

 
 flume0.9和1.x的版本 

 
 1. 0.9之前的 版本叫flume-og， 而1.x的版本叫flume-ng 

 
 2. 0.9区分逻辑和物理上的节点，而1.x不区分物理和逻辑上的节点，每一个agent都是一个服务。 

 
 3. 0.9需要master和zookeeper的支持，而1.x 不再需要这些组件的支持。 

 
 4. 0.9开发并不是很灵活，而1.x很灵活， 

 
 四、flume案例 

 
 avro+memory+logger 

  vi agentconf/avro-logger.conf 

  #Name the components on the agent 

  a1.sources=r1 

  a1.channel=c1 

  a1.sinks=s1 

  #describe source 

  s1.sources.r1.type = avro 

  s1.sources.r1.bind = 192.168.243.11 

  s1.sources.r1.port = 44444 

  #describe channel 

  a1.channels.c1.type = memory 

  #describe sinks 

  a1.sinks.s1.type = logger 

  #bind source and sink to the channel 

  a1.source.r1.channels = c1 

  a1.sinks.s1.channels = c1 

 
 启动agent 

  bin/flume-ng agent -c ./conf -f ./agentconf/avro-logger.conf -n a1 -Dflume.root.logger=INFO,console 

 
 测试 

  bin/flume-ng avro-client -c ./conf -H 192.168.243.11 -p 44444 -F /root/flumedata/test.dat 

关注

专栏目录