一、Flume介绍 ---------------------------------------------------------- 1.是一种收集,移动,聚合大量日志数据的服务 2.基于流数据的架构,用于在线日志分析处理 3.在生产和消费者之间,起到有个缓冲的作用 4.提供了实务保证,确保消息一定被分发和处理 5.输入数据源多样,输出数据格式多元 6.多级跃点 -- 一个flume的输出可以作为另一个flume的输入 二、Source、Channel、Sink ---------------------------------------------------------- 1.Source 接收数据的源头,数据类型可以有多重 2.Channel 临时存储接收的数据,并对数据进行缓冲,直到sink消费掉 3.Sink 从channel中提取数据,并将数据进行集中储存(hbase,hdfs) 三、安装flume ----------------------------------------------------------- 1.下载apache-flume-1.7.0-bin.tar.gz 2.tar开 3.创建符号链接 4.配置环境变量 FLUME_HOME="/soft/flume" PATH=......:/soft/flume/bin; 5.验证安装是否成功 $> flume-ng version 四、配置和使用flume ----------------------------------------------------------------- 1.创建配置文件[/soft/flume/conf/hello.conf] #声明三种组件 a1.sources = r1 a1.channels = c1 a1.sinks = k1 #定义source信息 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 #定义sink信息 a1.sinks.k1.type=logger #定义channel信息 a1.channels.c1.type=memory #绑定在一起 a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 2.运行 -- nc 作为数据源source a)启动flume agent $flume/bin> ./flume-ng agent -f ../conf/helloworld.conf -n a1 -Dflume.root.logger=INFO,console b)启动nc的客户端 $>nc localhost 8888 $nc>hello world c)在flume的终端输出hello world. 五、flume sources数据源 ---------------------------------------------------------------- 1.netcat 2.exec 实时日志收集,监控日志文件内容的增加 a)配置文件 [/soft/flume/conf/exec.conf] a1.sources = r1 a1.sinks = k1 a1.channels = c1 a1.sources.r1.type=exec a1.sources.r1.command=tail -F /home/ubuntu/test.log a1.sinks.k1.type=logger a1.channels.c1.type=memory a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 b)$>bin/flume-ng agent -f ../conf/exec.conf -n a1 -Dflume.root.logger=INFO,console 3.批量收集 监控一个文件夹。必须是静态文件 收集文件之后,会重命名文件成新文件 .complete a)配置文件 [/soft/flume/conf/spooldir.conf] a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type=spooldir a1.sources.r1.spoolDir=/home/ubuntu/spool a1.sources.r1.fileHeader=true a1.sinks.k1.type=logger a1.channels.c1.type=memory a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 b)创建目录 $>mkdir ~/spool c)启动flume $>bin/flume-ng agent -f ../conf/spooldir.conf -n a1 -Dflume.root.logger=INFO,console 4.seq文件 source a.创建配置文件[/soft/flume/conf/seq.conf] a1.sources = r1 a1.channels = c1 a1.sinks = k1 #数据输入类型是seq,最大事件数是1000,步长是1 a1.sources.r1.type=seq a1.sources.r1.totalEvents=1000 a1.sinks.k1.type=logger a1.channels.c1.type=memory a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 b.[运行] $>bin/flume-ng agent -f ../conf/seq.conf -n a1 -Dflume.root.logger=INFO,console 5.stress source 压力测试,瞬间产生大量的日志 a.创建配置文件[/soft/flume/conf/stress.conf] a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type = org.apache.flume.source.StressSource a1.sources.r1.size = 10240 a1.sources.r1.maxTotalEvents = 1000000 a1.sources.r1.channels = c1 a1.channels.c1.type=memory a1.sinks.k1.channel=c1 a1.sinks.k1.type=logger b.[运行] $>bin/flume-ng agent -f ../conf/stress.conf -n a1 -Dflume.root.logger=INFO,console 六、sinks 沉槽 ---------------------------------------------------------------- 1.HDFS -- 从tomcat中收集日志,并向hdfs中写日志 a.创建配置文件[/soft/flume/conf/hdfs.conf] a1.sources = r1 a1.channels = c1 a1.sinks = k1 a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 8888 a1.sinks.k1.type = hdfs a1.sinks.k1.hdfs.path = /user/ubuntu/flume/events/%y-%m-%d/%H/%M/%S a1.sinks.k1.hdfs.filePrefix = events- #是否是产生新目录,每十分钟产生一个新目录,一般控制的目录方面。 #2017-12-12 --> #2017-12-12 -->%H%M%S a1.sinks.k1.hdfs.round = true a1.sinks.k1.hdfs.roundValue = 10 a1.sinks.k1.hdfs.roundUnit = second a1.sinks.k1.hdfs.useLocalTimeStamp=true #是否产生新文件。 a1.sinks.k1.hdfs.rollInterval=10 a1.sinks.k1.hdfs.rollSize=10 a1.sinks.k1.hdfs.rollCount=3 a1.channels.c1.type=memory a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 b.[运行] $>bin/flume-ng agent -f ../conf/hdfs.conf -n a1 2.hive 略 3.hbase a.创建配置文件[/soft/flume/conf/hbase.conf] #创建3大组件 a1.sources = r1 a1.channels = c1 a1.sinks = k1 #配置源信息 a1.sources.r1.type = netcat a1.sources.r1.bind = localhost a1.sources.r1.port = 8888 #配置通道信息 a1.channels.c1.type=memory #配置沉槽信息 a1.sinks.k1.type = hbase a1.sinks.k1.table = ns1:t12 a1.sinks.k1.columnFamily = f1 a1.sinks.k1.serializer = org.apache.flume.sink.hbase.RegexHbaseEventSerializer #用通道绑定源和沉槽 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 b.[运行] $>bin/flume-ng agent -f ../conf/hbase.conf -n a1 4.kafka 略 七、使用avroSource和AvroSink实现跃点agent处理 ------------------------------------------------------------- 1.创建配置文件 [avro_hop.conf] #a1 nc输入/avro输出 a1.sources = r1 a1.sinks= k1 a1.channels = c1 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.sinks.k1.type = avro a1.sinks.k1.hostname=localhost a1.sinks.k1.port=9999 a1.channels.c1.type=memory a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 #a2 avro输入/logger输出 a2.sources = r2 a2.sinks= k2 a2.channels = c2 a2.sources.r2.type=avro a2.sources.r2.bind=localhost a2.sources.r2.port=9999 a2.sinks.k2.type = logger a2.channels.c2.type=memory a2.sources.r2.channels = c2 a2.sinks.k2.channel = c2 2.启动a2 $>flume-ng agent -f /soft/flume/conf/avro_hop.conf -n a2 -Dflume.root.logger=INFO,console 3.验证a2 $>netstat -anop | grep 9999 4.启动a1 $>flume-ng agent -f /soft/flume/conf/avro_hop.conf -n a1 5.验证a1 $>netstat -anop | grep 8888 八、channel 通道 ---------------------------------------------------------------- 1.MemoryChannel 2.FileChannel a.创建配置文件[file.conf] a1.sources = r1 a1.sinks= k1 a1.channels = c1 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.sinks.k1.type=logger a1.channels.c1.type = file a1.channels.c1.checkpointDir = /home/ubuntu/flume/fc_check a1.channels.c1.dataDirs = /home/ubuntu/flume/fc_data a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 b.[运行] $>flume-ng agent -f /soft/flume/conf/file.conf -n a1 -Dflume.root.logger=INFO,console 3.可溢出文件通道 a.创建配置文件[spilt.conf] a1.sources = r1 a1.sinks= k1 a1.channels = c1 a1.sources.r1.type=netcat a1.sources.r1.bind=localhost a1.sources.r1.port=8888 a1.sinks.k1.type=logger a1.channels.c1.type = SPILLABLEMEMORY #0表示禁用内存通道,等价于文件通道 a1.channels.c1.memoryCapacity = 0 #0,禁用文件通道,等价内存通道。 a1.channels.c1.overflowCapacity = 2000 a1.channels.c1.byteCapacity = 800000 a1.channels.c1.checkpointDir = /home/ubuntu/flume/fc_check a1.channels.c1.dataDirs = /home/ubuntu/flume/fc_data a1.sources.r1.channels=c1 a1.sinks.k1.channel=c1 b.[运行] $>flume-ng agent -f /soft/flume/conf/spilt.conf -n a1 -Dflume.root.logger=INFO,console
大数据之flume(一) --- Flume介绍,Source、Channel、Sink,安装flume,配置和使用flume
最新推荐文章于 2022-03-14 22:19:10 发布