Flume学习(一)

本文详细介绍了Flume的使用,包括单一代理流配置的案例,如监控目录并输出文件内容,TCP数据传输,Avro数据处理。接着讲解了实时从web服务器读取数据到HDFS,以及多代理流程配置,最后讨论了Flume的扇出流(复制和复用)概念及其配置。
摘要由CSDN通过智能技术生成






流配置
单一代理流配置

案例1:通过flume来监控一个目录,当目录中有新文件时,将文件内容输出到控制台。

#文件名:sample1.properties

#配置内容:

分别在linux系统里面建两个文件夹:一个文件夹用于存储配置文件(flumetest),一个文件夹用于存储需要读取的文件(flume)

#监控指定的目录,如果有新文件产生,那么将文件的内容显示到控制台
#配置一个agent agent的名称可以自定义
#指定agent的 sources,sinks,channels
#分别指定 agent的 sources,sinks,channels 的名称 名称可以自定义
a1.sources=s1
a1.channels=c1
a1.sinks=k1

#配置 source 根据 agent的 sources 的名称来对 source 进行配置
#source 的参数是根据 不同的数据源 配置不同---在文档查找即可
#配置目录 source  flume这个文件夹用于存储需要读取的文件
a1.sources.s1.type=spooldir
a1.sources.s1.spoolDir=/home/hadoop/apps/apache-flume-1.8.0-bin/flume

#配置 channel 根据 agent的 channels的名称来对 channels 进行配置
#配置内存 channel
a1.channels.c1.type=memory

#配置 sink 根据 agent的sinks 的名称来对 sinks 进行配置
#配置一个 logger sink
a1.sinks.k1.type=logger

#绑定 特别注意 source的channel 的绑定有 s,sink的 channel的绑定没有 s
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1

把 sample1.properties 配置文件上传到linux系统上的 flumetest 文件夹:

用这个命令来启动Flume:

bin/flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/flumetest/sample1.properties --name a1 -Dflume.root.logger=INFO,console
--conf 指定flume配置文件的位置
--conf-file 指定日志收集的配置文件
--name 指定agent的名称
-Dflume.root.logger=INFO,console 让收集的信息打印到控制台

启动部分结果:

18/05/05 20:28:16 INFO node.AbstractConfigurationProvider: Creating channels
18/05/05 20:28:16 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
18/05/05 20:28:16 INFO node.AbstractConfigurationProvider: Created channel c1
18/05/05 20:28:16 INFO source.DefaultSourceFactory: Creating instance of source s1, type spooldir
18/05/05 20:28:16 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger
18/05/05 20:28:16 INFO node.AbstractConfigurationProvider: Channel c1 connected to [s1, k1]
18/05/05 20:28:16 INFO node.Application: Starting new configuration:{ sourceRunners:{s1=EventDrivenSourceRunner: { source:Spool Directory source s1: { spoolDir: /home/hadoop/apps/apache-flume-1.8.0-bin/flume } }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@101f0f3a counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
18/05/05 20:28:16 INFO node.Application: Starting Channel c1
18/05/05 20:28:16 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
18/05/05 20:28:16 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
18/05/05 20:28:16 INFO node.Application: Starting Sink k1
18/05/05 20:28:16 INFO node.Application: Starting Source s1
18/05/05 20:28:16 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /home/hadoop/apps/apache-flume-1.8.0-bin/flume
18/05/05 20:28:17 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: s1: Successfully registered new MBean.
18/05/05 20:28:17 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: s1 started

在liunx系统中新建一个文件 hello.txt

[hadoop@hadoop02 ~]$ vi hello.txt 
hello
world

把这个文件复制到 存储读取文件的目录下:(这个配置文件所设置的文件夹)

a1.sources.s1.spoolDir=/home/hadoop/apps/apache-flume-1.8.0-bin/flume

使用命令:

[hadoop@hadoop02 ~]$ cp hello.txt ~/apps/apache-flume-1.8.0-bin/flume

读取结果:

18/05/05 20:30:10 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
18/05/05 20:30:10 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/hadoop/apps/apache-flume-1.8.0-bin/flume/hello.txt to /home/hadoop/apps/apache-flume-1.8.0-bin/flume/hello.txt.COMPLETED
18/05/05 20:30:14 INFO sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F                                  hello }
18/05/05 20:30:14 INFO sink.LoggerSink: Event: { headers:{} body: 77 6F 72 6C 64                                  world }


案例2:tcp

#文件名:case_tcp.properties

#配置内容:(是在同一个节点上进行操作)

上传到hadoop02节点:

分别在linux系统里面建两个文件夹:一个文件夹用于存储配置文件(flumetest),一个文件夹用于存储需要读取的文件(flume)

#通过 avro source 读取指定端口的输入数据  到控制台显示。
a1.sources=s1
a1.channels=c1
a1.sinks=k1

a1.sources.s1.type=netcat
a1.sources.s1.bind=192.168.123.102
a1.sources.s1.port=55555

a1.channels.c1.type=memory
a1.sinks.k1.type=logger


a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1

把 case_tcp.properties 配置文件上传到linux系统上的 flumetest 文件夹:

用这个命令来启动Flume:

bin/flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/flumetest/case_tcp.properties --name a1 -Dflume.root.logger=INFO,console
--conf 指定flume配置文件的位置
--conf-file 指定日志收集的配置文件
--name 指定agent的名称
-Dflume.root.logger=INFO,console 让收集的信息打印到控制台

启动后的部分结果:

op/apps/apache-flume-1.8.0-bin/flumetest/case_tcp.properties
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Processing:k1
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Processing:k1
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
18/05/06 10:41:34 INFO node.AbstractConfigurationProvider: Creating channels
18/05/06 10:41:34 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
18/05/06 10:41:34 INFO node.AbstractConfigurationProvider: Created channel c1
18/05/06 10:41:34 INFO source.DefaultSourceFactory: Creating instance of source s1, type netcat
18/05/06 10:41:34 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger
18/05/06 10:41:34 INFO node.AbstractConfigurationProvider: Channel c1 connected to [s1, k1]
18/05/06 10:41:34 INFO node.Application: Starting new configuration:{ sourceRunners:{s1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:s1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@738ed94d counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
18/05/06 10:41:34 INFO node.Application: Starting Channel c1
18/05/06 10:41:34 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
18/05/06 10:41:34 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
18/05/06 10:41:34 INFO node.Application: Starting Sink k1
18/05/06 10:41:34 INFO node.Application: Starting Source s1
18/05/06 10:41:34 INFO source.NetcatSource: Source starting
18/05/06 10:41:34 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.123.102:55555]

打开一个相同节点的另一个窗口:

[hadoop@hadoop02 apache-flume-1.8.0-bin]$ telnet 192.168.123.102 55555     
-bash: telnet: command not found

输入上面的命令发现找不到这个 telnet 这个组件,需要从yum 上下载:(需要切换至root用户下)

[hadoop@hadoop02 apache-flume-1.8.0-bin]$ su
Password: 
[root@hadoop02 apache-flume-1.8.0-bin]# yum install telnet
Loaded plugins: fastestmirror, refresh-packagekit, security
Setting up Install Process
Determining fastest mirrors
epel/metalink                                                                           | 6.2 kB     00:00     
 * base: mirrors.sohu.com
 * epel: mirrors.tongji.edu.cn
 * extras: mirror.bit.edu.cn
 * updates: mirror.bit.edu.cn
base                                                                                    | 3.7 kB     00:00     
epel                                                                                    | 4.7 kB     00:00     
epel/primary_db                                                                         | 6.0 MB     00:12     
extras                                                                            
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值