流配置
单一代理流配置
案例1:通过flume来监控一个目录,当目录中有新文件时,将文件内容输出到控制台。
#文件名:sample1.properties
#配置内容:
分别在linux系统里面建两个文件夹:一个文件夹用于存储配置文件(flumetest),一个文件夹用于存储需要读取的文件(flume)
#监控指定的目录,如果有新文件产生,那么将文件的内容显示到控制台
#配置一个agent agent的名称可以自定义
#指定agent的 sources,sinks,channels
#分别指定 agent的 sources,sinks,channels 的名称 名称可以自定义
a1.sources=s1
a1.channels=c1
a1.sinks=k1
#配置 source 根据 agent的 sources 的名称来对 source 进行配置
#source 的参数是根据 不同的数据源 配置不同---在文档查找即可
#配置目录 source flume这个文件夹用于存储需要读取的文件
a1.sources.s1.type=spooldir
a1.sources.s1.spoolDir=/home/hadoop/apps/apache-flume-1.8.0-bin/flume
#配置 channel 根据 agent的 channels的名称来对 channels 进行配置
#配置内存 channel
a1.channels.c1.type=memory
#配置 sink 根据 agent的sinks 的名称来对 sinks 进行配置
#配置一个 logger sink
a1.sinks.k1.type=logger
#绑定 特别注意 source的channel 的绑定有 s,sink的 channel的绑定没有 s
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1
把 sample1.properties 配置文件上传到linux系统上的 flumetest 文件夹:
用这个命令来启动Flume:
bin/flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/flumetest/sample1.properties --name a1 -Dflume.root.logger=INFO,console
--conf 指定flume配置文件的位置
--conf-file 指定日志收集的配置文件
--name 指定agent的名称
-Dflume.root.logger=INFO,console 让收集的信息打印到控制台
启动部分结果:
18/05/05 20:28:16 INFO node.AbstractConfigurationProvider: Creating channels
18/05/05 20:28:16 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
18/05/05 20:28:16 INFO node.AbstractConfigurationProvider: Created channel c1
18/05/05 20:28:16 INFO source.DefaultSourceFactory: Creating instance of source s1, type spooldir
18/05/05 20:28:16 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger
18/05/05 20:28:16 INFO node.AbstractConfigurationProvider: Channel c1 connected to [s1, k1]
18/05/05 20:28:16 INFO node.Application: Starting new configuration:{ sourceRunners:{s1=EventDrivenSourceRunner: { source:Spool Directory source s1: { spoolDir: /home/hadoop/apps/apache-flume-1.8.0-bin/flume } }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@101f0f3a counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
18/05/05 20:28:16 INFO node.Application: Starting Channel c1
18/05/05 20:28:16 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
18/05/05 20:28:16 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
18/05/05 20:28:16 INFO node.Application: Starting Sink k1
18/05/05 20:28:16 INFO node.Application: Starting Source s1
18/05/05 20:28:16 INFO source.SpoolDirectorySource: SpoolDirectorySource source starting with directory: /home/hadoop/apps/apache-flume-1.8.0-bin/flume
18/05/05 20:28:17 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: SOURCE, name: s1: Successfully registered new MBean.
18/05/05 20:28:17 INFO instrumentation.MonitoredCounterGroup: Component type: SOURCE, name: s1 started
在liunx系统中新建一个文件 hello.txt
[hadoop@hadoop02 ~]$ vi hello.txt
hello
world
把这个文件复制到 存储读取文件的目录下:(这个配置文件所设置的文件夹)
a1.sources.s1.spoolDir=/home/hadoop/apps/apache-flume-1.8.0-bin/flume
使用命令:
[hadoop@hadoop02 ~]$ cp hello.txt ~/apps/apache-flume-1.8.0-bin/flume
读取结果:
18/05/05 20:30:10 INFO avro.ReliableSpoolingFileEventReader: Last read took us just up to a file boundary. Rolling to the next file, if there is one.
18/05/05 20:30:10 INFO avro.ReliableSpoolingFileEventReader: Preparing to move file /home/hadoop/apps/apache-flume-1.8.0-bin/flume/hello.txt to /home/hadoop/apps/apache-flume-1.8.0-bin/flume/hello.txt.COMPLETED
18/05/05 20:30:14 INFO sink.LoggerSink: Event: { headers:{} body: 68 65 6C 6C 6F hello }
18/05/05 20:30:14 INFO sink.LoggerSink: Event: { headers:{} body: 77 6F 72 6C 64 world }
案例2:tcp
#文件名:case_tcp.properties
#配置内容:(是在同一个节点上进行操作)
上传到hadoop02节点:
分别在linux系统里面建两个文件夹:一个文件夹用于存储配置文件(flumetest),一个文件夹用于存储需要读取的文件(flume)
#通过 avro source 读取指定端口的输入数据 到控制台显示。
a1.sources=s1
a1.channels=c1
a1.sinks=k1
a1.sources.s1.type=netcat
a1.sources.s1.bind=192.168.123.102
a1.sources.s1.port=55555
a1.channels.c1.type=memory
a1.sinks.k1.type=logger
a1.sources.s1.channels=c1
a1.sinks.k1.channel=c1
把 case_tcp.properties 配置文件上传到linux系统上的 flumetest 文件夹:
用这个命令来启动Flume:
bin/flume-ng agent --conf conf --conf-file /home/hadoop/apps/apache-flume-1.8.0-bin/flumetest/case_tcp.properties --name a1 -Dflume.root.logger=INFO,console
--conf 指定flume配置文件的位置
--conf-file 指定日志收集的配置文件
--name 指定agent的名称
-Dflume.root.logger=INFO,console 让收集的信息打印到控制台
启动后的部分结果:
op/apps/apache-flume-1.8.0-bin/flumetest/case_tcp.properties
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Added sinks: k1 Agent: a1
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Processing:k1
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Processing:k1
18/05/06 10:41:34 INFO conf.FlumeConfiguration: Post-validation flume configuration contains configuration for agents: [a1]
18/05/06 10:41:34 INFO node.AbstractConfigurationProvider: Creating channels
18/05/06 10:41:34 INFO channel.DefaultChannelFactory: Creating instance of channel c1 type memory
18/05/06 10:41:34 INFO node.AbstractConfigurationProvider: Created channel c1
18/05/06 10:41:34 INFO source.DefaultSourceFactory: Creating instance of source s1, type netcat
18/05/06 10:41:34 INFO sink.DefaultSinkFactory: Creating instance of sink: k1, type: logger
18/05/06 10:41:34 INFO node.AbstractConfigurationProvider: Channel c1 connected to [s1, k1]
18/05/06 10:41:34 INFO node.Application: Starting new configuration:{ sourceRunners:{s1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:s1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@738ed94d counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
18/05/06 10:41:34 INFO node.Application: Starting Channel c1
18/05/06 10:41:34 INFO instrumentation.MonitoredCounterGroup: Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
18/05/06 10:41:34 INFO instrumentation.MonitoredCounterGroup: Component type: CHANNEL, name: c1 started
18/05/06 10:41:34 INFO node.Application: Starting Sink k1
18/05/06 10:41:34 INFO node.Application: Starting Source s1
18/05/06 10:41:34 INFO source.NetcatSource: Source starting
18/05/06 10:41:34 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/192.168.123.102:55555]
打开一个相同节点的另一个窗口:
[hadoop@hadoop02 apache-flume-1.8.0-bin]$ telnet 192.168.123.102 55555
-bash: telnet: command not found
输入上面的命令发现找不到这个 telnet 这个组件,需要从yum 上下载:(需要切换至root用户下)
[hadoop@hadoop02 apache-flume-1.8.0-bin]$ su
Password:
[root@hadoop02 apache-flume-1.8.0-bin]# yum install telnet
Loaded plugins: fastestmirror, refresh-packagekit, security
Setting up Install Process
Determining fastest mirrors
epel/metalink | 6.2 kB 00:00
* base: mirrors.sohu.com
* epel: mirrors.tongji.edu.cn
* extras: mirror.bit.edu.cn
* updates: mirror.bit.edu.cn
base | 3.7 kB 00:00
epel | 4.7 kB 00:00
epel/primary_db | 6.0 MB 00:12
extras