Flume.apache.org 官方文档学习笔记 part two

最新推荐文章于 2023-11-09 03:15:29 发布

Called_Kingsley

最新推荐文章于 2023-11-09 03:15:29 发布

阅读量194

点赞数

分类专栏： BigData 文章标签： flume hadoop

本文链接：https://blog.csdn.net/qq_40309183/article/details/83211536

版权

BigData 专栏收录该内容

28 篇文章 0 订阅

订阅专栏

配置个体组件：

当你定义了这个流之后，你需要去设置每个资源、接收器、信道的属性。这是在你设置组件类型和每个组件的特定属性值的同一层命名空间内完成的。

# properties for sources
<Agent>.sources.<Source>.<someProperty> = <someValue>

# properties for channels
<Agent>.channel.<Channel>.<someProperty> = <someValue>

# properties for sinks
<Agent>.sources.<Sink>.<someProperty> = <someValue>

每个组件的属性”type“都需要被设置，为了能让flume知道它需要成为哪一种对象。每个源（source）、接收器、信道、都有它自己的所需要的一系列属性能让他如期运行，所有这些都需要被设置为需要的。在前面的实例中，我们有一个流通过内存信道mem-channel-1，从avro-AppSrv-source到hdfs-Cluster1-sink.这里有一个例子展示了每个组件的配置：

agent_foo.sources = avro-AppSrv-source
agent_foo.sinks = hdfs-Cluster1-sink
agent_foo.channels = mem-channel-1

# set channel for sources, sinks

# properties of avro-AppSrv-source
agent_foo.sources.avro-AppSrv-source.type = avro
agent_foo.sources.avro-AppSrv-source.bind = localhost
agent_foo.sources.avro-AppSrv-source.port = 10000

# properties of mem-channel-1
agent_foo.channels.mem-channel-1.type = memory
agent_foo.channels.mem-channel-1.capacity = 1000
agent_foo.channels.mem-channel-1.transactionCapacity = 100

# properties of hdfs-Cluster1-sink
agent_foo.sinks.hdfs-Cluster1-sink.type = hdfs
agent_foo.sinks.hdfs-Cluster1-sink.hdfs.path = hdfs://namenode/flume/webdata

在代理中加入多个流：
   一个单独的Flume流代理可以包括好几个独立的流。你可以在配置文件中列出这些源，接收器，还有信道，这些组件可以连接起来组成多个流：
   # list the sources, sinks and channels for the agent
   <Agent>.sources = <Source1> <Source2>
   <Agent>.sinks = <Sink1> <Sink2>
   <Agent>.channels = <Channel1> <Channel2>

   然后，你可以将源以及接受器连接到他相应的通道，去启动两个不同的流，比如，如果你需要启动在一个代理中设置两个流，一个从外部的avro客户端到外部的hdfs，另一个从尾部输出到avro接收器，那么这儿就是这个的配置：
       # list the sources, sinks and channels in the agent
   agent_foo.sources = avro-AppSrv-source1 exec-tail-source2
   agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2
   agent_foo.channels = mem-channel-1 file-channel-2

   # flow #1 configuration
   agent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1
   agent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1

   # flow #2 configuration
   agent_foo.sources.exec-tail-source2.channels = file-channel-2
   agent_foo.sinks.avro-forward-sink2.channel = file-channel-2

配置一个多代理流程：
为了配置一个多代理流，你需要有一个的avro/thrift接收器指向下一跳的avro/thrift源。这将导致第一个flume代理向下一个流代理转发事件，比如：如果你定期使用avro客户端发送文件到一个本地流代理，那么这个代理就能转发文件到已安装存储的另一个代理。

       Weblog 代理配置：
       # list sources, sinks and channels in the agent
       agent_foo.sources = avro-AppSrv-source
       agent_foo.sinks = avro-forward-sink
       agent_foo.channels = file-channel

       # define the flow
       agent_foo.sources.avro-AppSrv-source.channels = file-channel
       agent_foo.sinks.avro-forward-sink.channel = file-channel

       # avro sink properties
       agent_foo.sinks.avro-forward-sink.type = avro
       agent_foo.sinks.avro-forward-sink.hostname = 10.1.1.100
       agent_foo.sinks.avro-forward-sink.port = 10000

# configure other pieces
#...

   HDFS 代理设置：
       # list sources, sinks and channels in the agent
       agent_foo.sources = avro-collection-source
       agent_foo.sinks = hdfs-sink
       agent_foo.channels = mem-channel

       # define the flow
       agent_foo.sources.avro-collection-source.channels = mem-channel
       agent_foo.sinks.hdfs-sink.channel = mem-channel

       # avro source properties
       agent_foo.sources.avro-collection-source.type = avro
       agent_foo.sources.avro-collection-source.bind = 10.1.1.100
       agent_foo.sources.avro-collection-source.port = 10000

# configure other pieces
#...

这里我们连接了avro-forward-sink 从weblog代理到 hdfs代理的avro-collection-source，这将会让来自外部appserver 源的事件最终存储在hdfs上。

扇出流量：
就如前一节所述，Flume支持从一个源扇出流量到不同的信道。这里有两个扇出的模式，复制和多路复用。在复制流中，事件被送到所有配置过的信道。如果是多路复用，这个事件就会被送到一些有资格的子集信道。为了扇出流，一个需要指定的是源的通道列表以及扇出它的策略。这是通过添加一个能够复制或者多路复用的信道选择器，如果它是一个多路复用选择器就要进一步指定选择规则。如果你不指定选择器，他就会默认复制：

       # List the sources, sinks and channels for the agent
       <Agent>.sources = <Source1>
       <Agent>.sinks = <Sink1> <Sink2>
       <Agent>.channels = <Channel1> <Channel2>

# set list of channels for source (separated by space)
<Agent>.sources.<Source1>.channels = <Channel1> <Channel2>

       # set channel for sinks
       <Agent>.sinks.<Sink1>.channel = <Channel1>
       <Agent>.sinks.<Sink2>.channel = <Channel2>

<Agent>.sources.<Source1>.selector.type = replicating

       多路复用选择有更进一步的一系列属性去分叉流。这需要指定一个事件属性到通道的映射。选择器在事件头部检查每一个配置的属性。如果它匹配到了设定的值，那么这个事件就会被送到所有的映射到这个值的信道。如果没有匹配，那么这个事件就会被送到一系列默认的信道：
       # Mapping for multiplexing selector
       <Agent>.sources.<Source1>.selector.type = multiplexing
       <Agent>.sources.<Source1>.selector.header = <someHeader>
       <Agent>.sources.<Source1>.selector.mapping.<Value1> = <Channel1>
       <Agent>.sources.<Source1>.selector.mapping.<Value2> = <Channel1> <Channel2>
       <Agent>.sources.<Source1>.selector.mapping.<Value3> = <Channel2>
       #...

<Agent>.sources.<Source1>.selector.default = <Channel2>

映射允许为每个值重叠通道。

以下示例具有多路复用到两个路径的单个流。名为agent_foo的代理具有一个avro 源和两个连接到两个接收器的信道：

       ＃列出代理中的源，接收器和通道
       agent_foo.sources = avro-AppSrv-source1
       agent_foo.sinks = hdfs-Cluster1-sink1 avro-forward-sink2
       agent_foo.channels = mem-channel-1 file-channel-2

＃set源
代理的通道agent_foo.sources.avro-AppSrv-source1.channels = mem-channel-1 file-channel-2

       ＃为sinks设置通道
       agent_foo.sinks.hdfs-Cluster1-sink1.channel = mem-channel-1
       agent_foo.sinks .avro-forward-sink2.channel = file-channel-2

       ＃channel selector configuration
       agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing
       agent_foo.sources.avro-AppSrv-source1.selector.header = State
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1
       agent_foo.sources.avro-AppSrv-source1.selector.mapping。 AZ = file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel -1

这个选择器检查叫做“State”的头部。如果这个值是“CA”，那么它就会被送到mem-channel-1，如果他是“AZ”,那么他就会被送到file-channel-2，或者如果值是“NY”,那么两个地方都会送到，如果“State”头部没有设置或者没有匹配到这里的任何一个值，那么它会到被设置为默认的mem-channel-1。

       # channel selector configuration
       agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing
       agent_foo.sources.avro-AppSrv-source1.selector.header = State
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.optional.CA = mem-channel-1 file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1

这个选择器也支持可选信道。为头部指定可选信道，配置参数 ‘optional’ 用下面的方法来使用：

# channel selector configuration

       agent_foo.sources.avro-AppSrv-source1.selector.type = multiplexing
       agent_foo.sources.avro-AppSrv-source1.selector.header = State
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.CA = mem-channel-1
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.NY = mem-channel-1 file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.optional.CA = mem-channel-1 file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.mapping.AZ = file-channel-2
       agent_foo.sources.avro-AppSrv-source1.selector.default = mem-channel-1

这个选择器会尝试首先去写入到需求的那些信道，如果其中一个信道失败地消耗了事件，那么这个事务就会失败。这个事务会重新尝试所有的信道。只要所有需求的信道都消耗了这些事件，那么这个选择器就会尝试去写入到那些可选信道。任何可选信道使用该事件的失败都会被忽略而不会重视。
如果有可选信道和特定报头的需求信道之间有重叠，这个信道就会被认为是需要的，信道的失败会导致所有的所需信道重新尝试。例如，在上面的例子中，因为头部是“CA”的mem-channel-1 被当做是所需信道，即使他被标记为同为所需信道和可选信道，写入这个信道的失败会造成事件重新尝试去发送所有被配置给选择器的信道。

注意一下，如果一个头部没有任何的需求信道，然后这些事件就会被写入到默认信道，还会被写入到为这个报头准备的可选信道。如果没有指定需求信道，指定可选信道会持续造成这个事件写入到默认信道。如果没有信道被指定为默认信道而且也没有指定需求信道，那么这个选择器就会尝试去写入这个事件到可选信道。每个失败都会被忽略。