flume source 简介及官方用例

独孤雨鸿

已于 2024-05-19 19:11:55 修改

阅读量716

点赞数 15

分类专栏： flume 文章标签： flume 大数据

于 2024-05-19 18:05:06 首次发布

本文链接：https://blog.csdn.net/v15220/article/details/139046072

版权

flume 专栏收录该内容

10 篇文章 0 订阅

订阅专栏

1、NetCat TCP Source

一个类似 netcat 的源，它侦听给定的端口并将每行文本转换为一个事件。类似于 nc -k -l [主机] [端口]。换句话说，它打开一个指定的端口并侦听数据。期望提供的数据是换行符分隔的文本。每行文本都会变成一个 Flume 事件，并通过连接的通道发送

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be netcat
bind	–	Host name or IP address to bind to
port	–	Port # to bind to

Example for agent named a1:

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = netcat

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 6666

a1.sources.r1.channels = c1

2、HTTP Source

通过 HTTP、POST 和 GET 接受 Flume 事件的源。GET 只能用于实验。HTTP 请求由可插入的“处理程序”转换为 flume 事件，该处理程序必须实现 HTTPSourceHandler 接口。此处理程序采用 HttpServletRequest 并返回 flume 事件列表。从一个 Http 请求处理的所有事件都在一个事务中提交到通道，从而可以提高通道（如文件通道）的效率。如果处理程序引发异常，则此源将返回 HTTP 状态 400。如果通道已满，或者源无法将事件追加到通道，则源将返回 HTTP 503 - 暂时不可用状态。

在一个 post 请求中发送的所有事件都被视为一个批次，并在一个事务中插入到通道中。

此源基于 Jetty 9.4，能够设置其他特定于 Jetty 的参数，这些参数将直接传递给 Jetty 组件。

Property Name	Default	Description
type		The component type name, needs to be http
port	–	The port the source should bind to.

An example http source for agent named a1:

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = http

a1.sources.r1.port = 5140

a1.sources.r1.channels = c1

a1.sources.r1.handler = org.example.rest.RestHandler

a1.sources.r1.handler.nickname = random props

a1.sources.r1.HttpConfiguration.sendServerVersion = false

a1.sources.r1.ServerConnector.idleTimeout = 300

3、Avro Source

监听Avro端口，从Avro client streams接收events。当与另一个（前一跳）Flume agent内置的Avro Sink配对时，它可以创建分层收集拓扑。

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be avro
bind	–	hostname or IP address to listen on
port	–	Port # to bind to

Example for agent named a1:

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

4、Thrift Source

侦听 Thrift 端口并接收来自外部 Thrift 客户端流的事件

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be thrift
bind	–	hostname or IP address to listen on
port	–	Port # to bind to

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = thrift

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

5、Exec Source

Exec 源代码在启动时运行给定的 Unix 命令，并期望该进程在标准输出时持续生成数据（除非属性 logStdErr 设置为 true，否则 stderr b不会作为输出）

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be exec
command	–	The command to execute
shell	–	A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc.

Example for agent named a1:

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /var/log/secure

a1.sources.r1.channels = c1

“shell”配置用于通过命令 shell（例如 Bash 或 Powershell）调用“命令”。“命令”作为参数传递给“shell”执行。这允许“命令”使用 shell 中的功能，例如通配符、反引号、管道、循环、条件等。在没有“shell”配置的情况下，将直接调用“命令”。“shell”的常用值：“/bin/sh -c”、“/bin/ksh -c”、“cmd /c”、“powershell -Command”等。

a1.sources.tailsource-1.type = exec

a1.sources.tailsource-1.shell = /bin/bash -c

a1.sources.tailsource-1.command = for i in /path/*.txt; do cat $i; done

6、Taildir Source（Windows中不可用）

监视指定的目录，并在检测到目录下每个文件有新追加时实时的更新。如果正在写入新行，则此源将重试读取它们，以等待写入完成。它定期以 JSON 格式将每个文件的最后读取位置写入给定位置文件。如果 Flume 由于某种原因停止或关闭，它可以实现断点续传，当指定路径上没有定位文件时，默认情况下，它将从每个文件的第一行开始上传。

此源不会重命名或删除或对被监听的文件进行任何修改。目前，此源不支持监听上传二进制文件。它逐行读取文本文件。

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be TAILDIR.
filegroups	–	Space-separated list of file groups. Each file group indicates a set of files to be tailed.
filegroups.<filegroupName>	–	Absolute path of the file group. Regular expression (and not file system patterns) can be used for filename only.

Example for agent named a1:

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = TAILDIR

a1.sources.r1.channels = c1

a1.sources.r1.positionFile = /var/log/flume/taildir_position.json

a1.sources.r1.filegroups = f1 f2

a1.sources.r1.filegroups.f1 = /var/log/test1/example.log

a1.sources.r1.headers.f1.headerKey1 = value1

a1.sources.r1.filegroups.f2 = /var/log/test2/.*log.*

a1.sources.r1.headers.f2.headerKey1 = value2

a1.sources.r1.headers.f2.headerKey2 = value2-2

a1.sources.r1.fileHeader = true

a1.sources.ri.maxBatchCount = 1000

Kafka Source

Kafka Source 是一个 Apache Kafka 消费者，用于读取来自 Kafka topic的消息。如果运行了多个 Kafka 源，则可以使用相同的消费者组来配置它们，以便每个源将读取主题的唯一分区集。目前支持 Kafka 服务器版本 0.10.1.0 或更高版本。测试完成到 2.0.1，这是发布时最高的可用版本。

Property Name	Default	Description
channels	–
type	–	The component type name, needs to be org.apache.flume.source.kafka.KafkaSource
kafka.bootstrap.servers	–	List of brokers in the Kafka cluster used by the source
kafka.consumer.group.id	flume	Unique identified of consumer group. Setting the same id in multiple sources or agents indicates that they are part of the same consumer group
kafka.topics	–	Comma-separated list of topics the Kafka consumer will read messages from.
kafka.topics.regex	–	Regex that defines set of topics the source is subscribed on. This property has higher priority than kafka.topics and overrides kafka.topics if exists.

Example for topic subscription by comma-separated topic list.

tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource

tier1.sources.source1.channels = channel1

tier1.sources.source1.batchSize = 5000

tier1.sources.source1.batchDurationMillis = 2000

tier1.sources.source1.kafka.bootstrap.servers = localhost:9092

tier1.sources.source1.kafka.topics = test1, test2

tier1.sources.source1.kafka.consumer.group.id = custom.g.id

Example for topic subscription by regex

tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource

tier1.sources.source1.channels = channel1

tier1.sources.source1.kafka.bootstrap.servers = localhost:9092

tier1.sources.source1.kafka.topics.regex = ^topic[0-9]$

# the default kafka.consumer.group.id=flume is used

独孤雨鸿

关注

15
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
flume source 简介及官方用例

一个类似 netcat 的源，它侦听给定的端口并将每行文本转换为一个事件。类似于 nc -k -l [主机] [端口]。换句话说，它打开一个指定的端口并侦听数据。期望提供的数据是换行符分隔的文本。每行文本都会变成一个 Flume 事件，并通过连接的通道发送Defaultchannelstypebindporta1.sourcesr1c1netcat0.0.0.06666c1。
复制链接

扫一扫