1、NetCat TCP Source
一个类似 netcat 的源,它侦听给定的端口并将每行文本转换为一个事件。类似于 nc -k -l [主机] [端口]。换句话说,它打开一个指定的端口并侦听数据。期望提供的数据是换行符分隔的文本。每行文本都会变成一个 Flume 事件,并通过连接的通道发送
Property Name | Default | Description |
channels | – | |
type | – | The component type name, needs to be netcat |
bind | – | Host name or IP address to bind to |
port | – | Port # to bind to |
Example for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = netcat
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 6666
a1.sources.r1.channels = c1
2、HTTP Source
通过 HTTP、POST 和 GET 接受 Flume 事件的源。GET 只能用于实验。HTTP 请求由可插入的“处理程序”转换为 flume 事件,该处理程序必须实现 HTTPSourceHandler 接口。此处理程序采用 HttpServletRequest 并返回 flume 事件列表。从一个 Http 请求处理的所有事件都在一个事务中提交到通道,从而可以提高通道(如文件通道)的效率。如果处理程序引发异常,则此源将返回 HTTP 状态 400。如果通道已满,或者源无法将事件追加到通道,则源将返回 HTTP 503 - 暂时不可用状态。
在一个 post 请求中发送的所有事件都被视为一个批次,并在一个事务中插入到通道中。
此源基于 Jetty 9.4,能够设置其他特定于 Jetty 的参数,这些参数将直接传递给 Jetty 组件。
Property Name | Default | Description |
type | The component type name, needs to be http | |
port | – | The port the source should bind to. |
An example http source for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = http
a1.sources.r1.port = 5140
a1.sources.r1.channels = c1
a1.sources.r1.handler = org.example.rest.RestHandler
a1.sources.r1.handler.nickname = random props
a1.sources.r1.HttpConfiguration.sendServerVersion = false
a1.sources.r1.ServerConnector.idleTimeout = 300
3、Avro Source
监听Avro端口,从Avro client streams接收events。当与另一个(前一跳)Flume agent内置的Avro Sink配对时,它可以创建分层收集拓扑。
Property Name | Default | Description |
channels | – | |
type | – | The component type name, needs to be avro |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
Example for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = avro
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
4、Thrift Source
侦听 Thrift 端口并接收来自外部 Thrift 客户端流的事件
Property Name | Default | Description |
channels | – | |
type | – | The component type name, needs to be thrift |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = thrift
a1.sources.r1.channels = c1
a1.sources.r1.bind = 0.0.0.0
a1.sources.r1.port = 4141
5、Exec Source
Exec 源代码在启动时运行给定的 Unix 命令,并期望该进程在标准输出时持续生成数据(除非属性 logStdErr 设置为 true,否则 stderr b不会作为输出)
Property Name | Default | Description |
channels | – | |
type | – | The component type name, needs to be exec |
command | – | The command to execute |
shell | – | A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc. |
Example for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /var/log/secure
a1.sources.r1.channels = c1
“shell”配置用于通过命令 shell(例如 Bash 或 Powershell)调用“命令”。“命令”作为参数传递给“shell”执行。这允许“命令”使用 shell 中的功能,例如通配符、反引号、管道、循环、条件等。在没有“shell”配置的情况下,将直接调用“命令”。“shell”的常用值:“/bin/sh -c”、“/bin/ksh -c”、“cmd /c”、“powershell -Command”等。
a1.sources.tailsource-1.type = exec
a1.sources.tailsource-1.shell = /bin/bash -c
a1.sources.tailsource-1.command = for i in /path/*.txt; do cat $i; done
6、Taildir Source(Windows中不可用)
监视指定的目录,并在检测到目录下每个文件有新追加时实时的更新。如果正在写入新行,则此源将重试读取它们,以等待写入完成。它定期以 JSON 格式将每个文件的最后读取位置写入给定位置文件。如果 Flume 由于某种原因停止或关闭,它可以实现断点续传,当指定路径上没有定位文件时,默认情况下,它将从每个文件的第一行开始上传。
此源不会重命名或删除或对被监听的文件进行任何修改。目前,此源不支持监听上传二进制文件。它逐行读取文本文件。
Property Name | Default | Description |
channels | – | |
type | – | The component type name, needs to be TAILDIR. |
filegroups | – | Space-separated list of file groups. Each file group indicates a set of files to be tailed. |
filegroups.<filegroupName> | – | Absolute path of the file group. Regular expression (and not file system patterns) can be used for filename only. |
Example for agent named a1:
a1.sources = r1
a1.channels = c1
a1.sources.r1.type = TAILDIR
a1.sources.r1.channels = c1
a1.sources.r1.positionFile = /var/log/flume/taildir_position.json
a1.sources.r1.filegroups = f1 f2
a1.sources.r1.filegroups.f1 = /var/log/test1/example.log
a1.sources.r1.headers.f1.headerKey1 = value1
a1.sources.r1.filegroups.f2 = /var/log/test2/.*log.*
a1.sources.r1.headers.f2.headerKey1 = value2
a1.sources.r1.headers.f2.headerKey2 = value2-2
a1.sources.r1.fileHeader = true
a1.sources.ri.maxBatchCount = 1000
Kafka Source
Kafka Source 是一个 Apache Kafka 消费者,用于读取来自 Kafka topic的消息。如果运行了多个 Kafka 源,则可以使用相同的消费者组来配置它们,以便每个源将读取主题的唯一分区集。目前支持 Kafka 服务器版本 0.10.1.0 或更高版本。测试完成到 2.0.1,这是发布时最高的可用版本。
Property Name | Default | Description |
channels | – | |
type | – | The component type name, needs to be org.apache.flume.source.kafka.KafkaSource |
kafka.bootstrap.servers | – | List of brokers in the Kafka cluster used by the source |
kafka.consumer.group.id | flume | Unique identified of consumer group. Setting the same id in multiple sources or agents indicates that they are part of the same consumer group |
kafka.topics | – | Comma-separated list of topics the Kafka consumer will read messages from. |
kafka.topics.regex | – | Regex that defines set of topics the source is subscribed on. This property has higher priority than kafka.topics and overrides kafka.topics if exists. |
Example for topic subscription by comma-separated topic list.
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
tier1.sources.source1.channels = channel1
tier1.sources.source1.batchSize = 5000
tier1.sources.source1.batchDurationMillis = 2000
tier1.sources.source1.kafka.bootstrap.servers = localhost:9092
tier1.sources.source1.kafka.topics = test1, test2
tier1.sources.source1.kafka.consumer.group.id = custom.g.id
Example for topic subscription by regex
tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource
tier1.sources.source1.channels = channel1
tier1.sources.source1.kafka.bootstrap.servers = localhost:9092
tier1.sources.source1.kafka.topics.regex = ^topic[0-9]$
# the default kafka.consumer.group.id=flume is used