Flume Source
Avro Source
监听avro端口,从外部avro client接收事件。当前一个agent的sink是avro类型时,可以构建多级agent。加粗的是必选熟悉。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be avro |
bind | – | hostname or IP address to listen on |
port | – | Port # to bind to |
threads | – | Maximum number of worker threads to spawn |
selector.type | ||
selector.* | ||
interceptors | – | Space-separated list of interceptors |
interceptors.* | ||
compression-type | none | This can be “none” or “deflate”. The compression-type must match the compression-type of matching AvroSource |
ssl | false | Set this to true to enable SSL encryption. You must also specify a “keystore” and a “keystore-password”. |
keystore | – | This is the path to a Java keystore file. Required for SSL. |
keystore-password | – | The password for the Java keystore. Required for SSL. |
keystore-type | JKS | The type of the Java keystore. This can be “JKS” or “PKCS12”. |
exclude-protocols | SSLv3 | Space-separated list of SSL/TLS protocols to exclude. SSLv3 will always be excluded in addition to the protocols specified. |
ipFilter | false | Set this to true to enable ipFiltering for netty |
ipFilterRules | – | Define N netty ipFilter pattern rules with this config. |
它可以配置N个netty类型的ipFilter,逗号分隔。每个rule需要符合下面的格式。
<’allow’ or deny>:<’ip’ or ‘name’ for computer name>:<pattern> or allow/deny:ip/name:pattern
举例: ipFilterRules=allow:ip:127.*,allow:name:localhost,deny:ip:*
解释:
"allow:name:localhost,deny:ip:" : 这个会允许本地的客户端,拒绝其他ip的客户端
“deny:name:localhost,allow:ip:“ : 这个会拒绝本地客户端,运行其他ip的客户端
Thrift Source
说明同上。thrift source可以通过启用kerberos authentication,来使用安全模式启动。agent-principal, agent-keytab是用来配置这个的属性。加粗的是必选属性。
简单示例
Exec Sourceexec source在启动的时候执行unix command,把数据源源不断的输出的到标准输出(stderr的输出直接被丢弃,除非把logStdErr设成true)。如果进程由于某些原因退出,则source也会推出导致无法继续产生数据。所以比如cat [named pipe]或者tail -F [file]可以产生持续的数据,而date 则不能。前2个命令产生持续的数据,而后面的命令仅产生单一数据然后退出。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be exec |
command | – | The command to execute |
shell | – | A shell invocation used to run the command. e.g. /bin/sh -c. Required only for commands relying on shell features like wildcards, back ticks, pipes etc. |
restartThrottle | 10000 | Amount of time (in millis) to wait before attempting a restart |
restart | false | Whether the executed cmd should be restarted if it dies |
logStdErr | false | Whether the command’s stderr should be logged |
batchSize | 20 | The max number of lines to read and send to the channel at a time |
batchTimeout | 3000 | Amount of time (in milliseconds) to wait, if the buffer size was not reached, before data is pushed downstream |
selector.type | replicating | replicating or multiplexing |
selector.* | Depends on the selector.type value | |
interceptors | – | Space-separated list of interceptors |
interceptors.* |
注:命令行tail -F [file], -F参数更好因为file rolling之后依然有效。(比如log文件每天生成新的,则能独到新的文件)
简单示例
shell是用来配置shell类型(bash或者powershell,跟写shell时候一样,可以指定不同的shell,然后利用不同的特性)
常用的值有‘/bin/sh -c’, ‘/bin/ksh -c’, ‘cmd /c’, ‘powershell -Command’, etc.
JMS Source
jms source可以从queue或者topic读数据。作为jms app,必须和jms provider一起使用,不过目前只测试了activemq。jms source提供了可配置的一些属性,见下面。第三方提供的jms jar包有3种方式添加到flume classpath,
plugins.d目录(推荐),命令行添加-classpath,在flume-env.sh的FLUME_CLASSPATH变量。加粗为必选属性。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be jms |
initialContextFactory | – | Inital Context Factory, e.g: org.apache.activemq.jndi.ActiveMQInitialContextFactory |
connectionFactory | – | The JNDI name the connection factory shoulld appear as |
providerURL | – | The JMS provider URL |
destinationName | – | Destination name |
destinationType | – | Destination type (queue or topic) |
messageSelector | – | Message selector to use when creating the consumer |
userName | – | Username for the destination/provider |
passwordFile | – | File containing the password for the destination/provider |
batchSize | 100 | Number of messages to consume in one batch |
converter.type | DEFAULT | Class to use to convert messages to flume events. See below. |
converter.* | – | Converter properties. |
converter.charset | UTF-8 | Default converter only. Charset to use when converting JMS TextMessages to byte arrays. |
jms source支持可插拔的converter。默认情况,默认的converter即可满足使用。jms message的属性会添加到flume event的头消息里面。
ByteMessage
byte会复制到event的消息体,最大2G
TextMessage
转成byte数组复制到event的消息体,UTF-8是默认字符集,可配置。
ObjectMessage
转成输出流(ByteArrayOutputStream)复制到event消息体。
简单示例
Spooling Source
这种数据源是在“spooling”的目录里面放置文件,然后获取数据。这种source会监测目录,看是否有新文件,新文件到达的时候会发送事件。处理逻辑是可配的,当文件被完全读入channel后,可以重命名文件或删掉。
其他略。
Twitter 1% firehose Source(测试阶段/不稳定阶段)
略
Kafka Source
kafka source就是kafka的消费者,从topic读数据,如果有多个kafka source,把他们配到同一个consumer group。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be org.apache.flume.source.kafka,KafkaSource |
zookeeperConnect | – | URI of ZooKeeper used by Kafka cluster |
groupId | flume | Unique identified of consumer group. Setting the same id in multiple sources or agents indicates that they are part of the same consumer group |
topic | – | Kafka topic we’ll read messages from. At the time, this is a single topic only. |
batchSize | 1000 | Maximum number of messages written to Channel in one batch |
batchDurationMillis | 1000 | Maximum time (in ms) before a batch will be written to Channel The batch will be written whenever the first of size and time will be reached. |
backoffSleepIncrement | 1000 | Initial and incremental wait time that is triggered when a Kafka Topic appears to be empty. Wait period will reduce aggressive pinging of an empty Kafka Topic. One second is ideal for ingestion use cases but a lower value may be required for low latency operations with interceptors. |
maxBackoffSleep | 5000 | Maximum wait time that is triggered when a Kafka Topic appears to be empty. Five seconds is ideal for ingestion use cases but a lower value may be required for low latency operations with interceptors. |
Other Kafka Consumer Properties | – | These properties are used to configure the Kafka Consumer. Any producer property supported by Kafka can be used. The only requirement is to prepend the property name with the prefix kafka.. For example: kafka.consumer.timeout.ms Check Kafka documentation <https://kafka.apache.org/08/configuration.html#consumerconfigs> for details |
简单示例
Netcat source
netcat命令类似的source,监听一个端口,把到达的每行数据转成时间发送。类似nc -k -l [host] [port]。换句话说,它打开一个具体的端口然后监听数据。预期的结果是到达的数据的单独的行文本,每行文本转成flume事件然后通过链接的通道发送。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be netcat |
bind | – | Host name or IP address to bind to |
port | – | Port # to bind to |
max-line-length | 512 | Max line length per event body (in bytes) |
ack-every-event | true | Respond with an “OK” for every event received |
selector.type | replicating | replicating or multiplexing |
selector.* | Depends on the selector.type value | |
interceptors | – | Space-separated list of interceptors 空格分隔 |
interceptors.* |
简单示例
计数器从0开始,每次递增1,连续产生事件。一般测试使用。
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be seq |
selector.type | replicating or multiplexing | |
selector.* | replicating | Depends on the selector.type value |
interceptors | – | Space-separated list of interceptors |
interceptors.* | ||
batchSize | 1 |
简单示例
UDP source把整个消息当成一个事件,TCP source把每一行创建一个事件(“\n”)
略
HTTP Source
通过http post或者get接收flume event,get应该用在测试使用。http请求会被实现了HTTPSourceHandler接口的“handler”转成flume 事件,这个handler接收HttpServletRequest,返回flume event list,在http请求里面处理的所以事件会在一个事务里面写入channel,这可以提供channel的效率,比如说文件通道。如果handler抛出异常,source会返回400.如果channel满了或者source不能在channel追加事件,source会返回503.
一个post请求的所有事件被认为是一个批量,且会在一个事务里写入channel。
Property Name | Default | Description |
---|---|---|
type | The component type name, needs to be http | |
port | – | The port the source should bind to. |
bind | 0.0.0.0 | The hostname or IP address to listen on |
handler | org.apache.flume.source.http.JSONHandler | The FQCN of the handler class. |
handler.* | – | Config parameters for the handler |
selector.type | replicating | replicating or multiplexing |
selector.* | Depends on the selector.type value | |
interceptors | – | Space-separated list of interceptors |
interceptors.* | ||
enableSSL | false | Set the property true, to enable SSL. HTTP Source does not support SSLv3. |
excludeProtocols | SSLv3 | Space-separated list of SSL/TLS protocols to exclude. SSLv3 is always excluded. |
keystore | Location of the keystore includng keystore file name | |
keystorePassword Keystore password |
简单示例
发送的数据格式需要是json数组的形式,即使只有一组数据。示例格式数据如下
测试:
curl -X POST -H 'Content-Type: application/json; charset=UTF-8' -d '[{ "headers" : { "timestamp" : "434324343", "host" : "random_host.example.com" }, "body" : "random_body123456789" }, { "headers" : { "namenode" : "namenode.example.com", "datanode" : "random_datanode.example.com" }, "body" : "really_random_body" }]' http://localhost:44444/
在agent的启动窗口显示
2016-05-26 17:29:12,494 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{timestamp=434324343, host=random_host.example.com} body: 72 61 6E 64 6F 6D 5F 62 6F 64 79 31 32 33 34 35 random_body12345 }
2016-05-26 17:29:12,494 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.LoggerSink.process(LoggerSink.java:94)] Event: { headers:{namenode=namenode.example.com, datanode=random_datanode.example.com} body: 72 65 61 6C 6C 79 5F 72 61 6E 64 6F 6D 5F 62 6F really_random_bo }
打印的body并不是传的值,原因是LoggerSink里面有设置打印的长度,默认16,可更改。
BlobHandler
处理如pdf,jpg等,但受限于内存,因为会都加载进来。
Stress Source
压力测试用的,后续再说吧,略
Legacy Source
版本兼容用的,略
Custom Source
我们自己实现Source接口,agent启动的时候,jar和依赖jar必须包括在agent的classpath里面,
Property Name | Default | Description |
---|---|---|
channels | – | |
type | – | The component type name, needs to be your FQCN |
selector.type | replicating or multiplexing | |
selector.* | replicating | Depends on the selector.type value |
interceptors | – | Space-separated list of interceptors |
interceptors.* |
Example for agent named a1:
Scribe Source
略