Flume

kismetG

于 2019-12-06 11:09:26 发布

阅读量1.5k

点赞数 3

分类专栏： Flume 文章标签： Flume内部组成 Flume安装部署 Flume 故障转移 Flume负载均衡（load balancer） Flume Source详解

本文链接：https://blog.csdn.net/weixin_44036154/article/details/103417537

版权

Flume 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

什么是Flume

一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统

Flume内部组成

1 、 Source ：与数据源对接，用于采集、收集数据

2 、 Channel : 用于数据传输（在 flflumeAgent 内部）

3 、 Sink : 用户数据的发送或数据下沉（在 flflumeAgent 内部）

Flume安装部署

1 、将安装包上传并解压

2 、 cp flflume-env.sh.template flflume-env.sh

3 、编辑 flflume-env.sh ，配置java_home

编写一个flflume配置文件的过程

1、实例单个角色

a1.sources = r1

a1.channels = c1

a1.sinks = k1

2、三个角色配置

a1.sources.r1.type = netcat

a1.sources.r1.bind = 192.168.100.211

a1.sources.r1.port = 44444

a1.channels.c1.type = memory

a1.channels.c1.capacity = 1000

a1.channels.c1.transactionCapacity = 100

a1.sinks.k1.type = logger

3 、建立三者之间的关系

a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

启动配置文件

bin/flume-ng agent -c conf -f conf/netcat-logger.conf -n a1 -Dflume.root.logger=INFO,console

-c conf 指定flume自身的配置文件所在目录

-f conf/netcat-logger.con 指定我们所描述的采集方案

-n a1 指定我们这个agent的名字

案例：

接收数据包

1、实例单个角色

2、三个角色配置

a1.sources.r1.type = netcat

a1.sources.r1.bind = 192.168.100.211

a1.sources.r1.port = 44444

a1.sinks.k1.type = logger

3 、建立三者之间的关系

在另一个节点安装telnet ,（yum install -y telnet ）

并使用telnet向192.168.100.211的44444 端口发送数据

监控目录

1、实例单个角色

2、三个角色配置

a1.sources.r1.type=spooldir

a1.sources.r1.spoolDir=/export/dir

a1.sources.r1.fileHeader = true

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/

3 、建立三者之间的关系

被监控的目录已经梳理过的数据会加一个后缀，没有梳理过的数据没有后缀。

收集文件新数据

1、实例单个角色

2、三个角色配置

a1.sources.r1.type=exec

a1.sources.r1.command =tail -F /export/taillogs/access_log

a1.sinks.k1.type=hdfs

a1.sinks.k1.hdfs.path=hdfs://node01:8020/spooldir/

3 、建立三者之间的关系

两个agent级联

在第一个节点采集数据，将数据发送到第二个节点，第二个节点将数据写入HDFS

节点1

1、实例单个角色

2、三个角色配置

a1.sources.r1.type=exec

a1.sources.r1.command =tail -F /export/taillogs/access_log

a1.sinks.k1.type = avro

a1.sinks.k1.hostname = 192.168.100.212

a1.sinks.k1.port = 4141

3 、建立三者之间的关系

节点2

1、实例单个角色

2、三个角色配置

a1.sources.r1.type = avro

a1.sources.r1.bind = 192.168.100.212

a1.sources.r1.port = 4141

3 、建立三者之间的关系

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://node01:8020/avro

故障转移

在node01 如下配置

1、实例单个角色

agent1.channels = c1

agent1.sources = r1

agent1.sinks = k1 k2

agent1.sinkgroups = g1

agent1.sinkgroups.g1.sinks = k1 k2

2、三个角色配置

设置sink优先级

agent1.sinkgroups.g1.processor.type = failover

agent1.sinkgroups.g1.processor.priority.k1 = 2

agent1.sinkgroups.g1.processor.priority.k2 = 1

agent1.sinkgroups.g1.processor.maxpenalty = 10000

3 、建立三者之间的关系

在node02，在node03 如下配置

1、实例单个角色

2、三个角色配置

3 、建立三者之间的关系

负载均衡（load balancer）

在node01 如下配置

a1.sinkgroups.g1.processor.type = load_balance

a1.sinkgroups.g1.processor.backoff = true

a1.sinkgroups.g1.processor.selector = round_robin

a1.sinkgroups.g1.processor.selector.maxTimeOut=10000

在node02，在node03 如下配置

1、实例单个角色

2、三个角色配置

3 、建立三者之间的关系

过滤器

node01 node02节点做的事相同，配置如下

1、实例三个角色

a1.sources = r1 r2 r3

a1.sinks = k1

a1.channels = c1

2、三个角色配置

# Describe/configure the source

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /export/taillogs/access.log

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = static

## static拦截器的功能就是往采集到的数据的header中插入自己定## 义的key-value对

a1.sources.r1.interceptors.i1.key = type

a1.sources.r1.interceptors.i1.value = access

a1.sources.r2.type = exec

a1.sources.r2.command = tail -F /export/taillogs/nginx.log

a1.sources.r2.interceptors = i2

a1.sources.r2.interceptors.i2.type = static

a1.sources.r2.interceptors.i2.key = type

a1.sources.r2.interceptors.i2.value = nginx

a1.sources.r3.type = exec

a1.sources.r3.command = tail -F /export/taillogs/web.log

a1.sources.r3.interceptors = i3

a1.sources.r3.interceptors.i3.type = static

a1.sources.r3.interceptors.i3.key = type

a1.sources.r3.interceptors.i3.value = web

3 、建立三者之间的关系

a1.sources.r1.channels = c1

a1.sources.r2.channels = c1

a1.sources.r3.channels = c1

a1.sinks.k1.channel = c1

====

说明：

a1.sources.r2.interceptors = i2 为r2 添加过滤器 ,过滤器的名字叫 i2

a1.sources.r2.interceptors.i2.type = static 设置过滤器类型为 static

a1.sources.r2.interceptors.i2.key = type 设置key 的值

a1.sources.r2.interceptors.i2.value = nginx 设置value的值

====

node03的配置

1、实例三个角色

2、三个角色配置

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path=hdfs://192.168.52.100:8020/source/logs/%{type}/%Y%m%d

3 、建立三者之间的关系

===

说明：sources需要添加过滤器，类型必须是

org.apache.flume.interceptor.TimestampInterceptor$Builder 写入数据时获取前面设置的key ,使用%{type}

===

Flume Source详解

1. Avro Source属性说明

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = avro

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

2. Spooling Directory Source

监测配置的目录下新增的文件，并将文件中的数据读取出来。

a1.channels = ch-1

a1.sources = src-1

a1.sources.src-1.type = spooldir

a1.sources.src-1.channels = ch-1

a1.sources.src-1.spoolDir = /var/log/apache/flumeSpool

a1.sources.src-1.fileHeader = true

3. NetCat Source

一个NetCat Source用来监听一个指定端口，并将接收到的数据的每一行转换为一个事件。

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = netcat

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 6666

a1.sources.r1.channels = c1

4. HTTP Source

HTTP Source接受HTTP的GET和POST请求作为Flume的事件,其中GET方式应该只用于试验。

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = http

a1.sources.r1.port = 5140

a1.sources.r1.channels = c1

a1.sources.r1.handler = org.example.rest.RestHandler

a1.sources.r1.handler.nickname = random props

5. Kafka Source

Example for topic subscription by comma-separated topic list.

tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource

tier1.sources.source1.channels = channel1

tier1.sources.source1.batchSize = 5000

tier1.sources.source1.batchDurationMillis = 2000

tier1.sources.source1.kafka.bootstrap.servers = localhost:9092

tier1.sources.source1.kafka.topics = test1, test2

tier1.sources.source1.kafka.consumer.group.id = custom.g.id

Example for topic subscription by regex

tier1.sources.source1.type = org.apache.flume.source.kafka.KafkaSource

tier1.sources.source1.channels = channel1

tier1.sources.source1.kafka.bootstrap.servers = localhost:9092

tier1.sources.source1.kafka.topics.regex = ^topic[0-9]$

# the default kafka.consumer.group.id=flume is used

6. Thrift Source

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = thrift

a1.sources.r1.channels = c1

a1.sources.r1.bind = 0.0.0.0

a1.sources.r1.port = 4141

7. Exec Source

tail -f 命令监控

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = exec

a1.sources.r1.command = tail -F /var/log/secure

a1.sources.r1.channels = c1

8.TailDir Source

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = TAILDIR

a1.sources.r1.channels = c1

a1.sources.r1.positionFile = /var/log/flume/taildir_position.json

a1.sources.r1.filegroups = f1 f2

a1.sources.r1.filegroups.f1 = /var/log/test1/example.log

a1.sources.r1.headers.f1.headerKey1 = value1

a1.sources.r1.filegroups.f2 = /var/log/test2/.*log.*

a1.sources.r1.headers.f2.headerKey1 = value2

a1.sources.r1.headers.f2.headerKey2 = value2-2

a1.sources.r1.fileHeader = true

9.SySLog Source

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = syslogtcp

a1.sources.r1.port = 5140

a1.sources.r1.host = localhost

a1.sources.r1.channels = c1

10.MULTIPORT SysLog TCP Source

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = multiport_syslogtcp

a1.sources.r1.channels = c1

a1.sources.r1.host = 0.0.0.0

a1.sources.r1.ports = 10001 10002 10003

a1.sources.r1.portHeader = port

11.SySLog UDP Source

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = syslogudp

a1.sources.r1.port = 5140

a1.sources.r1.host = localhost

a1.sources.r1.channels = c1

12.Custom Source

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = org.example.MySource

a1.sources.r1.channels = c1

13.Scribe Source

a1.sources = r1

a1.channels = c1

a1.sources.r1.type = org.apache.flume.source.scribe.ScribeSource

a1.sources.r1.port = 1463

a1.sources.r1.workerThreads = 5

a1.sources.r1.channels = c1

kismetG

关注

3
点赞
踩
0

收藏

觉得还不错? 一键收藏
2
评论
Flume

什么是Flume 一个分布式、可靠、和高可用的海量日志采集、聚合和传输的系统 Flume内部组成 1、 Source ：与数据源对接，用于采集、收集数据 2、Channel : 用于数据传输（在flflumeAgent内部） 3、Sink : 用户数据的发送或数据下沉（在flflumeAgent内部）F...
复制链接

扫一扫

专栏目录