Flum的案例channel和Sink组件

人间小鲸鱼

于 2020-05-19 08:48:03 发布

阅读量217

点赞数

分类专栏： # Flum 文章标签： flume 大数据

本文链接：https://blog.csdn.net/weixin_45316851/article/details/106206680

版权

Flum 专栏收录该内容

3 篇文章 0 订阅

订阅专栏

Simple Example

收集网络端口产生的访问数据，并且输出到服务的控制台窗口

准备配置文件

[root@HadoopNode00 flume]# cd apache-flume-1.7.0-bin/
[root@HadoopNode00 apache-flume-1.7.0-bin]# vi conf/simple.properties
[root@HadoopNode00 apache-flume-1.7.0-bin]# vi conf/simple.properties
# example.conf: A single-node Flume configuration

# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1

# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind = HadoopNode00
a1.sources.r1.port = 44444

# Describe the sink
a1.sinks.k1.type = logger

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100

# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

启动Flume Agent服务实例

[root@HadoopNode00 apache-flume-1.7.0-bin]# bin/flume-ng agent --conf conf --conf-file conf/simple.properties --name a1 -Dflume.root.logger=INFO,console

Channel

事件（Event）队列的数据结构，负责采集数据的临时存储

Memory

临时存放到内存中

JDBC

Events存储到一个持久化的数据库(Derby)，目前不支持其它数据库产品

a1.channels = c1
a1.channels.c1.type = jdbc

Kafka

Events存储到Kafka集群，kafka是一个高可用和数据冗余备份功能的消息队列系统

a1.channels.channel1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.channel1.kafka.bootstrap.servers = kafka-1:9092,kafka-2:9092,kafka-3:9092
a1.channels.channel1.kafka.topic = channel1
a1.channels.channel1.kafka.consumer.group.id = flume-consumer

File

Events存放到本地的文件系统中

Spillable Memory

内存溢写的channel，将超出内存容量的Events溢写到磁盘进行存储

Sink

主要作用：负责将采集到的数据最终存放/保存中央存储系统中

Logger

将采集到的数据输出到服务的控制台窗口

HDFS

将采集到的数据存放到HDFS分布式文件系统中，支持两种文件格式：文本和序列化文件。

注意：确保HDFS分布式文件系统服务正常

# Describe the sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /flume/events/%y-%m-%d/%H%M/%S
a1.sinks.k1.hdfs.filePrefix = events-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = minute

org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
    
解决方案：
	HDFS Sink，需要配置一个额外的拦截器(Interceptor),Event Header中自动添加一个时间戳信息
	#--------------------------------------------
	a1.sources.r1.interceptors = i1
    a1.sources.r1.interceptors.i1.type = timestamp

备注说明：

HDFS Sink默认使用的是Sequence File文件格式，如果需要数据的原始内容需要添加配置项a1.sinks.k1.hdfs.fileType = DataStream

Avro

将采集到的数据发送给指定的Avro Source

a1.sinks.k1.type = avro
# 另外一个Flume Agent服务实例的IP地址和Port
a1.sinks.k1.hostname = 10.10.10.10
a1.sinks.k1.port = 4545

File Roll

将采集到的数据保存到本地文件系统中

a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /root/data

HBase Sink

将采集到数据保存到HBase分布式非关系型数据库中

# Describe the sink
a1.sinks.k1.type = hbase
a1.sinks.k1.table = baizhi:t_data
a1.sinks.k1.columnFamily = cf1

Kafka Sink

将采集到的数据发布到Kafka集群中

a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic = mytopic
a1.sinks.k1.kafka.bootstrap.servers = localhost:9092
a1.sinks.k1.kafka.flumeBatchSize = 20
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.compression.type = snappy

人间小鲸鱼

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Flum的案例channel和Sink组件

Simple Example收集网络端口产生的访问数据，并且输出到服务的控制台窗口准备配置文件[root@HadoopNode00 flume]# cd apache-flume-1.7.0-bin/[root@HadoopNode00 apache-flume-1.7.0-bin]# vi conf/simple.properties[root@HadoopNode00 apache-flume-1.7.0-bin]# vi conf/simple.properties# example.
复制链接

扫一扫

专栏目录