windows下，利用flume将csv数据文件上传至kafka

Miriam_Taylor

已于 2024-07-17 03:35:04 修改

阅读量191

点赞数 7

文章标签： flume kafka 大数据 windows

于 2024-07-17 03:20:51 首次发布

本文链接：https://blog.csdn.net/m0_64359381/article/details/140481735

版权

以下内容仅做笔记用

准备工作

首先准备好电脑相关配置：

JDK1.8下载与安装（引用猿月亮博主的相关博文）

windows下的zookeeper和kafka安装及配置-CSDN博客

windows下flume的安装与配置教程-CSDN博客

写好flume配置文件

打开flume目录下的conf文件，创建一个example.conf配置文件，里面包含source、channel、sink信息

简单示例如下：

# Define the source, sink, and channel
a1.sources = src
a1.sinks = k1
a1.channels = c1

# Describe the source
a1.sources.src.type = spooldir
a1.sources.src.spoolDir = E:\\apache-flume-1.11.0-bin
a1.sources.src.fileHeader = false
a1.sources.src.includePattern = work12.csv
a1.sources.src.deserializer = LINE
a1.sources.src.deserializer.maxLineLength = 10000

# Define the interceptor with the correct type
a1.sources.src.interceptors = head_filter
a1.sources.src.interceptors.head_filter.type = regex_filter
a1.sources.src.interceptors.head_filter.regex = ^user_id*
a1.sources.src.interceptors.head_filter.excludeEvents = true

# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.bootstrap.servers = localhost:9092
a1.sinks.k1.kafka.topic = mytopic
a1.sinks.k1.kafka.flumeBatchSize = 500
a1.sinks.k1.kafka.producer.acks = 1
a1.sinks.k1.kafka.producer.linger.ms = 1
a1.sinks.k1.kafka.producer.batch.size = 1048576
a1.sinks.k1.kafka.consumer.session.timeout.ms = 30000
a1.sinks.k1.kafka.consumer.heartbeat.interval.ms = 10000

# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 10000
a1.channels.c1.transactionCapacity = 5000

# Bind the source and sink to the channel
a1.sources.src.channels = c1
a1.sinks.k1.channel = c1

启动zookeeper和kafka

1. 打开cmd一号（不关闭），输入zkServer，启动zk服务

2. 打开cmd二号（不关闭），输入 cd E:\kafka_2.12-3.4.0 进入根目录，输入
bin\windows\kafka-server-start.bat config\server.properties
启动kafka服务

3. 打开cmd三号（不关闭），输入
.\bin\windows\kafka-topics.bat --create --bootstrap-server localhost:9092 --topic mytopic --partitions 1 --replication-factor 1
这里根据要求调整分区，创建一个新的topic作为本次数据消息发送的平台

4. 在cmd三号里（不关闭），输入
bin\windows\kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic mytopic --from-beginning
创建消费者消息监听

5. 打开cmd四号（不关闭），启动flume，输入
cd E:\apache-flume-1.11.0-bin\bin

flume-ng agent -n a1 -f E:\apache-flume-1.11.0-bin\conf\example.conf
以 example.conf 文件中自定义的配置启动 flume。

这里example.conf 文件定义了一个简单的数据流，从一个源（spooldir）读取数据，经过一个通道（memory channel），并将数据发送到一个 Kafka 主题中。详细介绍看flume配置文件的一些语义解释-CSDN博客

运行完毕后，打开cmd三号，发现消费者监听会话成功，与原数据对比如下：

到此flume传输数据至kafka操作结束

Miriam_Taylor

关注

7
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
windows下，利用flume将csv数据文件上传至kafka

这里example.conf 文件定义了一个简单的数据流，从一个源（spooldir）读取数据，经过一个通道（memory channel），并将数据发送到一个 Kafka 主题中。打开flume目录下的conf文件，创建一个example.conf配置文件，里面包含source、channel、sink信息。5. 打开cmd四号（不关闭），启动flume，输入。2. 打开cmd二号（不关闭），输入。1. 打开cmd一号（不关闭），输入。3. 打开cmd三号（不关闭），输入。
复制链接

扫一扫