kafka和flume整合

kafka和flume的整合

kafka和flume的整合应用非常的广泛
Flume是一个数据采集搬运工。配置数据源,可以源源不断的将数据采集过来,flume不会持久性的保存数据,但是会做一个临时性的缓存,最后还是需要sink将数据落地到外部的存储系统,比如hdfs、kafka。
实际上使用hdfs和kafka走的是两条线,flume和hdfs的整合一般都是做离线的批处理,而flume和kafka的整合一般都走的是实时流处理的路线。
在这里插入图片描述

案例

需求,用shell脚本实现每秒向一个文件中追加一条时间的信息,来模拟实时的日志生成的过程,使用flume实时监视此文件,将文件中增加的内容收集起来,传入到kafka,也就是flume作为kafka的生产者.使用API的消费者进行消费,将消息实时的读取出来

1.编写shell脚本

#!bin/bash
while true
do
echo $(date) >> /home/hadoop/apps/kafka_2
.11-1.1.1/time.txt
sleep 1
done

2.编写flume的配置文件

# 指定各个核心组件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 准备数据源
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/apps/kafka_2.11-1.1.1/time.txt

# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic=hadoop
a1.sinks.k1.kafka.bootstrap.servers=hadoop01:9092,hadoop02:9092,hadoop03:9092
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 20000
a1.channels.c1.transactionCapacity = 10000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3.消费者代码实现

package com.chang;

import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;

import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

public class Myconsumer {
    public static void main(String[] args) {
        Consumer consumer=null;
        Properties properties=new Properties();
        try {
            //加载配置文件
            properties.load(Myconsumer.class.getClassLoader().getResourceAsStream("consumer.properties"));
            consumer = new KafkaConsumer(properties);
            /*订阅topic,因为 public void subscribe(Collection<String> topics);接收的是一个集合的类型
            *所以要对topic的封装在集合中
             */
            List<String> list=new ArrayList<>();
            list.add("hadoop");
            consumer.subscribe(list);
                //循环接收
            while (true){
                //消费,指定拉取的时间间隔
                ConsumerRecords<Integer, String> records = consumer.poll(1000);
            for (ConsumerRecord<Integer, String> record : records) {
                //获取消息的属性并打印
                String topic = record.topic();
                int partition = record.partition();
                Integer key = record.key();
                String value = record.value();
                long offset = record.offset();
                System.out.println(String.format("topic %s\t,partition %d\t,key:%d\t,value:%s\t,offset:%d\t",
                        topic, partition, key, value, offset));

            }
        }
        } catch (IOException e) {
            e.printStackTrace();
        }finally {
           // consumer.close();
        }
    }
}

配置文件

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.consumer.ConsumerConfig for more details

# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=hadoop01:9092,haoop02:9092,hadoop03:9092

# consumer group id
group.id=myconsumer

# 消费数据的方式:latest(从偏移量最新的位置开始消费), earliest(从偏移量最早的位置开始消费)
# 默认latest
auto.offset.reset=earliest
#指定反序列化
#key对应的反序列化器de
key.deserializer=org.apache.kafka.common.serialization.IntegerDeserializer

#value对应的序列化器这两个参数如果不指定的话会报错
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer

4.打开消费者

5.运行脚本

sh: vim time_seconds.sh

6.运行flume

bin/flume-ng agent -n a1 -c conf -f agentconf/exec_source_kafka_sink.properties

7.观察消费者读取到的信息

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值