kafka和flume的整合
kafka和flume的整合应用非常的广泛
Flume是一个数据采集搬运工。配置数据源,可以源源不断的将数据采集过来,flume不会持久性的保存数据,但是会做一个临时性的缓存,最后还是需要sink将数据落地到外部的存储系统,比如hdfs、kafka。
实际上使用hdfs和kafka走的是两条线,flume和hdfs的整合一般都是做离线的批处理,而flume和kafka的整合一般都走的是实时流处理的路线。

案例
需求,用shell脚本实现每秒向一个文件中追加一条时间的信息,来模拟实时的日志生成的过程,使用flume实时监视此文件,将文件中增加的内容收集起来,传入到kafka,也就是flume作为kafka的生产者.使用API的消费者进行消费,将消息实时的读取出来
1.编写shell脚本
#!bin/bash
while true
do
echo $(date) >> /home/hadoop/apps/kafka_2
.11-1.1.1/time.txt
sleep 1
done
2.编写flume的配置文件
# 指定各个核心组件
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 准备数据源
a1.sources.r1.type = exec
a1.sources.r1.command = tail -F /home/hadoop/apps/kafka_2.11-1.1.1/time.txt
# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.kafka.topic=hadoop
a1.sinks.k1.kafka.bootstrap.servers=hadoop01:9092,hadoop02:9092,hadoop03:9092
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 20000
a1.channels.c1.transactionCapacity = 10000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3.消费者代码实现
package com.chang;
import org.apache.kafka.clients.consumer.Consumer;
import org.apache.kafka.clients.consumer.ConsumerRecord;
import org.apache.kafka.clients.consumer.ConsumerRecords;
import org.apache.kafka.clients.consumer.KafkaConsumer;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import java.util.Properties;
public class Myconsumer {
public static void main(String[] args) {
Consumer consumer=null;
Properties properties=new Properties();
try {
//加载配置文件
properties.load(Myconsumer.class.getClassLoader().getResourceAsStream("consumer.properties"));
consumer = new KafkaConsumer(properties);
/*订阅topic,因为 public void subscribe(Collection<String> topics);接收的是一个集合的类型
*所以要对topic的封装在集合中
*/
List<String> list=new ArrayList<>();
list.add("hadoop");
consumer.subscribe(list);
//循环接收
while (true){
//消费,指定拉取的时间间隔
ConsumerRecords<Integer, String> records = consumer.poll(1000);
for (ConsumerRecord<Integer, String> record : records) {
//获取消息的属性并打印
String topic = record.topic();
int partition = record.partition();
Integer key = record.key();
String value = record.value();
long offset = record.offset();
System.out.println(String.format("topic %s\t,partition %d\t,key:%d\t,value:%s\t,offset:%d\t",
topic, partition, key, value, offset));
}
}
} catch (IOException e) {
e.printStackTrace();
}finally {
// consumer.close();
}
}
}
配置文件
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see org.apache.kafka.clients.consumer.ConsumerConfig for more details
# list of brokers used for bootstrapping knowledge about the rest of the cluster
# format: host1:port1,host2:port2 ...
bootstrap.servers=hadoop01:9092,haoop02:9092,hadoop03:9092
# consumer group id
group.id=myconsumer
# 消费数据的方式:latest(从偏移量最新的位置开始消费), earliest(从偏移量最早的位置开始消费)
# 默认latest
auto.offset.reset=earliest
#指定反序列化
#key对应的反序列化器de
key.deserializer=org.apache.kafka.common.serialization.IntegerDeserializer
#value对应的序列化器这两个参数如果不指定的话会报错
value.deserializer=org.apache.kafka.common.serialization.StringDeserializer
4.打开消费者
5.运行脚本
sh: vim time_seconds.sh
6.运行flume
bin/flume-ng agent -n a1 -c conf -f agentconf/exec_source_kafka_sink.properties
7.观察消费者读取到的信息


1274

被折叠的 条评论
为什么被折叠?



