消息队列kafka

最新推荐文章于 2021-12-15 17:47:48 发布

Evan-^_^

最新推荐文章于 2021-12-15 17:47:48 发布

阅读量234

点赞数

分类专栏： Linux 文章标签： kafka

本文链接：https://blog.csdn.net/qq_38524532/article/details/86683933

版权

Linux 专栏收录该内容

14 篇文章 0 订阅

订阅专栏

一、kafka简介

kafka是一个高吞吐、低延迟分布式的消息队列系统，每秒可处理几十万条消息，延迟最低只有几毫秒

在这里插入图片描述

kafka集群有多个broker服务器组成，每个类型的消息定义为topic

同一个topic内部的消息按照一定的key和算法被分区存储在不同的broker上

消息生产者producer和消费者consumer可以在多个broker上生产消费topic

topics和logs

在这里插入图片描述

topic为发布到kafka集群中的一个类别，topic在kafka中可以由多个消费者订阅，消费

每个topic包含一个或多个partition分区，partition分区数量可以在创建topic的时候指定，每个分区中记录了该分区的数据以及索引信息

kafka只保证一个分区内的消息有序，不能保证一个主题不同分区之间的消息有序，若想要所有消息都有序，可以只为一个主题分配一个分区

分区会给每个消息记录分配一个顺序ID，偏移量，能够唯一的标识该分区中的每个记录，kafka集群保留所有发布的记录，不管这个记录有没有被消费过，kafka提供相关策略通过配置从而对旧数据进行处理

每个消费者唯一保存的元数据信息就是消费者当前消费日志的位移位置。位移位置由消费者控制，消费者可以修改偏移量读取任何位置的数据

producer

生产者，指定topic发送消息到kafka broker

consumer

根据topic消费相应的消息

topic

消息主题，类型，一个topic可以有多个partition，分布在不同的broker server上

注：

consumer自己维护消费到哪个offset

每个consumer都有对应的group

group内是queue消费模型

各个consumer消费不同的partition，因此一个消息在group内只消费一次

group间是publish-subscribe（发布/订阅）模型

各个group独立消费，互不影响，一个消息，可以在每个group订阅一次

kafka应用场景

a、日志收集

使用kafka收集各种服务的log，通过kafka以统一接口服务的方式开放给各种consumer，例如hadoop，hbase，solr（搜索引擎和elasticsearch相似）等

b、消息系统

解耦和生产者和消费者，缓存消息

c、用户活动跟踪

kafka经常被用来记录web用户或app用户的各种活动，如浏览网页、搜索、点击等活动，这些活动信息被各个服务器发布到kafka的topic中，然后订阅者通过订阅这些topic来做实时监控分析，或者装载到hadoop，数据仓库做离线分析或数据挖掘

d、运营指标

记录运营监控数据，收集各种分布式应用的数据，生产各种操作的集中反馈，比如报警和报告

e、流式处理

spark stream storm

二、集群部署

环境准备

Zookeeper 集群共三台服务器，分别为：node1、node2、node3。
Kafka 集群共三台服务器，分别为：node1、node2、node3。

1、zookeeper集群准备

kafka需依赖zookeeper，需要先安装好zk集群

2、kafka安装

下载压缩包（官网地址：http://kafka.apache.org/downloads.html）
解压：

tar zxvf kafka_2.10-0.9.0.1.tgz -C /opt/

修改配置文件：：config/server.properties

broker.id：broker 集群中唯一标识 id，0、1、2、3 依次增长（broker
即 Kafka 集群中的一台服务器）

注：当前 Kafka 集群共三台节点，分别为：node1、node2、node3。
对应的 broker.id 分别为 0、1、2。

将当前 node1 服务器上的 Kafka 目录同步到其他 node2、node3 服务
器上，并将broker id修改各自唯一id

3、启动集群

A、启动 Zookeeper 集群。
B、启动 Kafka 集群。
分别在三台服务器上执行以下命令启动：

bin/kafka-server-start.sh config/server.properties

4、测试

kafka-topics.sh --help //查看帮助手册
bin/kafka-console-consumer.sh help	//查看帮助手册

创建topic

bin/kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --create --replication-factor 2 --partitions 3 --topic test

参数说明：

replication-factor：副本数，默认为1个

partitions：分区，默认为1个

topic：指定新建的topic名称

查看 topic 列表：

bin/kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --list

查看’test‘ topic描述

bin/kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --describe --topic test

创建生产者

bin/kafka-console-producer.sh --broker-list node1:9092,node2:9092,node3:9092 --topic test

创建消费者

bin/kafka-console-consumer.sh --zookeeper 
node1:2181,node2:2181,node3:2181 --from-beginning --topic test

三、Flume & Kafka

1、Flume安装

2、Flume+Kafka

A、启动 Kafka 集群。

bin/kafka-server-start.sh config/server.properties

B、配置 Flume 集群，并启动 Flume 集群。

bin/flume-ng agent -n a1 -c conf -f conf/fk.conf -Dflume.root.logger=DEBUG,console

Flume配置文件如下

a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = node1
a1.sources.r1.port = 41414
# Describe the sink
a1.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a1.sinks.k1.topic = testflume
a1.sinks.k1.brokerList = node1:9092,node2:9092,node3:9092
a1.sinks.k1.requiredAcks = 1
a1.sinks.k1.batchSize = 20
a1.sinks.k1.channel = c1
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000000
a1.channels.c1.transactionCapacity = 10000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

3、测试

分别启动 Zookeeper、Kafka、Flume 集群

创建 topic：

bin/kafka-topics.sh --zookeeper node1:2181,node2:2181,node3:2181 --create --replication-factor 2 --partitions 3 --topic testflume

启动消费者

bin/kafka-console-consumer.sh --zookeeper node1:2181,node2:2181,node3:2181 --from-beginning --topic testflume

运行“RpcClientDemo”代码，通过 rpc 请求发送数据到 Flume 集群。

import org.apache.flume.Event;
import org.apache.flume.EventDeliveryException;
import org.apache.flume.api.RpcClient;
import org.apache.flume.api.RpcClientFactory;
import org.apache.flume.event.EventBuilder;
import java.nio.charset.Charset;
/**
* Flume官网案例
* http://flume.apache.org/FlumeDeveloperGuide.html 
* @author root
*/
public class RpcClientDemo {
public static void main(String[] args) {
MyRpcClientFacade client = new MyRpcClientFacade();
// Initialize client with the remote Flume agent's host and port
client.init("node1", 41414);
// Send 10 events to the remote Flume agent. That agent should 
be
// configured to listen with an AvroSource.
String sampleData = "Hello Flume!";
for (int i = 0; i < 10; i++) {
client.sendDataToFlume(sampleData);
System.out.println("发送数据：" + sampleData);
}
client.cleanUp();
}
}
class MyRpcClientFacade {
private RpcClient client;
private String hostname;
private int port;
public void init(String hostname, int port) {
// Setup the RPC connection
this.hostname = hostname;
this.port = port;
this.client = RpcClientFactory.getDefaultInstance(hostname, port);
// Use the following method to create a thrift client (instead 
of the
// above line):
// this.client = RpcClientFactory.getThriftInstance(hostname, 
port);
}
public void sendDataToFlume(String data) {
// Create a Flume Event object that encapsulates the sample data
Event event = EventBuilder.withBody(data, 
Charset.forName("UTF-8"));
// Send the event
try {
client.append(event);
} catch (EventDeliveryException e) {
// clean up and recreate the client
client.close();
client = null;
client = RpcClientFactory.getDefaultInstance(hostname, 
port);
// Use the following method to create a thrift client (instead 
of
// the above line):
// this.client = 
RpcClientFactory.getThriftInstance(hostname, port);
}
}
public void cleanUp() {
// Close the RPC connection
client.close();
}
}

四、数据丢失和重复消费问题

重复消费问题产生原因：

zookeeper保存消息队列的offset，consumer消费数据后，默认每隔一段时间提交一次确认使用数据，如自动提交时间为6000ms，在此段时间内，若消息已消费部分，但是在中途崩掉，导致这6000ms内的数据消费过，但未提交zookeeper，zookeeper记录的还是之前的offset，再启动时，从记录的offset开始读，从而到时消息重复消费

数据丢失产生原因：

与重复消费状况基本相同，但数据丢失主要因为自动提交时间过短，如自动提交时间设为20ms，但数据处理的时间为1000ms，在处理消息的这1000ms内，在第500ms时服务器崩掉，此时数据未成功消费，但在20ms时已提交zookeeper已被消费的情况，offset更新至当前消费数据的下一个，此时启动时，从记录offset开始读，中间丢失数据

五、相关参数说明

acks：生产者要求kafka的leader在请求完成前确认的数量

设置参数为0：生产者会将消息立即发送给队列，并且不等服务端确认是否收到，直接认为已发送成功

设置参数为1：leader将消息写发送备份给其它，未等其确认就直接返回已接受完成

设置参数为all：leader将消息发送给备份后，确认已收到，才返回确认收到

原文地址：http://kafka.apache.org/090/documentation.html（官网）

原文：

The number of acknowledgments the producer requires the leader to have received before considering a request complete. This controls the durability of records that are sent. The following settings are common:
acks=0 If set to zero then the producer will not wait for any acknowledgment from the server at all. The record will be immediately added to the socket buffer and considered sent. No guarantee can be made that the server has received the record in this case, and the retries configuration will not take effect (as the client won't generally know of any failures). The offset given back for each record will always be set to -1.
acks=1 This will mean the leader will write the record to its local log but will respond without awaiting full acknowledgement from all followers. In this case should the leader fail immediately after acknowledging the record but before the followers have replicated it then the record will be lost.
acks=all This means the leader will wait for the full set of in-sync replicas to acknowledge the record. This guarantees that the record will not be lost as long as at least one in-sync replica remains alive. This is the strongest available guarantee.

auto.offset.reset

默认从最小offset还是最大offset值开始消费

生产者消费者kafka代码实现

producer

package com.shsxt.test;

import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
import kafka.serializer.StringEncoder;

import java.util.Properties;

public class ProducerTest extends Thread{
    private String topic;
    private Producer<Integer,String> producer;

    public static void main(String[] args) {
        new ProducerTest("topicTest").start();
    }

    public ProducerTest(String topic) {
        this.topic = topic;

        Properties conf=new Properties();
        conf.put("metadata.broker.list","node1:9092,node2:9092,node3:9092");//kafka的broker位置路径
        conf.put("serializer.class", StringEncoder.class.getName());
        conf.put("acks",1);

        producer=new Producer<Integer, String>(new ProducerConfig(conf));

    }

    @Override
    public void run() {
        int counter=0;
        while(true){
            counter++;
            String value="number:"+counter;

            KeyedMessage<Integer,String> message=new KeyedMessage<>(topic,value);

            producer.send(message);

            System.out.println(value+"------------------");

            if(0==counter%2){
                try {
                    Thread.sleep(2000);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
            }
        }

    }
}

consumer

package com.shsxt.test;

import kafka.consumer.Consumer;
import kafka.consumer.ConsumerConfig;
import kafka.consumer.ConsumerIterator;
import kafka.consumer.KafkaStream;
import kafka.javaapi.consumer.ConsumerConnector;

import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;

public class ConsumerTest extends Thread{
    private final ConsumerConnector consumer;
    private final String topic;

    public ConsumerTest(String topic) {
        this.consumer = Consumer.createJavaConsumerConnector(createConsumerConfig());
        this.topic=topic;
    }

    private ConsumerConfig createConsumerConfig() {
        System.out.println();
        Properties properties=new Properties();
        properties.put("zookeeper.connect", "node1:2181,node2:2181,node3:2181");
        properties.put("group.id", "hq");
        properties.put("zookeeper.session.timeout.ms", "400");
        properties.put("auto.commit.interval.ms", "6000");
        properties.put("auto.offset.reset","smallest");
        return new ConsumerConfig(properties);

    }

    @Override
    public void run() {
        Map<String ,Integer> map=new HashMap<>();
        map.put(topic,1);//确认读哪个topic，几个线程读
        Map<String,List<KafkaStream<byte[],byte[]>>> consumerMap=consumer.createMessageStreams(map);

        List<KafkaStream<byte[],byte[]>> list=consumerMap.get(topic);

        KafkaStream stream=list.get(0);

        ConsumerIterator<byte[],byte[]> iterator=stream.iterator();
        System.out.println("-========---===-");

        while (iterator.hasNext()){
            String data=new String(iterator.next().message());

            System.out.println("开始处理数据"+data);
            try {
                Thread.sleep(500);
            }catch (Exception e){
                e.printStackTrace();
            }

//           System.out.println("数据处理中..." + data);
//            try {
//                Thread.sleep(2000);
//            } catch (InterruptedException e) {
//                e.printStackTrace();
//            }
//            System.out.println("处理完数据..." + data);
//            consumer.commitOffsets();
        }

    }

    public static void main(String[] args) {
        new ConsumerTest("topicTest").start();

    }
}

Evan-^_^

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
消息队列kafka

一、kafka简介kafka是一个高吞吐、低延迟分布式的消息队列系统，每秒可处理几十万条消息，延迟最低只有几毫秒kafka集群有多个broker服务器组成，每个类型的消息定义为topic同一个topic内部的消息按照一定的key和算法被分区存储在不同的broker上消息生产者producer和消费者consumer可以在多个broker上生产消费topictopics和logsto...
复制链接

扫一扫

专栏目录