Kafka系列(五)Kafka体系架构英文版

数据缓冲

Kafka:

high-throughputdistributedpublish-subscribe messaging system

  • fast
  • scalable
  • durable
  • real-time
kafka architecture

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-GyVUWRHy-1600834798558)(C:%5CUsers%5Clenovo%5CAppData%5CRoaming%5CTypora%5Ctypora-user-images%5Cimage-20200921090124773.png)]

  • kafka maintains feeds of messages in called Topics
  • Producers publish messages to a kafka topic
  • consumers subscribe to topics and process the feed of published messages
  • servers in a kafka cluster are called brokers
kafka topic
  • Topics:
  • Partitions:
  • Logs:
  • Retention Period:
  • Consumers maintain and track their locations/offsets in each log
#修改脚本
vi .bashrc


# .bashrc

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi
export SPARK_MAJOR_VERSION=2
sed -i s/'history -cw'//g .bash_logout
export PATH=/usr/hdp/current/kafka-broker/bin:$PATH
#启动kafka
kafka-server-start.sh /usr/hdp/2.6.4.0-91/kafka/config/server.properties
#创建topic
kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --create --topic test --partition 3 --replication-factor 1


# 查看topic
kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181  --list

# 查看topic分区
kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --describe --topic test

# 修改主题数据保留时间
kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --alter --topic test --config retention.ms=10000

kafka-topics.sh --zookeeper sandbox-hdp.hortonworks.com:2181 --alter --topic test -delete-config retention.ms

在这里插入图片描述

Kafka Message Flow

在这里插入图片描述

producer可以控制指定发送到哪个分区(robin)

consumer可以指定我消费哪个分区

Kafka High-Throughput & Low-Latency
  • Batching of individual messages
  • Zero copy I/O using sendfile
kafka Broker
  • Each partition has one server acting as a leader, and zero or more servers acting as followers / ISRs

    broker数要比分区数大得多, 每个分区的备份最好放在不在leader的其他机器上

  • Each server acts as a leader for some partitions and a followers for others, so load is well balanced

    每个broker最好只要有一个leader

kafka Producer
  • Producers publish data to the topics of their choice

    通过分区分配策略来控制我发送哪个分区

  • Async Publishing (less durable)

    把ack调成0,数据可靠性最差的情况

  • All nodes can answer metadata requests about:

    所有机器都可以提供位置信息给你作为传送消息的节点

kafka Consumer

可以定义细粒度:可以去拿topic,也可以去拿partition

  • Consumers consume messages through subscriptions

  • Multiple Consumers can read from the same topics

  • Consumers are organized into Consumer Groups

    一个partition只能被一个group的一个consumer消费,

    partition数量应该大于consumer数量

  • Kafka offers messages to Consumer Groups, not Consumer (instance) directly

    kafka的offset消息是在消费组层面上的,在组里每个consumer都知道partition读到什么地方,如果有consumer挂掉,下一个consumer会继续从offset之后拿

  • Messages remain on Kafka, which are not removed after they are consumed

  • Messaging models

    Queue: a message goes to one of the consumers.

    ​ ⚫ All consumers are in the same Consumer Group

    Publish-Subscribe: a message goes to all consumers.

    ​ ⚫ All consumers are assigned to different Consumer Groups;

    发布订阅,我多个消费组中的一个消费者去消费一个message时,这样可以并行去消费。

kafka ZooKeeper
  • Starting from 0.10, Kafka has its own internal topic for the offset storage
kafka API

The Producer API allows an application to publish a stream of records to one or more Kafka topics.

The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.

The Streams API allows an application to act as a stream processor , consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.

The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems.

Message Ordering

To ensure global ordering for a topic:

◼ If all message must be ordered within one topic, use one partition

全局有序得话,最好把分区数设为一

◼ If messages can be ordered by certain properties

​ ⚫ Group messages in a partition by Key (defined upon the properties in producer)

​ ⚫ Configure exactly one consumer instance per partition within a consumer group

Message Replication

◼ 0 – the producer never waits for an ack

◼ 1 – the producer gets an ack after the leader replica has received the data

◼ -1 / all – the producer gets an ack after all ISRs (in-sync replication) receives the data

Data Loss at the Producer

◆ Kafka Producer API

​ ◼ Messages are accumulated in buffer in batches

批量发送

​ ◼ Messages are batched by partition, retried at batch level

​ ◼ Expired batches dropped after retries

在重试之后,批处理的消息到期了

◆ Data Loss at Producer

​ ◼ Failed to close / flush producer on termination

没有成功刷新最后得消息

​ ◼ Dropped batches due to communication or other errors when acks = 0 or retry exhaustion

Data Delivered but Loss in the Cluster

在这里插入图片描述

Data Duplication

consumer和producer都会发生数据重复消费

在这里插入图片描述

Kafka Use Cases

◆ Real-Time Streaming Processing (combined with Spark Streaming)

◆ General purpose Message Bus

◆ Collecting User Activity Data

用户行为数据

◆ Collecting Operational Metrics from Applications, Servers or Devices

◆ Log Aggregation (combined with ELK)

◆ Change Data Capture

◆ Commit Log for distributed System

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值