Kafka

  1. Kafka简介:

Kafka是一种分布式的,基于发布/订阅的消息系统。主要设计目标如下:

以时间复杂度为O(1)的方式提供消息持久化能力,即使对TB级以上数据也能保证常数时间复杂度的访问性能。高吞吐率。即使在非常廉价的商用机器上也能做到单机支持每秒100K条以上消息的传输。
支持Kafka Server间的消息分区,及分布式消费,同时保证每个Partition内的消息顺序传输。同时支持离线数据处理和实时数据处理。Scale out:支持在线水平扩展。

常用Message Queue对比

RabbitMQ

RabbitMQ是使用Erlang编写的一个开源的消息队列,本身支持很多的协议:AMQP,XMPP, SMTP, STOMP,也正因如此,它非常重量级,更适合于企业级的开发。同时实现了Broker构架,
这意味着消息在发送给客户端时先在中心队列排队。对路由,负载均衡或者数据持久化都有很好的支持。

Redis

Redis是一个基于Key-Value对的NoSQL数据库,开发维护很活跃。虽然它是一个Key-Value数据库存储系统,但它本身支持MQ功能,所以完全可以当做一个轻量级的队列服务来使用。
对于RabbitMQ和Redis的入队和出队操作,各执行100万次,每10万次记录一次执行时间。测试数据分为128Bytes、512Bytes、1K和10K四个不同大小的数据。实验表明:入队时,
当数据比较小时Redis的性能要高于RabbitMQ,而如果数据大小超过了10K,Redis则慢的无法忍受;出队时,无论数据大小,Redis都表现出非常好的性能,而RabbitMQ的出队性能则远低于Redis。

ZeroMQ

ZeroMQ号称最快的消息队列系统,尤其针对大吞吐量的需求场景。ZeroMQ能够实现RabbitMQ不擅长的高级/复杂的队列,但是开发人员需要自己组合多种技术框架,
技术上的复杂度是对这MQ能够应用成功的挑战。ZeroMQ具有一个独特的非中间件的模式,你不需要安装和运行一个消息服务器或中间件,因为你的应用程序将扮演这个服务器角色。
你只需要简单的引用ZeroMQ程序库,可以使用NuGet安装,然后你就可以愉快的在应用程序之间发送消息了。但是ZeroMQ仅提供非持久性的队列,也就是说如果宕机,数据将会丢失。
其中,Twitter的Storm 0.9.0以前的版本中默认使用ZeroMQ作为数据流的传输(Storm从0.9版本开始同时支持ZeroMQ和Netty作为传输模块)。

ActiveMQ

ActiveMQ是Apache下的一个子项目。 类似于ZeroMQ,它能够以代理人和点对点的技术实现队列。同时类似于RabbitMQ,它少量代码就可以高效地实现高级应用场景。

Kafka/Jafka

Kafka是Apache下的一个子项目,是一个高性能跨语言分布式发布/订阅消息队列系统,而Jafka是在Kafka之上孵化而来的,即Kafka的一个升级版。
具有以下特性:快速持久化,可以在O(1)的系统开销下进行消息持久化;高吞吐,在一台普通的服务器上既可以达到10W/s的吞吐速率;
完全的分布式系统,Broker、Producer、Consumer都原生自动支持分布式,自动实现负载均衡;支持Hadoop数据并行加载,对于像Hadoop的一样的日志数据和离线分析系统,
但又要求实时处理的限制,这是一个可行的解决方案。Kafka通过Hadoop的并行加载机制统一了在线和离线的消息处理。Apache Kafka相对于ActiveMQ是一个非常轻量级的消息系统,
除了性能非常好之外,还是一个工作良好的分布式系统。

  1. kafka架构

Terminology

Broker
Kafka集群包含一个或多个服务器,这种服务器被称为broker

Topic
每条发布到Kafka集群的消息都有一个类别,这个类别被称为Topic。(物理上不同Topic的消息分开存储,逻辑上一个Topic的消息虽然保存于一个或多个broker上但用户只需指定消息的Topic即可生产或
消费数据而不必关心数据存于何处)

Partition
Parition是物理上的概念,每个Topic包含一个或多个Partition.

Producer
负责发布消息到Kafka broker

Consumer
消息消费者,向Kafka broker读取消息的客户端。

Consumer Group
每个Consumer属于一个特定的Consumer Group(可为每个Consumer指定group name,若不指定group name则属于默认的group)。

一个典型的Kafka集群中包含若干Producer(可以是web前端产生的Page View,或者是服务器日志,系统CPU、Memory等),若干broker(Kafka支持水平扩展,一般broker数量越多,集群吞吐率越高),
若干Consumer Group,以及一个Zookeeper集群。Kafka通过Zookeeper管理集群配置,选举leader,以及在Consumer Group发生变化时进行rebalance。Producer使用push模式将消息发布到
broker,Consumer使用pull模式从broker订阅并消费消息。

3.kafka镜像制作与构建容器集群

(1). 编写Dockerfile文件

root@ubuntu:/home/zhl/kafka# vi Dockerfile 
FROM index.tenxcloud.com/docker_library/java
MAINTAINER HaHa
COPY kafka_2.10-0.9.0.1.tgz /tmp/
RUN tar -xzf /tmp/kafka_2.10-0.9.0.1.tgz -C /opt
RUN mv /opt/kafka_2.10-0.9.0.1 /opt/kafka
RUN rm -f /tmp/zookeeper-3.4.8.tar.gz

ENV KAFKA_HOME /opt/kafka
ADD start-kafka.sh /usr/bin/start-kafka.sh
RUN chmod 777 /usr/bin/start-kafka.sh

CMD /usr/bin/start-kafka.sh

(2). 编写容器启动脚本

root@ubuntu:/home/zhl/kafka# vi start-kafka.sh 
cp $KAFKA_HOME/config/server.properties $KAFKA_HOME/config/server.properties.bk
sed -r -i "s/(zookeeper.connect)=(.*)/\1=${ZK}/g" $KAFKA_HOME/config/server.properties
sed -r -i "s/(broker.id)=(.*)/\1=${BROKER_ID}/g" $KAFKA_HOME/config/server.properties
sed -r -i "s/(log.dirs)=(.*)/\1=\/tmp\/kafka-logs-${BROKER_ID}/g" $KAFKA_HOME/config/server.properties
sed -r -i "s/#(advertised.host.name)=(.*)/\1=${HOST_IP}/g" $KAFKA_HOME/config/server.properties
sed -r -i "s/#(port)=(.*)/\1=${PORT}/g" $KAFKA_HOME/config/server.properties
sed -r -i "s/(listeners)=(.*)/\1=PLAINTEXT:\/\/:${PORT}/g" $KAFKA_HOME/config/server.properties
if [ "$KAFKA_HEAP_OPTS" !=""]; then
sed -r -i "s/^(export KAFKA_HEAP_OPTS)=\"(.*)\"/\1=\"$KAFKA_HEAP_OPTS\"/g" $KAFKA_HOME/bin/kafka-server-start.sh
fi
$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties

(3). 构建镜像

docker build -t kafkatest .

(4). 使用kafkatest镜像,启动3个kafka容器实例,
复制代码

$KAFKA_HOME/bin/kafka-server-start.sh $KAFKA_HOME/config/server.properties
K_PORT=9091
docker run --name=k1 -p ${K_PORT}:${K_PORT} -e BROKER_ID=1 -e HOST_IP=${HOST_IP} -e PORT=${K_PORT} -e ZK='192.168.225.128:2181,192.168.225.128:2182,192.168.225.128:2183' -d kafkatest
K_PORT=9092
docker run --name=k2 -p ${K_PORT}:${K_PORT} -e BROKER_ID=2 -e HOST_IP=${HOST_IP} -e PORT=${K_PORT} -e ZK='192.168.225.128:2181,192.168.225.128:2182,192.168.225.128:2183' -d kafkatest
K_PORT=9093
docker run --name=k3 -p ${K_PORT}:${K_PORT} -e BROKER_ID=3 -e HOST_IP=${HOST_IP} -e PORT=${K_PORT} -e ZK='192.168.225.128:2181,192.168.225.128:2182,192.168.225.128:2183' -d kafkatest   

查看启动结果:

root@ubuntu:/home/zhl/kafka# docker ps
CONTAINER ID        IMAGE               COMMAND                     CREATED             STATUS              PORTS                    NAMES
637beebec29e        kafkatest           "/bin/sh -c /usr/b..."   9 minutes ago       Up 9 minutes        0.0.0.0:9093->9093/tcp   k3
4bf4925e6f40        kafkatest           "/bin/sh -c /usr/b..."   9 minutes ago       Up 9 minutes        0.0.0.0:9092->9092/tcp   k2
988c32940785        kafkatest           "/bin/sh -c /usr/b..."   11 minutes ago      Up 11 minutes       0.0.0.0:9091->9091/tcp   k1
c1ab885b2770        zk                  "/opt/entrypoint.sh"     4 hours ago         Up 4 hours                                   zk3
b939bfa60ea2        zk                  "/opt/entrypoint.sh"     4 hours ago         Up 4 hours                                   zk2

bd161a246c28 zk “/opt/entrypoint.sh” 4 hours ago Up 4 hours zk1

root@ubuntu:~# ps -ef | grep config/server
root       4755   4742  0 13:24 ?        00:01:51 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/opt/kafka/bin/../logs -Dlog4j.configuration=file:/opt/kafka/bin/../config/log4j.properties -cp :/opt/kafka/bin/../libs/* kafka.Kafka /opt/kafka/config/server.properties
root       4954   4943  0 13:25 ?        00:01:55 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/opt/kafka/bin/../logs -Dlog4j.configuration=file:/opt/kafka/bin/../config/log4j.properties -cp :/opt/kafka/bin/../libs/* kafka.Kafka /opt/kafka/config/server.properties
root       5107   5095  0 13:25 ?        00:01:52 /usr/lib/jvm/java-8-openjdk-amd64/bin/java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/opt/kafka/bin/../logs/kafkaServer-gc.log -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dkafka.logs.dir=/opt/kafka/bin/../logs -Dlog4j.configuration=file:/opt/kafka/bin/../config/log4j.properties -cp :/opt/kafka/bin/../libs/* kafka.Kafka /opt/kafka/config/server.properties
root      28057   3295  0 23:57 pts/2    00:00:00 grep --color=auto config/server

(5). 通过命令行的方式启动生产者和消费者进行测试(http://www.jianshu.com/p/dc4770fc34b6)

//--进入到kafka目录,创建“test5”topic主题:分区为3、备份为3
root@ubuntu:/var/lib/docker/aufs/diff/c097ed7784e8ef038c4d564ca566de3c554ff52813f642220fed7cad0705810e/opt/kafka/bin# ./kafka-topics.sh --create --zookeeper 192.168.225.128:2181,192.168.225.128:2182,192.168.225.128:2183 --replication-factor 3 --partitions 3 --topic test5
Created topic "test5".
//--查看"test5"主题详情
root@ubuntu:/var/lib/docker/aufs/diff/c097ed7784e8ef038c4d564ca566de3c554ff52813f642220fed7cad0705810e/opt/kafka/bin# ./kafka-topics.sh --describe --zookeeper  192.168.225.128:2181 --topic test5
Topic:test5    PartitionCount:3    ReplicationFactor:3    Configs:
Topic: test5    Partition: 0    Leader: 1    Replicas: 1,2,3    Isr: 1,2,3
Topic: test5    Partition: 1    Leader: 2    Replicas: 2,3,1    Isr: 2,3,1
Topic: test5    Partition: 2    Leader: 3    Replicas: 3,1,2    Isr: 3,1,2
root@ubuntu:/var/lib/docker/aufs/diff/c097ed7784e8ef038c4d564ca566de3c554ff52813f642220fed7cad0705810e/opt/kafka/bin# ./kafka-console-producer.sh --broker-list 192.168.225.128:9092 --topic test5  //--broker-list : 值可以为broker集群中的一个或多个节点
this is test





//新开启一个窗口,启动消费者
root@ubuntu:/var/lib/docker/aufs/diff/c097ed7784e8ef038c4d564ca566de3c554ff52813f642220fed7cad0705810e/opt/kafka/bin# ./kafka-console-consumer.sh --zookeeper 192.168.225.128 --topic test5 --from-beginning
this is test

server.properties内容:

root@ubuntu:/# cat ./var/lib/docker/aufs/diff/6befb484b490a6f34d4a0e5ca00f2eda7c92bce972cabf510518ed4ce12b1fef/opt/kafka/config/server.properties
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
# 每一个broker在集群中的唯一表示
broker.id=1

############################# Socket Server Settings #############################

listeners=PLAINTEXT://:9091

# The port the socket server listens on
port=9091

# Hostname the broker will bind to. If not set, the server will bind to all interfaces
#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured.  Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
advertised.host.name=192.168.225.128

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>

# The number of threads handling network requests
num.network.threads=3

# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs-1

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
# zookeeper集群的地址,可以是多个,多个之间用逗号分割
zookeeper.connect=192.168.225.128:2181,192.168.225.128:2182,192.168.225.128:2183

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值