kafka-0.8介绍、安装、常用指令

kafka-0.8介绍、安装、常用指令

大家好,我是W

今天给大家带来Kafka的介绍、安装及其常用指令,我也是最近这段时间才学习到kafka,所以理解不深也无法给大家带来更深刻的东西,希望这篇博客对大家有用。我学习Kafka的时候是在学习Spark-Streaming的时候顺便学习的,下面我们的顺序是:Kafka的介绍、Kafka的安装、Kafka相关命令。

1、 Kafka介绍

Kafka是由Linkedin公司开发的基于zookeeper协调的分布式、支持分区、支持多副本的分布式消息系统,是一种高吞吐量的分布式发布订阅消息系统。它是由Scala和Java编写的,并在2010年由Linkedin公司贡献给Apache基金会,最终成为Apache的顶级开源项目。

Kafka的最大特点就是可以实时的处理大规模数据流以满足各种需求场景,特点给大家列举一下:

  • 高吞吐量:Kafka每秒可以产生约25万条(50M)消息,每秒处理55万条消息(110M)。
  • 持久化数据存储:支持将数据持久化到磁盘,因此可以用于批量消费。
  • 分布式系统易于扩展:所有的producer、broker、consumer都有多个,可以不停机扩展。

kafka官网

2、 Kafka的安装(CentOS 6.x)

2.1 环境介绍

  • zookeeper-3.4.6集群(node-02,node-03,node-04)
  • scala版本:2.11
  • kafka_2.11-0.8.2.2:scala版本2.11,kafka版本0.8.2.2

2.2 kafka安装

下面我将演示我在CentOS 6.10下安装Kafka-0.8的步骤,但是在安装Kafka前需要确保自己的机器上配置好了zookeeper集群

2.2.1 上传文件

通过Xftp上传 kafka_2.11-0.8.2.2.tgz 安装包到linux系统中,这里我放在/root目录下:

在这里插入图片描述

2.2.2 解压文件夹

输入命令,解压至指定目录:

tar -zxvf kafka_2.11-0.8.2.2.tgz -C /root/apps/
2.2.3 修改配置文件

首先我们可以进入kafka文件夹里看看里面的结构,ls一下可以看到:

ls
bin  config  libs  LICENSE  NOTICE

显然,kafka的目录结构很简单,LICENSE NOTICE不用管,libs里面都是装的依赖,config显然装的是一些配置文件,bin下面装的是一些脚本。

进入config,修改server.properties

cd /config
vi server.properties

我们将会看到这样的一个配置文件**(不用仔细看这里,下面会对关键配置项讲解)**:

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0

############################# Socket Server Settings #############################

# The port the socket server listens on
port=9092

# Hostname the broker will bind to. If not set, the server will bind to all interfaces
#host.name=localhost

# Hostname the broker will advertise to producers and consumers. If not set, it uses the
# value for "host.name" if configured.  Otherwise, it will use the value returned from
# java.net.InetAddress.getCanonicalHostName().
#advertised.host.name=<hostname routable by clients>

# The port to publish to ZooKeeper for clients to use. If this is not set,
# it will publish the same port that the broker binds to.
#advertised.port=<port accessible by clients>

# The number of threads handling network requests
num.network.threads=3

# The number of threads doing disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma seperated list of directories under which to store log files
log.dirs=/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=1

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to exceessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion
log.retention.hours=168

# A size-based retention policy for logs. Segments are pruned from the log as long as the remaining
# segments don't drop below log.retention.bytes.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

# By default the log cleaner is disabled and the log retention policy will default to just delete segments after their retention expires.
# If log.cleaner.enable=true is set the cleaner will be enabled and individual logs can then be marked for log compaction.
log.cleaner.enable=false

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.
zookeeper.connect=localhost:2181

上面这个配置文件大部分都是注释(打#部分),所以我会讲解部分参数:

  • broker.id=0 # 在配置集群过程中需要对broker.id做配置,要求每一台机器的id不同。
  • port=9092 # 这是kafka对外提供服务的时候访问的端口。
  • host.name=localhost # 该台机器绑定的ip。
  • log.dirs=/tmp/kafka-logs # kafka在执行任务过程中存在以日志形式持久化环节,这个是持久化数据的路径,而不是kafka的log的路径。
  • num.partitions=1 # 这个是kafka每个topic产生的分区数,日后数据持久化将以文件形式保存,分区数对应着文件数,分区数增大支持更大的并行消费能力,但过大的分区数也会导致文件数增多。
  • log.retention.hours=168 # 日志保存最大时间,大于这个时间会被清洗(单位:小时)。
  • zookeeper.connect=localhost:2181 # zookeeper通讯地址,多个地址间用逗号分隔。
经过上面的讲解,已经了解到一些参数,而我们需要配置的参数有以下几个:
  • broker.id=0
  • host.name=localhost
  • log.dirs=/tmp/kafka-logs
  • zookeeper.connect=localhost:2181

每个参数具体作用以及该怎么配置我已经讲清楚了,下面是我的配置,大家可以根据实际情况配置:

  • broker.id=0
  • host.name=192.168.120.21
  • log.dirs=/roo/app/kafka-logs
  • zookeeper.connect=1192.168.120.21:2181,92.168.120.22:2181,192.168.120.23:2181
2.2.4 集群配置

接下来还需要把安装包以及修改后的文件拷贝到其他机器上,需要免密操作教程的同学可以参考:Linux(CentOS 6.10)的联网配置和免密登录配置

scp -r /root/apps/kafka_2.11-0.8.2.2/ node-02:/root/apps/

重复执行上述命令,记得修改主机名。

然后逐台机器修改其中配置文件的下面几个参数即可:

  • broker.id=0
  • host.name=localhost

2.3 启动Kafka

在启动kafka前请先启动zookeeper集群,然后对每一台机器执行以下命令(当然要进入到对应的目录,或者使用全路径名):

sh kafka-server-start.sh -daemon /root/apps/kafka_2.11-0.8.2.2/config/server.properties

那么我们的kafka进程就启动了,可以通过JPS查看。

3、 Kafka相关命令和操作

启动kafka
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-server-start.sh -daemon /bigdata/kafka_2.11-0.10.2.1/config/server.properties 
停止kafka
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-server-stop.sh 
创建topic
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-topics.sh --create --zookeeper node-1.xiaoniu.com:2181,node-2.xiaoniu.com:2181,node-3.xiaoniu.com:2181 --replication-factor 3 --partitions 3 --topic my-topic
列出所有topic
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-topics.sh --list --zookeeper node-1.xiaoniu.com:2181,node-2.xiaoniu.com:2181,node-3.xiaoniu.com:2181
查看某个topic信息
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-topics.sh --describe --zookeeper node-1.xiaoniu.com:2181,node-2.xiaoniu.com:2181,node-3.xiaoniu.com:2181 --topic my-topic
启动一个命令行的生产者
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-console-producer.sh --broker-list spark-02:9092,spark-03:9092,spark-04:9092 --topic djm2
启动一个命令行的消费者
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-console-consumer.sh --zookeeper node-1.xiaoniu.com:2181,node-2.xiaoniu.com:2181,node-3.xiaoniu.com:2181 --topic my-topic --from-beginning
消费者连接到borker的地址
sh /bigdata/kafka_2.11-0.10.2.1/bin/kafka-console-consumer.sh --bootstrap-server node-1.xiaoniu.com:9092,node-2.xiaoniu.com:9092,node-3.xiaoniu.com:9092 --topic xiaoniu --from-beginning 

参考

提供几篇博客给大家参考:

总结

kafka的使用还是比较简单的,大家跟着上诉步骤一步步来应该没什么问题,接下来我将继续深入学习kafka后给大家带来更深刻的内容。祝各位学业、事业有成!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值