helm charts
By Amit Yadav, Sr. Engineer, DevOps at Ignite Solutions
Ignite Solutions的DevOps高级工程师Amit Yadav
Here’s our step-by-step how-to guide to deploying Kafka Connect on Kubernetes for connecting Kafka to external systems.
这是我们在Kubernetes上部署Kafka Connect以便将Kafka连接到外部系统的分步指南。
Kubernetes (K8s) is one of the most famous open-source projects and it is being continuously adapted. Kafka is an open-source stream-processing software platform that is used by a lot of companies. For example, LinkedIn customizes Apache Kafka for 7 trillion messages per day.
Kubernetes (K8s)是最著名的开源项目之一,并且正在不断地进行修改。 Kafka是许多公司使用的开放源代码流处理软件平台。 例如, LinkedIn将Apache Kafka自定义为每天7万亿条消息 。
So, what is Kafka Connect now? Kafka Connect is an open-source component of Kafka, a framework for connecting Kafka with external systems such as databases, key-value stores, search indexes, and file systems. It makes it simple to quickly define connectors that move large data sets into and out of Kafka.
那么,什么是Kafka Connect? Kafka Connect是Kafka的开源组件,Kafka是用于将Kafka与外部系统(例如数据库,键值存储,搜索索引和文件系统)连接的框架。 它使快速定义将大型数据集移入和移出Kafka的连接器变得简单。
As Kafka Connect is a component of Kafka, the setup will need Kafka broker(s) and Zookeepers (at least, until the Zookeeper dependency is removed). Our setup would look like something below:
由于Kafka Connect是Kafka的组件,因此安装程序将需要Kafka代理和Zookeeper(至少,直到删除Zookeeper依赖项为止)。 我们的设置如下所示:
We will start by setting up a Kafka cluster first. In this blog, we are going to use Confluent’s open-source Helm chart to do that. I am one of the contributors to these Helm charts and those are really good if you want to learn “Kafka on Kubernetes”.
我们将从首先设置一个Kafka集群开始。 在此博客中,我们将使用Confluent的开源Helm图表进行此操作。 我是这些Helm图表的撰稿人之一,如果您想学习“ Kubernetes上的Kafka”,那真的很棒 。
For getting started, make sure a Kubernetes cluster is running (e.g. GKE by Google, EKS by AWS, AKS by Azure, Minikube, etc.) and the following tools are installed in your local system:
首先,请确保Kubernetes集群正在运行(例如Google的GKE , AWS的EKS , Azure的AKS , Minikube等),并且在本地系统中安装了以下工具:
helm (version used for this blog: v2.16.1)
头盔 ( 此博客使用的版本:v2.16.1 )
Kubectl (version used for this blog: v1.15.3)
Kubectl ( 此博客使用的版本:v1.15.3 )
docker (version used for this blog: 19.03.1)
泊坞窗 ( 版本用于此博客:19.03.1)
MySql (Version used for this blog: 5.7)
MySql ( 此博客使用的版本:5.7 )
Let’s start by cloning the repository and updating the dependencies.
让我们从克隆存储库并更新依赖关系开始。
git clone git@github.com:confluentinc/cp-helm-charts.git
cd cp-helm-charts
helm dependency update charts/cp-kafka/
The last command updates the dependencies in the cp-kafka
chart that has a dependency of cp-zookeeper
chart. Installation of cp-kafka
fails without running the update command.
最后一条命令更新cp-kafka
图表中的依赖性,该图表具有cp-zookeeper
图表的依赖性。 如果不运行update命令,则cp-kafka
安装将失败。
Now let’s move ahead and deploy Kafka brokers with Zookeepers with a release name (e.g. confluent
) using the below command:
现在,让我们继续前进,并使用以下命令使用发布名称(例如confluent
)与Zookeeper一起部署Kafka代理:
helm install --name confluent ./charts/cp-kafka
It will take a few minutes before all the pods start running. Let’s verify the resources created with our release are working fine using kubectl.
几分钟后,所有Pod就会开始运行。 让我们验证使用kubectl在我们的发行版中创建的资源是否工作正常。
$ kubectl get podsNAME READY STATUS RESTARTS AGE
confluent-cp-kafka-0 2/2 Running 0 5m16s
confluent-cp-kafka-1 2/2 Running 0 4m47s
confluent-cp-kafka-2 2/2 Running 0 4m29s
confluent-cp-zookeeper-0 2/2 Running 0 5m16s
confluent-cp-zookeeper-1 2/2 Running 0 4m47s
confluent-cp-zookeeper-2 2/2 Running 0 4m21s$ kubectl get servicesNAME TYPE CLUSTER-IP PORT(S) AGE
cp-kafka ClusterIP xx.xx.xxx.x 9092/TCP 5m16s
cp-kafka-headless ClusterIP None 9092/TCP 5m16s
cp-zookeeper ClusterIP xx.xx.xxx.x 2181/TCP 5m16s
cp-zookeeper-headless ClusterIP None 2888/TCP,
3888/TCP 5m16s
If you notice all brokers and zookeepers have 2 containers per pod, one of these is the prometheus
container. You can disable prometheus
by editing the values files or simply setting values from Helm command-line while installing (E.g. helm install --set prometheus.jmx.enabled=false..
)
如果您发现所有经纪人和动物园管理员每个吊舱都有2个容器,则其中一个就是prometheus
容器。 您可以通过在安装时编辑值文件或仅通过Helm命令行设置值来禁用prometheus
(例如helm install --set prometheus.jmx.enabled=false..
)
Since we have the Kafka Connect dependencies in place we can go ahead and deploy the Kafka Connect chart too. However, to read from a MySQL database we will need JDBC Source Connector installed in our container. To do so let’s use the confluentinc/cp-kafka-connect image provided by Confluent and add a line to install JDBC Source Connector. Put the below content in file named Dockerfile
由于我们已经具备Kafka Connect依赖关系,因此我们可以继续部署Kafka Connect图表。 但是,要从MySQL数据库读取数据,我们需要在容器中安装JDBC Source Connector。 为此,我们使用Confluent提供的confluentinc / cp-kafka-connect映像,并添加一行以安装JDBC Source Connector。 将以下内容放入名为Dockerfile
文件中
FROM confluentinc/cp-kafka-connect:5.4 .0 RUN echo "===> Installing MySQL connector" \ && curl https:FROM confluentinc/cp-kafka-connect:5.4.0RUN echo "===> Installing MySQL connector" \
&& curl https://repo1.maven.org/maven2/mysql/mysql-connector-java/8.0.19/mysql-connector-java-8.0.19.jar --output /usr/share/java/kafka-connect-jdbc/mysql-connector-java-8.0.19.jar
NOTE: I have my Kubernetes cluster running on Google Cloud Platform, I will use Google Container Registry to keep my built docker image. You can simply use Dockerhub or any other preferred platform.
注意:我的Kubernetes集群运行在Google Cloud Platform上 ,我将使用Google Container Registry保留构建的docker映像。 您可以简单地使用Dockerhub或任何其他首选平台。
The below commands build and push the docker image to Google Container Registry:
以下命令构建docker镜像并将其推送到Google Container Registry :
docker build -t gcr.io/project123/cp-kafka-connect:5.4.0-jdbc .
docker push gcr.io/project123/cp-kafka-connect:5.4.0-jdbc
The next step is to use the docker image we just built and deploy Kafka connect on Kubernetes:
下一步是使用我们刚刚构建的docker映像,并在Kubernetes上部署Kafka connect:
helm install --name confluent-2 \
--set image="gcr\.io/project123/cp-kafka-connect" \
--set imageTag="5.4.0-jdbc" \
--set kafka.bootstrapServers="PLAINTEXT://confluent-cp-kafka-headless:9092" \
./charts/cp-kafka-connect
Replace name, image, and imageTag and with appropriate values in the above command. Here, kafka.bootstrapServers
is the service and port at which Kafka brokers are running.
在上述命令中,将name , image和imageTag替换为适当的值。 在这里, kafka.bootstrapServers
是运行Kafka代理的服务和端口。
After running kubectl get all
command again, we should see the pod, service, deployment, etc. running for Kafka Connect as well. Make sure the connect worker is healthy.
再次运行kubectl get all
命令后,我们还应该看到Kafka Connect也在运行pod,服务,部署等。 确保连接工人健康。
$ kubectl logs confluent-2-cp-kafka-connect-mvt5d \
--container cp-kafka-connect-server[datetime] INFO Kafka Connect started (org.apache.kafka.connect.runtime.Connect)
[datetime] INFO Herder started (org.apache.kafka.connect.runtime.distributed.DistributedHerder)
Here, confluent-2-cp-kafka-connect-mvt5d
is the name of the pod created for me, it should be something similar for you too, based on the release name you choose (for me release name is : confluent-2
).
在这里, confluent-2-cp-kafka-connect-mvt5d
是为我创建的pod的名称,根据您选择的发布名称,它也应该与您相似(对我而言,发布名称是: confluent-2
) 。
Now we have our Kafka Connect server running, but to read from a database (e.g MySQL) we will need to create connectors. Let’s do that now.
现在,我们的Kafka Connect服务器正在运行,但是要从数据库(例如MySQL )读取数据,我们需要创建连接器 。 现在开始吧。
Presuming we have a MySQL server running somewhere and MySQL client installed on your local system, let’s connect to the MySQL server using appropriate credentials and execute the following SQL statements:
假设我们有一个运行在某个地方MySQL服务器,并且在本地系统上安装了MySQL客户端,让我们使用适当的凭据连接到MySQL服务器并执行以下SQL语句:
# Repace xx.xxx.xxx.xx, and root with appropriate values
$ mysql -u root -h xx.xxx.xxx.xx -pCREATE DATABASE IF NOT EXISTS test_db;
USE test_db;
DROP TABLE IF EXISTS test_table;CREATE TABLE IF NOT EXISTS test_table (
id serial NOT NULL PRIMARY KEY,
name varchar(100),
emailId varchar(200),
branch varchar(200),
updated timestamp default CURRENT_TIMESTAMP NOT NULL,
INDEX `updated_index` (`updated`)
);INSERT INTO test_table (name, emailId, branch) VALUES ('Chandler', 'muriel@venus.com', 'Transponster');INSERT INTO test_table (name, emailId, branch) VALUES ('Joey', 'joseph@tribbiani.com', 'DOOL');exit;
While deploying Kafka Brokers and Zookeepers above, a sample Kafka-client is shown in the outputs for testing. Let’s save that in a file called sample-pod.yaml
and deploy that.
在上面部署Kafka Brokers和Zookeepers时,输出中会显示一个示例Kafka客户端以进行测试。 让我们将其保存在一个名为sample-pod.yaml
的文件中并进行部署。
apiVersion: v1
kind: Pod
metadata:
name: kafka-client
namespace: default
spec:
containers:
- name: kafka-client
image: confluentinc/cp-enterprise-kafka:5.4.1
command:
- sh
- -c
- "exec tail -f /dev/null"
Deploy this sample pod using the below command:
使用以下命令部署此示例容器:
kubectl apply -f sample-pod.yaml
We can verify if the Connect server is working by sending a simple GET
request to Kafka Connect REST endpoint. Read more about the REST API here.
我们可以通过向Kafka Connect REST端点发送简单的GET
请求来验证Connect服务器是否正常工作。 在此处阅读有关REST API的更多信息。
$ kubectl exec -it kafka-client -- curl confluent-2-cp-kafka-connect:8083/connectors
# Output
[]
As there are no connectors yet, we get a SUCCESS
response with an empty list [ ]
. Let's exec
into the container and create a connector:
由于还没有连接器,因此我们得到一个带有空列表[ ]
的SUCCESS
响应。 让我们exec
到容器中并创建一个连接器:
$ kubectl exec -ti confluent-2-cp-kafka-connect-mvt5d \
--container cp-kafka-connect-server -- /bin/bash$ curl -X POST \
-H "Content-Type: application/json" \
--data '{ "name": "k8s-connect-source",
"config": {
"connector.class":"io.confluent.connect.jdbc.JdbcSourceConnector",
"key.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"key.converter.schemas.enable": "false",
"value.converter.schemas.enable": "false",
"tasks.max": 1,
"connection.url":"jdbc:mysql://xx.xxx.xxx.xx/test_dbuser=root&password=ayadav",
"mode": "incrementing",
"incrementing.column.name": "id",
"timestamp.column.name": "updated",
"topic.prefix": "k8s-connect-",
"poll.interval.ms": 1000 } }'\
http://localhost:8083/connectors
Note:
注意:
Make sure to replace the value of
connection.url
with an appropriate value and verify other configurations too. We are usingJsonConverter
in this connector to avoid using Schema-registry (which is recommended) for the simplicity of the article.确保用适当的值替换
connection.url
的值,并还要验证其他配置。 为了JsonConverter
,我们在此连接器中使用JsonConverter
以避免使用Schema-registry (推荐)。These SQL statements and the connector are inspired by this tutorial.
这些SQL语句和连接器均受本教程的启发。
We can verify the status of the connector by running the following command (still from inside the sample Enterprise-Kafka sample pod):
我们可以通过运行以下命令来验证连接器的状态(仍然从示例Enterprise-Kafka示例窗格内部):
$ curl -s -X \
GET http://localhost:8083/connectors/k8s-connect-source/status{"name":"k8s-connect-source","connector":{"state":"RUNNING","worker_id":"10.8.4.2:8083"},"tasks":[{"id":0,"state":"RUNNING","worker_id":"10.8.4.2:8083"}],"type":"source"}
This should have created a list of topics on which Kafka Connect stores the connector configurations and pushes messages every time a new row gets added to the table. Exit the container and run the below command on your machine:
这应该已经创建了一个主题列表,每当向表中添加新行时,Kafka Connect就会在该主题上存储连接器配置并推送消息。 退出容器并在计算机上运行以下命令:
$ kubectl -n default exec kafka-client -- /usr/bin/kafka-topics --zookeeper confluent-cp-zookeeper:2181 --list# Output: List of topics
__confluent.support.metrics
__consumer_offsets
_confluent-metrics
confluent-2-cp-kafka-connect-config
confluent-2-cp-kafka-connect-offset
confluent-2-cp-kafka-connect-status
k8s-connect-test
k8s-connect-test_table# Listen for the messages on the Kafka topic
$ kubectl -n default exec -ti \
kafka-client -- /usr/bin/kafka-console-consumer \
--bootstrap-server confluent-10-cp-kafka:9092 \
--topic k8s-connect-test_table --from-beginning# Output
{"id":1,"name":"Joey","emailId":"joey@tribianni.com","branch":"DOOL","updated":1585514796000}
{"id":2,"name":"Chandler","emailId":"muriel@venus.com","branch":"Transponster","updated":1585514796000}
Furthermore, you can keep the listener shell alive, connect to MySQL again, and add a new row. You should see a new message from this topic in the kubectl
output.
此外,您可以使监听器外壳保持活动状态,再次连接到MySQL,然后添加新行。 您应该在kubectl
输出中看到来自该主题的kubectl
。
To automate the process of creating connectors on the fly while deploying Kafka Connect, have a look at this Pull Request I had submitted that is now merged with the master
branch, and the values.yaml file.
要在部署Kafka Connect时自动动态创建连接器的过程, 请查看我提交的此Pull Request (现在已与master
分支合并)和values.yaml文件。
Please let me know in the comments if you get stuck somewhere, or if you have any suggestions for improvement. You may also be interested in my article on Setting up TCP load balancers in a multi-regional cluster using GKE. Thank you for reading.
如果您被卡在某处或有任何改进建议,请在评论中让我知道。 您可能还对我的文章“ 使用GKE在多区域集群中设置TCP负载均衡器”感兴趣。 感谢您的阅读。
Originally published at https://www.ignitesol.com on August 14, 2020.
最初于 2020年8月14日 发布在 https://www.ignitesol.com 。
翻译自: https://medium.com/swlh/how-to-deploy-kafka-connect-on-kubernetes-using-helm-charts-853a43abc72c
helm charts