Kafka——＞flink——＞elasticsearch(demo)

青春不流名

已于 2022-07-24 13:21:34 修改

阅读量873

点赞数

分类专栏：问题记录文章标签：大数据

于 2022-07-21 20:31:30 首次发布

本文链接：https://blog.csdn.net/TT1024167802/article/details/125920323

版权

问题记录专栏收录该内容

39 篇文章 0 订阅

订阅专栏

系统环境

一、部署Flink

1.1、下载flink-1.15.1-bin-scala_2.12版本并解压。

1.2、配置flink-conf.yaml文件

2.1、下载kafka_2.12-3.2.0版本并解压

2.2、配置配置文件server.properties

2.3、配置配置文件zookeeper.properties

2.4、启动kafka

2.5、验证kafka监听端口

三、部署Elasticsearch

3.1、使用dockers部署ES，启动命令如下：

系统环境

CentOS7.9 IP地址:10.10.10.99

工作空间目录：/home/demo，所有操作都放在工作空间目录。

一、部署Flink

1.1、下载flink-1.15.1-bin-scala_2.12版本并解压。

wget https://dlcdn.apache.org/flink/flink-1.15.1/flink-1.15.1-bin-scala_2.12.tgz

1.2、配置flink-conf.yaml文件

jobmanager.rpc.address: 10.10.10.99
jobmanager.rpc.port: 6123
jobmanager.bind-host: 0.0.0.0
jobmanager.memory.process.size: 1600m
taskmanager.bind-host: 0.0.0.0
taskmanager.host: 10.10.10.99
taskmanager.memory.process.size: 1728m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 2

# The default file system scheme and authority.
#
# By default file paths without scheme are interpreted relative to the local
# root file system 'file:///'. Use this to override the default and interpret
# relative paths relative to a different file system,
# for example 'hdfs://mynamenode:12345'
#
# fs.default-scheme

#==============================================================================
# High Availability
#==============================================================================

# The high-availability mode. Possible options are 'NONE' or 'zookeeper'.
#
# high-availability: zookeeper

# The path where metadata for master recovery is persisted. While ZooKeeper stores
# the small ground truth for checkpoint and leader election, this location stores
# the larger objects, like persisted dataflow graphs.
#
# Must be a durable file system that is accessible from all nodes
# (like HDFS, S3, Ceph, nfs, ...)
#
# high-availability.storageDir: hdfs:///flink/ha/

# The list of ZooKeeper quorum peers that coordinate the high-availability
# setup. This must be a list of the form:
# "host1:clientPort,host2:clientPort,..." (default clientPort: 2181)
#
# high-availability.zookeeper.quorum: localhost:2181

# ACL options are based on https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes
# It can be either "creator" (ZOO_CREATE_ALL_ACL) or "open" (ZOO_OPEN_ACL_UNSAFE)
# The default value is "open" and it can be changed to "creator" if ZK security is enabled
#
# high-availability.zookeeper.client.acl: open

#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================

# The backend that will be used to store operator state checkpoints if
# checkpointing is enabled. Checkpointing is enabled when execution.checkpointing.interval > 0.
#
# Execution checkpointing related parameters. Please refer to CheckpointConfig and ExecutionCheckpointingOptions for more details.
#
# execution.checkpointing.interval: 3min
# execution.checkpointing.externalized-checkpoint-retention: [DELETE_ON_CANCELLATION, RETAIN_ON_CANCELLATION]
# execution.checkpointing.max-concurrent-checkpoints: 1
# execution.checkpointing.min-pause: 0
# execution.checkpointing.mode: [EXACTLY_ONCE, AT_LEAST_ONCE]
# execution.checkpointing.timeout: 10min
# execution.checkpointing.tolerable-failed-checkpoints: 0
# execution.checkpointing.unaligned: false
#
# Supported backends are 'hashmap', 'rocksdb', or the
# <class-name-of-factory>.
#
# state.backend: hashmap

# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#
# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints

# Default target directory for savepoints, optional.
#
# state.savepoints.dir: hdfs://namenode-host:port/flink-savepoints

# Flag to enable/disable incremental checkpoints for backends that
# support incremental checkpoints (like the RocksDB state backend).
#
# state.backend.incremental: false

# The failover strategy, i.e., how the job computation recovers from task failures.
# Only restart tasks that may have been affected by the task failure, which typically includes
# downstream tasks and potentially upstream tasks if their produced data is no longer available for consumption.

jobmanager.execution.failover-strategy: region

#==============================================================================
# Rest & web frontend
#==============================================================================

# The port to which the REST client connects to. If rest.bind-port has
# not been specified, then the server will bind to this port as well.
#
rest.port: 8081

# The address to which the REST client will connect to
#
rest.address: 10.10.10.99

# Port range for the REST and web server to bind to.
#
#rest.bind-port: 8080-8090

# The address that the REST & web server binds to
# By default, this is localhost, which prevents the REST & web server from
# being able to communicate outside of the machine/container it is running on.
#
# To enable this, set the bind address to one that has access to outside-facing
# network interface, such as 0.0.0.0.
#
rest.bind-address: 0.0.0.0

# Flag to specify whether job submission is enabled from the web-based
# runtime monitor. Uncomment to disable.

web.submit.enable: true

# Flag to specify whether job cancellation is enabled from the web-based
# runtime monitor. Uncomment to disable.

web.cancel.enable: true

#==============================================================================
# Advanced
#==============================================================================

# Override the directories for temporary files. If not specified, the
# system-specific Java temporary directory (java.io.tmpdir property) is taken.
#
# For framework setups on Yarn, Flink will automatically pick up the
# containers' temp directories without any need for configuration.
#
# Add a delimited list for multiple directories, using the system directory
# delimiter (colon ':' on unix) or a comma, e.g.:
# /data1/tmp:/data2/tmp:/data3/tmp
#
# Note: Each directory entry is read from and written to by a different I/O
# thread. You can include the same directory multiple times in order to create
# multiple I/O threads against that directory. This is for example relevant for
# high-throughput RAIDs.
#
io.tmp.dirs: /home/demo/flink-1.15.1/tmp

# The classloading resolve order. Possible values are 'child-first' (Flink's default)
# and 'parent-first' (Java's default).
#
# Child first classloading allows users to use different dependency/library
# versions in their application than those in the classpath. Switching back
# to 'parent-first' may help with debugging dependency issues.
#
# classloader.resolve-order: child-first

# The amount of memory going to the network stack. These numbers usually need
# no tuning. Adjusting them may be necessary in case of an "Insufficient number
# of network buffers" error. The default min is 64MB, the default max is 1GB.
#
taskmanager.memory.network.fraction: 0.1
taskmanager.memory.network.min: 64mb
taskmanager.memory.network.max: 1gb

#==============================================================================
# Flink Cluster Security Configuration
#==============================================================================

# Kerberos authentication for various components - Hadoop, ZooKeeper, and connectors -
# may be enabled in four steps:
# 1. configure the local krb5.conf file
# 2. provide Kerberos credentials (either a keytab or a ticket cache w/ kinit)
# 3. make the credentials available to various JAAS login contexts
# 4. configure the connector to use JAAS/SASL

# The below configure how Kerberos credentials are provided. A keytab will be used instead of
# a ticket cache if the keytab path and principal are set.

# security.kerberos.login.use-ticket-cache: true
# security.kerberos.login.keytab: /path/to/kerberos/keytab
# security.kerberos.login.principal: flink-user

# The configuration below defines which JAAS login contexts

# security.kerberos.login.contexts: Client,KafkaClient

#==============================================================================
# ZK Security Configuration
#==============================================================================

# Below configurations are applicable if ZK ensemble is configured for security

# Override below configuration to provide custom ZK service name if configured
# zookeeper.sasl.service-name: zookeeper

# The configuration below must match one of the values set in "security.kerberos.login.contexts"
# zookeeper.sasl.login-context-name: Client

#==============================================================================
# HistoryServer
#==============================================================================

# The HistoryServer is started and stopped via bin/historyserver.sh (start|stop)

# Directory to upload completed jobs to. Add this directory to the list of
# monitored directories of the HistoryServer as well (see below).
jobmanager.archive.fs.dir: /home/demo/flink-1.15.1/completed-jobs

# The address under which the web-based HistoryServer listens.
historyserver.web.address: 0.0.0.0

# The port under which the web-based HistoryServer listens.
historyserver.web.port: 8082

# Comma separated list of directories to monitor for completed jobs.
historyserver.archive.fs.dir: /home/demo/flink-1.15.1/history-jobs

# Interval in milliseconds for refreshing the monitored directories.
historyserver.archive.fs.refresh-interval: 10000

1.3、启动flink集群

/home/demo/flink-1.15.1/bin/start-cluster.sh

1.4、执行jps

1.5、启动web ui页面

Apache Flink Web Dashboard

二、部署kafka

2.1、下载kafka_2.12-3.2.0版本并解压

wget https://dlcdn.apache.org/kafka/3.2.0/kafka_2.12-3.2.0.tgz

2.2、配置配置文件server.properties

vim /home/demo/kafka_2.12-3.2.0/config/server.properties

内容：

broker.id=0
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://10.10.10.99:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/home/demo/kafka_2.12-3.2.0/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=24
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0

2.3、配置配置文件zookeeper.properties

vim /home/demo/kafka_2.12-3.2.0/config/zookeepe.properties

内容：

dataDir=/home/demo/kafka_2.12-3.2.0/logs/zookeeper
clientPort=2181
maxClientCnxns=0
admin.enableServer=true
admin.serverPort=18080

2.4、启动kafka

先启动kafka的内置zookeeper

/home/demo/kafka_2.12-3.2.0/bin/zookeeper-server-start.sh -daemon /home/demo/kafka_2.12-3.2.0/config/zookeeper.properties

在启动kafka server

/home/demo/kafka_2.12-3.2.0/bin/kafka-server-start.sh -daemon /home/demo/kafka_2.12-3.2.0/config/server.properties

2.5、验证kafka监听端口

netstat -ntulp | grep 2181 (zookeeper)

netstat -ntulp | grep 9200 (kafka)

三、部署Elasticsearch

3.1、使用dockers部署ES，启动命令如下：

docker network create es

docker run -d --restart=unless-stopped --privileged --name elasticsearch --net es -v /home/demo/elasticsearch/data:/usr/share/elasticsearch/data -v /home/demo/elasticsearch/logs:/usr/share/elasticsearch/logs -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.17.5

3.2、访问HTTP 9200地址

10.10.10.99:9200

四、编写程序

https://download.csdn.net/download/TT1024167802/86248701

五、IDEA打包成可执行jar文件

5.1、idea配置构建信息

demo使用maven构建，多次导出运行java -jar 命令后，出现找不到主类问题，使用Maven Helper解决Jar冲突后运行，正常

六、部署demo jar查看演示结果

6.1、kafka命令

/home/demo/kafka_2.12-3.2.0/bin/kafka-topics.sh --bootstrap-server 10.10.10.99:9092 --list --describe

/home/demo/kafka_2.12-3.2.0/bin/kafka-topics.sh --bootstrap-server 10.10.10.99:9092 --create --partitions 1 --replication-factor 1 --topic demo

/home/demo/kafka_2.12-3.2.0/bin/kafka-console-consumer.sh --bootstrap-server 10.10.10.99:9092 --topic demo --from-beginning

/home/demo/kafka_2.12-3.2.0/bin/kafka-console-producer.sh --bootstrap-server 10.10.10.99:9092 --topic demo

6.2、部署kafka ui程序

docker run -p 8061:8080 \
	-e KAFKA_CLUSTERS_0_NAME=local \
	-e KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=10.10.10.99:9092 \
	-d provectuslabs/kafka-ui:latest

docker run -d --rm -v /home/demo/protobuf_desc:/var/protobuf_desc -p 9000:9000 -e KAFKA_BROKERCONNECT=10.10.10.99:9092 -e JVM_OPTS="-Xms32M -Xmx64M" -e SERVER_SERVLET_CONTEXTPATH="/" -e CMD_ARGS="--message.format=PROTOBUF --protobufdesc.directory=/var/protobuf_desc" obsidiandynamics/kafdrop

打包jar方式运行

java --add-opens=java.base/sun.nio.ch=ALL-UNNAMED \
-jar kafdrop.jar \
--kafka.brokerConnect=10.10.10.99:9092 --server.port=9000 --management.server.port=9002