Kafka——>flink——>elasticsearch(demo)

目录

系统环境

目录

系统环境

1.3、启动flink集群

1.4、执行jps

1.5、启动web ui页面

二、部署kafka

2.1、下载kafka_2.12-3.2.0版本并解压

2.2、配置配置文件server.properties

2.3、配置配置文件zookeeper.properties

2.4、启动kafka

2.5、验证kafka监听端口

三、部署Elasticsearch

3.1、使用dockers部署ES,启动命令如下:

3.2、访问HTTP 9200地址

四、编写程序

       5.1、idea配置构建信息

六、部署demo jar查看演示结果

6.1、kafka命令

6.2、部署kafka ui程序



系统环境

      CentOS7.9 IP地址:10.10.10.99

     工作空间目录:/home/demo,所有操作都放在工作空间目录。

wget https://dlcdn.apache.org/flink/flink-1.15.1/flink-1.15.1-bin-scala_2.12.tgz

jobmanager.rpc.address: 10.10.10.99
jobmanager.rpc.port: 6123
jobmanager.bind-host: 0.0.0.0
jobmanager.memory.process.size: 1600m
taskmanager.bind-host: 0.0.0.0
taskmanager.host: 10.10.10.99
taskmanager.memory.process.size: 1728m
taskmanager.numberOfTaskSlots: 1
parallelism.default: 2

# The default file system scheme and authority.

# By default file paths without scheme are interpreted relative to the local
# root file system 'file:///'. Use this to override the default and interpret
# relative paths relative to a different file system,
# for example 'hdfs://mynamenode:12345'
#
# fs.default-scheme

#==============================================================================
# High Availability
#==============================================================================

# The high-availability mode. Possible options are 'NONE' or 'zookeeper'.
#
# high-availability: zookeeper

# The path where metadata for master recovery is persisted. While ZooKeeper stores
# the small ground truth for checkpoint and leader election, this location stores
# the larger objects, like persisted dataflow graphs.

# Must be a durable file system that is accessible from all nodes
# (like HDFS, S3, Ceph, nfs, ...) 
#
# high-availability.storageDir: hdfs:///flink/ha/

# The list of ZooKeeper quorum peers that coordinate the high-availability
# setup. This must be a list of the form:
# "host1:clientPort,host2:clientPort,..." (default clientPort: 2181)
#
# high-availability.zookeeper.quorum: localhost:2181


# ACL options are based on https://zookeeper.apache.org/doc/r3.1.2/zookeeperProgrammers.html#sc_BuiltinACLSchemes
# It can be either "creator" (ZOO_CREATE_ALL_ACL) or "open" (ZOO_OPEN_ACL_UNSAFE)
# The default value is "open" and it can be changed to "creator" if ZK security is enabled
#
# high-availability.zookeeper.client.acl: open

#==============================================================================
# Fault tolerance and checkpointing
#==============================================================================

# The backend that will be used to store operator state checkpoints if
# checkpointing is enabled. Checkpointing is enabled when execution.checkpointing.interval > 0.
#
# Execution checkpointing related parameters. Please refer to CheckpointConfig and ExecutionCheckpointingOptions for more details.
#
# execution.checkpointing.interval: 3min
# execution.checkpointing.externalized-checkpoint-retention: [DELETE_ON_CANCELLATION, RETAIN_ON_CANCELLATION]
# execution.checkpointing.max-concurrent-checkpoints: 1
# execution.checkpointing.min-pause: 0
# execution.checkpointing.mode: [EXACTLY_ONCE, AT_LEAST_ONCE]
# execution.checkpointing.timeout: 10min
# execution.checkpointing.tolerable-failed-checkpoints: 0
# execution.checkpointing.unaligned: false
#
# Supported backends are 'hashmap', 'rocksdb', or the
# <class-name-of-factory>.
#
# state.backend: hashmap

# Directory for checkpoints filesystem, when using any of the default bundled
# state backends.
#
# state.checkpoints.dir: hdfs://namenode-host:port/flink-checkpoints

# Default target directory for savepoints, optional.
#
# state.savepoints.dir: hdfs://namenode-host:port/flink-savepoints

# Flag to enable/disable incremental checkpoints for backends that
# support incremental checkpoints (like the RocksDB state backend). 
#
# state.backend.incremental: false

# The failover strategy, i.e., how the job computation recovers from task failures.
# Only restart tasks that may have been affected by the task failure, which typically includes
# downstream tasks and potentially upstream tasks if their produced data is no longer available for consumption.

jobmanager.execution.failover-strategy: region

#==============================================================================
# Rest & web frontend
#==============================================================================

# The port to which the REST client connects to. If rest.bind-port has
# not been specified, then the server will bind to this port as well.
#
rest.port: 8081

# The address to which the REST client will connect to
#
rest.address: 10.10.10.99

# Port range for the REST and web server to bind to.
#
#rest.bind-port: 8080-8090

# The address that the REST & web server binds to
# By default, this is localhost, which prevents the REST & web server from
# being able to communicate outside of the machine/container it is running on.
#
# To enable this, set the bind address to one that has access to outside-facing
# network interface, such as 0.0.0.0.
#
rest.bind-address: 0.0.0.0

# Flag to specify whether job submission is enabled from the web-based
# runtime monitor. Uncomment to disable.

web.submit.enable: true

# Flag to specify whether job cancellation is enabled from the web-based
# runtime monitor. Uncomment to disable.

web.cancel.enable: true

#==============================================================================
# Advanced
#==============================================================================

# Override the directories for temporary files. If not specified, the
# system-specific Java temporary directory (java.io.tmpdir property) is taken.
#
# For framework setups on Yarn, Flink will automatically pick up the
# containers' temp directories without any need for configuration.
#
# Add a delimited list for multiple directories, using the system directory
# delimiter (colon ':' on unix) or a comma, e.g.:
#     /data1/tmp:/data2/tmp:/data3/tmp
#
# Note: Each directory entry is read from and written to by a different I/O
# thread. You can include the same directory multiple times in order to create
# multiple I/O threads against that directory. This is for example relevant for
# high-throughput RAIDs.
#
io.tmp.dirs: /home/demo/flink-1.15.1/tmp

# The classloading resolve order. Possible values are 'child-first' (Flink's default)
# and 'parent-first' (Java's default).
#
# Child first classloading allows users to use different dependency/library
# versions in their application than those in the classpath. Switching back
# to 'parent-first' may help with debugging dependency issues.
#
# classloader.resolve-order: child-first

# The amount of memory going to the network stack. These numbers usually need 
# no tuning. Adjusting them may be necessary in case of an "Insufficient number
# of network buffers" error. The default min is 64MB, the default max is 1GB.

taskmanager.memory.network.fraction: 0.1
taskmanager.memory.network.min: 64mb
taskmanager.memory.network.max: 1gb

#==============================================================================
# Flink Cluster Security Configuration
#==============================================================================

# Kerberos authentication for various components - Hadoop, ZooKeeper, and connectors -
# may be enabled in four steps:
# 1. configure the local krb5.conf file
# 2. provide Kerberos credentials (either a keytab or a ticket cache w/ kinit)
# 3. make the credentials available to various JAAS login contexts
# 4. configure the connector to use JAAS/SASL

# The below configure how Kerberos credentials are provided. A keytab will be used instead of
# a ticket cache if the keytab path and principal are set.

# security.kerberos.login.use-ticket-cache: true
# security.kerberos.login.keytab: /path/to/kerberos/keytab
# security.kerberos.login.principal: flink-user

# The configuration below defines which JAAS login contexts

# security.kerberos.login.contexts: Client,KafkaClient

#==============================================================================
# ZK Security Configuration
#==============================================================================

# Below configurations are applicable if ZK ensemble is configured for security

# Override below configuration to provide custom ZK service name if configured
# zookeeper.sasl.service-name: zookeeper

# The configuration below must match one of the values set in "security.kerberos.login.contexts"
# zookeeper.sasl.login-context-name: Client

#==============================================================================
# HistoryServer
#==============================================================================

# The HistoryServer is started and stopped via bin/historyserver.sh (start|stop)

# Directory to upload completed jobs to. Add this directory to the list of
# monitored directories of the HistoryServer as well (see below).
jobmanager.archive.fs.dir: /home/demo/flink-1.15.1/completed-jobs

# The address under which the web-based HistoryServer listens.
historyserver.web.address: 0.0.0.0

# The port under which the web-based HistoryServer listens.
historyserver.web.port: 8082

# Comma separated list of directories to monitor for completed jobs.
historyserver.archive.fs.dir: /home/demo/flink-1.15.1/history-jobs

# Interval in milliseconds for refreshing the monitored directories.
historyserver.archive.fs.refresh-interval: 10000

1.3、启动flink集群

/home/demo/flink-1.15.1/bin/start-cluster.sh
 

1.4、执行jps

1.5、启动web ui页面

Apache Flink Web Dashboard

二、部署kafka

2.1、下载kafka_2.12-3.2.0版本并解压

wget https://dlcdn.apache.org/kafka/3.2.0/kafka_2.12-3.2.0.tgz

2.2、配置配置文件server.properties

vim /home/demo/kafka_2.12-3.2.0/config/server.properties

内容:

broker.id=0
listeners=PLAINTEXT://0.0.0.0:9092
advertised.listeners=PLAINTEXT://10.10.10.99:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/home/demo/kafka_2.12-3.2.0/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.hours=24
log.retention.bytes=1073741824
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=18000
group.initial.rebalance.delay.ms=0

2.3、配置配置文件zookeeper.properties

vim /home/demo/kafka_2.12-3.2.0/config/zookeepe.properties

内容:

dataDir=/home/demo/kafka_2.12-3.2.0/logs/zookeeper
clientPort=2181
maxClientCnxns=0
admin.enableServer=true
admin.serverPort=18080

2.4、启动kafka

先启动kafka的内置zookeeper

/home/demo/kafka_2.12-3.2.0/bin/zookeeper-server-start.sh -daemon /home/demo/kafka_2.12-3.2.0/config/zookeeper.properties

在启动kafka server

/home/demo/kafka_2.12-3.2.0/bin/kafka-server-start.sh -daemon /home/demo/kafka_2.12-3.2.0/config/server.properties

2.5、验证kafka监听端口

netstat -ntulp | grep 2181 (zookeeper)

netstat -ntulp | grep 9200 (kafka)

三、部署Elasticsearch

3.1、使用dockers部署ES,启动命令如下:

docker network create es

docker run -d --restart=unless-stopped --privileged --name elasticsearch --net es -v /home/demo/elasticsearch/data:/usr/share/elasticsearch/data -v /home/demo/elasticsearch/logs:/usr/share/elasticsearch/logs -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" elasticsearch:7.17.5

3.2、访问HTTP 9200地址

10.10.10.99:9200

四、编写程序

https://download.csdn.net/download/TT1024167802/86248701

五、IDEA打包成可执行jar文件

       5.1、idea配置构建信息

demo使用maven构建,多次导出运行java -jar 命令后,出现找不到主类问题,使用Maven Helper解决Jar冲突后运行,正常

六、部署demo jar查看演示结果

6.1、kafka命令

/home/demo/kafka_2.12-3.2.0/bin/kafka-topics.sh  --bootstrap-server 10.10.10.99:9092 --list --describe


/home/demo/kafka_2.12-3.2.0/bin/kafka-topics.sh  --bootstrap-server 10.10.10.99:9092 --create --partitions 1 --replication-factor 1 --topic demo


/home/demo/kafka_2.12-3.2.0/bin/kafka-console-consumer.sh  --bootstrap-server 10.10.10.99:9092 --topic demo --from-beginning


/home/demo/kafka_2.12-3.2.0/bin/kafka-console-producer.sh  --bootstrap-server 10.10.10.99:9092 --topic demo

6.2、部署kafka ui程序

docker run -p 8061:8080 \
	-e KAFKA_CLUSTERS_0_NAME=local \
	-e KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS=10.10.10.99:9092 \
	-d provectuslabs/kafka-ui:latest 

docker run -d --rm -v /home/demo/protobuf_desc:/var/protobuf_desc -p 9000:9000 -e KAFKA_BROKERCONNECT=10.10.10.99:9092 -e JVM_OPTS="-Xms32M -Xmx64M" -e SERVER_SERVLET_CONTEXTPATH="/" -e CMD_ARGS="--message.format=PROTOBUF --protobufdesc.directory=/var/protobuf_desc" obsidiandynamics/kafdrop

打包jar方式运行

java --add-opens=java.base/sun.nio.ch=ALL-UNNAMED \
    -jar kafdrop.jar \
    --kafka.brokerConnect=10.10.10.99:9092 --server.port=9000 --management.server.port=9002

 

 


 

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
如果您想使用Flink 1.12将Kafka数据写入Elasticsearch中,可以按照以下步骤操作: 1. 首先,您需要在项目中添加FlinkKafkaElasticsearch依赖,例如: ```xml <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-kafka_2.11</artifactId> <version>1.12.0</version> </dependency> <dependency> <groupId>org.apache.flink</groupId> <artifactId>flink-connector-elasticsearch7_2.11</artifactId> <version>1.12.0</version> </dependency> ``` 2. 创建一个Flink Streaming Job,并使用Kafka作为数据源,例如: ```java Properties props = new Properties(); props.setProperty("bootstrap.servers", "localhost:9092"); props.setProperty("group.id", "test"); DataStream<String> stream = env.addSource(new FlinkKafkaConsumer<>("my-topic", new SimpleStringSchema(), props)); ``` 3. 将数据转换为Elasticsearch的数据格式,并将其写入Elasticsearch中,例如: ```java List<HttpHost> httpHosts = new ArrayList<>(); httpHosts.add(new HttpHost("localhost", 9200, "http")); stream.map(new MapFunction<String, Map<String, Object>>() { @Override public Map<String, Object> map(String value) throws Exception { // 将数据转换为Elasticsearch的数据格式 Map<String, Object> data = new HashMap<>(); data.put("message", value); data.put("@timestamp", new Date()); return data; } }).addSink(new ElasticsearchSink.Builder<>(httpHosts, new ElasticsearchSinkFunction<Map<String, Object>>() { @Override public void process(Map<String, Object> element, RuntimeContext ctx, RequestIndexer indexer) { // 将数据写入Elasticsearch中 IndexRequest request = Requests.indexRequest() .index("my-index") .source(element); indexer.add(request); } }).build()); ``` 上述代码中,我们将Kafka中的数据转换为Elasticsearch的数据格式,然后使用ElasticsearchSinkFunction将数据写入Elasticsearch中。 希望这些能够帮到您!

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值