elasticsearch 实现与mysql 数据同步

elasticsearch 实现与mysql 数据同步


一.安装elasticsearch

#因为elasticsearch 是基于java平台,所以需要先安装java
[root@localhost ~]# java -version  #查看是否安装java
[root@localhost ~]# yum search java  #查找java 版本
[root@localhost ~]# yum install java-17-openjdk-headless.x86_64  #安装java-17-openjdk-headless 版本
[root@localhost ~]# java -version  
openjdk version "17.0.6" 2023-01-17 LTS
OpenJDK Runtime Environment (Red_Hat-17.0.6.0.10-3.el9) (build 17.0.6+10-LTS)
OpenJDK 64-Bit Server VM (Red_Hat-17.0.6.0.10-3.el9) (build 17.0.6+10-LTS, mixed mode, sharing)

#安装elasticsearch
[root@localhost home]# wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-8.12.2-linux-x86_64.tar.gz  #下载elasticsearch,也可以直接官网下载https://www.elastic.co/cn/elasticsearch  安装包
[root@localhost home]# tar -zxvf elasticsearch-8.12.2-linux-x86_64.tar.gz #解压

#启动elasticsearch
[root@localhost home]# cd /home/elasticsearch-8.12.2/
[root@localhost elasticsearch-8.12.2]# ./bin/elasticsearch
CompileCommand: exclude org/apache/lucene/util/MSBRadixSorter.computeCommonPrefixLengthAndBuildHistogram bool exclude = true
CompileCommand: exclude org/apache/lucene/util/RadixSelector.computeCommonPrefixLengthAndBuildHistogram bool exclude = true
二月 23, 2024 11:25:43 上午 sun.util.locale.provider.LocaleProviderAdapter <clinit>
WARNING: COMPAT locale provider will be removed in a future release
[2024-02-23T11:25:44,075][ERROR][o.e.b.Elasticsearch      ] [localhost] fatal exception while booting Elasticsearchjava.lang.RuntimeException: can not run elasticsearch as root
	at org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.Elasticsearch.initializeNatives(Elasticsearch.java:282)
	at org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.Elasticsearch.initPhase2(Elasticsearch.java:167)
	at org.elasticsearch.server@8.12.2/org.elasticsearch.bootstrap.Elasticsearch.main(Elasticsearch.java:72)

See logs for more details.

ERROR: Elasticsearch did not exit normally - check the logs at /home/elasticsearch-8.12.2/logs/elasticsearch.log

ERROR: Elasticsearch exited unexpectedly, with exit code 1
#查看elasticsearch的日志  /home/elasticsearch-8.12.2/logs/elasticsearch.log,我这边提示是不能以root用户启动elasticsearch
[root@localhost home]# useradd es  #添加用户
[root@localhost home]# passwd es  #密码
更改用户 es 的密码 。
新的密码: 
无效的密码: 密码少于 8 个字符
重新输入新的密码: 
passwd:所有的身份验证令牌已经成功更新。
[root@localhost home]# chown -R es:es /home/elasticsearch-8.12.2  #更改文件所在的用户和组
[root@localhost home]# chmod -R 777 /home/elasticsearch-8.12.2 #修改权限
[root@localhost home]# su es  #切换es用户
[es@localhost elasticsearch-8.12.2]$ ./bin/elasticsearch  #启动elasticsearch  (-d 是后台启动)
[2024-02-23T14:17:51,625][WARN ][o.e.h.n.Netty4HttpServerTransport] [localhost] received plaintext http traffic on an https channel, closing connection Netty4HttpChannel{localAddress=/192.168.243.134:9200, remoteAddress=/192.168.243.1:62306}  #浏览器访问http://ip:9200 提示此错误,因为elasticsearch 默认开启了ssl认证,修改(config/elasticsearch.yml 中xpack.security.enabled:false 改为false)
[es@localhost elasticsearch-8.12.2]$ ./bin/elasticsearch  #重新启动

elasticsearch 启动成功

二.安装mysql

mysql8相关的安装可以看下另一篇博客
https://editor.csdn.net/md/?articleId=135905811

三.安装logstash

1.下载安装logstash

[root@localhost home]# wget https://artifacts.elastic.co/downloads/logstash/logstash-8.12.2-linux-x86_64.tar.gz  #与elasticsearch 同版本
[root@localhost home]# tar -zxvf logstash-8.12.2-linux-x86_64.tar.gz  #解压

2.logstash 配置
logstash.yml

# Settings file in YAML
#
# Settings can be specified either in hierarchical form, e.g.:
# 使用分层形式设置管道批处理大小和批处理延迟
#   pipeline:
#     batch:
#       size: 125   #管道批处理大小
#       delay: 5    #管道批处理延迟
#
# Or as flat keys:
#  要表达与平面关键点相同的值
#   pipeline.batch.size: 125
#   pipeline.batch.delay: 5
#
# ------------  Node identity ------------
#
# Use a descriptive name for the node:
#
# node.name: test  #节点名称,在集群中具备唯一性,默认为logstash主机的主机名
#
# If omitted the node name will default to the machine's host name
#
# ------------ Data path ------------------
#
# Which directory should be used by logstash and its plugins
# for any persistent needs. Defaults to LOGSTASH_HOME/data
#
# path.data:  #Logstash及其插件使用目录
#
# ------------ Pipeline Settings --------------
#
# The ID of the pipeline.
#
# pipeline.id: main  #管道id,默认为main
#
# Set the number of workers that will, in parallel, execute the filters+outputs
# stage of the pipeline.
#
# This defaults to the number of the host's CPU cores.
#
# pipeline.workers: 2 #并行执行管道的筛选器和输出阶段的工作者的数量,默认值为CPU的核数
#
# How many events to retrieve from inputs before sending to filters+workers
#
# pipeline.batch.size: 125  #单个工作进程从输入中收集的最大事件数
#
# How long to wait in milliseconds while polling for the next event
# before dispatching an undersized batch to filters+outputs
#
# pipeline.batch.delay: 50 #当创建管道事件批处理时,在向管道工作人员发送一个较小的批处理之前,等待每个事件的时间为多少毫秒
#
# Force Logstash to exit during shutdown even if there are still inflight
# events in memory. By default, logstash will refuse to quit until all
# received events have been pushed to the outputs.
#
# WARNING: Enabling this can lead to data loss during shutdown
#
# pipeline.unsafe_shutdown: false #当设置为true时,即使内存中仍然存在游离事件,也会在关闭期间强制Logstash退出,默认情况下,Logstash将拒绝退出,直到所有接收到的事件都被推送到输出,启用此选项可能导致关闭期间的数据丢失
#
# Set the pipeline event ordering. Options are "auto" (the default), "true" or "false".
# "auto" automatically enables ordering if the 'pipeline.workers' setting
# is also set to '1', and disables otherwise.
# "true" enforces ordering on the pipeline and prevent logstash from starting
# if there are multiple workers.
# "false" disables any extra processing necessary for preserving ordering.
#
# pipeline.ordered: auto #1.pipeline.ordered: auto并且pipeline.workers: 1,会自动启用事件排序;2.设置为true,强制排序;3.false 禁止排序
#
# Sets the pipeline's default value for `ecs_compatibility`, a setting that is
# available to plugins that implement an ECS Compatibility mode for use with
# the Elastic Common Schema.
# Possible values are:
# - disabled
# - v1
# - v8 (default)
# Pipelines defined before Logstash 8 operated without ECS in mind. To ensure a
# migrated pipeline continues to operate as it did before your upgrade, opt-OUT
# of ECS for the individual pipeline in its `pipelines.yml` definition. Setting
# it here will set the default for _all_ pipelines, including new ones.
#
# pipeline.ecs_compatibility: v8
#
# ------------ Pipeline Configuration Settings --------------
#
# Where to fetch the pipeline configuration for the main pipeline
#
# path.config:  #管道的Logstash配置路径
#
# Pipeline configuration string for the main pipeline
#
# config.string: #包含要用于主管道的管道配置的字符串
#
# At startup, test if the configuration is valid and exit (dry run)
#
# config.test_and_exit: false #设置为true,启动时测试配置是否有效并退出,默认false
#
# Periodically check if the configuration has changed and reload the pipeline
# This can also be triggered manually through the SIGHUP signal
#
# config.reload.automatic: false #定期检查配置是否已更新并重新加载,默认false
#
# How often to check if the pipeline configuration has changed (in seconds)
# Note that the unit value (s) is required. Values without a qualifier (e.g. 60)
# are treated as nanoseconds.
# Setting the interval this way is not recommended and might change in later versions.
#
# config.reload.interval: 3s #间隔多少秒检查管道中的配置是否更改
#
# Show fully compiled configuration as debug log message
# NOTE: --log.level must be 'debug'
#
# config.debug: false #设置为true时,将完整编译的配置显示为debug日志消息,你还必须设置log.level: debug
#
# When enabled, process escaped characters such as \n and \" in strings in the
# pipeline configuration files.
#
# config.support_escapes: false  #是否开启字符串转义
#
# ------------ API Settings -------------
# Define settings related to the HTTP API here.
#
# The HTTP API is enabled by default. It can be disabled, but features that rely
# on it will not work as intended.
#
# api.enabled: true   #是否开启http访问
#
# By default, the HTTP API is not secured and is therefore bound to only the
# host's loopback interface, ensuring that it is not accessible to the rest of
# the network.
# When secured with SSL and Basic Auth, the API is bound to _all_ interfaces
# unless configured otherwise.
#
# api.http.host: 127.0.0.1  #http访问地址
#
# The HTTP API web server will listen on an available port from the given range.
# Values can be specified as a single port (e.g., `9600`), or an inclusive range
# of ports (e.g., `9600-9700`).
#
# api.http.port: 9600-9700 #http访问端口
#
# The HTTP API includes a customizable "environment" value in its response,
# which can be configured here.
#
# api.environment: "production"  #http响应环境值
#
# The HTTP API can be secured with SSL (TLS). To do so, you will need to provide
# the path to a password-protected keystore in p12 or jks format, along with credentials.
#
# api.ssl.enabled: false  #是否开启ssl
# api.ssl.keystore.path: /path/to/keystore.jks  #ssl key证书路径
# api.ssl.keystore.password: "y0uRp4$$w0rD"    #ssl key密码
#
# The availability of SSL/TLS protocols depends on the JVM version. Certain protocols are
# disabled by default and need to be enabled manually by changing `jdk.tls.disabledAlgorithms`
# in the $JDK_HOME/conf/security/java.security configuration file.
#
# api.ssl.supported_protocols: [TLSv1.2,TLSv1.3]
#
# The HTTP API can be configured to require authentication. Acceptable values are
#  - `none`:  no auth is required (default)
#  - `basic`: clients must authenticate with HTTP Basic auth, as configured
#             with `api.auth.basic.*` options below
# api.auth.type: none
#
# When configured with `api.auth.type` `basic`, you must provide the credentials
# that requests will be validated against. Usage of Environment or Keystore
# variable replacements is encouraged (such as the value `"${HTTP_PASS}"`, which
# resolves to the value stored in the keystore's `HTTP_PASS` variable if present
# or the same variable from the environment)
#
# api.auth.basic.username: "logstash-user"
# api.auth.basic.password: "s3cUreP4$$w0rD"
#
# When setting `api.auth.basic.password`, the password should meet
# the default password policy requirements.
# The default password policy requires non-empty minimum 8 char string that
# includes a digit, upper case letter and lower case letter.
# Policy mode sets Logstash to WARN or ERROR when HTTP authentication password doesn't
# meet the password policy requirements.
# The default is WARN. Setting to ERROR enforces stronger passwords (recommended).
#
# api.auth.basic.password_policy.mode: WARN
#
# ------------ Module Settings ---------------
# Define modules here.  Modules definitions must be defined as an array.
# The simple way to see this is to prepend each `name` with a `-`, and keep
# all associated variables under the `name` they are associated with, and
# above the next, like this:
#模块定义,必须为数组
# 模块变量名格式必须为var.PLUGIN_TYPE.PLUGIN_NAME.KEY
#
# modules:
#   - name: MODULE_NAME
#     var.PLUGINTYPE1.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE1.PLUGINNAME1.KEY2: VALUE
#     var.PLUGINTYPE2.PLUGINNAME1.KEY1: VALUE
#     var.PLUGINTYPE3.PLUGINNAME3.KEY1: VALUE
#
# Module variable names must be in the format of
#
# var.PLUGIN_TYPE.PLUGIN_NAME.KEY
#
# modules:
#
# ------------ Cloud Settings ---------------
# Define Elastic Cloud settings here.
# Format of cloud.id is a base64 value e.g. dXMtZWFzdC0xLmF3cy5mb3VuZC5pbyRub3RhcmVhbCRpZGVudGlmaWVy
# and it may have an label prefix e.g. staging:dXMtZ...
# This will overwrite 'var.elasticsearch.hosts' and 'var.kibana.host'
# cloud.id: <identifier>
#
# Format of cloud.auth is: <user>:<pass>
# This is optional
# If supplied this will overwrite 'var.elasticsearch.username' and 'var.elasticsearch.password'
# If supplied this will overwrite 'var.kibana.username' and 'var.kibana.password'
# cloud.auth: elastic:<password>
#
# ------------ Queuing Settings --------------
#
# Internal queuing model, "memory" for legacy in-memory based queuing and
# "persisted" for disk-based acked queueing. Defaults is memory
# 事件缓冲的内部排队模型,值可以指定为内存memory或磁盘persisted,默认为内存memory
# queue.type: memory
#
# If `queue.type: persisted`, the directory path where the pipeline data files will be stored.
# Each pipeline will group its PQ files in a subdirectory matching its `pipeline.id`.
# Default is path.data/queue.
#使用持久化队列(queue.type: persisted)时,存储管道数据文件的目录路径。默认值为path.data/queue
# path.queue:
#
# If using queue.type: persisted, the page data files size. The queue data consists of
# append-only data files separated into pages. Default is 64mb
# 使用持久化队列(queue.type: persisted)时,页面数据文件的大小。默认值为64mb
# queue.page_capacity: 64mb
#
# If using queue.type: persisted, the maximum number of unread events in the queue.
# Default is 0 (unlimited)
#使用持久化队列(queue.type: persisted)时,队列中未读事件的最大数目。默认为0
# queue.max_events: 0
#
# If using queue.type: persisted, the total capacity of the queue in number of bytes.
# If you would like more unacked events to be buffered in Logstash, you can increase the
# capacity using this setting. Please make sure your disk drive has capacity greater than
# the size specified here. If both max_bytes and max_events are specified, Logstash will pick
# whichever criteria is reached first
# Default is 1024mb or 1gb
#使用持久化队列(queue.type: persisted)时,队列的总容量(以字节为单位)。默认为1024mb
# queue.max_bytes: 1024mb
#
# If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#使用持久化队列(queue.type: persisted)时,强制检查点之前已确认事件的最大数量。默认值为1024,0表示无限制
# queue.checkpoint.acks: 1024
#
# If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
# Default is 1024, 0 for unlimited
#使用持久化队列(queue.type: persisted)时,强制检查点之前写入的最大事件数。默认值为1024,0表示无限制
# queue.checkpoint.writes: 1024
#
# If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
# Default is 1000, 0 for no periodic checkpoint.
#使用持久化队列(queue.type: persisted)时,强制执行检查点时的间隔(以毫秒为单位)。默认值为1000,0表示没有定期检查点
# queue.checkpoint.interval: 1000
#
# ------------ Dead-Letter Queue Settings --------------
# Flag to turn on dead-letter queue.
#是否启用死信队列。默认false
# dead_letter_queue.enable: false

# If using dead_letter_queue.enable: true, the maximum size of each dead letter queue. Entries
# will be dropped if they would increase the size of the dead letter queue beyond this setting.
# Default is 1024mb
#每个死信队列的最大大小,超过该值,则会被删除,默认1024mb
# dead_letter_queue.max_bytes: 1024mb

# If using dead_letter_queue.enable: true, the interval in milliseconds where if no further events eligible for the DLQ
# have been created, a dead letter queue file will be written. A low value here will mean that more, smaller, queue files
# may be written, while a larger value will introduce more latency between items being "written" to the dead letter queue, and
# being available to be read by the dead_letter_queue input when items are written infrequently.
# Default is 5000.
#启用死信队列,写入延迟的时间间隔,默认5000ms
# dead_letter_queue.flush_interval: 5000

# If using dead_letter_queue.enable: true, controls which entries should be dropped to avoid exceeding the size limit.
# Set the value to `drop_newer` (default) to stop accepting new events that would push the DLQ size over the limit.
# Set the value to `drop_older` to remove queue pages containing the oldest events to make space for new ones.
#启用死信队列时,应控制删除哪些条目以避免超过大小限制。将值设置为“drop_newer”(默认值)以停止接受会使死信队列大小超过限制的新事件,将值设置为“drop_older”可删除包含最旧事件的队列页面,为新事件腾出空间。
# dead_letter_queue.storage_policy: drop_newer

# If using dead_letter_queue.enable: true, the interval that events have to be considered valid. After the interval has
# expired the events could be automatically deleted from the DLQ.
# The interval could be expressed in days, hours, minutes or seconds, using as postfix notation like 5d,
# to represent a five days interval.
# The available units are respectively d, h, m, s for day, hours, minutes and seconds.
# If not specified then the DLQ doesn't use any age policy for cleaning events.
#死信队列保存数据的有效时间,超时则从死信队列删除。
# dead_letter_queue.retain.age: 1d

# If using dead_letter_queue.enable: true, the directory path where the data files will be stored.
# Default is path.data/dead_letter_queue
#死信队列的存储路径
# path.dead_letter_queue:
#
# ------------ Debugging Settings --------------
#
# Options for log.level:
#   * fatal
#   * error
#   * warn
#   * info (default)
#   * debug
#   * trace
#日志等级
# log.level: info
#日志路径
# path.logs:
#
# ------------ Other Settings --------------
#
# Allow or block running Logstash as superuser (default: true)
# 是否运行超级用户运行Logstash 
# allow_superuser: false
#
# Where to find custom plugins
#自定义插件的路径
# path.plugins: []
#
# Flag to output log lines of each pipeline in its separate log file. Each log filename contains the pipeline.name
# Default is false
# 是否启用在不同日志文件中每个管道的日志
# pipeline.separate_logs: false
#
# ------------ X-Pack Settings (not applicable for OSS build)--------------
#
# X-Pack Monitoring
# https://www.elastic.co/guide/en/logstash/current/monitoring-logstash.html
#xpack.monitoring.enabled: false
#xpack.monitoring.elasticsearch.username: logstash_system
#xpack.monitoring.elasticsearch.password: password
#xpack.monitoring.elasticsearch.proxy: ["http://proxy:port"]
#xpack.monitoring.elasticsearch.hosts: ["https://es1:9200", "https://es2:9200"]
# an alternative to hosts + username/password settings is to use cloud_id/cloud_auth
#xpack.monitoring.elasticsearch.cloud_id: monitoring_cluster_id:xxxxxxxxxx
#xpack.monitoring.elasticsearch.cloud_auth: logstash_system:password
# another authentication alternative is to use an Elasticsearch API key
#xpack.monitoring.elasticsearch.api_key: "id:api_key"
#xpack.monitoring.elasticsearch.ssl.certificate_authority: "/path/to/ca.crt"
#xpack.monitoring.elasticsearch.ssl.ca_trusted_fingerprint: xxxxxxxxxx
#xpack.monitoring.elasticsearch.ssl.truststore.path: path/to/file
#xpack.monitoring.elasticsearch.ssl.truststore.password: password
# use either keystore.path/keystore.password or certificate/key configurations
#xpack.monitoring.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.monitoring.elasticsearch.ssl.keystore.password: password
#xpack.monitoring.elasticsearch.ssl.certificate: /path/to/file
#xpack.monitoring.elasticsearch.ssl.key: /path/to/key
#xpack.monitoring.elasticsearch.ssl.verification_mode: full
#xpack.monitoring.elasticsearch.ssl.cipher_suites: []
#xpack.monitoring.elasticsearch.sniffing: false
#xpack.monitoring.collection.interval: 10s
#xpack.monitoring.collection.pipeline.details.enabled: true
#
# X-Pack Management
# https://www.elastic.co/guide/en/logstash/current/logstash-centralized-pipeline-management.html
#xpack.management.enabled: false
#xpack.management.pipeline.id: ["main", "apache_logs"]
#xpack.management.elasticsearch.username: logstash_admin_user
#xpack.management.elasticsearch.password: password
#xpack.management.elasticsearch.proxy: ["http://proxy:port"]
#xpack.management.elasticsearch.hosts: ["https://es1:9200", "https://es2:9200"]
# an alternative to hosts + username/password settings is to use cloud_id/cloud_auth
#xpack.management.elasticsearch.cloud_id: management_cluster_id:xxxxxxxxxx
#xpack.management.elasticsearch.cloud_auth: logstash_admin_user:password
# another authentication alternative is to use an Elasticsearch API key
#xpack.management.elasticsearch.api_key: "id:api_key"
#xpack.management.elasticsearch.ssl.ca_trusted_fingerprint: xxxxxxxxxx
#xpack.management.elasticsearch.ssl.certificate_authority: "/path/to/ca.crt"
#xpack.management.elasticsearch.ssl.truststore.path: /path/to/file
#xpack.management.elasticsearch.ssl.truststore.password: password
# use either keystore.path/keystore.password or certificate/key configurations
#xpack.management.elasticsearch.ssl.keystore.path: /path/to/file
#xpack.management.elasticsearch.ssl.keystore.password: password
#xpack.management.elasticsearch.ssl.certificate: /path/to/file
#xpack.management.elasticsearch.ssl.key: /path/to/certificate_key_file
#xpack.management.elasticsearch.ssl.cipher_suites: []
#xpack.management.elasticsearch.ssl.verification_mode: full
#xpack.management.elasticsearch.sniffing: false
#xpack.management.logstash.poll_interval: 5s

# X-Pack GeoIP Database Management
# https://www.elastic.co/guide/en/logstash/current/plugins-filters-geoip.html#plugins-filters-geoip-manage_update
#xpack.geoip.downloader.enabled: true
#xpack.geoip.downloader.endpoint: "https://geoip.elastic.co/v1/database"

3.pipelines.yml 配置

# List of pipelines to be loaded by Logstash
#
# This document must be a list of dictionaries/hashes, where the keys/values are pipeline settings.
# Default values for omitted settings are read from the `logstash.yml` file.
# When declaring multiple pipelines, each MUST have its own `pipeline.id`.
#
# Example of two pipelines:
#
# - pipeline.id: test  
#   pipeline.workers: 1 
#   pipeline.batch.size: 1
#   config.string: "input { generator {} } filter { sleep { time => 1 } } output { stdout { codec => dots } }"
# - pipeline.id: another_test
#   queue.type: persisted
#   path.config: "/tmp/logstash/*.config"
#
# Available options:
#
#   # name of the pipeline
#   pipeline.id: mylogs
#
#   # The configuration string to be used by this pipeline
#   config.string: "input { generator {} } filter { sleep { time => 1 } } output { stdout { codec => dots } }"
#
#   # The path from where to read the configuration text
#   path.config: "/etc/conf.d/logstash/myconfig.cfg"
#
#   # How many worker threads execute the Filters+Outputs stage of the pipeline
#   pipeline.workers: 1 (actually defaults to number of CPUs)
#
#   # How many events to retrieve from inputs before sending to filters+workers
#   pipeline.batch.size: 125
#
#   # How long to wait in milliseconds while polling for the next event
#   # before dispatching an undersized batch to filters+outputs
#   pipeline.batch.delay: 50
#
#   Set the pipeline event ordering. Options are "auto" (the default), "true" # #   or "false".
#   "auto" automatically enables ordering if the 'pipeline.workers' setting
#   is also set to '1', and disables otherwise.
#   "true" enforces ordering on a pipeline and prevents logstash from starting
#   a pipeline with multiple workers allocated.
#   "false" disable any extra processing necessary for preserving ordering.
#
#   pipeline.ordered: auto
#
#   # Internal queuing model, "memory" for legacy in-memory based queuing and
#   # "persisted" for disk-based acked queueing. Defaults is memory
#   queue.type: memory
#
#   # If using queue.type: persisted, the page data files size. The queue data consists of
#   # append-only data files separated into pages. Default is 64mb
#   queue.page_capacity: 64mb
#
#   # If using queue.type: persisted, the maximum number of unread events in the queue.
#   # Default is 0 (unlimited)
#   queue.max_events: 0
#
#   # If using queue.type: persisted, the total capacity of the queue in number of bytes.
#   # Default is 1024mb or 1gb
#   queue.max_bytes: 1024mb
#
#   # If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
#   # Default is 1024, 0 for unlimited
#   queue.checkpoint.acks: 1024
#
#   # If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
#   # Default is 1024, 0 for unlimited
#   queue.checkpoint.writes: 1024
#
#   # If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
#   # Default is 1000, 0 for no periodic checkpoint.
#   queue.checkpoint.interval: 1000
#
#   # Enable Dead Letter Queueing for this pipeline.
#   dead_letter_queue.enable: false
#
#   If using dead_letter_queue.enable: true, the maximum size of dead letter queue for this pipeline. Entries
#   will be dropped if they would increase the size of the dead letter queue beyond this setting.
#   Default is 1024mb
#   dead_letter_queue.max_bytes: 1024mb
#
#   If using dead_letter_queue.enable: true, the interval in milliseconds where if no further events eligible for the DLQ
#   have been created, a dead letter queue file will be written. A low value here will mean that more, smaller, queue files
#   may be written, while a larger value will introduce more latency between items being "written" to the dead letter queue, and
#   being available to be read by the dead_letter_queue input when items are are written infrequently.
#   Default is 5000.
#
#   dead_letter_queue.flush_interval: 5000

#   If using dead_letter_queue.enable: true, controls which entries should be dropped to avoid exceeding the size limit.
#   Set the value to `drop_newer` (default) to stop accepting new events that would push the DLQ size over the limit.
#   Set the value to `drop_older` to remove queue pages containing the oldest events to make space for new ones.
#
#   dead_letter_queue.storage_policy: drop_newer

#   If using dead_letter_queue.enable: true, the interval that events have to be considered valid. After the interval has
#   expired the events could be automatically deleted from the DLQ.
#   The interval could be expressed in days, hours, minutes or seconds, using as postfix notation like 5d,
#   to represent a five days interval.
#   The available units are respectively d, h, m, s for day, hours, minutes and seconds.
#   If not specified then the DLQ doesn't use any age policy for cleaning events.
#
#   dead_letter_queue.retain.age: 1d

#
#   If using dead_letter_queue.enable: true, the directory path where the data files will be stored.
#   Default is path.data/dead_letter_queue
#
#   path.dead_letter_queue:

四.mysql数据同步到es

同步方式:
1.logstash
2.go-mysql-elasticsearch
3.canal(阿里云)

一.logstash
1.安装mysql-connector-java 插件(需与mysql 版本一致)


[root@localhost home]# wget https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-j-8.0.33-1.el9.noarch.rpm
--2024-02-23 16:01:15--  https://downloads.mysql.com/archives/get/p/3/file/mysql-connector-j-8.0.33-1.el9.noarch.rpm
正在解析主机 downloads.mysql.com (downloads.mysql.com)... 23.66.135.36, 2600:1406:3c00:18b::2e31, 2600:1406:3c00:189::2e31
正在连接 downloads.mysql.com (downloads.mysql.com)|23.66.135.36|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 302 Moved Temporarily
位置:https://cdn.mysql.com/archives/mysql-connector-java-8.0/mysql-connector-j-8.0.33-1.el9.noarch.rpm [跟随至新的 URL]
--2024-02-23 16:01:16--  https://cdn.mysql.com/archives/mysql-connector-java-8.0/mysql-connector-j-8.0.33-1.el9.noarch.rpm
正在解析主机 cdn.mysql.com (cdn.mysql.com)... 23.42.93.135, 2600:1406:3a00:293::1d68, 2600:1406:3a00:282::1d68
正在连接 cdn.mysql.com (cdn.mysql.com)|23.42.93.135|:443... 已连接。
已发出 HTTP 请求,正在等待回应... 200 OK
长度:2425346 (2.3M) [application/x-redhat-package-manager]
正在保存至: “mysql-connector-j-8.0.33-1.el9.noarch.rpm”

mysql-connector-j-8.0.33-1.el9.noarch.rpm                   100%[==========================================================================================================================================>]   2.31M   736KB/s  用时 3.2s    

2024-02-23 16:01:21 (736 KB/s) - 已保存 “mysql-connector-j-8.0.33-1.el9.noarch.rpm” [2425346/2425346])

[root@localhost home]# rpm -ivh mysql-connector-j-8.0.33-1.el9.noarch.rpm  #安装
警告:mysql-connector-j-8.0.33-1.el9.noarch.rpm: 头V4 RSA/SHA256 Signature, 密钥 ID 3a79bd29: NOKEY
错误:依赖检测失败:
	java-headless >= 1:1.8.0 被 mysql-connector-j-1:8.0.33-1.el9.noarch 需要
[root@localhost home]# yum -y install java-headless  #直接重装java-openjdk
[root@localhost home]# rpm -ivh mysql-connector-j-8.0.33-1.el9.noarch.rpm 
警告:mysql-connector-j-8.0.33-1.el9.noarch.rpm: 头V4 RSA/SHA256 Signature, 密钥 ID 3a79bd29: NOKEY
Verifying...                          ################################# [100%]
准备中...                          ################################# [100%]
正在升级/安装...
   1:mysql-connector-j-1:8.0.33-1.el9 ################################# [100%]
[root@localhost home]#  安装完成,文件默认放在/usr/share/java/mysql-connector-java.jar

2.配置logstash.conf

#单表
input {
    jdbc {
        #    mysql    数据库连接
        jdbc_connection_string => "jdbc:mysql://192.168.243.134:3306/wine"
        #    用户名和密码
        jdbc_user => "root"
        jdbc_password => "root"
        #    驱动类名
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        #    驱动
        jdbc_driver_library => "/usr/share/java/mysql-connector-j.jar"
        #    是否分页
        jdbc_paging_enabled => true
        jdbc_page_size => "1000"
        #    是否清除last_run_metadata_path    的记录,如果为真那么每次相当于从头开始查询所有的数据库记录
        clear_run    =>false
        #是否需要记录某个column 的值,如果 record_last_run 为真,可以自定义我们需要表的字段名称,
        #此时该参数就要为 true. 否则默认 track 的是 timestamp 的值.
        use_column_value => true
        #如果 use_column_value 为真,需配置此参数. 这个参数就是数据库给出的一个字段名称。当然该字段必须是递增的,可以是 数据库的数据时间这类的
        tracking_column => "unix_ts_in_secs"
        #是否记录上次执行结果, 如果为真,将会把上次执行到的 tracking_column 字段的值记录下来,保存到 last_run_metadata_path 指定的文件中
        record_last_run => "true"
        #    字段类型
        tracking_column_type => "numeric"
        #    设置监听间隔
        schedule => "*/5 * * * * *"
        #    执行sql
        statement => "SELECT *  FROM wine_address"
        #    索引类型
        type => "es_table"
    }
}
filter {
    ruby {
        code => "event.set('@timestamp',event.get('timestamp'))"
    }
    mutate {
        copy => { "id" => "[@metadata][_id]"}
        remove_field => ["id", "@version", "unix_ts_in_secs","timestamp"]
    }
}
output {
    if [type]=="es_table" {
        elasticsearch {
        hosts => ["192.168.243.134:9200"]    (es    ip与端口)
        index => "es_table_idx"            (索引名称)
        document_id => "%{[@metadata][_id]}"
        }
    }
}



#多表
input {
    jdbc {
        jdbc_connection_string => "jdbc:mysql://192.168.243.134:3306/wine"
        jdbc_user => "root"
        jdbc_password => "123456"
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        jdbc_driver_library => "/usr/share/java/mysql-connector-j.jar"
        jdbc_paging_enabled => true
        jdbc_page_size => "1000"
        clean_run    =>false
        use_column_value => true
        tracking_column => "unix_ts_in_secs"
        record_last_run => "true"
        tracking_column_type => "numeric"
        schedule => "*/5 * * * * *"
        statement => "SELECT *  FROM wine_address"
        type => "es_table"
    }

     jdbc {
        jdbc_connection_string => "jdbc:mysql://192.168.243.134:3306/wine"
        jdbc_user => "root"
        jdbc_password => "123456"
        jdbc_driver_class => "com.mysql.cj.jdbc.Driver"
        jdbc_driver_library => "/usr/share/java/mysql-connector-j.jar"
        jdbc_paging_enabled => true
        jdbc_page_size => "1000"
        clean_run    =>false
        use_column_value => true
        tracking_column => "unix_ts_in_secs"
        record_last_run => "true"
        tracking_column_type => "numeric"
        schedule => "*/5 * * * * *"
        statement => "SELECT *  FROM wine_area"
        type => "es_table1"
    }
}
filter {
    mutate {
        copy => { "id" => "[@metadata][_id]"}
        remove_field => ["id", "@version", "unix_ts_in_secs","timestamp"]
    }
}
output {
    if [type]=="es_table" {
        elasticsearch {
        hosts => ["192.168.243.134:9200"]   
        index => "es_table_idx"                
        document_id => "%{address_id}"
        }
    }
        if [type]=="es_table1" {
        elasticsearch {
        hosts => ["192.168.243.134:9200"]   
        index => "es_table_idx1"                
        document_id => "%{area_id}"
        }
    }
}


3.启动logstash

[root@localhost logstash-8.12.2]# ./bin/logstash -f /home/logstash-8.12.2/config/logstash.conf  &

4.插入数据

INSERT INTO `wine`.`wine_address` ( `address_id`, `member_id`, `area_id`, `city_id`, `province_id`, `area_info`, `address`, `mob_phone`, `reciver_name`, `is_default`, `dis_mode`, `add_time` )
VALUES( 5, 5, 5, 5, 5, '测试', '测试', 14512456789, '10', 0, '1', 0 );

5.查看数据
在这里插入图片描述

二.go-mysql-elasticsearch

1.安装go环境

[root@localhost /]# wget https://golang.google.cn/dl/go1.15.4.linux-amd64.tar.gz  #下载
[root@localhost /]# tar -zxvf  go1.15.4.linux-amd64.tar.gz  #解压
[root@localhost /]# mv go /usr/local/
[root@localhost river]# vim /etc/profile  #添加环境变量
export GOROOT=/usr/local/go
export GOPATH=/root/go
export PATH=$PATH:$GOROOT/bin:$GOPATH/bin
[root@localhost /]# suorce /etc/profile  #更新



2.下载go-mysql-elasticsearch

[root@localhost /] go get github.com/siddontang/go-mysql-elasticsearch  #下载
[root@localhost river]# cd $GOPATH/src/github.com/siddontang/go-mysql-elasticsearch
[root@localhost go-mysql-elasticsearch]# make #编译,编译成功后 go-mysql-elasticsearch/bin 目录下会生成名为 go-mysql-elasticsearch 的可执行文件

3.配置($GOPATH/src/github.com/siddontang/go-mysql-elasticsearch/etc/river.toml)

my_addr = "192.168.243.134:3306"   #数据库ip地址
my_user = "root"    #数据库用户名
my_pass = "123456"   #数据库密码  
es_addr = "192.168.243.134:9200" #es 地址
es_user = ""  #es账号
es_pass = ""  #es密码
data_dir = "/root/go/src/github.com/siddontang/go-mysql-elasticsearch/data"  #数据存储目录
stat_addr = "192.168.243.134:12800"    #内部地址加端口
stat_path = "/metrics"
server_id = 1001  
flavor = "mysql"
mysqldump = "mysqldump "  
bulk_size = 128
flush_bulk_time = "200ms"
skip_no_pk_table = false
[[source]]
schema = "wine"  #数据库名称
tables = ["wine_role"] #数据表名称
[[rule]]
schema = "wine"   #数据库名称
table = "wine_role"  #数据表名称
index = ""  #生成es数据索引名称,对应schema
type = ""  #生成es数据类型,对应table
filter = ["id", "name"]  #只同步的数据字段

4.启动

# 官网提示mysql版本小于8,es 版本小于6
[root@localhost go-mysql-elasticsearch]# ./bin/go-mysql-elasticsearch -config=./etc/river.toml

5.查看elasticsearch 数据(可以通过google elasticsearch-head 插件)
在这里插入图片描述
三.canalhttps://github.com/alibaba/canal/releases

1.下载安装canal

#下载1.17 版本,支持MySQL 8.0。因为本人mysql是8.0.36
[root@localhost home]# wget https://github.com/alibaba/canal/releases/download/canal-1.1.7/canal.adapter-1.1.7.tar.gz   #是canal的客户端适配器,可将其看作canal client
[root@localhost home]# wget https://github.com/alibaba/canal/releases/download/canal-1.1.7/canal.admin-1.1.7.tar.gz   #canal 操作界面  
[root@localhost home]# wget https://github.com/alibaba/canal/releases/download/canal-1.1.7/canal.deployer-1.1.7.tar.gz   #canal server 端
#创建文件夹
[root@localhost home]#  madir canal
[root@localhost home]# cd canal
[root@localhost canal]# mkdir canal-adapter
[root@localhost canal]# mkdir canal-admin
[root@localhost canal]# mkdir canal-server
#解压
[root@localhost home]# tar -zxvf canal.adapter-1.1.7.tar.gz -C ./canal/canal-adapter/
[root@localhost home]# tar -zxvf canal.admin-1.1.7.tar.gz -C ./canal/canal-admin/
[root@localhost home]# tar -zxvf canal.deployer-1.1.7.tar.gz -C ./canal/canal-server/
#修改权限
[root@localhost home]# chmod -R 777 canal  

2.配置

#canal-server/conf/canal.properties
#################################################
######### 		common argument		#############
#################################################
# tcp bind ip
canal.ip = 127.0.0.1
# register ip to zookeeper
canal.register.ip =
canal.port = 11111
canal.metrics.pull.port = 11112
# canal instance user/passwd
# canal.user = canal
# canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458

# canal admin config
#canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
#canal.admin.register.auto = true
#canal.admin.register.cluster =
#canal.admin.register.name =

canal.zkServers =
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ, pulsarMQ
canal.serverMode = tcp
# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n)
canal.instance.memory.buffer.size = 16384
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024 
## meory store gets mode used MEMSIZE or ITEMSIZE
canal.instance.memory.batch.mode = MEMSIZE
canal.instance.memory.rawEntry = true

## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false

# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size =  1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60

# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30

# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
canal.instance.filter.transaction.entry = false
canal.instance.filter.dml.insert = false
canal.instance.filter.dml.update = false
canal.instance.filter.dml.delete = false

# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED 
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB

# binlog ddl isolation
canal.instance.get.ddl.isolation = false

# parallel parser config
canal.instance.parser.parallel = true
## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
#canal.instance.parser.parallelThreadSize = 16
## disruptor ringbuffer size, must be power of 2
canal.instance.parser.parallelBufferSize = 256

# table meta tsdb info
canal.instance.tsdb.enable = true
canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
canal.instance.tsdb.dbUsername = canal
canal.instance.tsdb.dbPassword = canal
# dump snapshot interval, default 24 hour
canal.instance.tsdb.snapshot.interval = 24
# purge snapshot expire , default 360 hour(15 days)
canal.instance.tsdb.snapshot.expire = 360

#################################################
######### 		destinations		#############
#################################################
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
# set this value to 'true' means that when binlog pos not found, skip to latest.
# WARN: pls keep 'false' in production env, or if you know what you want.
canal.auto.reset.latest.pos.mode = false

canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml

canal.instance.global.mode = spring
canal.instance.global.lazy = false
canal.instance.global.manager.address = ${canal.admin.manager}
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml

##################################################
######### 	      MQ Properties      #############
##################################################
# aliyun ak/sk , support rds/mq
canal.aliyun.accessKey =
canal.aliyun.secretKey =
canal.aliyun.uid=

canal.mq.flatMessage = true
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
# Set this value to "cloud", if you want open message trace feature in aliyun.
canal.mq.accessChannel = local

canal.mq.database.hash = true
canal.mq.send.thread.size = 30
canal.mq.build.thread.size = 8

##################################################
######### 		     Kafka 		     #############
##################################################
kafka.bootstrap.servers = 127.0.0.1:9092
kafka.acks = all
kafka.compression.type = none
kafka.batch.size = 16384
kafka.linger.ms = 1
kafka.max.request.size = 1048576
kafka.buffer.memory = 33554432
kafka.max.in.flight.requests.per.connection = 1
kafka.retries = 0

kafka.kerberos.enable = false
kafka.kerberos.krb5.file = ../conf/kerberos/krb5.conf
kafka.kerberos.jaas.file = ../conf/kerberos/jaas.conf

# sasl demo
# kafka.sasl.jaas.config = org.apache.kafka.common.security.scram.ScramLoginModule required \\n username=\"alice\" \\npassword="alice-secret\";
# kafka.sasl.mechanism = SCRAM-SHA-512
# kafka.security.protocol = SASL_PLAINTEXT

##################################################
######### 		    RocketMQ	     #############
##################################################
rocketmq.producer.group = test
rocketmq.enable.message.trace = false
rocketmq.customized.trace.topic =
rocketmq.namespace =
rocketmq.namesrv.addr = 127.0.0.1:9876
rocketmq.retry.times.when.send.failed = 0
rocketmq.vip.channel.enabled = false
rocketmq.tag = 

##################################################
######### 		    RabbitMQ	     #############
##################################################
rabbitmq.host =
rabbitmq.virtual.host =
rabbitmq.exchange =
rabbitmq.username =
rabbitmq.password =
rabbitmq.deliveryMode =


##################################################
######### 		      Pulsar         #############
##################################################
pulsarmq.serverUrl =
pulsarmq.roleToken =
pulsarmq.topicTenantPrefix =



/home/canal/canal-server/conf/example/instance.properties
#canal-server/conf/example
#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=200

# enable gtid use true/false
canal.instance.gtidon=false

# position info
#mysql地址加端口
canal.instance.master.address=127.0.0.1:3306
#开始同步的binlog日志文件,注意这里的binlog文件名以你自己查出来的为准
canal.instance.master.journal.name=mysql-bin.000045
#开始同步的binlog文件位置
canal.instance.master.position=237
#开始同步时间点 时间戳形式
canal.instance.master.timestamp=1709112558000
#开始同步gtid
canal.instance.master.gtid=

# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=root  #数据库用户名
canal.instance.dbPassword=123456 #数据库密码
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
canal.instance.filter.regex=.*\\..*
# table black regex
#配置不同步mysql库
canal.instance.filter.black.regex=mysql\\.slave_.*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=example
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,topic2:mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.enableDynamicQueuePartition=false
#canal.mq.partitionsNum=3
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#
# multi stream for polardbx
canal.instance.multi.stream.on=false
#################################################



#/home/canal/canal-adapter/conf/application.yml
server:
  port: 8081 #adapter 端口
spring:
  jackson:
    date-format: yyyy-MM-dd HH:mm:ss
    time-zone: GMT+8
    default-property-inclusion: non_null

canal.conf:
  mode: tcp #tcp   canal server读取模式 kafka rocketMQ rabbitMQ
  flatMessage: true  # 扁平message开关, 是否以json字符串形式投递数据, 仅在kafka/rocketMQ模式下有效
  zookeeperHosts:#集群模式下的zk地址, 如果配置了canalServerHost, 则以canalServerHost为准
  syncBatchSize: 1000 #批处理大小
  retries: -1 #重试次数,-1时表示一致阻塞
  timeout: #获取数据的时长
  accessKey:
  secretKey:
  consumerProperties:
    # canal tcp consumer
    canal.tcp.server.host: 192.168.243.134:11111
    canal.tcp.zookeeper.hosts:
    canal.tcp.batch.size: 500
    canal.tcp.username:
    canal.tcp.password:
    # kafka consumer
    kafka.bootstrap.servers: 127.0.0.1:9092
    kafka.enable.auto.commit: false
    kafka.auto.commit.interval.ms: 1000
    kafka.auto.offset.reset: latest
    kafka.request.timeout.ms: 40000
    kafka.session.timeout.ms: 30000
    kafka.isolation.level: read_committed
    kafka.max.poll.records: 1000
    # rocketMQ consumer
    rocketmq.namespace:
    rocketmq.namesrv.addr: 127.0.0.1:9876
    rocketmq.batch.size: 1000
    rocketmq.enable.message.trace: false
    rocketmq.customized.trace.topic:
    rocketmq.access.channel:
    rocketmq.subscribe.filter:
    # rabbitMQ consumer
    rabbitmq.host:
    rabbitmq.virtual.host:
    rabbitmq.username:
    rabbitmq.password:
    rabbitmq.resource.ownerId:

  srcDataSources: #数据来源
    defaultDS:
      url: jdbc:mysql://192.168.243.134:3306/wine?useUnicode=true
      username: root
      password: 123456
  canalAdapters: #数据去处
    - instance: example # canal instance Name or mq topic name #对应对应canal destination或者 mq topic
      groups: #适配器组
        - groupId: g1  #组id 
          outerAdapters: #适配器列表
            - name: logger
            #      - name: rdb
            #        key: mysql1
            #        properties:
            #          jdbc.driverClassName: com.mysql.jdbc.Driver
            #          jdbc.url: jdbc:mysql://127.0.0.1:3306/mytest2?useUnicode=true
            #          jdbc.username: root
            #          jdbc.password: 121212
            #          druid.stat.enable: false
            #          druid.stat.slowSqlMillis: 1000
            #      - name: rdb
            #        key: oracle1
            #        properties:
            #          jdbc.driverClassName: oracle.jdbc.OracleDriver
            #          jdbc.url: jdbc:oracle:thin:@localhost:49161:XE
            #          jdbc.username: mytest
            #          jdbc.password: m121212
            #      - name: rdb
            #        key: postgres1
            #        properties:
            #          jdbc.driverClassName: org.postgresql.Driver
            #          jdbc.url: jdbc:postgresql://localhost:5432/postgres
            #          jdbc.username: postgres
            #          jdbc.password: 121212
            #          threads: 1
            #          commitSize: 3000
            #      - name: hbase
            #        properties:
            #          hbase.zookeeper.quorum: 127.0.0.1
            #          hbase.zookeeper.property.clientPort: 2181
            #          zookeeper.znode.parent: /hbase
            - name: es8  #本人是es8
              hosts: http://192.168.243.134:9200 # 127.0.0.1:9200 for rest mode,rest模式ip前要加http,transport 值端口为9300
              properties:
                mode: rest # or rest
                # security.auth: test:123456 #  only used for rest mode
                cluster.name: elasticsearch  #集群名称
#      - name: kudu
#        key: kudu
#        properties:
#          kudu.master.address: 127.0.0.1 # ',' split multi address
#      - name: phoenix
#        key: phoenix
#        properties:
#          jdbc.driverClassName: org.apache.phoenix.jdbc.PhoenixDriver
#          jdbc.url: jdbc:phoenix:127.0.0.1:2181:/hbase/db
#          jdbc.username:
#          jdbc.password:



#//home/canal/canal-adapter/conf/es8/mytest_user.yml
dataSourceKey: defaultDS  #源数据源的key, 对应application.yml配置srcDataSources中的值
destination: example 
groupId: g1  #组id
esMapping:
  _index: mytest_user  #索引
  _id: _id #id,如不配置该项必须配置下面的pk项_id否则会由es自动分配
  _type: _doc  #类型
  upsert: true #支持不存在新增操作
  #  pk: id
  sql:
    "select a.album_id as _id, a.album_name, a.class_name, a.mark,a.goods_sku,a.add_time,a.merchant_id
    from wine_album a"
  #  objFields:
  #    _labels: array:;
  etlCondition: "where a.add_time>={}"
  commitBatch: 3000  #批量提交大小




3.启动

[root@localhost canal-server]# ./bin/startup.sh  

4.查看日志

#查看server日志 (/home/canal/canal-server/logs)
[root@localhost canal]# tail -5f canal.log 
2024-02-28 14:47:50.369 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## set default uncaught exception handler
2024-02-28 14:47:50.375 [main] INFO  com.alibaba.otter.canal.deployer.CanalLauncher - ## load canal configurations
2024-02-28 14:47:50.383 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## start the canal server.
2024-02-28 14:47:50.414 [main] INFO  com.alibaba.otter.canal.deployer.CanalController - ## start the canal server[127.0.0.1(127.0.0.1):11111]
2024-02-28 14:47:51.318 [main] INFO  com.alibaba.otter.canal.deployer.CanalStarter - ## the canal server is running now ......


#查看example日志
[root@localhost example]# tail -5f example.log                       
2024-02-28 14:47:51.284 [main] WARN  c.a.o.canal.parse.inbound.mysql.dbsync.LogEventConvert - --> init table black filter : ^mysql\.slave_.*$
2024-02-28 14:47:51.289 [main] INFO  c.a.otter.canal.instance.core.AbstractCanalInstance - start successful....
2024-02-28 14:47:51.342 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> begin to find start position, it will be long time for reset or first position
2024-02-28 14:47:51.342 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - prepare to find start position just show master status
2024-02-28 14:47:51.749 [destination = example , address = /127.0.0.1:3306 , EventParser] WARN  c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - ---> find start position successfully, EntryPosition[included=false,journalName=mysql-bin.000044,position=4,serverId=1,gtid=<null>,timestamp=1709092749000] cost : 403ms , the next step is binlog dump

#查看adapter日志(/home/canal/canal-adapter/logs/adapter)
[root@localhost logs]# tail -10f ./adapter/adapter.log 
2024-02-28 18:56:58.798 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: logger succeed
2024-02-28 18:56:59.017 [main] INFO  c.a.o.c.client.adapter.es.core.config.ESSyncConfigLoader - ## Start loading es mapping config ... 
2024-02-28 18:56:59.028 [main] INFO  c.a.o.c.client.adapter.es.core.config.ESSyncConfigLoader - ## ES mapping config loaded
2024-02-28 18:56:59.242 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Load canal adapter: es8 succeed
2024-02-28 18:56:59.248 [main] INFO  c.alibaba.otter.canal.connector.core.spi.ExtensionLoader - extension classpath dir: /home/canal/canal-adapter/plugin
2024-02-28 18:56:59.267 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterLoader - Start adapter for canal-client mq topic: example-g1 succeed
2024-02-28 18:56:59.267 [main] INFO  c.a.o.canal.adapter.launcher.loader.CanalAdapterService - ## the canal client adapters are running now ......
2024-02-28 18:56:59.267 [Thread-3] INFO  c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - =============> Start to connect destination: example <=============
2024-02-28 18:56:59.273 [main] INFO  c.a.otter.canal.adapter.launcher.CanalAdapterApplication - Started CanalAdapterApplication in 2.381 seconds (JVM running for 2.951)
2024-02-28 18:56:59.356 [Thread-3] INFO  c.a.otter.canal.adapter.launcher.loader.AdapterProcessor - =============> Subscribe destination: example succeed <=============
2024-02-28 18:58:32.556 [pool-3-thread-1] INFO  c.a.o.canal.client.adapter.logger.LoggerAdapterExample - DML: {"data":[{"album_id":4,"album_name":"2","class_name":6,"mark":"2","goods_sku":"2","add_time":2,"merchant_id":2}],"database":"wine","destination":"example","es":1709117912000,"groupId":"g1","isDdl":false,"old":null,"pkNames":["album_id"],"sql":"","table":"wine_album","ts":1709117912427,"type":"INSERT"}
2024-02-28 18:58:32.805 [pool-3-thread-1] DEBUG c.a.o.canal.client.adapter.es.core.service.ESSyncService - DML: {"data":[{"album_id":4,"album_name":"2","class_name":6,"mark":"2","goods_sku":"2","add_time":2,"merchant_id":2}],"database":"wine","destination":"example","es":1709117912000,"groupId":"g1","isDdl":false,"old":null,"pkNames":["album_id"],"sql":"","table":"wine_album","ts":1709117912427,"type":"INSERT"} 
Affected indexes: mytest_user 

5.查看es 索引数据
在这里插入图片描述

  • 15
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
### 回答1: ElasticsearchMySQL同步可以通过以下几种方式实现: 1. 使用Logstash:Logstash是一个开源的数据收集引擎,可以从MySQL中读取数据并将其同步Elasticsearch中。 2. 使用Elasticsearch的JDBC插件:Elasticsearch的JDBC插件可以直接从MySQL中读取数据并将其同步Elasticsearch中。 3. 使用Canal:Canal是阿里巴巴开源的一款基于MySQL数据库增量日志解析和同步的工具,可以将MySQL中的数据同步Elasticsearch中。 以上三种方式都可以实现ElasticsearchMySQL同步,具体选择哪种方式取决于实际需求和环境。 ### 回答2: Elasticsearch是一个基于Lucene的分布式搜索引擎,它能够高效地存储、搜索和分析海量数据。而MySQL是一个广泛使用的关系型数据库,它也是很多应用程序的核心数据存储方案。为了将这两个技术应用到实际的项目中,需要进行elasticsearchmysql同步实现elasticsearchmysql同步的一种常见方法是使用elasticsearch river插件。River插件以“河流”作为自己的核心概念,用于将不同的数据源与elasticsearch实例进行连接和同步。在这种情况下,MySQL将充当数据源,并通过elasticsearch river插件将数据同步elasticsearch中。 首先,需要安装并启用elasticsearch river插件。然后,在MySQL中选择需要同步的表,并利用插件配置生成对应的elasticsearch mapping,该mapping定义了将来在elasticsearch中索引的数据结构。最后,通过在elasticsearch中创建一个名为River的river对象并指定数据源细节(例如MySQL的IP地址、数据库名和表名)来启动同步过程。 该方法适用于一些对数据同步时效性要求不高的场景,但对于需要实时同步的操作,需要使用更为高效的数据同步方法。这时可以考虑使用消息队列,比如Kafka或者RabbitMQ,将MySQL数据更快地同步elasticsearch中。 总之,elasticsearchmysql同步实现应用程序高效数据存储、搜索和分析的关键步骤。利用elasticsearch river插件或者消息队列等方法,可以在合理的成本范围内完成数据同步,并提高应用程序的响应速度和数据可用性。 ### 回答3: elasticsearch(简称ES)和MySQL是目前非常流行的两个开源数据库,它们都有自己的优势和适用场景。ES是一个分布式搜索引擎,具备实时全文搜索、分析和处理功能;MySQL是一个关系型数据库管理系统,可用于存储和管理结构化数据。在实际开发过程中,我们可能需要实现ES和MySQL之间的同步,比如将MySQL数据实时同步到ES中,以便快速检索和分析。 实现ES和MySQL之间的同步主要有两种方式:基于定时同步和基于实时同步。 基于定时同步:它的实现方式是定期将MySQL中的数据导出到ES中,通常借助一些工具来实现,如Logstash、DataX等,这种方式适用于数据体量不是很大的场景,可以设置定时任务来定时执行同步操作。但是由于同步间隔时间比较长,数据也无法做到实时同步。 基于实时同步:这种方式就要用到MySQL的binlog和ES的river插件。binlog是MySQL的日志文件,记录了MySQL中的所有操作,包括插入、修改和删除等。river插件可以监听MySQL binlog文件的变化,并将变化实时同步到ES中,这种方式的同步精度非常高,可以做到真正的实时同步。但是由于该方案会对MySQL产生一定的压力,因此需要谨慎使用。 综上所述,对于同步需求不是很高的场景,我们可以选择定时同步的方式,而对于需要实时同步的场景,我们可以选择基于实时同步的方案。无论采用何种方式,都需要在实际应用中根据自己的需要做出正确的选择。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值