十一、mongodb分片集群迁移

段帅星

已于 2024-06-05 15:35:44 修改

阅读量21

点赞数

分类专栏： mongodb 文章标签： mongodb

于 2023-03-04 09:56:14 首次发布

本文链接：https://blog.csdn.net/weixin_47003048/article/details/129330700

版权

mongodb 专栏收录该内容

13 篇文章 1 订阅

订阅专栏

场景1：从一个shard集群迁移到另外一个shard集群
迁移方案：mongoshake

机器规划：
1、规划：
原集群cluster01：

角色	IP地址	端口
cluster01-router	192.168.86.21	27017
cluster01-configsvr	192.168.86.22	27017
shard01-shardsvr01	192.168.86.23	27017
shard02-shardsvr01	192.168.86.24	27017
shard03-shardsvr01	192.168.86.25	27017

目标集群cluster02：

角色	IP地址	端口
cluster01-router	192.168.86.21	27018
cluster01-configsvr	192.168.86.22	27018
shard01-shardsvr01	192.168.86.23	27018
shard02-shardsvr01	192.168.86.24	27018
shard03-shardsvr01	192.168.86.25	27018

mongoshake：192.168.86.26

2、配置文件

基于mongo sharding集群的同步source和destination都是sharding集群
在做mongo 集群间同步时，需要重点注意collector.conf中的几个配置
mongo_urls 源集群的replica-set地址
mongo_cs_url 源集群的config server 地址
mongo_s_url 源集群的mongos地址
tunnel 通道模式，集群间同步使用direct
tunnel.address 目的集群的mongos地址
filter.namespace.white 需要同步的数据库
checkpoint.storage.collection checkpoint存储的表的名字，如果启动多个mongoshake拉取同一个源可以修改这个表名以防止冲突

collector.conf

# if you have any problem, please visit https://github.com/alibaba/MongoShake/wiki/FAQ
# for the detail explanation, please visit xxxx
# 如果有问题，请先查看FAQ文档以及wiki上的说明。
# 关于各个参数的详细说明，请参考：xxx

# current configuration version, do not modify.
# 当前配置文件的版本号，请不要修改该值。
conf.version = 10

# --------------------------- global configuration ---------------------------
# collector name
# id用于输出pid文件等信息。
id = mongoshake

# high availability option.
# enable master election if set true. only one mongoshake can become master
# and do sync, the others will wait and at most one of them become master once
# previous master die. The master information stores in the `mongoshake` db in the source
# database by default.
# This option is useless when there is only one mongoshake running.
# 如果开启主备mongoshake拉取同一个源端，此参数需要开启。
master_quorum = false

# http api interface. Users can use this api to monitor mongoshake.
# `curl 127.0.0.1:9100`.
# We also provide a restful tool named "mongoshake-stat" to
# print ack, lsn, checkpoint and qps information based on this api.
# usage: `./mongoshake-stat --port=9100`
# 全量和增量的restful监控端口，可以用curl查看内部监控metric统计情况。详见wiki。
full_sync.http_port = 9101
incr_sync.http_port = 9100
# profiling on net/http/profile
# profiling端口，用于查看内部go堆栈。
system_profile_port = 9200

# global log level: debug, info, warning, error. lower level message will be filter
log.level = info
# log directory. log and pid file will be stored into this file.
# if not set, default is "./logs/"
# log和pid文件的目录，如果不设置默认打到当前路径的logs目录。
log.dir =
# log file name.
# log文件名。
log.file = collector.log
# log flush enable. If set false, logs may not be print when exit. If
# set true, performance will be decreased extremely
# 设置log刷新，false表示包含缓存，如果true那么每条log都会直接刷屏，但对性能有影响；
# 反之，退出不一定能打印所有的log，调试时建议配置true。
log.flush = false

# sync mode: all/full/incr. default is incr.
# all means full synchronization + incremental synchronization.
# full means full synchronization only.
# incr means incremental synchronization only.
# 同步模式，all表示全量+增量同步，full表示全量同步，incr表示增量同步。
sync_mode = all

# connect source mongodb, set username and password if enable authority. Please note: password shouldn't contain '@'.
# split by comma(,) if use multiple instance in one replica-set. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC
# split by semicolon(;) if sharding enable. E.g., mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ
# 源MongoDB连接串信息，逗号分隔同一个副本集内的结点，分号分隔分片sharding实例，免密模式
# 可以忽略“username:password@”，注意，密码里面不能含有'@'符号。
# 举例：
# 副本集：mongodb://username1:password1@primaryA,secondaryB,secondaryC
# 分片集：mongodb://username1:password1@primaryA,secondaryB,secondaryC;mongodb://username2:password2@primaryX,secondaryY,secondaryZ
#mongo_urls = mongodb://username:password@127.0.0.1:20040,127.0.0.1:20041
mongo_urls = mongodb://root:rootPassw0rd@192.168.86.23:27017;mongodb://root:rootPassw0rd@192.168.86.24:27017;mongodb://root:rootPassw0rd@192.168.86.25:27017
# please fill the source config server url if source mongodb is sharding.
#mongo_cs_url = mongodb://platform-exporter:A2m3EC8oRRaA0skZ@10.127.25.143:21207,10.127.25.144:21207,10.127.25.147:21207/admin
mongo_cs_url = mongodb://root:rootPassw0rd@192.168.86.22:27017/admin

# please give at least one mongos address if source is sharding.
# 如果源端采用change stream拉取，这里还需要配置至少一个mongos的地址，多个mongos地址以逗号（,）分割
#mongo_s_url =
mongo_s_url = mongodb://root:rootPassw0rd@192.168.86.21:27017/admin

# enable source ssl
mongo_ssl_root_ca_file =

# tunnel pipeline type. now we support rpc,file,kafka,mock,direct
# 通道模式。
tunnel = direct
# tunnel target resource url
# for rpc. this is remote receiver socket address
# for tcp. this is remote receiver socket address
# for file. this is the file path, for instance "data"
# for kafka. this is the topic and brokers address which split by comma, for
# instance: topic@brokers1,brokers2, default topic is "mongoshake"
# for mock. this is uesless
# for direct. this is target mongodb address which format is the same as `mongo_urls`. If
# the target is sharding, this should be the mongos address.
# direct模式用于直接写入MongoDB，其余模式用于一些分析，或者远距离传输场景，
# 注意，如果是非direct模式，需要通过receiver进行解析，具体参考FAQ文档。
# 此处配置通道的地址，格式与mongo_urls对齐。
#tunnel.address = mongodb://127.0.0.1:20080
#tunnel.address =
tunnel.address = mongodb://root:rootPassw0rd@192.168.86.21:27018/admin

# the message format in the tunnel, used when tunnel is kafka.
# "raw": batched raw data format which has good performance but encoded so that users
# should parse it by receiver.
# "json": single oplog format by json.
# "bson": single oplog format by bson.
# 通道数据的类型，只用于kafka和file通道类型。
# raw是默认的类型，其采用聚合的模式进行写入和
# 读取，但是由于携带了一些控制信息，所以需要专门用receiver进行解析。
# json以json的格式写入kafka，便于用户直接读取。
# bson以bson二进制的格式写入kafka。
tunnel.message = raw
# how many partitions will be written, use some hash function in "incr_sync.shard_key".
# 如果目的端是kafka，最多启用多少个partition，最大不超过"incr_sync.worker"。默认1
tunnel.kafka.partition_number = 1
# tunnel json format, it'll only take effect in the case of tunnel.message = json
# and tunnel == kafka. Set canonical_extended_json if you want to use "Canonical
# Extended JSON Format", #559.
# 写入异构通道的json格式。如果希望使用Canonical Extended Json Format，则设置为
# canonical_extended_json
tunnel.json.format =
# if tunnel == driect and enable ssl
tunnel.mongo_ssl_root_ca_file =

# connect mode:
# primary: fetch data from primary.
# secondaryPreferred: fetch data from secondary if has, otherwise primary.(default)
# standalone: fetch data from given 1 node, no matter primary, secondary or hidden. This is only
# support when tunnel type is direct.
# 连接模式，primary表示从主上拉取，secondaryPreferred表示优先从secondary拉取（默认建议值），
# standalone表示从任意单个结点拉取。
mongo_connect_mode = secondaryPreferred

# filter db or collection namespace. at most one of these two parameters can be given.
# if the filter.namespace.black is not empty, the given namespace will be
# filtered while others namespace passed.
# if the filter.namespace.white is not empty, the given namespace will be
# passed while others filtered.
# all the namespace will be passed if no condition given.
# db and collection connected by the dot(.).
# different namespaces are split by the semicolon(;).
# filter: filterDbName1.filterCollectionName1;filterDbName2
# 黑白名单过滤，目前不支持正则，白名单表示通过的namespace，黑名单表示过滤的namespace，
# 不能同时指定。分号分割不同namespace，每个namespace可以是db，也可以是db.collection。
filter.namespace.black =
#filter.namespace.white =
filter.namespace.white = duanshuaixing01
# some databases like "admin", "local", "mongoshake", "config", "system.views" are
# filtered, users can enable these database based on some special needs.
# different database are split by the semicolon(;).
# e.g., admin;mongoshake.
# pay attention: collection isn't support like "admin.xxx" except "system.views"
# 正常情况下，不建议配置该参数，但对于有些非常特殊的场景，用户可以启用admin，mongoshake等库的同步，
# 以分号分割，例如：admin;mongoshake。
filter.pass.special.db =
# only transfer oplog commands for syncing. represent
# by oplog.op are "i","d","u".
# DDL will be transferred if disable like create index, drop databse,
# transaction in mongodb 4.0.
# 是否需要开启DDL同步，true表示开启，源是sharding暂时不支持开启。
# 如果目的端是sharding，暂时不支持applyOps命令，包括事务。
filter.ddl_enable = false
# filter oplog gid if enabled.
# 如果MongoDB启用了gid，但是目的端MongoDB不支持gid导致同步会失败，可以启用gid过滤，将会去掉gid字段。
# 谨慎建议开启，shake本身性能受损很大。
filter.oplog.gids = false

# checkpoint info, used in resuming from break point.
# checkpoint存储信息，用于支持断点续传。
# context.storage.url is used to mark the checkpoint store database. E.g., mongodb://127.0.0.1:20070
# if not set, checkpoint will be written into source mongodb(db=mongoshake)
# checkpoint的具体写入的MongoDB地址，如果不配置，对于副本集和分片集群都将写入源库(db=mongoshake)
# 2.4版本以后不需要配置为源端cs的地址。
checkpoint.storage.url =
# checkpoint db's name.
# checkpoint存储的db的名字
checkpoint.storage.db = mongoshake
# checkpoint collection's name.
# checkpoint存储的表的名字，如果启动多个mongoshake拉取同一个源可以修改这个表名以防止冲突。
#checkpoint.storage.collection = ckpt_default
checkpoint.storage.collection = duanshuaixing-sync-20230304-1638
# set if enable ssl
checkpoint.storage.url.mongo_ssl_root_ca_file =
# real checkpoint: the fetching oplog position.
# pay attention: this is UTC time which is 8 hours latter than CST time. this
# variable will only be used when checkpoint is not exist.
# 本次开始拉取的位置，如果checkpoint已经存在（位于上述存储位置）则该参数无效，
# 如果需要强制该位置开始拉取，需要先删除原来的checkpoint，详见FAQ。
# 若checkpoint不存在，且该值为1970-01-01T00:00:00Z，则会拉取源端现有的所有oplog。
# 若checkpoint不存在，且该值不为1970-01-01T00:00:00Z，则会先检查源端oplog最老的时间是否
# 大于给定的时间，如果是则会直接报错退出。
checkpoint.start_position = 1970-01-01T00:00:00Z

# transform from source db or collection namespace to dest db or collection namespace.
# at most one of these two parameters can be given.
# transform: fromDbName1.fromCollectionName1:toDbName1.toCollectionName1;fromDbName2:toDbName2
# 转换命名空间，比如a.b同步后变成c.d，谨慎建议开启，比较耗性能。
transform.namespace =

# --------------------------- full sync configuration ---------------------------
# the number of collection concurrence
# 并发最大拉取的表个数，例如，6表示同一时刻shake最多拉取6个表。
full_sync.reader.collection_parallel = 6
# the number of document writer thread in each collection.
# 同一个表内并发写的线程数，例如，8表示对于同一个表，将会有8个写线程进行并发写入。
full_sync.reader.write_document_parallel = 8
# number of documents in a batch insert in a document concurrence
# 目的端写入的batch大小，例如，128表示一个线程将会一次聚合128个文档然后再写入。
full_sync.reader.document_batch_size = 128
# max number of fetching thread per table. default is 1
# 单个表最大拉取的线程数，默认是单线程拉取。需要具备splitVector权限。
# 注意：对单个表来说，仅支持索引对应的value是同种类型，如果有不同类型请勿启用该配置项！
full_sync.reader.parallel_thread = 1
# the parallel query index if set full_sync.reader.parallel_thread. index should only has
# 1 field.
# 如果设置了full_sync.reader.parallel_thread，还需要设置该参数，并行拉取所扫描的index，value
# 必须是同种类型。对于副本集，建议设置_id；对于集群版，建议设置shard_key。key只能有1个field。
full_sync.reader.parallel_index = _id

# drop the same name of collection in dest mongodb in full synchronization
# 同步时如果目的库存在，是否先删除目的库再进行同步，true表示先删除再同步，false表示不删除。
full_sync.collection_exist_drop = true

# create index option.
# none: do not create indexes.
# foreground: create indexes when data sync finish in full sync stage.
# background: create indexes when starting.
# 全量期间数据同步完毕后，是否需要创建索引，none表示不创建，foreground表示创建前台索引，
# background表示创建后台索引。
full_sync.create_index = none

# convert insert to update when duplicate key found
# 如果_id存在在目的库，是否将insert语句修改为update语句。
full_sync.executor.insert_on_dup_update = false
# filter orphan document for source type is sharding.
# 源端是sharding，是否需要过滤orphan文档
full_sync.executor.filter.orphan_document = false
# enable majority write in full sync.
# the performance will degrade if enable.
# 全量阶段写入端是否启用majority write
full_sync.executor.majority_enable = false

# --------------------------- incrmental sync configuration ---------------------------
# fetch method:
# oplog: fetch oplog from source mongodb (default)
# change_stream: use change to receive change event from source mongodb, support MongoDB >= 4.0.
# we recommand to use change_stream if possible.
#incr_sync.mongo_fetch_method = oplog
incr_sync.mongo_fetch_method = change_stream

# After the document is updated, the fields that only need to be updated are set to false,
# and the contents of all documents are set to true
# 更新文档后,只需要更新的字段则设为false,需要全部文档内容则设为true
# 只在mongo_fetch_method = change_stream 模式下生效，且性能有所下降
incr_sync.change_stream.watch_full_document = false

# global id. used in active-active replication.
# this parameter is not supported on current open-source version.
# gid用于双活防止环形复制，目前只用于阿里云云上MongoDB，如果是阿里云云上实例互相同步
# 希望开启gid，请联系阿里云售后，sharding的有多个gid请以分号(;)分隔。
incr_sync.oplog.gids =

# distribute data to different worker by hash key to run in parallel.
# [auto] 		decide by if there has unique index in collections.
# 		 		use `collection` if has unique index otherwise use `id`.
# [id] 			shard by ObjectId. handle oplogs in sequence by unique _id
# [collection] 	shard by ns. handle oplogs in sequence by unique ns
# hash的方式，id表示按文档hash，collection表示按表hash，auto表示自动选择hash类型。
# 如果没有索引建议选择id达到非常高的同步性能，反之请选择collection。
incr_sync.shard_key = collection
# if shard_key is collection, and users want to improve performance when some collections
# do not have unique key.
# 对于按collection哈希，如果某些表不具有唯一索引，则可以设置按_id哈希以提高并发度。
# 用户需要确认该表不会创建唯一索引，一旦检测发现存在唯一索引，则会立刻crash退出。
# 例如，db1.collection1;db2.collection2，不支持仅指定db
incr_sync.shard_by_object_id_whitelist =

# oplog transmit worker concurrent
# if the source is sharding, worker number must equal to shard numbers.
# 内部发送的worker数目，如果机器性能足够，可以提高worker个数。
incr_sync.worker = 8

# how many writing threads will be used in one worker.
# 对于目的端是kafka等非direct tunnel，启用多少个序列化线程，必须为"incr_sync.worker"的倍数。
# 默认为"incr_sync.worker"的值。
incr_sync.tunnel.write_thread = 0

# set the sync delay just like mongodb secondary slaveDelay parameter. unit second.
# 设置目的端的延迟，比如延迟源端20分钟，类似MongoDB本身主从同步slaveDelay参数，单位：秒
# 0表示不启用
incr_sync.target_delay = 0

# memory queue configuration, plz visit FAQ document to see more details.
# do not modify these variables if the performance and resource usage can
# meet your needs.
# 内部队列的配置参数，如果目前性能足够不建议修改，详细信息参考FAQ。
incr_sync.worker.batch_queue_size = 64
incr_sync.adaptive.batching_max_size = 1024
incr_sync.fetcher.buffer_capacity = 256

# --- direct tunnel only begin ---
# if tunnel type is direct, all the below variable should be set
# 下列参数仅用于tunnel为direct的情况。

# oplog changes to Insert while Update found non-exist (_id or unique-index)
# 如果_id不存在在目的库，是否将update语句修改为insert语句。
incr_sync.executor.upsert = false
# oplog changes to Update while Insert found duplicated key (_id or unique-index)
# 如果_id存在在目的库，是否将insert语句修改为update语句。
incr_sync.executor.insert_on_dup_update = false
# db. write duplicated logs to mongoshake_conflict
# 如果写入存在冲突，记录冲突的文档。
incr_sync.conflict_write_to = none

# enable majority write in incrmental sync.
# the performance will degrade if enable.
# 增量阶段写入端是否启用majority write
incr_sync.executor.majority_enable = false

# --- direct tunnel only end ---

# 特殊字段，标识源端类型，默认为空。阿里云MongoDB serverless集群请配置aliyun_serverless
special.source.db.flag =

3、部署mongoshake并执行

kubectl create deployment mongo-shake --image=registry.baidubce.com/tools/mongo-shake-v2.6.4_2:latest

kubectl exec -it mongo-shake-xxx bash
修改配置文件并执行开启同步
bash   start.sh ./collector.conf

4、mongoshake分片集群问题

1、索引属性不能同步，需要完成后停止同步并重建索引
测试不停止同步删除索引增量数据仍会正常同步
db.user01.dropIndex( {userid:1} ) 
db.user01.ensureIndex( { userid: 1 },{unique:true})

2、用户信息不能同步，需要人工干预创建用户
新集群创建账号，停止同步重新开启后源有数据账号不会被删除

3、完成全量同步后增量数据在停止同步重新开启不会被覆盖删除

4、不支持断点续传，要确保被同步在同步期间不能重启，否则数据会重新覆盖同步

5、mongo分片集群数据count少数据问题，操作的是分片的集合（前提）；
1>shard分片正在做块迁移，导致有重复数据出现
存在孤立文档（因为不正常关机、块迁移失败等原因导致，mongodb4.4版本开始可以支持自动清理孤儿文档）
2>count和aggregate的不同：在mongoDB中，count和aggregate是在两支不同的程序中实现的，aggregate的实现是考虑到了shard的环境的，所以官方文档是推荐使用aggregate来进行shard环境下的count。

db.user01.aggregate( [ { $group: { _id: null, count: { $sum: 1 } } } ] )

3>在各个shard的PRIMARY节点admin库执行cleanupOrphaned清理孤儿文档
db.runCommand( { cleanupOrphaned: "duanshuaixing01.user01" } )

4、全量+增量同步失败
报错提示：[CRIT] run replication failed: incr sync ts[7206215134182637577[1677827708, 9]] is less than current oldest ts[7206839828585906182[1677973156, 6]], this error means user's oplog collection size is too small or full sync continues too long

原因：此错误通常发生在 MongoShake 刚刚完成全同步并开始增量同步时，oplog 在源 MongoDB 上丢失。例如，MongoShake full-sync 开始时间是 A，结束时间是 B，然后增量同步开始，它会尝试从 A 开始获取 oplog。所以一旦 A 在源 MongoDB oplog collection() 上被清除，local.oplog.rs这错误发生。用户可以检查使用rs.printReplication()来检查。解决这个问题的方法是增加 oplog 集合的大小。从 v2.4 开始，用户还可以启用full_sync.oplog_store_disk在全同步阶段将 oplog 存储在本地磁盘

full_sync.oplog_store_disk


增量持久化参数
full_sync.reader.oplog_store_disk。2.4版本废弃。一旦启用，在全量同步期间，对源库的oplog进行拉取并本地磁盘持久化，避免增量同步所需的oplog被源库删除，造成增量同步失败。默认false。
full_sync.reader.oplog_store_disk_max_size。在全量同步期间，oplog本地持久化的容量上限，单位MB，若超过该值全量还没结束，则不再写入磁盘。默认256000。

full_sync.reader.oplog_store_disk = true
full_sync.reader.oplog_store_disk_max_size = 25600000