TiDB 缩容过程中调度生成原理及常见问题丨TiDB 扩缩容指南（二）

每天读点书学堂

于 2025-04-06 10:47:13 发布

阅读量675

点赞数 17

文章标签： tidb 数据库数据库架构分布式

本文链接：https://blog.csdn.net/weixin_42241611/article/details/147022645

版权

导读

作为一个分布式数据库，扩缩容是 TiDB 集群最常见的运维操作之一。

本文是 TiDB 扩缩容系列的第二篇，基于 TiDB v7.5 版本，继第一篇介绍 TiDB 扩容过程中 PD 生成调度的原理和常见问题之后，本篇专注于 TiDB 缩容过程中的调度生成原理和可能遇到的常见问题。

本文详细介绍了 TiDB 在缩容过程中，PD（Placement Driver）如何生成调度指令以及 TiKV 如何执行这些指令。文章首先解释了缩容的概念和重要性，然后深入探讨了调度生成的原理，包括资源均衡的考量和调度指令的生成过程。此外，文章还列出了在缩容过程中可能遇到的常见问题，并提供了相应的排查和解决方法。

Overview

TiKV-scale-in

对于 TiKV 来说，缩容和扩容本质上都是在删除或添加 TiKV 节点后，进行数据搬迁最后让所有 TiKV 达到数据均衡的目的。

scale in with 1 tikv node

如图，我们要下线掉 store-4, 则会将 store-4 的数据搬迁到其他节点上。每个 region 副本的具体搬迁原理与扩容是一样的，只是对于 PD 来说，这类 operator 我们称为 replace-rule-offline-peer。

scale in with 1 tikv node-2

TiKV status on PD

status of the tikv on pd

TiKV 会定期将自己的状态通过心跳的方式上报给 PD，PD 则根据 TiKV 的状态，产生相应的调度，让整个集群的资源调度能够均衡起来。我们可以使用 pd-ctl 去获取 PD 上 tikv 的详细状态，目前 tikv 的状态主要分为以下几类：

Up: 正常情况下
Offline: 主动发起下线
Disconnect: 当 PD 20s 没有收到 kv 的心跳后，就会被判断为disconnect，此时这个 tikv 上的数据还不会被搬走。
- 只要 tikv 恢复心跳，该 tikv 就会恢复到 up 状态。
- 需要手动执行下线才会变成 offline 状态
Down: 超过半小时 PD 没有收到 KV 的心跳，则判断为该状态，
- 这个状态下的 kv 数据会慢慢被搬走。
- 需要手动执行下线才会变成 offline 状态。
- 只要 tikv 恢复心跳，该 tikv 状态就立刻变回 up 状态。
Tombstone：当下线状态下的 TiKV 上的数据被完全搬走后，这个 tikv 就会被安全的删除，此时 PD 会将其变为 tombstone. 一旦变为 tombstone 后，将永远无法恢复。
- 对于长期 down 且上面没有数据的 tikv, 需要手动将其下线才会变成 tombstone,否则会一直是 Down状态。

常见问题

TiKV 下线原因判断

从上文我们知道，一旦 TiKV 进入 offline 状态，目标 tikv 上的资源就会很快被释放出来，因为资源变少加上数据搬迁，这个过程中会有一些性能抖动。从上面的 TiKV 状态切换我们知道，下线 TiKV 的原因一般有以下两个：

TiKV 宕机，变成 down：在现实场景中，机器宕机是个很常见且高频预期中的问题
手动下线, 变成 offline：常见运维操作

一般的，我们可以通过 pd 的日志看到具体下线的原因，对于正常下线的 kv, 我们可以根据以下日志示例看到具体下线的时间：

手动下线 TiKV 节点

// Scale in with tiup manually: 
tiup cluster scale-in shirlyv7.5.2 --node "127.0.0.1:20165"

Copy

PD receive DeleteStore by API and set the status of store to offline

// PD log:
// step1: PD receive DeleteStore by API and set the status of store to `offline`
[2024/07/30 22:31:00.444 +08:00] [INFO] [audit.go:126] ["audit log"] [service-info="{ServiceLabel:DeleteStore, Method:HTTP/1.1/DELETE:/pd/api/v1/store/166543141, Component:anonymous, IP:127.0.0.1, Port:60818, StartTime:2024-07-30 22:31:00 +0800 CST, URLParam:{}, BodyParam:}"]
[2024/07/30 22:31:00.444 +08:00] [WARN] [cluster.go:1516] ["store has been offline"] [store-id=166543141] [store-address=127.0.0.1:20164] [physically-destroyed=false]

Copy

PD will set store limit remove-peer to unlimited to speed up operator generation.

// Step2: PD will set store limit remove-peer to unlimited to speed up operator generation.
[2024/07/30 22:31:00.447 +08:00] [INFO] [cluster.go:2627] ["store limit changed"] [store-id=166543141] [type=remove-peer] [rate-per-min=100000000]

Copy

PatrolRegion goroutine notice the offline store

// PatrolRegion goroutine notice the offline store and create replace-rule-offline-peer to move data to other stores.
[2024/07/30 22:31:00.447 +08:00] [INFO] [operator_controller.go:488] ["add operator"] [region-id=2848] [operator="\"replace-rule-offline-peer {mv peer: store [166543141] to [1]} (kind:replica,region, region:2848(181, 23), createAt:2024-07-30 22:31:00.447901052 +0800 CST m=+3045380.050049300, startAt:0001-01-01 00:00:00 +0000 UTC, currentStep:0, size:93, steps:[0:{add learner peer 215218780 on store 1}, 1:{use joint consensus, promote learner peer 215218780 on store 1 to voter, demote voter peer 166548760 on store 166543141 to learner}, 2:{leave joint state, promote learner peer 215218780 on store 1 to voter, demote voter peer 166548760 on store 166543141 to learner}, 3:{remove peer on store 166543141}], timeout:[17m0s])\""] [additional-info=]
….
[2024/07/31 00:15:45.497 +08:00] [INFO] [operator_controller.go:635] ["operator finish"] [region-id=144173] [takes=1.88436143s] [operator="\"replace-rule-offline-leader-peer {mv peer: store [166543141] to [1]} (ki