TIDB突然断电pd文件损坏启动失败

weixin_50870405

已于 2023-11-15 11:05:18 修改

阅读量339

点赞数 4

文章标签： tidb

于 2023-11-15 10:55:15 首次发布

本文链接：https://blog.csdn.net/weixin_50870405/article/details/134414683

版权

记录一次TIDB数据库突然断电导致db文件损坏导致集群启动失败

pd.log错误日志输出

["run server failed"] [error="[PD:leveldb:ErrLevelDBOpen]leveldb: manifest corrupted (field 'comparer'): missing [file=MANIFEST-000030]"] [stack="main.main\n\t/home/jenkins/agent/workspace/build-common/go/src/github.com/pingcap/pd/cmd/pd-server/main.go:122\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:225"]

使用pd-recover修复pd

详细文档：PD Recover 使用文档 | PingCAP 文档中心

一.获取 Cluster ID

从 PD 日志获取 Cluster ID

cd /TiDB/tidb-deploy/pd-2379/log/

cat pd.log | grep "init cluster id"

[2022/04/20 12:23:07.079 +08:00] [INFO] [server.go:358] ["init cluster id"] [cluster-id=7088536805883498676]

二.获取已分配 ID

cat pd*.log | grep "idAllocator allocates a new id" | awk -F'=' '{print $2}' | awk -F']' '{print $1}' | sort -r -n | head -n 1

3500

三.部署一套新的 PD 集群

部署新的 PD 集群之前，需要停止当前的 PD 集群，然后删除旧的数据目录

1.停止旧集群

tiup cluster stop tidb

2.删除旧集群pd的数据目录

mv /TiDB/tidb-data/pd-2379 /TiDB/tidb-data/pd-2379.bak

3.修改旧集群的meta.yaml注释掉pd相关配置

/root/.tiup/storage/cluster/clusters/tidb/meta.yaml

4.部署新的pd

创建新topology.yaml使用以下配置

# # Global variables are applied to all deployments and used as the default value of
# # the deployments if a specific deployment value is missing.
global:
  # # The user who runs the tidb cluster.
  user: "tidb"
  # # group is used to specify the group name the user belong to if it's not the same as user.
  # group: "tidb"
  # # SSH port of servers in the managed cluster.
  ssh_port: 22
  # # Storage directory for cluster deployment files, startup scripts, and configuration files.
  deploy_dir: "/TiDB/tidb-deploy"
  # # TiDB Cluster data storage directory
  data_dir: "/TiDB/tidb-data"
  # # Supported values: "amd64", "arm64" (default: "amd64")
  arch: "amd64"
  # # Resource Control is used to limit the resource of an instance.
  # # See: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html
  # # Supports using instance-level `resource_control` to override global `resource_control`.
  # resource_control:
  #   # See: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#MemoryLimit=bytes
  #   memory_limit: "2G"
  #   # See: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#CPUQuota=
  #   # The percentage specifies how much CPU time the unit shall get at maximum, relative to the total CPU time available on one CPU. Use values > 100% for allotting CPU time on more than one CPU.
  #   # Example: CPUQuota=200% ensures that the executed processes will never get more than two CPU time.
  #   cpu_quota: "200%"
  #   # See: https://www.freedesktop.org/software/systemd/man/systemd.resource-control.html#IOReadBandwidthMax=device%20bytes
  #   io_read_bandwidth_max: "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 100M"
  #   io_write_bandwidth_max: "/dev/disk/by-path/pci-0000:00:1f.2-scsi-0:0:0:0 100M"

# # Server configs are used to specify the runtime configuration of TiDB components.
# # All configuration items can be found in TiDB docs:
# # - TiDB: https://pingcap.com/docs/stable/reference/configuration/tidb-server/configuration-file/
# # - TiKV: https://pingcap.com/docs/stable/reference/configuration/tikv-server/configuration-file/
# # - PD: https://pingcap.com/docs/stable/reference/configuration/pd-server/configuration-file/
# # - TiFlash: https://docs.pingcap.com/tidb/stable/tiflash-configuration
# #
# # All configuration items use points to represent the hierarchy, e.g:
# #   readpool.storage.use-unified-pool
# #           ^       ^
# # - example: https://github.com/pingcap/tiup/blob/master/examples/topology.example.yaml.
# # You can overwrite this configuration via the instance-level `config` field.
# server_configs:
  # tidb:
  # tikv:
  # pd:
  # tiflash:
  # tiflash-learner:
monitored:
  # # The communication port for reporting system information of each node in the TiDB cluster.
  node_exporter_port: 9120
  # # Blackbox_exporter communication port, used for TiDB cluster port monitoring.
  blackbox_exporter_port: 9125
  # # Storage directory for deployment files, startup scripts, and configuration files of monitoring components.
  deploy_dir: "/TiDB/tidb-deploy/monitored-9120"
  # # Data storage directory of monitoring components.
  data_dir: "/TiDB/tidb-data/monitored-9120"
  # # Log storage directory of the monitoring component.
  log_dir: "/TiDB/tidb-deploy/monitored-9120/log"
# # Server configs are used to specify the configuration of PD Servers.
pd_servers:
  - host: 10.66.0.135
  # # The ip address of the PD Server.
    deploy_dir: "/TiDB/tidb-deploy/pd-2379"
    data_dir: "/TiDB/tidb-data/pd-2379"
    log_dir: "/TiDB/tidb-deploy/pd-2379/log"