tiup cluster edit-config tidb-dev修改配置文件后重启集群发现pd组件无法启动:
Error: failed to start pd: failed to start: pd 10.37.62.111:2379, please check the instance's log(/data01/MPP/tidb/tidb-deploy/pd-2379/log) for more detail.: timed out waiting for port 2379 to be started after 2m0s
Verbose debug logs has been written to /home/xx/.tiup/logs/tiup-cluster-debug-2022-07-06-10-33-32.log.
Error: run `/home/xx/.tiup/components/cluster/v1.3.2/tiup-cluster` (wd:/home/tydic/.tiup/data/TAloSQM) failed: exit status 1
查看错误提示中的debugger日志发现其通过ssh远程启动pd服务后一直无法检测到pd的端口,2分钟后超时报错。
日志中ssh之后执行的命令: systemctl start pd-2379.service
查看pd-2379.service的内容:
exec bin/pd-server \
--name="pd-192.168.129.110-2379" \
--client-urls="http://0.0.0.0:2379" \
--advertise-client-urls="http://192.168.129.110:2379" \
--peer-urls="http://0.0.0.0:2380" \
--advertise-peer-urls="http://192.168.129.110:2380" \
--data-dir="/data01/MPP/tidb/tidb-data/pd-2379" \
--initial-cluster="pd-192.168.129.111-2379=http://192.168.129.111:2380,pd-192.168.129.112-2379=http://192.168.129.112:2380,pd-192.168.129.113-2379=http://192.168.129.113:2380,pd-192.168.129.114-2379=http://192.168.129.114:2380,pd-192.168.129.110-2379=http://192.168.129.110:2380" \
--config=conf/pd.toml
复制出来执行,发现报错为pd.toml中有不识别的配置,直接修改pd.toml,然后sudo systemctl start pd-2379.service 成功启动pd, 其他节点也都启动下;
然后通过tiup cluster start cluster-name启动成功。