【我和openGauss的故事】openGauss GAUSS-51400/53600 其它节点状态unknow问题处置
一、检查状态
[omm@Euler1 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Unavailable
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------
1 Euler1 172.16.220.151 26000 6001 /gauss/data/db1 P Down Manually stopped
2 Euler2 172.16.220.152 26000 6002 /gauss/data/db1 S Unknown Unknown
3 Euler3 172.16.220.153 26000 6003 /gauss/data/db1 C Unknown Unknown
二、GAUSS-51400
[omm@Euler1 ~]$ gs_om -t start
Starting cluster.
=========================================
omm@euler2's password:
[GAUSS-51400] : Failed to execute the command: scp Euler3:/gauss/app_5b3e5810/bin/cluster_dynamic_config /gauss/app_5b3e5810/bin/cluster_dynamic_config_Euler3. Error:
ssh: connect to host euler3 port 22: No route to host
Euler3节点主机有问题,检查发现主机未正常启动,重启主机
三、GAUSS-53600/51400
再次启动,发现报错GAUSS-53600/51400
[omm@Euler1 ~]$ gs_om -t start
Starting cluster.
=========================================
omm@euler2's password:
omm@euler3's password:
[SUCCESS] Euler1
2023-07-11 16:33:56.783 64ad13f4.1 [unknown] 140702557879360 [unknown] 0 dn_6001_6002_6003 01000 0 [BACKEND] WARNING: could not create any HA TCP/IP sockets
2023-07-11 16:33:56.785 64ad13f4.1 [unknown] 140702557879360 [unknown] 0 dn_6001_6002_6003 01000 0 [BACKEND] WARNING: Failed to initialize the memory protect for g_instance.attr.attr_storage.cstore_buffers (16 Mbytes) or shared memory (1000 Mbytes) is larger.
=========================================
[GAUSS-53600]: Can not start the database, the cmd is source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off, Error:
[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off. Error:
[FAILURE] Euler2:
.[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StartInstance.py' -U omm -R /gauss/app -t 300 --security-mode=off. Error:
[FAILURE] Euler3:
脚本执行存在问题,python太不靠谱了,关闭节点排查一下
[omm@Euler1 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------
1 Euler1 172.16.220.151 26000 6001 /gauss/data/db1 P Primary Normal
2 Euler2 172.16.220.152 26000 6002 /gauss/data/db1 S Unknown Unknown
3 Euler3 172.16.220.153 26000 6003 /gauss/data/db1 C Unknown Unknown
[omm@Euler1 ~]$ gs_ctl stop -D /gauss/data/db1
[2023-07-11 16:35:58.075][39021][][gs_ctl]: gs_ctl stopped ,datadir is /gauss/data/db1
waiting for server to shut down......... done
omm@Euler1 ~]$ python
Python 3.7.4 (default, Mar 3 2022, 14:19:16)
[GCC 7.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
[omm@Euler1 ~]$
[omm@Euler1 ~]$
[omm@Euler1 ~]$ which python
/usr/bin/python
[omm@Euler1 ~]$ cd /usr/bin/
[root@Euler1 bin]# ls -lsa python
0 lrwxrwxrwx 1 root root 7 Jul 4 16:33 python -> python3
[root@Euler1 bin]# rm python
rm: remove symbolic link 'python'? y
[root@Euler1 bin]# ln -s python2.7 python
删除软连接,换成python2
[omm@Euler1 ~]$ gs_om -t status --detail
[ Cluster State ]
cluster_state : Degraded
redistributing : No
current_az : AZ_ALL
[ Datanode State ]
node node_ip port instance state
---------------------------------------------------------------------------------
1 Euler1 172.16.220.151 26000 6001 /gauss/data/db1 P Primary Normal
2 Euler2 172.16.220.152 26000 6002 /gauss/data/db1 S Unknown Unknown
3 Euler3 172.16.220.153 26000 6003 /gauss/data/db1 C Unknown Unknown
[omm@Euler1 ~]$ gs_om -t stop
Stopping cluster.
=========================================
[GAUSS-53606]: Can not stop the database, the cmd is source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast, Error:
[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast. Error:
[FAILURE] Euler1:
[FAILURE] Euler2:
[FAILURE] Euler3:
..
[omm@Euler1 ~]$ ls -lsa /gauss/om/script/local/StopInstance.py
8 -rwx------ 1 omm dbgrp 4719 Nov 12 2022 /gauss/om/script/local/StopInstance.py
[omm@Euler1 ~]$ chmod 777 /gauss/om/script/local/StopInstance.py
[omm@Euler1 ~]$ gs_om -t stop
Stopping cluster.
=========================================
[GAUSS-53606]: Can not stop the database, the cmd is source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast, Error:
[GAUSS-51400] : Failed to execute the command: source /home/omm/.bashrc; python3 '/gauss/om/script/local/StopInstance.py' -U omm -R /gauss/app -t 300 -m fast. Error:
[FAILURE] Euler1:
[FAILURE] Euler2:
[FAILURE] Euler3:
再次关闭,依然报错,重新修改权限,依然报错
[omm@Euler1 ~]$ ls -lsa /gauss/om/script/local/StopInstance.py
8 -rwxrwxrwx 1 omm dbgrp 4719 Nov 12 2022 /gauss/om/script/local/StopInstance.py
[omm@Euler1 ~]$