机房电源模块故障,导致部分服务器停电,很不幸我们的机柜就在这批服务器中。待服务器启动后,我们需要手动启动codis服务。
codis启动顺序如下:
1.先启动dashboard
2.启动redis
3.启动proxy
4.上线proxy
bash /usr/local/codis/scripts/start_dashboard.sh
执行命令后,提示dashboard已经启动,但是其端口18087没有启动。检查日志如下:
tail -f /data/codis/codis_dashboard/logs/dashboard.log
2017/06/30 21:37:44 dashboard.go:170: [info] dashboard listening on addr: :18087
2017/06/30 21:37:44 dashboard.go:244: [fatal] dashboard already exists: {"addr": "10.10.3.124:18087", "pid": 20430}
究其原因,dashboard关闭需要通过kill {pid},而不是kill -9 {pid},否则kill -9 和服务器断电都会导致dashboard异常退出,再次启动时就会报如上错误。
其实在dashboard启动时会在日志中有相关提示:
2015/09/26 11:18:41 dashboard.go:160: [INFO] dashboard listening on addr: :18087
2015/09/26 11:18:41 dashboard.go:143: [INFO] dashboard node created: /zk/codis/db_test/dashboard, {"addr": "localhost:18087", "pid": 1701}
2015/09/26 11:18:41 dashboard.go:144: [WARN] ********** Attention **********
2015/09/26 11:18:41 dashboard.go:145: [WARN] You should use `kill {pid}` rather than `kill -9 {pid}` to stop me,
2015/09/26 11:18:41 dashboard.go:146: [WARN] or the node resisted on zk will not be cleaned when I'm quiting and you must remove it manually
2015/09/26 11:18:41 dashboard.go:147: [WARN] *******************************
因此按照提示,我们需要在zookeeper中进行删除dashboard,操作如下:
/usr/local/zookeeper-3.4.6/bin/zkCli.sh -server 127.0.0.1:2181
[zk: 127.0.0.1:2181(CONNECTED) 2] ls /zk/codis/db_codis
[fence, slots, servers, proxy, migrate_tasks, dashboard, LOCK, actions, ActionResponse]
[zk: 127.0.0.1:2181(CONNECTED) 3] rmr /zk/codis/db_codis/dashboard
删除后再次启动就会正常。