KingbaseES R6 集群修改ssh端口执行sys_backup.sh备份案例

数据库环境:**

test=# select version();
                                                       version                                                    
    ------------------------------------------------------------------------------------------------------------------
 KingbaseES V008R006C003B0010 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46), 64-bit
(1 row)

操作系统:

[kingbase@node1 bin]$ cat /etc/centos-release
CentOS Linux release 7.2.1511 (Core)

集群架构:

案例说明:

1)本案例在通用机环境下执行。sys_backup.sh是调用sys_rman做物理备份,对于集群环境需要用到ssh端口做远程连接,当修改ssh端口,会影响sys_backup.sh正常执行。
2)修改ssh端口对于集群的运行,只需要修改repmgr.conf文件中变量即可。
3)对于修改ssh端口后,用sys_backup.sh作物理备份,需要在sys_backup.sh脚本中修改所有ssh语句的连接端口,修改的位置较多。
4)建议如果对ssh修改端口后,需要用sys_backup.sh作备份的应用较多的情况下,在sys_backup.sh脚本中用变量来指定ssh端口号。

一、查看当前集群状态

[kingbase@node2 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                
----+---------+---------+-----------+----------+----------+----------+----------+----------------
 1  | node248 | standby |   running | node249  | default  | 100      | 6        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3
 2  | node249 | primary | * running |          | default  | 100      | 6        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count

二、修改操作系统和集群配置文件ssh端口号(所有节点)

1)查看系统原ssh端口号(默认22)

[kingbase@node2 bin]$ netstat -antlp |grep 22
(Not all processes could be identified, non-owned process info
 will not be shown, you would have to be root to see it all.)
tcp        0      0 192.168.122.1:53        0.0.0.0:*               LISTEN      -                   
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      -                   
tcp        0      0 192.168.7.249:22        192.168.7.116:55883     ESTABLISHED -                   
tcp6       0      0 :::22                   :::*                    LISTEN      -

2)查看集群repmgr.conf应用ssh端口号

[kingbase@node2 bin]$ cat ../etc/repmgr.conf|grep ssh
ssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 22'

=== 默认用-p 22 指定集群ssh通讯端口===

3)修改操作系统端口

[root@node1 ~]# cat /etc/ssh/sshd_config|grep -i Port
# If you want to change the port on a SELinux system, you have to tell
# semanage port -a -t ssh_port_t -p tcp #PORTNUMBER
Port 2222

4)修改集群ssh通讯端口(改为2222)

[kingbase@node1 bin]$ cat ../etc/repmgr.conf |grep sshssh_options='-q -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ServerAliveInterval=2 -o ServerAliveCountMax=5 -p 2222'

5)重启sshd服务

[root@node1 ~]# systemctl restart sshd[root@node1 ~]# netstat -an |grep 22tcp        0      0 0.0.0.0:2222            0.0.0.0:*               LISTEN

6)通过非默认端口ssh连接测试

[root@node1 ~]# ssh -p 2222 node2Last failed login: Mon Mar  1 17:06:07 CST 2021 from 192.168.7.116 on ssh:nottyThere were 2 failed login attempts since the last successful login.Last login: Mon Mar  1 16:43:29 2021 from 192.168.7.249

=== 从以上可知,修改端口后ssh信任关系正常===

7)sys_monitor.sh重启集群测试

[kingbase@node1 bin]$ ./sys_monitor.sh restart
2021-03-01 17:29:55 Ready to stop all DB ...
Service process "node_export" was killed at process 11833
Service process "postgres_ex" was killed at process 11834
Service process "node_export" was killed at process 9343
Service process "postgres_ex" was killed at process 9344
2021-03-01 17:30:00 begin to stop repmgrd on "[192.168.7.248]".
2021-03-01 17:30:01 repmgrd on "[192.168.7.248]" stop success.
2021-03-01 17:30:01 begin to stop repmgrd on "[192.168.7.249]".
2021-03-01 17:30:02 repmgrd on "[192.168.7.249]" stop success.
2021-03-01 17:30:02 begin to stop DB on "[192.168.7.249]".waiting for server to shut down..... done
server stopped
2021-03-01 17:30:04 DB on "[192.168.7.249]" stop success.
2021-03-01 17:30:04 begin to stop DB on "[192.168.7.248]".waiting for server to shut down......... done
server stopped
2021-03-01 17:30:11 DB on "[192.168.7.248]" stop success.
2021-03-01 17:30:11 Done.2021-03-01 17:30:11 Ready to start all DB ...
2021-03-01 17:30:11 begin to start DB on "[192.168.7.248]".waiting for server to start.... done
server started
2021-03-01 17:30:12 execute to start DB on "[192.168.7.248]" success, connect to check it.
2021-03-01 17:30:13 DB on "[192.168.7.248]" start success.
2021-03-01 17:30:13 Try to ping trusted_servers on host 192.168.7.248 ...
2021-03-01 17:30:16 Try to ping trusted_servers on host 192.168.7.249 ...
2021-03-01 17:30:18 begin to start DB on "[192.168.7.249]".waiting for server to start.... done
server started
2021-03-01 17:30:20 execute to start DB on "[192.168.7.249]" success, connect to check it.
2021-03-01 17:30:21 DB on "[192.168.7.249]" start success. 
ID | Name    | Role    | Status    | Upstream  | Location | Priority | Timeline | Connection string                                                                                                                                ----+---------+---------+-----------+-----------+----------+----------+----------+--------------------------------------------------------------------------------------------------------------------------------------------------- 
1  | node248 | standby |   running | ! node249 | default  | 100      | 6        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 
2  | node249 | primary | * running |           | default  | 100      | 6        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3

WARNING: following issues were detected  - node "node248" (ID: 1) is not attached to its upstream node "node249" (ID: 2)
2021-03-01 17:30:21 The primary DB is started.
2021-03-01 17:30:25 Success to load virtual ip [192.168.7.240/24] on primary host [192.168.7.249].
2021-03-01 17:30:25 Try to ping vip on host 192.168.7.248 ...
2021-03-01 17:30:28 Try to ping vip on host 192.168.7.249 ...
2021-03-01 17:30:30 begin to start repmgrd on "[192.168.7.248]".
[2021-03-01 17:30:31] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf"
[2021-03-01 17:30:31] [NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"2021-03-01 17:30:31 repmgrd on "[192.168.7.248]" start success.
2021-03-01 17:30:31 begin to start repmgrd on "[192.168.7.249]".
[2021-03-01 17:29:25] [NOTICE] using provided configuration file "/home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf"[2021-03-01 17:29:25] 
[NOTICE] redirecting logging output to "/home/kingbase/cluster/R6HA/KHA/kingbase/hamgr.log"2021-03-01 17:30:32 repmgrd on "[192.168.7.249]" 
start success. 
ID | Name    | Role    | Status    | Upstream | repmgrd | PID   | Paused? | Upstream last seen
----+---------+---------+-----------+----------+---------+-------+---------+-------------------- 
1  | node248 | standby |   running | node249  | running | 16767 | no      | 0 second(s) ago     
2  | node249 | primary | * running |          | running | 17865 | no      | n/a                2021-03-01 17:30:38 Done.

8)查看集群节点状态

[kingbase@node1 bin]$ ./repmgr cluster show
 ID | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                                                                                                ----+---------+---------+-----------+----------+----------+----------+----------+---------------- 
1  | node248 | standby |   running | node249  | default  | 100      | 6        | host=192.168.7.248 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count=3 
2  | node249 | primary | * running |          | default  | 100      | 6        | host=192.168.7.249 user=esrep dbname=esrep port=54321 connect_timeout=10 keepalives=1 keepalives_idle=10 keepalives_interval=1 keepalives_count

=== 从以上可知,修改ssh端口后,集群通讯正常===

二、修改ssh端口后执行sys_backup.sh备份(所有节点)

1)在修改ssh端口前备份基础上停止备份测试

[kingbase@node1 bin]$ ./sys_backup.sh stop
Disable all sys_rman in crontab-daemon
ssh: connect to host 192.168.7.248 port 22: Connection refused
ssh: connect to host 192.168.7.248 port 22: Connection refused
ssh: connect to host 192.168.7.248 port 22: Connection refused

=== 如上所示,在通过sys_backup.sh基于集群环境做备份时,会通过ssh做远程节点的连接,修改端口后,无法通过ssh连接===

2)修改sys_backup.sh脚本中ssh端口

=== 修改”ssh_cmd“变量===

# local function_ssh_cmd_="ssh -p 2222 -n -o ConnectTimeout=30 -o StrictHostKeyChecking=no -o PreferredAuthentications=publickey -- "
function _log () {
        echo "$*" >> /tmp/sys_backup.sh.log
} # end of _log

=修改”_gene_ssh_pwd_less“中ssh通讯端口=

function _gene_ssh_pwd_less() {
        _ip="${1}"
        _user="${2}"
        # 1. check whether pwd-less work
        ssh -p 2222 -t -o ConnectTimeout=30 -o PreferredAuthentications=publickey ${_user}@${_ip} date 1>/dev/null 2>/dev/null
        _local2remote_rt=$?
        ssh -p 2222 -t -o ConnectTimeout=30 -o PreferredAuthentications=publickey ${_user}@${_ip} "ssh -p 2222 ${_user}@${_repo_ip} date>/dev/null 2>/dev/null" 2>/dev/null

=== 配置ssh免密中ssh端口===

# set local.pub to remote, get remote.pub to local        _remote_pub_buf=` ssh -p 2222 -q -o StrictHostKeyChecking=no -o ConnectTimeout=30 -o PreferredAuthentications=password -- ${_user}@${_ip} \                "if [ ! -f \\${HOME}/.ssh/id_rsa.pub ] ; then echo -e '\ny' | ssh-keygen -t rsa -N '' >/dev/null 2>/dev/null ; fi;echo ${_t_buf_pub} >> \\${HOME}/.ssh/authorized_keys;chmod 600 \\${HOME}/.ssh/authorized_keys;cat  \\${HOME}/.ssh/id_rsa.pub;" `

三、执行sys_backup.sh备份

1)执行init备份初始化

[kingbase@node2 bin]$ ./sys_backup.sh init
# generate local sys_rman.conf...DONE
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...DONE
# create stanza and check...(maybe 60+ seconds)ERROR: check stanza failed, check log file /tmp/sys_rman_check.log

=== 脚本执行报错,在check stanza失败===

查看日志:

[kingbase@node2 bin]$ cat /tmp/sys_rman_check.log
2021-03-01 12:38:56.011 P00   INFO: check command begin 2.27: --config=/home/kingbase/kbbr_repo/sys_rman.conf --log-level-console=info --log-level-file=info --log-path=/tmp --log-subprocess --kb2-host=192.168.7.248 --kb2-host-user=kingbase --kb1-path=/home/kingbase/cluster/R6HA/KHA/kingbase/data --kb2-path=/home/kingbase/cluster/R6HA/KHA/kingbase/data --kb1-port=54321 --kb2-port=54321 --kb1-user=esrep --kb2-user=esrep --repo1-path=/home/kingbase/kbbr_repo --stanza=kingbaseWARN: unable to check kb-2: [UnknownError] remote-0 process on '192.168.7.248' terminated unexpectedly [255]: ssh: connect to host 192.168.7.248 port 22: Connection refusedERROR: [125]: remote-0 process on '192.168.7.248' terminated unexpectedly [255]: ssh: connect to host 192.168.7.248 port 22: Connection refused2021-03-01 12:38:56.529 P00   INFO: check command end: aborted with exception [125]

=== 从日志可可知,在执行check stanza时,需要通过ssh连接备库;但是使用ssh连接时,仍然使用修改前的22端口,无法使用修改后的2222端口,导致连接备库失败,check stanza失败===

3)在sys_backup.sh脚本注释stanza检测(跳过check stanza)

371 #${_rman_bin} --config=${_rman_conf_file} --stanza=${_stanza_name} --log-level-console=info check >>/tmp/sys_rm    an_check.log 2>&1372 #if [ "X0" != "X$?" ] ; then
373 #        echo "ERROR: check stanza failed, check log file /tmp/sys_rman_check.log"374 #        exit 3
375 #fi
376 echo "# create stanza and check...DONE"

4)再次执行sys_backup.sh备份

# init 初始化

[kingbase@node2 bin]$ ./sys_backup.sh init
# generate local sys_rman.conf...DONE
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...
# update all node: sys_rman.conf and archive_command with sys_rman.archive-push...DONE
# create stanza and check...(maybe 60+ seconds)# create stanza and check...DONE# initial first full backup...(maybe several minutes)
# initial first full backup...DONE# Initial sys_rman OK.'sys_backup.sh start' should be executed when need back-rest feature.

# start 开始备份

[kingbase@node2 bin]$ ./sys_backup.sh start
Enable some sys_rman in crontab-daemonSet full-backup in 7 daysSet incr-backup in 1 days
0 2 */7 * * kingbase /home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase --archive-copy --type=full backup >>/tmp/sys_rman_backup_full.log 2>&1
0 4 */1 * * kingbase /home/kingbase/cluster/R6HA/KHA/kingbase/bin/sys_rman --config=/home/kingbase/kbbr_repo/sys_rman.conf --stanza=kingbase --archive-copy --type=incr backup >>/tmp/sys_rman_backup_incr.log 2>&1

# pause 备份暂停

[kingbase@node2 bin]$ ./sys_backup.sh pause
Puase the sys_rman...DONE

# unpause 停止暂停

[kingbase@node2 bin]$ ./sys_backup.sh unpause
Un-Puase the sys_rman...DONE

# stop 停止备份

[kingbase@node2 bin]$ ./sys_backup.sh stop
Disable all sys_rman in crontab-daemon
[kingbase@node2 bin]$ cat /etc/cron.d/KINGBASECRON 
*/1 * * * * kingbase . /etc/profile;/home/kingbase/cluster/R6HA/KHA/kingbase/bin/kbha -A daemon -f /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../etc/repmgr.conf >> /home/kingbase/cluster/R6HA/KHA/kingbase/bin/../kbha.log 2>&1
#*/1 * * * * kingbase  /home/kingbase/cluster/kha/db/bin/network_rewind.sh#*/1 * * * * root  /home/kingbase/cluster/kha/kingbasecluster/bin/restartcluster.sh

=== 从以上信息获知,修改系统ssh端口后,通过sys_backup.sh备份成功===

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值