高可用部署（lightdb）

觅含

已于 2022-12-02 10:36:22 修改

阅读量348

点赞数

分类专栏： lightdb 文章标签：服务器网络 linux

于 2022-12-02 10:25:30 首次发布

本文链接：https://blog.csdn.net/weixin_44031114/article/details/128142240

版权

lightdb 专栏收录该内容

11 篇文章 1 订阅

订阅专栏

高可用

lightdb使用ltcluter搭建高可用环境。

1 准备

以下设置在主备机上都需要执行。

1.1 host名设置

修改配置文件，vim /etc/hostname。

1.2 设置.pgpass文件

如在~目录下不存在.pgpass，则创建该文件。
.pgpass文件内容格式如下：
hostname:port:database:username:password

1.3 防火墙

以下为防火墙配置步骤：
如果使用firewall防火墙，请执行以下命令，其中第一条命令中的5432需修改为实际使用的端口。
firewall-cmd --permanent --add-port=5432/tcp
firewall-cmd --permanent --add-port=123/udp
如果使用iptables防火墙，请执行以下命令，其中第一条命令中的5432需修改为实际使用的端口。
iptables -A INPUT -p tcp --dport 5432 -j ACCEPT
iptables -A INPUT -p udp --dport 123 -j ACCEPT
如果使用其他防火墙，则参考防火墙相关文档正确开放端口如果您的环境可以关闭防火墙，则可以使用下面命令停止并禁用防火墙。
systemctl stop firewalld.service
systemctl disable firewalld.service
systemctl stop NetworkManager.service
systemctl disable NetworkManager.service

1.4 ssh免密设置

以高可用一主一从为例，主节点IP为192.168.10.110，从节点IP为192.168.10.128，主从均切换到lightdb用户，按如下所示步骤进行配置。

# 免密认证，所有服务器都要执行
ssh-keygen -t rsa -P "" -f ~/.ssh/id_rsa

# 证书同步，所有服务器都要执行
ssh-copy-id lightdb@192.168.10.128
ssh-copy-id lightdb@192.168.10.110

# 免密验证，主机SSH连接从机，无需输密码
[lightdb@localhost install]$ ssh lightdb@192.168.10.128
Last login: Thu Aug 12 09:02:32 2021

2 主机配置

2.1 lightdb.conf配置

# Enable replication connections; set this value to at least one more
    # than the number of standbys which will connect to this server
    # (note that ltcluster will execute "lt_basebackup" in WAL streaming mode,
    # which requires two free WAL senders).
    #
    # See: https://www.hs.net/lightdb/docs/html/runtime-config-replication.html#GUC-MAX-WAL-SENDERS

    max_wal_senders = 10

    # If using replication slots, set this value to at least one more
    # than the number of standbys which will connect to this server.
    # Note that ltcluster will only make use of replication slots if
    # "use_replication_slots" is set to "true" in "ltcluster.conf".
    # (If you are not intending to use replication slots, this value
    # can be set to "0").
    #
    # See: https://www.hs.net/lightdb/docs/html/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS

    max_replication_slots = 10

    # Ensure WAL files contain enough information to enable read-only queries
    # on the standby.
    #
    #  LightDB 21 and later: one of 'replica' or 'logical'
    #    ('hot_standby' will still be accepted as an alias for 'replica')
    #
    # See: https://www.hs.net/lightdb/docs/html/runtime-config-wal.html#GUC-WAL-LEVEL

    wal_level = 'hot_standby'

    # Enable read-only queries on a standby
    # (Note: this will be ignored on a primary but we recommend including
    # it anyway, in case the primary later becomes a standby)
    #
    # See: https://www.hs.net/lightdb/docs/html/runtime-config-replication.html#GUC-HOT-STANDBY

    hot_standby = on

    # Enable WAL file archiving
    #
    # See: https://www.hs.net/lightdb/docs/html/runtime-config-wal.html#GUC-ARCHIVE-MODE

    archive_mode = on

    # Set archive command to a dummy command; this can later be changed without
    # needing to restart the LightDB instance.
    #
    # See: https://www.hs.net/lightdb/docs/html/runtime-config-wal.html#GUC-ARCHIVE-COMMAND

    archive_command = '/bin/true'

2.2 创建ltcluter用户和数据库

   createuser -s ltcluster
  createdb ltcluster -O ltcluster

2.3 lt_hba.conf设置

    local   replication   ltcluster                              trust
    host    replication   ltcluster      127.0.0.1/32            trust
    host    replication   ltcluster      192.168.1.0/24          trust

    local   ltcluster        ltcluster                              trust
    host    ltcluster        ltcluster      127.0.0.1/32            trust
    host    ltcluster        ltcluster      192.168.1.0/24          trust

注意：ADDRESS需要修改成本地机器的ip。如机器为192.168.105.150，将192.168.1.0/24修改成192.168.105.0/24.

2.4 备机准备

在备机上连接主机数据库。

ltsql 'host=node1 user=ltcluster dbname=ltcluster connect_timeout=2'

成功之后再继续，往下执行。不成功检测防火墙状态、ssh端口、和配置文件等。

2.5 创建ltculter.conf文件（主备都需要创建）

创建该文件的目录，可以在当前bin目录下。

    node_id=1
    node_name='node1'
    conninfo='host=node1 user=ltcluster dbname=ltcluster connect_timeout=2'
    data_directory='/var/lib/lightdb/data'

2.6 注册主机

注册主机
ltcluster -f /etc/ltcluster.conf primary register
查看primary状态
ltcluster -f /etc/ltcluster.conf cluster show
登录ltsql,查看节点信息
SELECT * FROM ltcluster.nodes;

2.7 克隆主机实例到备机

（1）配置备机的ltcluter.conf文件
node_id=2
node_name=‘node2’
conninfo=‘host=node2 user=ltcluster dbname=ltcluster connect_timeout=2’
data_directory=‘/var/lib/lightdb/data’
（2) 使用–dry-run，查看备机克隆的情况
ltcluster -h node1 -U ltcluster -d ltcluster -f /etc/ltcluster.conf standby clone --dry-run

(3) 执行克隆
ltcluster -h node1 -U ltcluster -d ltcluster -f /etc/ltcluster.conf standby clone

2.8 验证流复制功能

登录主机进行查询
SELECT * FROM pg_stat_replication;

3 备机

3.1 注册备机

在备机上执行，ltcluster -f /etc/ltcluster.conf standby register

3.2 查看节点信息

ltcluster -f /etc/ltcluster.conf cluster show

4 主库重启

# 1. 暂停ltclusterd，防止自动failover
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service pause

# 2. 查看集群状态，确认primary的Paused?状态为yes
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status

# 3. 先断开所有连接到数据库的客户端和应用程序（否则数据库将stop failed），然后停止主库
lt_ctl -D $LTDATA stop  # 默认会回滚所有未断开的连接

# 如果有连接存在导致stop failed，则可以尝试使用
lt_ctl -D $LTDATA stop -m smart

# 如果仍然stop failed，且因条件限制无法或不希望断开所有客户端连接，则可以使用-m immediate强制停止数据库,此方式下没有回滚连接，即强制断开、强制停止，没有完全shutdown，会导致在启动时recovery
lt_ctl -D $LTDATA stop -m immediate

# 4. 等待数据库停止成功，确认步骤3执行结果中出现server stopped信息

# 5. 修改数据库参数，或做其他事情

# 6. 启动主库
lt_ctl -D $LTDATA start

# 7. 等待数据库启动成功，确认步骤6执行结果中出现server started的信息

# 8. 恢复ltclusterd
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service unpause

# 9. 查看集群状态，确认primary的Paused?状态为no
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status

5 备库重启

# 1. 暂停ltclusterd,防止自动failover
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service pause

# 2. 查看集群状态，确认standby的Paused?字段为yes
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status

# 3. 先断开所有连接到数据库的客户端和应用程序（否则数据库将stop failed），然后停止备库
lt_ctl -D $LTDATA stop  # 默认会回滚所有未断开的连接

# 如果有连接存在导致stop failed，则可以尝试使用
lt_ctl -D $LTDATA stop -m smart

# 如果仍然stop failed，且因条件限制无法或不希望断开所有客户端连接，则可以使用-m immediate强制停止数据库,此方式下没有回滚连接，即强制断开、强制停止，没有完全shutdown，会导致在启动时recovery
lt_ctl -D $LTDATA stop -m immediate

# 4. 等待数据库停止成功，确认步骤3执行结果中出现server stopped信息

# 5. 修改数据库参数，或做其他事情

# 6. 启动备库
lt_ctl -D $LTDATA start

# 7. 等待数据库启动成功，确认步骤6执行结果中出现server started的信息

# 8. 恢复ltclusterd
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service unpause

# 9. 确认standby的Paused?字段为no
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status

6 主备切换

# switchover会自动对集群做pause和unpuase操作，不需手动执行pause/unpause

# 1. 在备机上试运行
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf standby switchover --siblings-follow --dry-run

# 2. 如果试运行结果最后一行信息为：prererequisites for executing STANDBY SWITCHOVER are met，则表示成功，可以进入下一步

# 3. 在备机上正式运行switchover主备切换
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf standby switchover --siblings-follow

# 4. 在各节点上分别查看集群状态，确认各节点执行结果中primary和standby角色确实已互换
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf service status

7 主库恢复

# 1. 确认LightDB已停止

# 2. 确认ltclusterd是否启动，若不存在则启动它
ps aux | grep ltcluster
ltclusterd -d -f $LTHOME/etc/ltcluster/ltcluster.conf -p $LTHOME/etc/ltcluster/ltclusterd.pid

# 3. rejoin试运行，new_primary_host为原，也就是新主的host，new_primary_port为新主端口号
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf node rejoin -d 'host=new_primary_host port=new_primary_port dbname=ltcluster user=ltcluster' --verbose --force-rewind --dry-run

# 4. 确认试运行成功，进入下一步

# 5. 正式执行rejoin，new_primary_host与new_primary_port同上
ltcluster -f $LTHOME/etc/ltcluster/ltcluster.conf node rejoin -d 'host=new_primary_host port=new_primary_port dbname=ltcluster user=ltcluster' --verbose --force-rewind

# 6. 按本文档5.4.2.3所述，在新备上执行主备切换，恢复到最初的主备关系

# 7. 确认keepalived是否启动
ps aux | grep keepalived

8 备库恢复

# 1. 启动lightdb
lt_ctl -D $LTDATA start

# 2. 确认ltclusterd是否启动，若不存在则启动它
ps aux | grep ltcluster
ltcluster -d -f $LTHOME/etc/ltcluster/ltcluster.conf -p $LTHOME/etc/ltcluster/ltclusterd.pid

# 3. 确认keepalived是否启动
ps aux | grep keepalived