【openGauss5.0.0】一主两备部署及CM查看主备级联架构
一、环境说明
- 操作系统:openEuler 20.03 TLS
- 数据库版本:openGauss-5.0.0 TLS
- 实验环境:个人PC+VirtualBox 规格:x64|2核4GB
- 主机规划:
二、一主两备部署
- 以root用户登录服务器,完成如下操作:
- 下载软件包
- 创建对应的目录,如/opt/software/openGauss,上传软件包到这个目录下
- 解压,并在该目录下创建cluster_config.xml文件,并添加如下内容:
<?xml version="1.0" encoding="UTF-8"?> <ROOT> <!-- openGauss整体信息 --> <CLUSTER> <PARAM name="clusterName" value="Cluster_template" /> <PARAM name="nodeNames" value="primary,standby,standby02" /> <PARAM name="gaussdbAppPath" value="/opt/huawei/install/app" /> <PARAM name="gaussdbLogPath" value="/var/log/omm" /> <PARAM name="tmpMppdbPath" value="/opt/huawei/tmp"/> <PARAM name="gaussdbToolPath" value="/opt/huawei/install/om" /> <PARAM name="corePath" value="/opt/huawei/corefile"/> <PARAM name="backIp1s" value="192.168.56.26,192.168.56.27,192.168.56.30"/> </CLUSTER> <!-- 每台服务器上的节点部署信息 --> <DEVICELIST> <!-- node1上的节点部署信息 --> <DEVICE sn="primary"> <PARAM name="name" value="primary"/> <PARAM name="azName" value="AZ1"/> <PARAM name="azPriority" value="1"/> <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP --> <PARAM name="backIp1" value="192.168.56.26"/> <PARAM name="sshIp1" value="192.168.56.26"/> <!--CM节点部署信息--> <PARAM name="cmsNum" value="1"/> <PARAM name="cmServerPortBase" value="15000"/> <PARAM name="cmServerListenIp1" value="192.168.56.26,192.168.56.27,192.168.56.30"/> <PARAM name="cmServerHaIp1" value="192.168.56.26,192.168.56.27,192.168.56.30"/> <PARAM name="cmServerlevel" value="1"/> <PARAM name="cmServerRelation" value="primary,standby,standby02"/> <PARAM name="cmDir" value="/opt/huawei/data/cmserver"/> <!--dn--> <PARAM name="dataNum" value="1"/> <PARAM name="dataPortBase" value="26000"/> <PARAM name="dataNode1" value="/opt/huawei/install/data/dn,standby,/opt/huawei/install/data/dn,standby02,/opt/huawei/install/data/dn"/> <PARAM name="dataNode1_syncNum" value="0"/> </DEVICE> <!-- node2上的节点部署信息,其中“name”的值配置为主机名称 --> <DEVICE sn="standby"> <PARAM name="name" value="standby"/> <PARAM name="azName" value="AZ1"/> <PARAM name="azPriority" value="1"/> <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP --> <PARAM name="backIp1" value="192.168.56.27"/> <PARAM name="sshIp1" value="192.168.56.27"/> <!-- cm --> <PARAM name="cmServerPortStandby" value="15000"/> <PARAM name="cmDir" value="/opt/huawei/data/cmserver"/> </DEVICE> <!-- node3上的节点部署信息,其中“name”的值配置为主机名称 --> <DEVICE sn="standby02"> <PARAM name="name" value="standby02"/> <PARAM name="azName" value="AZ1"/> <PARAM name="azPriority" value="1"/> <!-- 如果服务器只有一个网卡可用,将backIP1和sshIP1配置成同一个IP --> <PARAM name="backIp1" value="192.168.56.30"/> <PARAM name="sshIp1" value="192.168.56.30"/> <!-- cm --> <PARAM name="cmServerPortStandby" value="15000"/> <PARAM name="cmDir" value="/opt/huawei/data/cmserver"/> </DEVICE> </DEVICELIST> </ROOT>
- 执行预安装操作
[root@primary script]# ./gs_preinstall -U omm -G dbgrp -X /opt/software/openGauss/cluster_config.xml Parsing the configuration file. Successfully parsed the configuration file. Installing the tools on the local node. Successfully installed the tools on the local node. Are you sure you want to create trust for root (yes/no)?yes Please enter password for root Password: Successfully created SSH trust for the root permission user. Setting host ip env Successfully set host ip env. Distributing package. Begin to distribute package to tool path. Successfully distribute package to tool path. Begin to distribute package to package path. Successfully distribute package to package path. Successfully distributed package. Are you sure you want to create the user[omm] and create trust for it (yes/no)? yes Preparing SSH service. Successfully prepared SSH service. Installing the tools in the cluster. Successfully installed the tools in the cluster. Checking hostname mapping. Successfully checked hostname mapping. Creating SSH trust for [omm] user. Please enter password for current user[omm]. Password: Checking network information. All nodes in the network are Normal. Successfully checked network information. Creating SSH trust. Creating the local key file. Successfully created the local key files. Appending local ID to authorized_keys. Successfully appended local ID to authorized_keys. Updating the known_hosts file. Successfully updated the known_hosts file. Appending authorized_key on the remote node. Successfully appended authorized_key on all remote node. Checking common authentication file content. Successfully checked common authentication content. Distributing SSH trust file to all node. Distributing trust keys file to all node successfully. Successfully distributed SSH trust file to all node. Verifying SSH trust on all hosts. Successfully verified SSH trust on all hosts. Successfully created SSH trust. Successfully created SSH trust for [omm] user. Checking OS software. Successfully check os software. Checking OS version. Successfully checked OS version. Creating cluster's path. Successfully created cluster's path. Set and check OS parameter. Setting OS parameters. Successfully set OS parameters. Warning: Installation environment contains some warning messages. Please get more details by "/opt/software/openGauss/script/gs_checkos -i A -h primary,standby,standby02 --detail". Set and check OS parameter completed. Preparing CRON service. Successfully prepared CRON service. Setting user environmental variables. Successfully set user environmental variables. Setting the dynamic link library. Successfully set the dynamic link library. Setting Core file Successfully set core path. Setting pssh path Successfully set pssh path. Setting Cgroup. Successfully set Cgroup. Set ARM Optimization. No need to set ARM Optimization. Fixing server package owner. Setting finish flag. Successfully set finish flag. Preinstallation succeeded.
- 切换到omm用户执行安装,等待安装完成
[omm@primary ~]$ gs_install -X /opt/software/openGauss/cluster_config.xml Parsing the configuration file. Check preinstall on every node. Successfully checked preinstall on every node. Creating the backup directory. Successfully created the backup directory. begin deploy.. Installing the cluster. begin prepare Install Cluster.. Checking the installation environment on all nodes. begin install Cluster.. Installing applications on all nodes. Successfully installed APP. begin init Instance.. encrypt cipher and rand files for database. Please enter password for database: Please repeat for database: begin to create CA cert files The sslcert will be generated in /opt/huawei/install/app/share/sslcert/om Create CA files for cm beginning. Create CA files on directory [/opt/huawei/install/app_a07d57c3/share/sslcert/cm]. file list: ['server.key.cipher', 'client.key', 'server.crt', 'server.key.ran d', 'client.crt', 'server.key', 'client.key.rand', 'client.key.cipher', 'cacert.pem'] Non-dss_ssl_enable, no need to create CA for DSS Cluster installation is completed. Configuring. Deleting instances from all nodes. Successfully deleted instances from all nodes. Checking node configuration on all nodes. Initializing instances on all nodes. Updating instance configuration on all nodes. Check consistence of memCheck and coresCheck on database nodes. Successful check consistence of memCheck and coresCheck on all nodes. Configuring pg_hba on all nodes. Configuration is completed. Starting cluster. ====================================================================== Successfully started primary instance. Wait for standby instance. ====================================================================== . Successfully started cluster. ====================================================================== cluster_state : Normal redistributing : No node_count : 3 Datanode State primary : 1 standby : 2 secondary : 0 cascade_standby : 0 building : 0 abnormal : 0 down : 0 Successfully installed application. end deploy..
- 查看集群状态
使用go_om查看集群运行状态:
使用cm_ctl命令查看状态:[omm@primary ~]$ gs_om -t status --detail [ CMServer State ] node node_ip instance state ----------------------------------------------------------------------------- 1 primary 192.168.56.26 1 /opt/huawei/data/cmserver/cm_server Primary 2 standby 192.168.56.27 2 /opt/huawei/data/cmserver/cm_server Standby 3 standby02 192.168.56.30 3 /opt/huawei/data/cmserver/cm_server Standby [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node node_ip instance state ------------------------------------------------------------------------------- 1 primary 192.168.56.26 6001 /opt/huawei/install/data/dn P Primary Normal 2 standby 192.168.56.27 6002 /opt/huawei/install/data/dn S Standby Normal 3 standby02 192.168.56.30 6003 /opt/huawei/install/data/dn S Standby Normal
[omm@primary ~]$ cm_ctl query -v -C [ CMServer State ] node instance state ----------------------------- 1 primary 1 Primary 2 standby 2 Standby 3 standby02 3 Standby [ Cluster State ] cluster_state : Normal redistributing : No balanced : Yes current_az : AZ_ALL [ Datanode State ] node instance state | node instance state | node instance state --------------------------------------------------------------------------------------------------------------------------- 1 primary 6001 P Primary Normal | 2 standby 6002 S Standby Normal | 3 standby02 6003 S Standby Normal
三、cm_ctl命令使用
查看cm_ctl命令帮助信息查看其具体的用法,例如下面的几个示例:
- 查看各个server节点的服务配置:
查看日志级别:[omm@primary ~]$ cm_ctl list --param --server [conf of node(1)] log_dir = /var/log/omm/omm/cm/cm_server log_file_size = 16MB log_min_messages = WARNING thread_count = 1000 instance_heartbeat_timeout = 6 instance_failover_delay_timeout = 0 cmserver_ha_connect_timeout = 2 cmserver_ha_heartbeat_timeout = 6 cmserver_ha_status_interval = 1 cmserver_self_vote_timeout = 6 phony_dead_effective_time = 5 cm_server_arbitrate_delay_base_time_out = 10 cm_server_arbitrate_delay_incrememtal_time_out = 3 alarm_component = '/opt/huawei/snas/bin/snas_cm_cmd' alarm_report_interval = 3 alarm_report_max_count = 1 instance_keep_heartbeat_timeout = 40 az_switchover_threshold = 100 az_check_and_arbitrate_interval = 2 az_connect_check_interval = 60 az_connect_check_delay_time = 150 cmserver_demote_delay_on_etcd_fault = 8 instance_phony_dead_restart_interval = 21600 enable_transaction_read_only = on datastorage_threshold_check_interval = 10 datastorage_threshold_value_check = 85 max_datastorage_threshold_check = 43200 enable_az_auto_switchover = 1 cm_auth_method = trust cm_krb_server_keyfile = '${GAUSSHOME}/kerberos/{UserName}.keytab' switch_rto = 600 force_promote = 0 backup_open = 0 enable_dcf = off ddb_type = 0 enable_ssl = on ssl_cert_expire_alert_threshold = 90 ssl_cert_expire_check_interval = 86400 delay_arbitrate_timeout = 0 delay_arbitrate_max_cluster_timeout = 300 ddb_log_level = RUN_ERR|RUN_WAR|DEBUG_ERR|OPER|RUN_INF|PROFILE ddb_log_backup_file_count = 10 ddb_max_log_file_size = 10M ddb_log_suppress_enable = 1 ddb_election_timeout = 3 enable_e2e_rto = 0 share_disk_path = '' voting_disk_path = '' disk_timeout = 200 agent_network_timeout = 6 dn_arbitrate_mode = quorum agent_fault_timeout = 60 third_party_gateway_ip = '' cms_enable_failover_on2nodes = false cms_enable_db_crash_recovery = false cms_network_isolation_timeout = 20 [conf of node(2)] log_dir = /var/log/omm/omm/cm/cm_server log_file_size = 16MB log_min_messages = WARNING thread_count = 1000 instance_heartbeat_timeout = 6 instance_failover_delay_timeout = 0 cmserver_ha_connect_timeout = 2 cmserver_ha_heartbeat_timeout = 6 cmserver_ha_status_interval = 1 cmserver_self_vote_timeout = 6 phony_dead_effective_time = 5 cm_server_arbitrate_delay_base_time_out = 10 cm_server_arbitrate_delay_incrememtal_time_out = 3 alarm_component = '/opt/huawei/snas/bin/snas_cm_cmd' alarm_report_interval = 3 alarm_report_max_count = 1 instance_keep_heartbeat_timeout = 40 az_switchover_threshold = 100 az_check_and_arbitrate_interval = 2 az_connect_check_interval = 60 az_connect_check_delay_time = 150 cmserver_demote_delay_on_etcd_fault = 8 instance_phony_dead_restart_interval = 21600 enable_transaction_read_only = on datastorage_threshold_check_interval = 10 datastorage_threshold_value_check = 85 max_datastorage_threshold_check = 43200 enable_az_auto_switchover = 1 cm_auth_method = trust cm_krb_server_keyfile = '${GAUSSHOME}/kerberos/{UserName}.keytab' switch_rto = 600 force_promote = 0 backup_open = 0 enable_dcf = off ddb_type = 0 enable_ssl = on ssl_cert_expire_alert_threshold = 90 ssl_cert_expire_check_interval = 86400 delay_arbitrate_timeout = 0 delay_arbitrate_max_cluster_timeout = 300 ddb_log_level = RUN_ERR|RUN_WAR|DEBUG_ERR|OPER|RUN_INF|PROFILE ddb_log_backup_file_count = 10 ddb_max_log_file_size = 10M ddb_log_suppress_enable = 1 ddb_election_timeout = 3 enable_e2e_rto = 0 share_disk_path = '' voting_disk_path = '' disk_timeout = 200 agent_network_timeout = 6 dn_arbitrate_mode = quorum agent_fault_timeout = 60 third_party_gateway_ip = '' cms_enable_failover_on2nodes = false cms_enable_db_crash_recovery = false cms_network_isolation_timeout = 20 [conf of node(3)] log_dir = /var/log/omm/omm/cm/cm_server log_file_size = 16MB log_min_messages = WARNING thread_count = 1000 instance_heartbeat_timeout = 6 instance_failover_delay_timeout = 0 cmserver_ha_connect_timeout = 2 cmserver_ha_heartbeat_timeout = 6 cmserver_ha_status_interval = 1 cmserver_self_vote_timeout = 6 phony_dead_effective_time = 5 cm_server_arbitrate_delay_base_time_out = 10 cm_server_arbitrate_delay_incrememtal_time_out = 3 alarm_component = '/opt/huawei/snas/bin/snas_cm_cmd' alarm_report_interval = 3 alarm_report_max_count = 1 instance_keep_heartbeat_timeout = 40 az_switchover_threshold = 100 az_check_and_arbitrate_interval = 2 az_connect_check_interval = 60 az_connect_check_delay_time = 150 cmserver_demote_delay_on_etcd_fault = 8 instance_phony_dead_restart_interval = 21600 enable_transaction_read_only = on datastorage_threshold_check_interval = 10 datastorage_threshold_value_check = 85 max_datastorage_threshold_check = 43200 enable_az_auto_switchover = 1 cm_auth_method = trust cm_krb_server_keyfile = '${GAUSSHOME}/kerberos/{UserName}.keytab' switch_rto = 600 force_promote = 0 backup_open = 0 enable_dcf = off ddb_type = 0 enable_ssl = on ssl_cert_expire_alert_threshold = 90 ssl_cert_expire_check_interval = 86400 delay_arbitrate_timeout = 0 delay_arbitrate_max_cluster_timeout = 300 ddb_log_level = RUN_ERR|RUN_WAR|DEBUG_ERR|OPER|RUN_INF|PROFILE ddb_log_backup_file_count = 10 ddb_max_log_file_size = 10M ddb_log_suppress_enable = 1 ddb_election_timeout = 3 enable_e2e_rto = 0 share_disk_path = '' voting_disk_path = '' disk_timeout = 200 agent_network_timeout = 6 dn_arbitrate_mode = quorum agent_fault_timeout = 60 third_party_gateway_ip = '' cms_enable_failover_on2nodes = false cms_enable_db_crash_recovery = false cms_network_isolation_timeout = 20
[omm@primary ~]$ cm_ctl get --log_level cm_ctl: send get msg to cm_server. . cm_ctl: cm server has been get. cm_ctl: log_level=WARNING cm_ctl: gets it successfully.
四、openGauss主备从(备级联)xlog日志同步
-
从cm_ctl命令查看主备从的含义
…… Instance state including: Primary database system run as a primary server, send xlog to standby server Standby database system run as a standby server, receive xlog from primary server Cascade Standby database system run as a cascade standby server, receive xlog from standby server Pending database system run as a pending server, wait for promoting to primary or demoting to standby Down database system not running Unknown database system not connected ……
Primary(主)
:database system run as a primary server,send xlog to standby server
Standby(备)
: database system run as a standby server,receive xlog from primary server
Cascade Standby(级联备)
:database system run as a cascade standby server,receive xlog from standby server
三者的日志同步的顺序:
主 -- 发生xlog日志 --> 备 -- 发生xlog日志--> 从备(级联)