作者:张华 发表于:2020-02-27
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明
LAN: pacemaker active/passive cluster, or native active/active cluster
WAN: federation or shovel
This blog is about pacemaker active/passive cluster, about federation, pls see:
https://blog.csdn.net/qq_34939489/article/details/88786266
https://pdf.us/2018/06/07/1260.html
https://blog.csdn.net/shudaqi2010/article/details/53760423
test steps
juju add-model ha
juju deploy ubuntu rabbit1 --series=bionic --config hostname=rabbit1 --constraints "mem=1G cores=1 root-disk=6G"
juju deploy ubuntu rabbit2 --series=bionic --config hostname=rabbit2 --constraints "mem=1G cores=1 root-disk=6G"
juju deploy ubuntu rabbit3 --series=bionic --config hostname=rabbit3 --constraints "mem=1G cores=1 root-disk=6G"
ubuntu@zhhuabj-bastion:~$ juju status |grep ready
rabbit1/0* active idle 0 10.5.0.7 ready
rabbit2/0* active idle 1 10.5.0.15 ready
rabbit3/0* active idle 2 10.5.0.12 ready
#run the following commands in 3 units
echo -e "10.5.0.7 rabbit1\n10.5.0.15 rabbit2\n10.5.0.12 rabbit3" |sudo tee -a /etc/hosts
echo -e "net.ipv4.ip_nonlocal_bind = 1\nnet.ipv4.ip_forward = 1" |sudo tee -a /etc/sysctl.conf
sudo sysctl -p
sudo apt install chrony -y ##https://www.server-world.info/en/note?os=Ubuntu_18.04&p=ntp
sudo systemctl restart chrony
sudo apt install pacemaker corosync fence-agents resource-agents crmsh pcs -y
sudo apt install rabbitmq-server -y
HOSTNAME=$(hostname)
cat << EOF | sudo tee -a /etc/rabbitmq/rabbitmq-env.conf
RABBITMQ_NODENAME=rabbit@$HOSTNAME
EOF
cat << EOF | sudo tee /etc/corosync/corosync.conf
totem {
version: 2
secauth: off
cluster_name: quqicluster
transport: udpu
token: 55000
rrp_problem_count_timeout: 110000
rrp_problem_count_threshold: 30
token_retransmits_before_loss_const: 30
}
nodelist {
node {
ring0_addr: rabbit1
nodeid: 1
}
node {
ring0_addr: rabbit2
nodeid: 2
}
node {
ring0_addr: rabbit3
nodeid: 3
}
}
quorum {
provider: corosync_votequorum
}
logging {
fileline: off
to_stderr: no
to_logfile: yes
logfile: /var/log/corosync.log
to_syslog: yes
syslog_facility: daemon
debug: on
timestamp: on
logger_subsys {
subsys: QUORUM
debug: on
}
}
EOF
#sudo systemctl restart corosync #use pcs instead for management
sudo systemctl restart pcsd
sudo systemctl enable pcsd
#configure node authentication
sudo passwd hacluster #change the password to 'password' for the user hacluster
sudo pcs cluster auth -u hacluster -p password rabbit1 rabbit2 rabbit3
#create the cluster
sudo pcs cluster setup --force --name quqicluster rabbit1 rabbit2 rabbit3
sudo pcs cluster enable --all
sudo pcs cluster start --all
sudo pcs status
sudo pcs status corosync
sudo pcs cluster status
sudo corosync-cfgtool -s
sudo corosync-cmapctl | grep members
sudo pcs property set stonith-enabled=false #Resource start-up disabled since no STONITH resources have been defined
sudo pcs property set no-quorum-policy=ignore
sudo crm_verify -L -V
sudo pcs property set pe-warn-series-max=1000 pe-input-series-max=1000 pe-error-series-max=1000
sudo pcs property set cluster-recheck-interval=5
sudo pcs resource delete rabbitmq-clone --force
sudo pcs resource create rabbitmq-clone ocf:heartbeat:rabbitmq-cluster --clone ordered=true interleave=true
sudo pcs resource update rabbitmq-clone op monitor interval=30 timeout=120
sudo pcs resource update rabbitmq-clone op start interval=0 timeout=100
sudo pcs resource update rabbitmq-clone op stop interval=0 timeout=90
#sudo pcs resource update rabbitmq-clone meta ordered=true interleave=true
sudo pcs resource show rabbitmq-clone
sudo pcs resource
sudo pcs config show
sudo pcs status
sudo rabbitmqctl cluster_status
sudo crm resource restart rabbitmq-clone
sudo crm resource clean rabbitmq-clone
sudo rabbitmqctl list_users
sudo rabbitmqctl cluster_status
#or use crm command instead of pcs
wget https://raw.githubusercontent.com/ClusterLabs/crmsh/master/contrib/bash_completion.sh
sudo cp bash_completion.sh /etc/bash_completion.d/crmsh
source /etc/bash_completion.d/crmsh
sudo crm configure show
sudo crm ra list ocf |grep rabbit
sudo crm ra meta rabbitmq-cluster
sudo crm configure property stonith-enabled=false
sudo crm configure primitive rabbitmq ocf:heartbeat:rabbitmq-cluster \
op monitor interval=30 timeout=120 \
op start interval=0 timeout=100 \
op stop interval=0 timeout=90
sudo crm configure clone rabbitmq-clone rabbitmq \
meta ordered=true interleave=true
#configure a VIP 10.5.0.122 by using pcs/corosync
sudo pcs resource create rabbitvip ocf:heartbeat:IPaddr2 ip=10.5.0.122 cidr_netmask=24 nic=ens3 op monitor interval=30s
#or configure a VIP 10.5.0.122 by using haproxy (not test)
sudo apt install -y haproxy
cat << EOF | sudo tee -a /etc/haproxy/haproxy.cfg
listen httpd
bind 10.5.0.122:5672
balance source
option tcpka
option httpchk
option tcplog
server juju-3f992e-ha-0 10.5.0.12:5672 check inter 2000 rise 2 fall 5
server juju-3f992e-ha-1 10.5.0.7:5672 check inter 2000 rise 2 fall 5
server juju-3f992e-ha-2 10.5.0.10:5672 check inter 2000 rise 2 fall 5
EOF
sudo pcs resource create lb-haproxy systemd:haproxy --clone
sudo pcs constraint colocation add lb-haproxy-clone with vip #make haproxy and vip in the same nodes
sudo pcs constraint order start vip then lb-haproxy-clone kind=Optional #make first start vip before starting haproxy
sudo pcs resource
standby test
ubuntu@rabbit1:~$ sudo crm node standby rabbit1
ubuntu@rabbit1:~$ sudo crm status
Stack: corosync
Current DC: rabbit1 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Fri Feb 28 03:00:32 2020
Last change: Fri Feb 28 02:58:53 2020 by root via crm_attribute on rabbit1
3 nodes configured
4 resources configured
Node rabbit1: standby
Online: [ rabbit2 rabbit3 ]
Full list of resources:
Clone Set: rabbitmq-clone-clone [rabbitmq-clone]
Started: [ rabbit2 rabbit3 ]
Stopped: [ rabbit1 ]
rabbitvip (ocf::heartbeat:IPaddr2): Started rabbit2
ubuntu@rabbit2:~$ sudo rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2
[{nodes,[{disc,[rabbit@rabbit2]}]},
{running_nodes,[rabbit@rabbit2]},
{cluster_name,<<"rabbit@rabbit2">>},
{partitions,[]},
{alarms,[{rabbit@rabbit2,[]}]}]
debug
read related code and patches according to the log about resource-agent rabbitmq-cluster
grep -r 'rabbitmq-cluster' var/log/cluster/corosync.log
#https://github.com/ClusterLabs/resource-agents/blob/v3.9.7/heartbeat/rabbitmq-cluster
git log v3.9.7..master --oneline --no-merges heartbeat/rabbitmq-cluster
some logs
ubuntu@rabbit1:~$ sudo rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit1
[{nodes,[{disc,[rabbit@rabbit1]}]},
{running_nodes,[rabbit@rabbit1]},
{cluster_name,<<"rabbit@rabbit1">>},
{partitions,[]},
{alarms,[{rabbit@rabbit1,[]}]}]
ubuntu@rabbit1:~$ sudo pcs resource show rabbitmq-clone
Resource: rabbitmq-clone (class=ocf provider=heartbeat type=rabbitmq-cluster)
Operations: monitor interval=10 timeout=40 (rabbitmq-clone-monitor-interval-10)
start interval=0s timeout=100 (rabbitmq-clone-start-interval-0s)
stop interval=0s timeout=90 (rabbitmq-clone-stop-interval-0s)
ubuntu@rabbit1:~$
ubuntu@rabbit1:~$ sudo crm status
Stack: corosync
Current DC: rabbit1 (version 1.1.18-2b07d5c5a9) - partition with quorum
Last updated: Fri Feb 28 02:52:34 2020
Last change: Fri Feb 28 02:49:43 2020 by root via cibadmin on rabbit1
3 nodes configured
4 resources configured
Online: [ rabbit1 rabbit2 rabbit3 ]
Full list of resources:
Clone Set: rabbitmq-clone-clone [rabbitmq-clone]
Started: [ rabbit1 rabbit2 rabbit3 ]
rabbitvip (ocf::heartbeat:IPaddr2): Started rabbit1
ubuntu@rabbit3:~$ sudo pcs cluster auth -u hacluster -p password rabbit1 rabbit2 rabbit3
rabbit3: Authorized
rabbit2: Authorized
rabbit1: Authorized
ubuntu@rabbit3:~$ sudo pcs cluster setup --force --name quqicluster rabbit1 rabbit2 rabbit3
Destroying cluster on nodes: rabbit1, rabbit2, rabbit3...
rabbit1: Stopping Cluster (pacemaker)...
rabbit3: Stopping Cluster (pacemaker)...
rabbit2: Stopping Cluster (pacemaker)...
rabbit3: Successfully destroyed cluster
rabbit1: Successfully destroyed cluster
rabbit2: Successfully destroyed cluster
Sending 'pacemaker_remote authkey' to 'rabbit1', 'rabbit2', 'rabbit3'
rabbit3: successful distribution of the file 'pacemaker_remote authkey'
rabbit1: successful distribution of the file 'pacemaker_remote authkey'
rabbit2: successful distribution of the file 'pacemaker_remote authkey'
Sending cluster config files to the nodes...
rabbit1: Succeeded
rabbit2: Succeeded
rabbit3: Succeeded
Synchronizing pcsd certificates on nodes rabbit1, rabbit2, rabbit3...
rabbit3: Success
rabbit2: Success
rabbit1: Success
Restarting pcsd on the nodes in order to reload the certificates...
rabbit3: Success
rabbit1: Success
rabbit2: Success
ubuntu@rabbit3:~$ sudo pcs cluster enable --all
rabbit1: Cluster Enabled
rabbit2: Cluster Enabled
rabbit3: Cluster Enabled
ubuntu@rabbit3:~$ sudo pcs cluster start --all
rabbit1: Starting Cluster...
rabbit2: Starting Cluster...
rabbit3: Starting Cluster...
ubuntu@juju-3f992e-ha-2:~$ sudo pcs resource show rabbitmq-clone
Clone: rabbitmq-clone
Meta Attrs: ordered=true interleave=true
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Operations: monitor interval=30 timeout=120 (rabbitmq-monitor-30)
start interval=0 timeout=100 (rabbitmq-start-0)
stop interval=0 timeout=90 (rabbitmq-stop-0)
Clone: rabbitmq-clone
Meta Attrs: interleave=true ordered=true
Resource: rabbitmq (class=ocf provider=heartbeat type=rabbitmq-cluster)
Attributes: set_policy="ha-all ^(?!amq\.).* {"ha-mode":"all"}"
Operations: start interval=0s timeout=100 (rabbitmq-start-interval-0s)
stop interval=0s timeout=90 (rabbitmq-stop-interval-0s)
monitor interval=30 timeout=120 (rabbitmq-monitor-interval-30)
[1] https://www.andylouse.net/openstack-queens-cluster-mode-system-optimize-and-configuration
[2] https://www.centos.bz/2019/06/linux%E4%B8%8B%E6%90%AD%E5%BB%BAhaproxypacemakercorosync%E9%9B%86%E7%BE%A4/
[3] https://www.sunmite.com/openstack/RabbitMQ-High-availability-with-Pacemaker.html