How to Configure NGINX High Availability Cluster Using Pacemaker on Ubuntu 16.04_pacemaker high availability cluster manager (16min-CSDN博客

转载至

Introduction

High availability is a term that describes a website or applications that are durable and likely to operate continuously without failure for a long time. High availability provides a number of fail-safes, and aims for a 99% uptime. Highly available systems are made from several components and they can be scaled horizontally when needed, thus improving their ability to serve content.

Pacemaker is an advanced, scalable high-availability cluster resource manager that provides maximum availability of the cluster resources by doing failover of resources between the cluster nodes. Pacemaker uses Corosync for heartbeat and internal communication among cluster components. Pacemaker manages all cluster resources and achieves maximum availability by detecting and recovering from node- and resource-level failures by making use of the messaging and membership capabilities provided by Corosync.

In this tutorial, we will explain the installation and configuration of two Node NGINX Web Server Clustering using Pacemaker on Ubuntu 16.04 server.

Requirements

Two fresh Alibaba Cloud instance with Ubuntu 16.04 server installed.
A static IP address. 192.168.0.102 is configured on the first instance and 192.168.0.103 is configured on the second instance. We will use floating IP Address 192.168.0.104.
A Root password set up on both instances.

Launch Alibaba Cloud ECS Instance

First, login to your Alibaba Cloud ECS Console and create a new ECS instance , choosing Ubuntu 16.04 as the operating system with at least 2GB RAM. Connect to your ECS instance and log in as the root user.

Once you are logged into your Ubuntu 16.04 instance, run the following command to update your base system with the latest available packages.

apt-get update -y

Getting Started

Before starting, you will need to configure hosts file on each server, so each server can communicate to the other servers with the hostname of the server.

You can do this by editing /etc/hosts file on both servers.

nano /etc/hosts

Add the following lines:

192.168.0.102 node1
192.168.0.103 node2

Save and close the file, when you are finished. Next, test hostname resolution by pinging the other server using hostname:

ping node1
ping node2

Install and Configure NGINX

Before setting up the High Availability web server, you will need to install and configure NGINX on each of the nodes. You can install NGINX by running the following command:

apt-get install nginx -y

Once NGINX is installed, start the NGINX service and enable it to start on boot time by running the following command on each of the nodes:

systemctl start nginx
systemctl enable nginx

Next, create default index.html page of NGINX on each node:

On Node1, open the index.html page:

nano /var/www/html/index.html

Remove all the lines and add the following lines:

<h1>
Nginx Cluster ::: Node1
</h1>

Save and close the file when you are finished.

On Node2, open the index.html page:

nano /var/www/html/index.html

Remove all the lines and add the following lines:

<h1>
Nginx Cluster ::: Node2
</h1>

Save and close the file when you are finished.

Now, stop the NGINX service on each node:

systemctl stop nginx

Install Pacemaker, Corosync, and Crmsh

Next, you will need to install Pacemaker, Corosync, and Crmsh on each node. By default, all the packages are available in Ubuntu 16.04 default repository. So you can install all of them with the following command:

apt-get install pacemaker corosync crmsh -y

Once the installation is completed, stop Pacemaker and Corosync services with the following command:

systemctl stop corosync
systemctl stop pacemaker

Configure Corosync

Next, you will need to configure Corosync on Node1 and generate the Corosync key for the cluster authentication.

Before starting, you will need to install haveged to generate random numbers for the Corosync key. You can install it with the following command:

apt-get install haveged -y

Next, generate Corosync key by running the following command:

corosync-keygen

You should see the following output:

Corosync Cluster Engine Authentication key generator.
Gathering 1024 bits for key from /dev/random.
Press keys on your keyboard to generate entropy.
Press keys on your keyboard to generate entropy (bits = 920).
Writing corosync key to /etc/corosync/authkey.

You can also see the generated key using the following command:

ls -l /etc/corosync/

Output:

-r-------- 1 root root  128 Feb 28 20:39 authkey
-rw-r--r-- 1 root root 3929 Oct 21  2015 corosync.conf

Next, change the directory to /etc/corosync and remove default configuration file:

cd /etc/corosync/
rm -rf corosync.conf

Next, create a new corosync.conf file as shown below:

nano corosync.conf

Add the following lines:

    totem {
      version: 2
      cluster_name: lbcluster
      transport: udpu
      interface {
        ringnumber: 0
        bindnetaddr: 192.168.0.102
        broadcast: yes
        mcastport: 5405
      }
    }

    quorum {
      provider: corosync_votequorum
      two_node: 1
    }

    nodelist {
      node {
        ring0_addr: 192.168.0.102
        name: primary
        nodeid: 1
      }
      node {
        ring0_addr: 192.168.0.103
        name: secondary
        nodeid: 2
      }
    }

    logging {
      to_logfile: yes
      logfile: /var/log/corosync/corosync.log
      to_syslog: yes
      timestamp: on
    }

service {
  name: pacemaker
  ver: 1
}

Save and close the file when you are finished.

Next, copy the Corosync authentication key and the configuration file from Node1 to Node2 with the following command:

scp /etc/corosync/* root@192.168.0.103:/etc/corosync/

Start Cluster Service

Now, start Pacemaker and Corosync service on each of the nodes and enable them to start on boot time with the following command:

systemctl start corosync
systemctl enable corosync
systemctl start pacemaker
systemctl enable pacemaker

Once both services have been started, check the status of the service on both nodes with the following command:

crm status

If everything is fine, you should see the following output:

Last updated: Wed Feb 28 21:13:27 2018Last change: Wed Feb 28 21:12:44 2018 by hacluster via crmd on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 0 resources configured

Online: [ primary secondary ]

Full list of resources:

You can also check the Corosync members with the following command:

corosync-cmapctl | grep members

You should see the IP address of both nodes in the following output:

runtime.totem.pg.mrp.srp.members.1.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.1.ip (str) = r(0) ip(192.168.0.102) 
runtime.totem.pg.mrp.srp.members.1.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.1.status (str) = joined
runtime.totem.pg.mrp.srp.members.2.config_version (u64) = 0
runtime.totem.pg.mrp.srp.members.2.ip (str) = r(0) ip(192.168.0.103) 
runtime.totem.pg.mrp.srp.members.2.join_count (u32) = 1
runtime.totem.pg.mrp.srp.members.2.status (str) = joined

Configure Cluster

Now, we are ready to create and configure Pacemaker. Here, we will run all Pacemaker commands on Primary Node (Node1), as it automatically synchronizes all cluster-related changes across all member nodes.

Next, you will also need to disable STONITH mode. STONITH is a mode that can be used to remove faulty nodes. Here, we are setting up a two node cluster, so we don't need STONITH mode.

You can disable it with the following command:

crm configure property stonith-enabled=false
crm configure property no-quorum-policy=ignore

Now, verify your STONITH status and the quorum policy with the following command:

crm configure show

You should see the following output:

node 1: primary
node 2: secondary
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.14-70404b0 \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore

Pacemaker is now running and configured. Next, you will need to create some new resources for the cluster, a Virtual IP for the floating IP and a web server for NGINX service.

You can create a new Virtual IP resource for the floating IP using the CRM command as shown below:

crm configure primitive virtual_ip ocf:heartbeat:IPaddr2 params ip="192.168.0.104" cidr_netmask="32" op monitor interval="10s" meta migration-threshold="10"

Next, create a web server resource using the following command:

crm configure primitive webserver ocf:heartbeat:nginx configfile=/etc/nginx/nginx.conf op start timeout="40s" interval="0" op stop timeout="60s" interval="0" op monitor interval="10s" timeout="60s" meta migration-threshold="10"

Next, check the status of the new resource with the following command:

crm resource status

You should see the following output:

virtual_ip(ocf::heartbeat:IPaddr2):Started
 webserver(ocf::heartbeat:nginx):Started

Next, you will also need to add a group for the new configuration of the Failover IP service. Now, add the virtual_ip and web server resources to a new group named hakase_balancing by running the following command:

crm configure group hakase_balancing virtual_ip webserver

Next, check the status of the new resource with the following command:

crm resource show

You should see the following output:

Resource Group: hakase_balancing
     virtual_ip(ocf::heartbeat:IPaddr2):Started
     webserver(ocf::heartbeat:nginx):Started

Test High Availability

The cluster configuration is now completed, and it's time to check the status of node and cluster.

You can do this with the following command:

crm status

You should see the following output:

Last updated: Wed Feb 28 21:35:21 2018Last change: Wed Feb 28 21:34:50 2018 by root via cibadmin on primary
Stack: corosync
Current DC: primary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured

Online: [ primary secondary ]

Full list of resources:

 Resource Group: hakase_balancing
     virtual_ip(ocf::heartbeat:IPaddr2):Started primary
     webserver(ocf::heartbeat:nginx):Started primary

You have now two nodes [primary secondary] with status online.

Now, from the remote machine, open your web browser and type the URL http://192.168.0.104 (Floating IP). You should see the Node1 page:

Node 1 Image

Next, stop the cluster service on Node1 with the following command:

crm cluster stop

Now, check the cluster status on the Node2 with the following command:

crm status

You should see that primary node is offline and secondary node is online as shown below:

Last updated: Wed Feb 28 22:00:59 2018Last change: Wed Feb 28 21:46:57 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition with quorum
2 nodes and 2 resources configured

Online: [ secondary ]
OFFLINE: [ primary ]

Full list of resources:

 Resource Group: hakase_balancing
     virtual_ip(ocf::heartbeat:IPaddr2):Started secondary
     webserver(ocf::heartbeat:nginx):Started secondary

Now, from the remote machine, open your web browser and type the URL http://192.168.0.104 (Floating IP). You should see the Node2 page:

Node 2 Image

Troubleshoot Cluster

If your High Availability setup is not working as expected. You can use some useful troubleshooting command to find the exact reason.

The crm_mon is a very useful tool for viewing the real-time status of your nodes and resources:

crm_mon

You should see the following output:

Last updated: Wed Feb 28 23:46:46 2018          Last change: Wed Feb 28 22:00:43 2018 by root via cibadmin on primary
Stack: corosync
Current DC: secondary (version 1.1.14-70404b0) - partition WITHOUT quorum
2 nodes and 2 resources configured

Online: [ secondary ]
OFFLINE: [ primary ]

 Resource Group: hakase_balancing
     virtual_ip (ocf::heartbeat:IPaddr2):       Started secondary
     webserver  (ocf::heartbeat:nginx): Started secondary

You can see your cluster configuration using the following command:

crm configure show

Output:

node 1: primary
node 2: secondary
primitive virtual_ip IPaddr2 \
params ip=192.168.0.104 cidr_netmask=32 \
op monitor interval=10s \
meta migration-threshold=10
primitive webserver nginx \
params configfile="/etc/nginx/nginx.conf" \
op start timeout=40s interval=0 \
op stop timeout=60s interval=0 \
op monitor interval=10s timeout=60s \
meta migration-threshold=10
group hakase_balancing virtual_ip webserver
property cib-bootstrap-options: \
have-watchdog=false \
dc-version=1.1.14-70404b0 \
cluster-infrastructure=corosync \
cluster-name=debian \
stonith-enabled=false \
no-quorum-policy=ignore

You can also troubleshoot cluster by looking the Corosync logs using the following command:

tail -f /var/log/corosync/corosync.log

Congratulations! You now have a basic NGINX High Availability server setup using Corosync and Pacemaker on Ubuntu 16.04 server. For more information refer the official Pacemaker doc.