参考资料:

http://my.oschina.net/guol/blog/182491

http://18567.blog.51cto.com/8567/655043

http://www.qixing318.com/article/by-keepalived-redis-double-machine.html

背景

目前,Redis集群的官方方案还处在开发测试中,未集成到稳定版中。且目前官方开发中的Redis Cluster提供的功能尚不完善(可参考官方网站或http://www.redisdoc.com/en/latest/topic/cluster-spec.html),在生产环境中不推荐使用。通过调研发现市面上要实现采用单一的IP来访问,大多采用keepalived实现redis的双机热备作为过渡方案。

环境部署

环境介绍:
Master: 192.168.1.218 redis,keepalived
Slave: 192.168.1.219 redis,keepalived
Virtural IP Address (VIP): 192.168.1.220

以下Master表示192.168.1.218这台主机,Slave表示192.168.1.219这台主机;master/slave表示keepalived/redis的role。(首字母大小写的区别)

设计思路:

通过keepalived的自定义脚本功能监控本机的redis服务状态,当监控脚本检测到redis服务出现异常时,则将本机的keepalived关闭,同时这会导致master/backup角色的变化,而keepalived在角色变化时也会触发一些机制执行相关脚本,这就为我们改变redis的master/slave状态提供了机会,这样做的目的是为了是redis的master/slave的数据保持一致。

在keepalived+redis的使用过程中有四种情况:

1 一种是keepalived挂了,同时redis也挂了,这样的话直接VIP飘走之后,是不需要进行redis数据同步的,因为redis挂了,你也无法去master上同步,不过会损失已经写在master上却还没同步到slave上面的这部分数据。

2 另一种是keepalived挂了,redis没挂,这时候VIP飘走后,redis的master/slave还是老的对应关系,如果不变化的话会把数据写入redis slave中,从而不会同步到master上去,这就要借助监控脚本反转redis的master/slave关系。这时候就要预留一点时间进行数据同步,然后反转master/slave。

3 还有一种是keepalived没挂,redis挂了,这时候根据监控脚本会检测到redis挂了,将本地的keepalived关闭,将虚拟IP漂移到另外一台服务器上。由另外一台备机承接redis业务。

4 随后一种是keepalived没挂,redis也没挂,什么都不用操作。

本文的实验环境四种情况都适合,第一种是不需要同步数据的,脚本会默认去同步数据,但是其实是不会成功的。脚本主要是用来处理第二和第三种情况的。

实施步骤:

-------------------创建专用用户-------------------

useradd -g develop redisadmin
echo ******|passwd --stdin redisadmin

说明:以下部署过程都是在root(或具备sudo权限的账号)账户下进行。

-------------------安装配置redis-------------------

在Master和Slave上进行如下操作:

1.下载redis源码

cd

wget http://download.redis.io/releases/redis-2.8.4.tar.gz

2.安装redis

tar -zxvf redis-2.8.4.tar.gz

cd redis-2.8.4

#reds的安装可以不用执行configure

make

#测试

make test

####在速度较慢的机器上执行make test可能出现下列错误,无影响

#*** [err]: Test replication partial resync: no backlog in tests/integration/replication-psync.tcl

3.配置redis

#创建redis主目录

mkdir -p /usr/local/redis/{bin,conf,logs}

#将可执行文件拷贝到相应的目录

find src/ \( -perm -0001 \) -type f -exec cp -a -R -p {} /usr/local/redis/bin \;

#创建redis启动脚本

vi /usr/local/redis/redis-start.sh

####以下是master上的配置,slave上的配置只需修改对应的IP地址。

#!/bin/bash
RPATH=/usr/local/redis
KPATH=/usr/local/keepalived
REDISCLI=$RPATH/bin/redis-cli
LOGFILE=$KPATH/logs/redis-state.log
LOCALIP=192.168.1.218
REMOTEIP=192.168.1.219

$RPATH/bin/redis-server $RPATH/conf/redis.conf
if [ "$?" == "0" ];then
    echo "[INFO]`date +%F/%H:%M:%S` :$LOCALIP redis start successful." >> $LOGFILE
else
    echo "[ERROR]`date +%F/%H:%M:%S` :$LOCALIP redis start error." >> $LOGFILE
fi
#创建redis关闭脚本
vi /usr/local/redis/redis-stop.sh
####以下为master上的配置,slave上的配置只需修改对应的IP地址。
#!/bin/bash
RPATH=/usr/local/redis
KPATH=/usr/local/keepalived
LOGFILE=$KPATH/logs/redis-state.log
LOCALIP=192.168.1.218
REMOTEIP=192.168.219
kill -9 `ps -ef|grep '/bin/redis-server'|grep -v grep|awk  '{print $2}'`
if [ "$?" == "0" ];then
        echo "[INFO]`date +%F/%H:%M:%S` :$LOCALIP redis shutdown completed!" >> $LOGFILE
else
        echo "[INFO]`date +%F/%H:%M:%S` :$LOCALIP redis is not started." >> $LOGFILE
fi
#创建redis配置文件
cp -a -R -p redis.conf /usr/local/redis/conf/redis.conf
#修改redis.conf对应配置项:
vi /usr/local/redis/conf/redis.conf
#以下为改动部分,其他的按照实际生产环境进行调整
daemonize yes
pidfile /usr/local/redis/redis.pid
#bind 192.168.1.218    #暂时注释,方便测试
timeout 300
loglevel verbose       #实际生产环境可用notice,此处是为了详细查看各种输出细节
logfile "/usr/local/redis/logs/redis.log"
dir /usr/local/redis/
appendonly yes

#修改redis的属主和权限

chmod -R 750 /usr/local/redis/

chown -R redisadmin:develop /usr/local/redis/

-------------------安装配置keepalived-------------------

1.下载keepalived最新源码包1.2.10

wget http://www.keepalived.org/software/keepalived-1.2.10.tar.gz
2.安装keepalived

需要先安装以下依赖包: make gcc libpopt-dev libnl-dev libcurl4-openssl-dev popt openssl
cd

tar zxvf keepalived-1.2.10.tar.gz

cd keepalived-1.2.10

注意:先按照下列出错解决版本修改后再执行后面步骤:

./configure --prefix=/usr/local/keepalived

make && make install

安装出错(在版本1.2.9上不会出错):

image_thumb8

解决办法:

vi ~/keepalived-1.2.10/keepalived/libipvs-2.6/libipvs.c

按照下图的方法修正(新增57行,注释82行):

image_thumb5

3.配置keepalived

cd /usr/local/keepalived

#将keepalived.conf备份:
mv /usr/local/keepalived/etc/keepalived/keepalived.conf /usr/local/keepalived/etc/keepalived/keepalived.conf-bak

#在Master:192.168.1.218上创建如下配置文件(可根据实际情况调整):

vi /usr/local/keepalived/etc/keepalived/keepalived.conf

! Configuration File for keepalived

#global_defs {
#   notification_email {
#   }
#   notification_email_from Alexandre.Cassen@firewall.loc
#    router_id  node3
#}

vrrp_script chk_redis {
    script "/usr/local/keepalived/etc/keepalived/scripts/redis_check.sh"
#如果脚本执行结果非0,并且weight配置的值小于0,则优先级相应的减少;如果脚本执行结果为0,并且weight配置的值大于0,则优先级相应的增加;其他情况,维持原本prority的优先级。
#    weight -20
    interval 10                 #设置脚本执行的频率。10秒一次
}

vrrp_instance VI_1 {
    state BACKUP                #要设置恢复时不抢占,需要将主,从服务器的此项都设置成BACKUP,nopreempt才会生效。
    #state MASTER
    interface eth3
    virtual_router_id 51
    priority 100
    nopreempt                   #设置不抢占。在priority值比较高的服务器上设置即可。priority值比较低的服务器启动时,发现值高的服务器为master,自动不抢占。
    #advert_int的作用是巡检的次数。keepalived默认是在启动完成后3秒向state:MASTER切换。若此处设置成2,则是2*3=6秒后才开启切换。
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass redis
    }
    virtual_ipaddress {
        192.168.1.220
    }
    track_script {
        chk_redis
    }
    notify_master /usr/local/keepalived/etc/keepalived/scripts/master.sh
    notify_backup /usr/local/keepalived/etc/keepalived/scripts/backup.sh
    notify_fault  /usr/local/keepalived/etc/keepalived/scripts/fault.sh
    notify_stop   /usr/local/keepalived/etc/keepalived/scripts/stop.sh
}
#在Slave:192.168.1.219上创建如下配置文件(可根据实际情况调整):
vi /usr/local/keepalived/etc/keepalived/keepalived.conf
! Configuration File for keepalived

#global_defs {
#   notification_email {
#   }
#   notification_email_from Alexandre.Cassen@firewall.loc
#    router_id  node3
#}

vrrp_script chk_redis {
    script "/usr/local/keepalived/etc/keepalived/scripts/redis_check.sh"
#如果脚本执行结果非0,并且weight配置的值小于0,则优先级相应的减少;如果脚本执行结果为0,并且weight配置的值大于0,则优先级相应的增加;其他情况,维持原本prority的优先级。
#    weight -20
    interval 10                 #设置脚本执行的频率。10秒一次
}

vrrp_instance VI_1 {
    state BACKUP
    interface eth5
    garp_master_delay 10
    virtual_router_id 51
    priority 90
    #nopreempt 
    #advert_int的作用是巡检的次数。keepalived默认是在启动完成后3秒向state:MASTER切换。若此处设置成2,则是2*3=6秒后才开启切换。                  
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass redis
    }
    virtual_ipaddress {
        192.168.1.220
    }
    track_script {
        chk_redis
    }
    notify_master /usr/local/keepalived/etc/keepalived/scripts/master.sh      #当keepalived切换成master时,会触发执行master.sh       
    notify_backup /usr/local/keepalived/etc/keepalived/scripts/backup.sh      #当keepalived切换成slave时,会触发执行slave.sh       
    notify_fault  /usr/local/keepalived/etc/keepalived/scripts/fault.sh       #当keepalived出错时,会触发执行fault.sh       
    notify_stop   /usr/local/keepalived/etc/keepalived/scripts/stop.sh        #当keepalived停止时,会触发执行stop.sh
}
#指定keepalived的日志文件
vi /usr/local/keepalived/etc/sysconfig/keepalived
#KEEPALIVED_OPTIONS="-D"
KEEPALIVED_OPTIONS="-D -d -S 0"
#redhat6.0以下服务器修改/etc/syslog.conf,redhat6.0以上(包括6.0)服务器修改/etc/rsyslog.conf,新增以下内容:
#Save keepalived message to keepalived.log
local0.*                                                /usr/local/keepalived/logs/keepalived.log
#重启日志服务
service rsyslog restart
#将文件拷贝到相应的位置

cp -r * /

#在Master和Slave上创建监控Redis的相关脚本脚本,以下脚本都是master上的配置,slave上只需修改相应的IP地址。
mkdir /usr/local/keepalived/etc/keepalived/scripts

vi /usr/local/keepalived/etc/keepalived/scripts/redis_check.sh

#!/bin/bash
KPATH=/usr/local/keepalived
RPATH=/usr/local/redis
REDISCLI=$RPATH/bin/redis-cli
LOGFILE=$KPATH/logs/redis-state.log
LOCALIP="192.168.1.218"
REMOTEIP="192.168.1.219"
PORT="6379"
PID=$$

ALIVE=`$REDISCLI PING`
if [ "$ALIVE" == "PONG" ]; then
  echo "[INFO]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis is health." >> $LOGFILE
  exit 0
else
  echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis is not health." >> $LOGFILE
  #当发现本地redis无法连接时,等待一秒后再进行一次检查。若恢复,则提示;若仍无法连接,则关闭本地keepalived,将虚拟ip漂移到另外一台服务器上。
  sleep 1
  ALIVE1=`$REDISCLI PING`
  if [ "$ALIVE1" == "PONG" ];then 
    echo "[NOTICE]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis become health." >> $LOGFILE
    exit 0
  else
    echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP local redis is error." >> $LOGFILE
    echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP shutdown local keepalived." >> $LOGFILE
    /etc/init.d/keepalived stop
    if [ "$?" != "0" ];then
      echo "[ERROR]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP keepalived shutdown error." >> $LOGFILE
    else
      echo "[INFO]`date +'%Y-%m-%d:%H:%M:%S'` :$LOCALIP keepalived shutdown completed." >> $LOGFILE
    fi
  exit 1 
  fi
fi
#######################################################################

vi /usr/local/keepalived/etc/keepalived/scripts/master.sh

#!/bin/bash
KPATH=/usr/local/keepalived
RPATH=/usr/local/redis
REDISCLI=$RPATH/bin/redis-cli
LOGFILE=$KPATH/logs/redis-state.log
LOCALIP="192.168.1.218"
REMOTEIP="192.168.1.219"
PORT="6379"
PID=$$


#当此服务器的keepalived恢复成master时,即虚拟IP切换到本机时,将本机的redis切换成role:master

echo "[WARM]-----------keepalived change to master,change local redis to master---------------" >> $LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave]" >> $LOGFILE
#先切换成role:slave
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF $REMOTEIP $PORT'" >> $LOGFILE
$REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE 2>&1
#同步数据
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] wait 10 sec for data sync from old master" >> $LOGFILE
sleep 10
#等待10秒(此时间要根据实际业务需要进行调整),待数据同步完,再切换成role:master
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] data rsync from old mater ok..." >> $LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Run slaveof no one,close master/slave" >> $LOGFILE
$REDISCLI SLAVEOF NO ONE >> $LOGFILE 2>&1
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait other slave connect...." >> $LOGFILE
echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE

#######################################################################

vi /usr/local/keepalived/etc/keepalived/scripts/backup.sh

#!/bin/bash
KPATH=/usr/local/keepalived
RPATH=/usr/local/redis
REDISCLI=$RPATH/bin/redis-cli
LOGFILE=$KPATH/logs/redis-state.log
LOCALIP="192.168.1.218"
REMOTEIP="192.168.1.219"
PORT="6379"
PID=$$


#当此服务器的keepalived恢复成slave时,即虚拟IP切换到其他服务器时,将本机redis切换成role:slave

echo "[WARM]------------keepalived change to slave,change local redis to slave----------------" >> $LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master]" >> $LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Being slave state..." >> $LOGFILE 2>&1
#切换时,等待10秒,让对方同步数据(此时间要根据实际业务需要进行调整)
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait 10 sec for data sync from old master" >> $LOGFILE
sleep 10
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] data rsync from old mater ok..." >> $LOGFILE
#等数据同步完,再切换成role:slave
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF  $REMOTEIP $PORT'" >> $LOGFILE
$REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE  2>&1
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] slave connect to $REMOTEIP $PORT ok..." >> $LOGFILE
echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE  

#######################################################################

vi /usr/local/keepalived/etc/keepalived/scripts/stop.sh

#!/bin/sh
KPATH=/usr/local/keepalived
RPATH=/usr/local/redis
REDISCLI=$RPATH/bin/redis-cli
LOGFILE=$KPATH/logs/redis-state.log
LOCALIP="192.168.1.218"
REMOTEIP="192.168.1.219"
PORT="6379"
PID=$$


#当主服务器的keepalived停止时,将本机redis切换成role:slave

echo "[ERROR]-----------------keepalived stop,change local redis to slave---------------------" >> $LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master]" >> $LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Being slave state..." >> $LOGFILE 2>&1
#切换时,等待10秒,让对方同步数据(此时间要根据实际业务需要进行调整)
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait 10 sec for data sync from old master" >> $LOGFILE
sleep 10
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] data rsync from old mater ok..." >> $LOGFILE
#等数据同步完,再切换成role:slave
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF  $REMOTEIP $PORT'" >> $LOGFILE
$REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE  2>&1
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] slave connect to $REMOTEIP $PORT ok..." >> $LOGFILE
echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE

#######################################################################

vi /usr/local/keepalived/etc/keepalived/scripts/fault.sh

#!/bin/bash
KPATH=/usr/local/keepalived
RPATH=/usr/local/redis
REDISCLI=$RPATH/bin/redis-cli
LOGFILE=$KPATH/logs/redis-state.log
LOCALIP="192.168.1.218"
REMOTEIP="192.168.1.219"
PORT="6379"
PID=$$


#当此服务器的keepalived出错时,将本机redis切换成role:slave
echo "[ERROR]---------------keepalived is fault,change local redis to slave-------------------" >> $LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master]" >>$LOGFILE
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] Being slave state..." >> $LOGFILE 2>&1
#切换时,等待10秒,让对方同步数据(此时间要根据实际业务需要进行调整)
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] wait 10 sec for data sync from old master" >> $LOGFILE
sleep 10
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[master] data rsync from old mater ok..." >> $LOGFILE
#等数据同步完,再切换成role:slave
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] Run 'SLAVEOF  $REMOTEIP $PORT'" >> $LOGFILE
$REDISCLI SLAVEOF $REMOTEIP $PORT >> $LOGFILE  2>&1
echo "`date +'%Y-%m-%d:%H:%M:%S'`|$PID|state:[slave] slave connect to $REMOTEIP $PORT ok..." >> $LOGFILE
echo "-------------------------------------complete!------------------------------------------" >> $LOGFILE
#######################################################################
修改监控脚本的权限:
chmod -R 750 /usr/local/keepalived/etc/keepalived/scripts/

 

系统测试

注意:

(1).在keepalived.conf配置文件中,将keepalived双机 都设置成BACKUP.同时在218上设置了nopreempt,即恢复时不抢占。而规划中是将218作为master。所以在启动过程中要遵循以下顺序:先启动218上的keepalived,等待数据同步完成后,再启动219上的keepalived.

(2).在keepalived的巡检脚本redis_check.sh中加入了状态切换的监控脚本。在master.sh中设置了当keepalived切换成master,会先将redis切换成slave进行同步数据,再切换回master。所以在启动keepalived之前,要保证Master和Slave上redis的数据是一致的,这样先启动redis的master那台的keepalived,虽然redis master会连接到redis slave同步数据,但是两边数据在刚开始的时候是一致的,并不会产生什么问题。

(3).在实际生产环境中需要修改防火墙策略,开放相应的端口。在此直接先将防火墙关闭:service iptables stop。

以下为各种测试场景和输出结果:

-----------------------------------------初始环境--------------------------------------------------

设定一下初始环境:

----启动218和219的redis: /usr/local/redis/redis-start.sh

image_thumb30

----启动218的keepalived: service keepalived start;先不启动219的keepalived.

在218上执行tail –f /usr/local/keepalived/logs/keepalived.log,可看到keepavlived切换成master state(配置文件中是设置state:backup),且绑定了VIP。

image_thumb35

查看218Master:redis的日志,可以看到redis切换的过程如下:

image_thumb[3]

----启动219Slave的keepalived,并查看redis的日志,可以看到redis的状态变成了slave:

image_thumb[6]

-----------------------------------------初始环境--------------------------------------------------

-----------------------------------------设计思路3-------------------------------------------------

----模拟设计思路3,将218Master的redis进程kill掉:

此时218的keepalived会被停止,如下图:

image_thumb[25]

219的keepalived会正确切换成State:Master,VIP完成漂移,如下图:

image_thumb[14]

218的redis监控日志如下,

image_thumb[20]

219的redis监控日志如下,显示了219已切换成master,保证了业务(当然此处218在内存中未写入文件的数据会丢失):

image_thumb[23]

----模拟218从故障中恢复:

因为在发现故障时,会将218上的keepalived关闭,因此恢复时,需要先启动218的redis,然后再启动218的keepalived:

查看218的keepalived日志,218的keepalived直接进入state:backup,不会造成业务的来回切换:

image_thumb[28]

查看218的redis日志,218的redis启动后,会切换成已存在redis服务器的备机。

image_thumb[31]

综上所示,设计思路3测试成功。

-----------------------------------------设计思路3-------------------------------------------------

-----------------------------------------设计思路2-------------------------------------------------

----先设置成初始环境,再模拟设计思路2,将218的keepalived进程kill掉(service keepalived stop):

查看218的redis监控日志:

image_thumb[38]

查看219的keepalivd日志,说明keepalived正常切换了:

image_thumb[37]

查看219的redis监控日志,可以看到redis完成了主从切换:

image_thumb[41]

----模拟218从keepalived故障中恢复(只需要先kill所有keepalived进程后正常启动),执行service keepalived start:

查看219的keepalived的日志,可以看到keepalived的state为backup,不会造成VIP的漂移:

image_thumb[50]

查看218的redis监控日志,可

image_thumb[44]

查看218的redis运行日志,可以看到redis恢复为slave身份,不会造成业务切换:

image_thumb[47]

综上所示,设计思路2测试成功。

-----------------------------------------设计思路2-------------------------------------------------

设计思路4即为初始环境,设计思路1的情况为设计思路2和3的综合情况,无需测试了。

以上即为Redis双机热备方案。