Mongodb Replica Set高可用

最新推荐文章于 2023-03-30 20:16:47 发布

azhao_dn

最新推荐文章于 2023-03-30 20:16:47 发布

阅读量5.2k

点赞数

分类专栏： MongoDB 文章标签： mongodb 服务器 exception manager 测试 authentication

MongoDB 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

点击查看原文

概述

对于mongodb的相关介绍，在此不多累赘，如需了解请见:

http://www.mongodb.org/display/DOCS/Home

本文目的是说明如下几个功能：

1.在测试环境构建Replica Set方案

2.在服务器端和客户端测试failover

3.测试天然的读写分离，减轻服务器压力

4.添加权限认证功能

部署Replica Sets 方案

Relica Sets使用的是n个mongod节点，构建具备自动的容错功能(auto-failover),自动恢复的(auto-recovery)的高可用方案,理论上需要三个mongodb实例，在这个测试环境中，我采用的方案是2个mongod+1个arbiter。

(0)基础环境

mongod1 : deploy 
ip : 10.12.7.107 
port : 27031 
$ mkdir -p /data/db/0 
./mongod  --dbpath /data/db/0 --port 27031 --replSet myset

复制代码

mongod2 : doc 
ip : 10.12.7.108 
port : 27032 
$ mkdir -p /data/db/1 
./mongod  --dbpath /data/db/1 --port 27032 --replSet myset

复制代码

mongod3 : deploy 
ip : 10.12.7.107 
port : 27033 
$ mkdir -p /data/db/2 
./mongod  --dbpath /data/db/2 --port 27033 --replSet myset

复制代码

(1)实例化Replica Sets

任选一个mongod节点，mongo shell 登陆进去,执行如下内容：

> config = {_id: 'myset', members: [ 
        {_id: 0, host: '10.12.7.107:27031'}, 
        {_id: 1, host: '10.12.7.108:27032'}, 
        {_id: 2, host: '10.12.7.107:27033', arbiterOnly: true}]} 
> rs.initiate(config) 
> rs.conf() #查看配置信息 
> rs.staus()

复制代码

到此，Replica Sets配置就算完成了，相关的配置信息保存在local数据库中。

(2)执行流程说明

在同一时刻，每组 Replica Sets 只有一个 Primary，用于接受写操作。而后会异步复制到其他成员数据库中。一旦 primary 死掉，会自动投票选出接任的 primary 来，原服务器恢复后成为普通成员。如果数据尚未从先前的 primary 复制到成员服务器，有可能会丢失数据。

在服务器端和客户端测试failover

(1)服务器端

在 mongo 中向 primary (27031) 插入数据：

> use test 
switched to db test 
> db.users.insert({name:"terrylc"})

复制代码

会在mongodb的输出信息中，看到如下信息：

#27031 
Tue Oct 18 08:44:07 [FileAllocator] allocating new datafile /data/db/0/test.ns, filling with zeroes... 
Tue Oct 18 08:44:07 [FileAllocator] done allocating datafile /data/db/0/test.ns, size: 16MB,  took 0.054 secs 
Tue Oct 18 08:44:07 [FileAllocator] allocating new datafile /data/db/0/test.0, filling with zeroes... 
Tue Oct 18 08:44:07 [FileAllocator] done allocating datafile /data/db/0/test.0, size: 64MB,  took 0.153 secs 
Tue Oct 18 08:44:07 [FileAllocator] allocating new datafile /data/db/0/test.1, filling with zeroes... 
Tue Oct 18 08:44:07 [conn3] building new index on { _id: 1 } for test.users 
Tue Oct 18 08:44:07 [conn3] done for 0 records 0secs 
Tue Oct 18 08:44:07 [conn3] insert test.users 213ms 
Tue Oct 18 08:44:09 [FileAllocator] done allocating datafile /data/db/0/test.1, size: 128MB,  took 1.527 secs 
 
#27032 
Tue Oct 18 23:43:02 [FileAllocator] allocating new datafile /data/db/1/test.ns, filling with zeroes... 
Tue Oct 18 23:43:02 [FileAllocator] done allocating datafile /data/db/1/test.ns, size: 16MB,  took 0.054 secs 
Tue Oct 18 23:43:02 [FileAllocator] allocating new datafile /data/db/1/test.0, filling with zeroes... 
Tue Oct 18 23:43:03 [FileAllocator] done allocating datafile /data/db/1/test.0, size: 64MB,  took 0.556 secs 
Tue Oct 18 23:43:03 [FileAllocator] allocating new datafile /data/db/1/test.1, filling with zeroes... 
Tue Oct 18 23:43:03 [replica set sync] building new index on { _id: 1 } for test.users 
Tue Oct 18 23:43:03 [replica set sync] done for 0 records 0secs 
Tue Oct 18 23:43:03 [FileAllocator] done allocating datafile /data/db/1/test.1, size: 128MB,  took 0.166 secs 
 
#27033 
Tue Oct 18 08:42:22 [ReplSetHealthPollTask] replSet info 10.12.7.107:27031 is up 
Tue Oct 18 08:42:22 [ReplSetHealthPollTask] replSet member 10.12.7.107:27031 PRIMARY 
Tue Oct 18 08:42:22 [ReplSetHealthPollTask] replSet info 10.12.7.108:27032 is up 
Tue Oct 18 08:42:22 [ReplSetHealthPollTask] replSet member 10.12.7.108:27032 RECOVERING 
Tue Oct 18 08:42:34 [ReplSetHealthPollTask] replSet member 10.12.7.108:27032 SECONDARY

复制代码

停止27031服务，观察现象：

#27032 
Tue Oct 18 23:47:47 [conn2] end connection 10.12.7.107:58587 
Tue Oct 18 23:47:47 [replica set sync] replSet syncThread: 10278 dbclient error communicating with server: 10.12.7.107:27031 
Tue Oct 18 23:47:48 [ReplSetHealthPollTask] DBClientCursor::init call() failed 
Tue Oct 18 23:47:48 [ReplSetHealthPollTask] replSet info 10.12.7.107:27031 is down (or slow to respond): DBClientBase::findOne: transport error: 10.12.7.107:27031 query: { replSetHeartbeat: "myset", v: 1, pv: 1, checkEmpty: false, from: "10.12.7.108:27032" } 
Tue Oct 18 23:47:48 [rs Manager] replSet info electSelf 1 
Tue Oct 18 23:47:48 [rs Manager] replSet couldn't elect self, only received -9999 votes 
Tue Oct 18 23:47:54 [rs Manager] replSet info electSelf 1 
Tue Oct 18 23:47:54 [rs Manager] replSet PRIMARY 
 
#27033 
Tue Oct 18 08:48:52 [conn2] end connection 10.12.7.107:43768 
Tue Oct 18 08:48:53 [conn3] 10.12.7.108:27032 is trying to elect itself but 10.12.7.107:27031 is already primary and more up-to-date 
Tue Oct 18 08:48:54 [ReplSetHealthPollTask] DBClientCursor::init call() failed 
Tue Oct 18 08:48:54 [ReplSetHealthPollTask] replSet info 10.12.7.107:27031 is down (or slow to respond): DBClientBase::findOne: transport error: 10.12.7.107:27031 query: { replSetHeartbeat: "myset", v: 1, pv: 1, checkEmpty: false, from: "10.12.7.107:27033" } 
Tue Oct 18 08:48:59 [conn3] replSet info voting yea for 1 
Tue Oct 18 08:49:00 [ReplSetHealthPollTask] replSet member 10.12.7.108:27032 PRIMARY 
}}}

复制代码

Mongod 27032 被选为 Primary,在对应的终端下，执行shell mongo

$ ./mongo localhost:27032 
> rs.status() 
{ 
    "set" : "myset", 
    "date" : ISODate("2011-10-18T15:53:01Z"), 
    "myState" : 1, 
    "members" : [ 
        { 
            "_id" : 0, 
            "name" : "10.12.7.107:27031", 
            "health" : 0, 
            "state" : 1, 
            "stateStr" : "(not reachable/healthy)", 
            "uptime" : 0, 
            "optime" : { 
                "t" : 1318952647000, 
                "i" : 1 
            }, 
            "optimeDate" : ISODate("2011-10-18T15:44:07Z"), 
            "lastHeartbeat" : ISODate("2011-10-18T15:47:46Z"), 
            "errmsg" : "socket exception" 
        }, 
        { 
            "_id" : 1, 
            "name" : "10.12.7.108:27032", 
            "health" : 1, 
            "state" : 1, 
            "stateStr" : "PRIMARY", 
            "optime" : { 
                "t" : 1318952647000, 
                "i" : 1 
            }, 
            "optimeDate" : ISODate("2011-10-18T15:44:07Z"), 
            "self" : true 
        }, 
        { 
            "_id" : 2, 
            "name" : "10.12.7.107:27033", 
            "health" : 1, 
            "state" : 7, 
            "stateStr" : "ARBITER", 
            "uptime" : 705, 
            "optime" : { 
                "t" : 0, 
                "i" : 0 
            }, 
            "optimeDate" : ISODate("1970-01-01T00:00:00Z"), 
            "lastHeartbeat" : ISODate("2011-10-18T15:53:00Z") 
        } 
    ], 
    "ok" : 1 
} 
 
}}}

复制代码

查询先前插入的记录正常:

myset:PRIMARY> db.users.findOne() 
{ "_id" : ObjectId("4e9d9ec76df8f2f72bd9c45a"), "name" : "terrylc" }

复制代码

如果重新启动27031对应的mongod，此实例将会变成secondary

Tue Oct 18 23:56:46 [ReplSetHealthPollTask] replSet info 10.12.7.107:27031 is up 
Tue Oct 18 23:56:46 [ReplSetHealthPollTask] replSet member 10.12.7.107:27031 SECONDARY 
Tue Oct 18 23:56:47 [conn6] I am already primary, 10.12.7.107:27031 can try again once I've stepped down 
Tue Oct 18 23:56:48 [initandlisten] connection accepted from 10.12.7.107:44777 #7 
Tue Oct 18 23:56:49 [slaveTracking] building new index on { _id: 1 } for local.slaves 
Tue Oct 18 23:56:49 [slaveTracking] done for 0 records 0secs 
Tue Oct 18 23:57:03 [conn5] end connection 127.0.0.1:58555

复制代码

(2)客户端测试

从客户端连接 Replica Sets，需要 drivers 支持，本例中，我采用pymongo2.0.1

* 测试功能可用性

目标：当一个主节点出现故障时，备用节点是否可以转化为主节点。

import pymongo 
import sys 
def testInsert(): 
    flag = 0 
    while True: 
        if flag == 4: 
            print "try to connect server 4 times but failed,so give up" 
            break 
        try: 
            conn = pymongo.Connection(host=["10.12.7.107:27031", "10.12.7.108:27032", "10.12.7.107:27033"]) 
            db = conn.test 
            post = {"name1": "terrylc"} 
            db.users.insert(post) 
            conn.disconnect() 
            return sys.exit(0) 
        except Exception, e: 
            print e 
            flag += 1 
 
if __name__ == '__main__': 
    testInsert() 
}}}

复制代码

结论：

在服务器端的某个实例当掉后，客户端依然能够正常工作。

* 测试大规模写数据完整性

需求：当进行大规模写的时候，如果几个节点在特定时间交叉宕机后，能否确保数据完成性

目标：用python客户端向db中连续插入5000条记录，期望在插入完成后db中增加5000条记录：

def testInsert2(index): 
    try: 
        conn = pymongo.Connection(host=["10.12.7.107:27031", "10.12.7.108:27032", "10.12.7.107:27033"]) 
        db = conn.test 
        while index <= 5000: 
            post = {"testTAG": index} 
            time.sleep(1) 
            objectId = db.users.save(post, safe=True,w=2) 
            if objectId is not None: 
                index += 1 
        conn.disconnect() 
    except Exception, e: 
        print e 
        print "you index is :" + str(index) 
        time.sleep(2) 
        testInsert2(index) 
 
if __name__ == '__main__': 
    testInsert2(1) 
}}}

复制代码

结论:

在测试过程，分别模拟几台mongodb节点宕机现象，客户端在进行重新连接后能够继续进行写操作，并且最后数据一致且都能查询到。

测试天然的读写分离，减轻服务器压力

其实很简单的方式是在客户端的请求中添加如下请求字段即可：

conn = pymongo.Connection(host=["10.12.7.107:27031", "10.12.7.108:27032", "10.12.7.107:27033"], 
                                      slave_okay=True) 
 
slave_okay=True  # 这个是关键，可以将读写分离

复制代码

添加权限认证功能

Authentication was added in 1.7.5(mongodb)

Replica Sets 的认证跟原本的单台的认证是不同的，最直接的表现在使用replica Sets时，启动实例方式如下：

$ echo "this is my super secret key" > mykey 
$ chmod 600 mykey 
$ mongod --keyFile mykey # other options... 
 
$ echo "this is my super secret key" > mykey 
$ chmod 600 mykey 
$ mongod --keyFile mykey # other options... 
 
$ echo "this is my super secret key" > mykey 
$ chmod 600 mykey 
$ mongod --keyFile mykey # other options...