MongoDB高可用复制集特性初探
近年来,NoSQL风头甚健,对传统数据库厂商发起不断冲击。众多NoSQL解决方案中,最出名的则得数MongoDB。虽然该项目背后的厂商目前收入菲薄(据说一年收入才千万美元),但去年10月就成功拿到1.5亿的融资,公司估值也达到了12亿。知名数据库 DB-Engines Ranking 在2014年1月7号发布的2013最受欢迎数据库排行榜,更将其评选为年度数据库魁首。
鉴于MongoDB这么火热,那很有必要与时俱进,学习研究一番。MongoDB官网文档非常详尽丰富,其特性、CRUD操作、Aggregation框架等,都有翔实文档和示例。在这里就先研究一下其水平扩展(Scale out)的配置操作和管理。
MongoDB有3中配置方式:单机、主从和复制集。从机制设计看,后者显然更优秀,可以容易的实现数据的读写分离,并保持数据库集群自动容灾和高可用性。
下面,通过实际操作来进行演示。
系统:Mac OS
MongoDB:2.4.9
安装:从官网下载压缩包,解压。完毕。
启动数据库集群和复制集
集群里面包含3个节点:Node1,Node2,Node3
由于是在单机上实验,其实是启动3个MongoDB示例,并分别指派不同的数据库存放目录和日志。命令如下:
1
2
3
4
5
6
7
8
9
|
bin / mongod -- dbpath node1 -- port 10001 -- replSet myMongo -- nojournal -- fork -- logpath node1 . log about to fork child process , waiting until server is ready for connections . forked process : 410 all output going to : / Users / yuanlinwu / mongodb / node1 . log child process started successfully , parent exiting |
启动成功,再启动两个实例。
1
2
3
|
bin / mongod -- dbpath node2 -- port 10002 -- replSet myMongo -- nojournal -- fork -- logpath node2 . log bin / mongod -- dbpath node3 -- port 10003 -- replSet myMongo -- nojournal -- fork -- logpath node3 . log |
启动复制集后,登录其中一个节点
1
2
3
4
5
6
7
8
9
10
11
12
13
|
$ bin / mongo localhost : 10001 MongoDB shell version : 2.4.9 connecting to : localhost : 10001 / test Server has startup warnings : Tue Jan 14 16 : 44 : 20.966 [ initandlisten ] Tue Jan 14 16 : 44 : 20.966 [ initandlisten ] * * WARNING : soft rlimits too low . Number of files is 256 , should be at least 1000
>
|
这时候,我们看到一警告:soft rlimits too low。MongoDB官网解释,这个原因是操作系统的保守设定造成的,与MongoDB无关,因此这里也就不管了。
这时候,我们还可以通过网页版管理界面查看MongoDB的状态,网址为 host:11001(端口号+1000)。
查看复制集状态
反馈结果:
1
2
3
4
5
6
7
8
9
10
11
|
{
"startupStatus" : 3 , "info" : "run rs.initiate(...) if not yet done for the set" ,
"ok"
:
0
,
"errmsg" : "can't get local.system.replset config from self or any seed (EMPTYCONFIG)"
}
|
从提示信息看出,我们还尚未对复制集进行初始化。
初始化操作:
设定集群信息:
1
2
3
4
5
6
7
8
9
10
11
|
replSet = { "_id" : "myMongo" , "members" : [ { "_id" : 1 , "host" : "localhost:10001" } , { "_id" : 2 , "host" : "localhost:10002" } , { "_id" : 3 , "host" : "localhost:10003" } ] } > rs . initiate ( replSet )
{
"info" : "Config now saved locally. Should come online in about a minute." ,
"ok"
:
1
}
|
反馈信息:
1
2
3
4
5
6
7
|
{
"info" : "Config now saved locally. Should come online in about a minute." ,
"ok"
:
1
}
|
说明复制集配置已经保存成功。这时候还可以通过Log日志查看,复制集初始化的相关信息。
新建一个客户端来连接node1,我们会发现,提示信息已经有变化了,提示符前变成了Primary或Secondary。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
|
$ bin / mongo localhost : 10001 myMongo : PRIMARY > rs . status ( )
{
"set" : "myMongo" , "date" : ISODate ( "2014-01-14T09:03:46Z" ) ,
"myState"
:
1
,
"members"
:
[
{
"_id"
:
1
,
"name" : "localhost:10001" ,
"health"
:
1
,
"state"
:
1
,
"stateStr" : "PRIMARY" ,
"uptime"
:
1166
,
"optime" : Timestamp ( 1389690019 , 1 ) , "optimeDate" : ISODate ( "2014-01-14T09:00:19Z" ) ,
"self"
:
true
}
,
{
"_id"
:
2
,
"name" : "localhost:10002" ,
"health"
:
1
,
"state"
:
2
,
"stateStr" : "SECONDARY" ,
"uptime"
:
203
,
"optime" : Timestamp ( 1389690019 , 1 ) , "optimeDate" : ISODate ( "2014-01-14T09:00:19Z" ) , "lastHeartbeat" : ISODate ( "2014-01-14T09:03:45Z" ) , "lastHeartbeatRecv" : ISODate ( "2014-01-14T09:03:46Z" ) ,
"pingMs"
:
0
,
"syncingTo" : "localhost:10001"
}
,
{
"_id"
:
3
,
"name" : "localhost:10003" ,
"health"
:
1
,
"state"
:
2
,
"stateStr" : "SECONDARY" ,
"uptime"
:
203
,
"optime" : Timestamp ( 1389690019 , 1 ) , "optimeDate" : ISODate ( "2014-01-14T09:00:19Z" ) , "lastHeartbeat" : ISODate ( "2014-01-14T09:03:45Z" ) , "lastHeartbeatRecv" : ISODate ( "2014-01-14T09:03:45Z" ) ,
"pingMs"
:
0
,
"syncingTo" : "localhost:10001"
}
]
,
"ok"
:
1
}
|
导入数据
这时候,我们向数据库中插入一些数据,看看是不是在各个节点都自动进行了复制。 为感性了解一下MongoDB的数据写入能力,我这里用的数据集来自于 《Mining the Social Web》 中安然公司的邮件数据集,下载位置: https://github.com/ptwobrussell/Mining-the-Social-Web-2nd-Edition/tree/master/ipynb/resources/ch06-mailboxes/data
新开一个命令终端,执行导入命令:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
|
$ bin / mongoimport -- host localhost -- port 10001 -- db enron -- collection mails -- file enron . mbox . json Tue Jan 14 17 : 15 : 40.009 Progress : 35267686 / 151226171 23 % Tue Jan 14 17 : 15 : 40.009 9600 3200 / second Tue Jan 14 17 : 15 : 43.014 Progress : 86409448 / 151226171 57 % Tue Jan 14 17 : 15 : 43.014 23500 3916 / second Tue Jan 14 17 : 15 : 46.014 Progress : 143605879 / 151226171 94 % Tue Jan 14 17 : 15 : 46.014 38900 4322 / second Tue Jan 14 17 : 15 : 46.449 check 9 41299 Tue Jan 14 17 : 15 : 46.449 imported 41299 objects |
包含4万多个对象体积约151Mb的json文件,大概用了6秒钟插入完毕。
开启新客户端进入node 2,查看插入node1的数据是否被复制到node2。
1
2
3
4
5
6
7
|
$ bin / mongo localhost : 10002 myMongo : SECONDARY > show dbs enron 0.453125GB local 0.328125GB |
这时候,我们看到enron数据库已经出现了,数据库体积与node1一致。之前说的,MongoDB可以通过复制集来实现读写分离,那我们来操作一下。
1
2
3
4
5
6
7
|
myMongo : SECONDARY > use enron switched to db enron myMongo : SECONDARY > show collections Tue Jan 14 17 : 26 : 43.721 error : { "$err" : "not master and slaveOk=false" , "code" : 13435 } at src / mongo / shell / query . js : 128 |
出现错误,提示slaveOk=false。MongoDB默认设置,SECONDARY节点是不允许读的,需要人工打开或在配置文件里面设置。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
myMongo : SECONDARY > rs . slaveOk ( ) ; myMongo : SECONDARY > show collections
mails
system
.
indexes
myMongo : SECONDARY > db . mails . findOne ( )
{
"_id" : ObjectId ( "52d50039242bb597037cb808" ) ,
……
|
执行rs.slaveOk()命令后,已经可以进行读操作了。
复制集高可用性实验
下面,就到本文的重点部分了。
实验步骤:
1、人为让Primary节点失败倒掉,看MongoDB集群能否正常继续运行
2、将失败的节点,重启修复后,恢复到集群内
在命令终端里查看mongod进程,找到primary node进程后杀掉。
1
2
3
4
5
6
7
8
9
|
$ ps - A | grep mongod 410 ? ? 0 : 30.61 bin / mongod -- dbpath node1 -- port 10001 -- replSet myMongo -- nojournal -- fork -- logpath node1 . log 424 ? ? 0 : 28.20 bin / mongod -- dbpath node2 -- port 10002 -- replSet myMongo -- nojournal -- fork -- logpath node2 . log 427 ? ? 0 : 28.40 bin / mongod -- dbpath node3 -- port 10003 -- replSet myMongo -- nojournal -- fork -- logpath node3 . log 1197 ttys002 0 : 00.00 grep mongod |
杀掉 410进程
kill -9 3925
在之前的Primary Shell中执行命令
1
2
3
4
5
6
7
8
9
|
myMongo : PRIMARY > rs . status ( ) Tue Jan 14 17 : 36 : 43.572 DBClientCursor :: init call ( ) failed Tue Jan 14 17 : 36 : 43.572 Error : error doing query : failed at src / mongo / shell / query . js : 78 Tue Jan 14 17 : 36 : 43.573 trying reconnect to localhost : 10001 Tue Jan 14 17 : 36 : 43.573 reconnect localhost : 10001 failed couldn ' t connect to server localhost : 10001 |
这时候,我们看到已经不能连接 localhost:10001 了。那我们在Secondary 中执行rs.status()
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
|
myMongo
:
SECONDARY
>
rs
.
status
(
)
{
"set" : "myMongo" , "date" : ISODate ( "2014-01-14T09:38:03Z" ) ,
"myState"
:
1
,
"members"
:
[
{
"_id"
:
1
,
"name" : "localhost:10001" ,
"health"
:
0
,
"state"
:
8
,
"stateStr" : "(not reachable/healthy)" ,
"uptime"
:
0
,
"optime" : Timestamp ( 1389690946 , 2490 ) , "optimeDate" : ISODate ( "2014-01-14T09:15:46Z" ) , "lastHeartbeat" : ISODate ( "2014-01-14T09:38:02Z" ) , "lastHeartbeatRecv" : ISODate ( "2014-01-14T09:36:28Z" ) ,
"pingMs"
:
0
}
,
{
"_id"
:
2
,
"name" : "localhost:10002" ,
"health"
:
1
,
"state"
:
1
,
"stateStr" : "PRIMARY" ,
"uptime"
:
3161
,
"optime" : Timestamp ( 1389690946 , 2490 ) , "optimeDate" : ISODate ( "2014-01-14T09:15:46Z" ) ,
"self"
:
true
}
,
{
"_id"
:
3
,
"name" : "localhost:10003" ,
"health"
:
1
,
"state"
:
2
,
"stateStr" : "SECONDARY" ,
"uptime"
:
2249
,
"optime" : Timestamp ( 1389690946 , 2490 ) , "optimeDate" : ISODate ( "2014-01-14T09:15:46Z" ) , "lastHeartbeat" : ISODate ( "2014-01-14T09:38:02Z" ) , "lastHeartbeatRecv" : ISODate ( "2014-01-14T09:38:01Z" ) ,
"pingMs"
:
0
,
"lastHeartbeatMessage" : "syncing to: localhost:10002" , "syncingTo" : "localhost:10002"
}
]
,
"ok"
:
1
}
|
这时候看到,复制集成员 localhost:10001 的状态为(not reachable/healthy),说明原来的Primary已经倒掉了。注意到这时候我们在操作的是Secondary,那是不是说复制集里面已经没有Primary了呢?考虑到我们的环境里面有3个节点,我们连接node3看看。
1
2
3
4
5
6
7
8
9
|
$ bin / mongo localhost : 10003 MongoDB shell version : 2.4.9 connecting to : localhost : 10003 / test Server has startup warnings : myMongo : SECONDARY > quit ( ) |
我们看到node3依然是SECONDARY,那PRIMARY跑哪里去了?重新连接node2
1
2
3
4
5
6
7
|
$ bin / mongo localhost : 10002 MongoDB shell version : 2.4.9 connecting to : localhost : 10002 / test myMongo : PRIMARY > |
哈,node2已经变成了复制集中的PRIMARY!
事实上,MongoDB复制集中,如果集群的节点数为奇数个时,复制集中的Primary节点失败,会自动从Secondary节点中推选除一个来顶替原来的Primary。而节点为偶数个时,则需要一个节点担任仲裁者。具体的介绍,可以参考:
接下来,我们试图恢复node1。
恢复失败节点
首先,删除node1数据文件夹下面的锁文件,mongod.lock
然后,执行修复命令
1
2
3
4
5
6
7
8
9
10
11
|
$ bin / mongod -- dbpath node1 -- port 10001 -- replSet myMongo -- nojournal -- fork -- logpath node1 . log -- repair about to fork child process , waiting until server is ready for connections . forked process : 1236 all output going to : / Users / yuanlinwu / mongodb / node1 . log log file [ / Users / yuanlinwu / mongodb / node1 . log ] exists ; copied to temporary file [ / Users / yuanlinwu / mongodb / node1 . log . 2014 - 01 - 14T09 - 47 - 45 ] child process started successfully , parent exiting |
启动失败节点
1
2
3
4
5
6
7
8
9
10
11
|
$ bin / mongod -- dbpath node1 -- port 10001 -- replSet myMongo -- nojournal -- fork -- logpath node1 . log about to fork child process , waiting until server is ready for connections . forked process : 1240 all output going to : / Users / yuanlinwu / mongodb / node1 . log log file [ / Users / yuanlinwu / mongodb / node1 . log ] exists ; copied to temporary file [ / Users / yuanlinwu / mongodb / node1 . log . 2014 - 01 - 14T09 - 48 - 11 ] child process started successfully , parent exiting |
从反馈信息看到,已经成功把失败的node1节点启动了。
在mongo终端查看复制集状态
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
|
myMongo : PRIMARY > rs . status ( )
{
"set" : "myMongo" , "date" : ISODate ( "2014-01-14T09:49:07Z" ) ,
"myState"
:
1
,
"members"
:
[
{
"_id"
:
1
,
"name" : "localhost:10001" ,
"health"
:
1
,
"state"
:
2
,
"stateStr" : "SECONDARY" ,
"uptime"
:
56
,
"optime" : Timestamp ( 1389690946 , 2490 ) , "optimeDate" : ISODate ( "2014-01-14T09:15:46Z" ) , "lastHeartbeat" : ISODate ( "2014-01-14T09:49:07Z" ) , "lastHeartbeatRecv" : ISODate ( "2014-01-14T09:49:07Z" ) ,
"pingMs"
:
0
,
"lastHeartbeatMessage" : "syncing to: localhost:10002" , "syncingTo" : "localhost:10002"
}
,
{
"_id"
:
2
,
"name" : "localhost:10002" ,
"health"
:
1
,
"state"
:
1
,
"stateStr" : "PRIMARY" ,
"uptime"
:
3825
,
"optime" : Timestamp ( 1389690946 , 2490 ) , "optimeDate" : ISODate ( "2014-01-14T09:15:46Z" ) ,
"self"
:
true
}
,
{
"_id"
:
3
,
"name" : "localhost:10003" ,
"health"
:
1
,
"state"
:
2
,
"stateStr" : "SECONDARY" ,
"uptime"
:
2913
,
"optime" : Timestamp ( 1389690946 , 2490 ) , "optimeDate" : ISODate ( "2014-01-14T09:15:46Z" ) , "lastHeartbeat" : ISODate ( "2014-01-14T09:49:06Z" ) , "lastHeartbeatRecv" : ISODate ( "2014-01-14T09:49:07Z" ) ,
"pingMs"
:
0
,
"syncingTo" : "localhost:10002"
}
]
,
"ok"
:
1
}
|
从复制集状态信息中,我们看到成员localhost:10001 已经成功恢复加入到myMongo复制集,并且向Primary进行同步。
关闭数据库
在节点Shell执行关闭服务器命令,进行干净的关机。
use admin
db.shutdownServer()
小结
从上面的操作可看出,MongoDB在集群管理、高可用性方面,的确具有易用方便的优势。其他特性,待有空后进一步研究。