小丸子学MongoDB系列之——副本集Auto-Failover

最新推荐文章于 2020-05-07 14:44:33 发布

ckml77559

最新推荐文章于 2020-05-07 14:44:33 发布

阅读量151

点赞数

文章标签：数据库 shell 网络

  MongoDB副本集是通过自动故障切换特性来提供的高可用功能，当主节点成员不可用时从节点成员会自动变成主节点来继续对外提供服务，下面为大家演示两种副本集架构下的自动故障切换。
一.Auto-Failover（一主俩从架构）
1.查看副本集状态
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
rstl:PRIMARY> rs.status()
{
"set" : "rstl",
"date" : ISODate("2015-12-07T07:47:14.958Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.157.128:47017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 258664,
"optime" : Timestamp(1449459199, 2),
"optimeDate" : ISODate("2015-12-07T03:33:19Z"),
"electionTime" : Timestamp(1449458787, 1),
"electionDate" : ISODate("2015-12-07T03:26:27Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.157.128:47027",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 15651,
"optime" : Timestamp(1449459199, 2),
"optimeDate" : ISODate("2015-12-07T03:33:19Z"),
"lastHeartbeat" : ISODate("2015-12-07T07:47:14.804Z"),
"lastHeartbeatRecv" : ISODate("2015-12-07T07:47:14.212Z"),
"pingMs" : 0,
"syncingTo" : "192.168.157.128:47017",
"configVersion" : 1
},
{
"_id" : 2,
"name" : "192.168.157.128:47037",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 15651,
"optime" : Timestamp(1449459199, 2),
"optimeDate" : ISODate("2015-12-07T03:33:19Z"),
"lastHeartbeat" : ISODate("2015-12-07T07:47:14.696Z"),
"lastHeartbeatRecv" : ISODate("2015-12-07T07:47:14.509Z"),
"pingMs" : 0,
"syncingTo" : "192.168.157.128:47017",
"configVersion" : 1
}
],
"ok" : 1
}

2.模拟主节点成员故障
[mgousr01@vm1 ~]$ netstat -ntpl|grep 47017
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.157.128:47017 0.0.0.0:* LISTEN 26892/mongod

[mgousr01@vm1 ~]$ kill -9 26892
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
2015-12-04T14:44:59.102+0800 W NETWORK Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:59.103+0800 E QUERY Error: couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
at connect (src/mongo/shell/mongo.js:181:14)
at (connect):1:6 at src/mongo/shell/mongo.js:181
exception: connect failed

[mgousr01@vm1 ~]$ mongo 192.168.157.128:47027
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47027/test
rstl:PRIMARY> rs.status();
{
"set" : "rstl",
"date" : ISODate("2015-12-04T06:45:46.896Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.157.128:47017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2015-12-04T06:45:46.244Z"),
"lastHeartbeatRecv" : ISODate("2015-12-04T06:44:17.958Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed",
"configVersion" : -1
},
{
"_id" : 1,
"name" : "192.168.157.128:47027",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 653,
"optime" : Timestamp(1449210941, 1),
"optimeDate" : ISODate("2015-12-04T06:35:41Z"),
"electionTime" : Timestamp(1449211460, 1),
"electionDate" : ISODate("2015-12-04T06:44:20Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 2,
"name" : "192.168.157.128:47037",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 603,
"optime" : Timestamp(1449210941, 1),
"optimeDate" : ISODate("2015-12-04T06:35:41Z"),
"lastHeartbeat" : ISODate("2015-12-04T06:45:46.031Z"),
"lastHeartbeatRecv" : ISODate("2015-12-04T06:45:46.031Z"),
"pingMs" : 0,
"configVersion" : 1
}
],
"ok" : 1
}

3.观察一主俩从架构的Auto-Failover
成员mg27的日志：
2015-12-04T14:44:19.979+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47027", fromId: 1, checkEmpty: false }
2015-12-04T14:44:19.985+0800 I REPL [ReplicationExecutor] Standing for election
2015-12-04T14:44:19.985+0800 I REPL [ReplicationExecutor] replSet possible election tie; sleeping 665ms until 2015-12-04T14:44:20.650+0800
2015-12-04T14:44:19.986+0800 W NETWORK [ReplExecNetThread-3] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.986+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:19.987+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.987+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:20.651+0800 I REPL [ReplicationExecutor] Standing for election
2015-12-04T14:44:20.651+0800 I REPL [ReplicationExecutor] replSet info electSelf
2015-12-04T14:44:20.653+0800 I REPL [ReplicationExecutor] replSet election succeeded, assuming primary role
2015-12-04T14:44:20.653+0800 I REPL [ReplicationExecutor] transition to PRIMARY
2015-12-04T14:44:21.094+0800 I REPL [rsSync] transition to primary complete; database writes are now permitted

成员mg37的日志：
2015-12-04T14:44:19.980+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47037", fromId: 2, checkEmpty: false }
2015-12-04T14:44:19.985+0800 I REPL [ReplicationExecutor] Standing for election
2015-12-04T14:44:19.985+0800 W NETWORK [ReplExecNetThread-1] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.986+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:19.986+0800 I REPL [ReplicationExecutor] replSet possible election tie; sleeping 666ms until 2015-12-04T14:44:20.652+0800
2015-12-04T14:44:19.986+0800 W NETWORK [ReplExecNetThread-1] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-04T14:44:19.986+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-04T14:44:20.652+0800 I REPL [ReplicationExecutor] replSetElect voting yea for 192.168.157.128:47027 (1)
2015-12-04T14:44:21.963+0800 I REPL [ReplicationExecutor] Member 192.168.157.128:47027 is now in state PRIMARY
注：从上述日志可以发现两个从节点成员通过心跳机制都无法向主节点发起请求，于是开始进行选举，由于两个候选成员都有可能成为主节点，所以选举可能会出现打平的现象，最终成员mg27成为主节点成员。

4.验证新的副本集数据同步是否正常
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47027
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47027/test
rstl:PRIMARY> use soho
switched to db soho
rstl:PRIMARY> db.food.find()
{ "_id" : ObjectId("5664fdfe1830846a3331ce02"), "name" : "egg", "price" : 38 }
rstl:PRIMARY> db.food.insert({name:"cake",price:100});
WriteResult({ "nInserted" : 1 })
rstl:PRIMARY> db.food.find()
{ "_id" : ObjectId("5664fdfe1830846a3331ce02"), "name" : "egg", "price" : 38 }
{ "_id" : ObjectId("56653f83e9fccdbf504e8548"), "name" : "cake", "price" : 100 }

[mgousr01@vm1 ~]$ mongo 192.168.157.128:47037
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47037/test
rstl:SECONDARY> rs.slaveOk()
rstl:SECONDARY> show dbs;
local 0.203GB
soho 0.078GB
rstl:SECONDARY> use soho
switched to db soho
rstl:SECONDARY> show tables;
food
system.indexes
rstl:SECONDARY> db.food.find();
{ "_id" : ObjectId("5664fdfe1830846a3331ce02"), "name" : "egg", "price" : 38 }
{ "_id" : ObjectId("56653f83e9fccdbf504e8548"), "name" : "cake", "price" : 100 }
注：从上面的输出发现新的副本集没有问题

二.Auto-Failover（一主一从一仲裁架构）
1.初始化副本集并查看状态
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
> cfg=
{
"_id" : "rstl",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "192.168.157.128:47017"
},
{
"_id" : 1,
"host" : "192.168.157.128:47027"
},
{
"_id" : 2,
"host" : "192.168.157.128:47037",
"arbiterOnly":true
}
]
}

> rs.initiate(cfg)
{ "ok" : 1 }

rstl:OTHER> rs.status()
{
"set" : "rstl",
"date" : ISODate("2015-12-07T08:30:04.224Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.157.128:47017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 657,
"optime" : Timestamp(1449476903, 1),
"optimeDate" : ISODate("2015-12-07T08:28:23Z"),
"electionTime" : Timestamp(1449476907, 1),
"electionDate" : ISODate("2015-12-07T08:28:27Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 1,
"name" : "192.168.157.128:47027",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 100,
"optime" : Timestamp(1449476903, 1),
"optimeDate" : ISODate("2015-12-07T08:28:23Z"),
"lastHeartbeat" : ISODate("2015-12-07T08:30:03.484Z"),
"lastHeartbeatRecv" : ISODate("2015-12-07T08:30:03.518Z"),
"pingMs" : 0,
"configVersion" : 1
},
{
"_id" : 2,
"name" : "192.168.157.128:47037",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 100,
"lastHeartbeat" : ISODate("2015-12-07T08:30:03.523Z"),
"lastHeartbeatRecv" : ISODate("2015-12-07T08:30:03.523Z"),
"pingMs" : 0,
"configVersion" : 1
}
],
"ok" : 1
}

2.模拟主节点成员故障
[mgousr01@vm1 ~]$ netstat -ntpl|grep 47017
(Not all processes could be identified, non-owned process info
will not be shown, you would have to be root to see it all.)
tcp 0 0 192.168.157.128:47017 0.0.0.0:* LISTEN 36268/mongod

[mgousr01@vm1 ~]$ kill -9 36268
[mgousr01@vm1 ~]$ mongo 192.168.157.128:47017
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47017/test
2015-12-07T16:42:46.486+0800 W NETWORK Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:46.488+0800 E QUERY Error: couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
at connect (src/mongo/shell/mongo.js:181:14)
at (connect):1:6 at src/mongo/shell/mongo.js:181
exception: connect failed

[mgousr01@vm1 ~]$ mongo 192.168.157.128:47027
MongoDB shell version: 3.0.3
connecting to: 192.168.157.128:47027/test
rstl:PRIMARY> rs.status()
{
"set" : "rstl",
"date" : ISODate("2015-12-07T08:43:02.047Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "192.168.157.128:47017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2015-12-07T08:43:00.318Z"),
"lastHeartbeatRecv" : ISODate("2015-12-07T08:42:06.201Z"),
"pingMs" : 0,
"lastHeartbeatMessage" : "Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed",
"configVersion" : -1
},
{
"_id" : 1,
"name" : "192.168.157.128:47027",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1336,
"optime" : Timestamp(1449476903, 1),
"optimeDate" : ISODate("2015-12-07T08:28:23Z"),
"electionTime" : Timestamp(1449477728, 1),
"electionDate" : ISODate("2015-12-07T08:42:08Z"),
"configVersion" : 1,
"self" : true
},
{
"_id" : 2,
"name" : "192.168.157.128:47037",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 878,
"lastHeartbeat" : ISODate("2015-12-07T08:43:00.267Z"),
"lastHeartbeatRecv" : ISODate("2015-12-07T08:43:00.264Z"),
"pingMs" : 0,
"configVersion" : 1
}
],
"ok" : 1
}

3.观察一主一从一仲裁架构的Auto-Failover
成员mg27的日志：
2015-12-07T16:42:08.218+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47027", fromId: 1, checkEmpty: false }
2015-12-07T16:42:08.218+0800 I REPL [ReplicationExecutor] Standing for election
2015-12-07T16:42:08.219+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.220+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.220+0800 I REPL [ReplicationExecutor] replSet info electSelf
2015-12-07T16:42:08.220+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.220+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.220+0800 I REPL [ReplicationExecutor] replSet election succeeded, assuming primary role
2015-12-07T16:42:08.220+0800 I REPL [ReplicationExecutor] transition to PRIMARY
2015-12-07T16:42:08.525+0800 I REPL [rsSync] transition to primary complete; database writes are now permitted

成员mg37的日志：
2015-12-07T16:42:08.217+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location10276 DBClientBase::findN: transport error: 192.168.157.128:47017 ns: admin.$cmd query: { replSetHeartbeat: "rstl", pv: 1, v: 1, from: "192.168.157.128:47037", fromId: 2, checkEmpty: false }
2015-12-07T16:42:08.218+0800 W NETWORK [ReplExecNetThread-4] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.218+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.219+0800 W NETWORK [ReplExecNetThread-4] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:08.219+0800 I NETWORK [initandlisten] connection accepted from 192.168.157.128:39452 #59 (2 connections now open)
2015-12-07T16:42:08.219+0800 I REPL [ReplicationExecutor] Error in heartbeat request to 192.168.157.128:47017; Location18915 Failed attempt to connect to 192.168.157.128:47017; couldn't connect to server 192.168.157.128:47017 (192.168.157.128), connection attempt failed
2015-12-07T16:42:08.220+0800 I REPL [ReplicationExecutor] replSetElect voting yea for 192.168.157.128:47027 (1)
2015-12-07T16:42:10.220+0800 W NETWORK [ReplExecNetThread-0] Failed to connect to 192.168.157.128:47017, reason: errno:111 Connection refused
2015-12-07T16:42:10.220+0800 I REPL [ReplicationExecutor] Member 192.168.157.128:47027 is now in state PRIMARY
注：从上述日志可以发现两个从节点成员通过心跳机制都无法向主节点发起请求，于是开始进行选举，从成员mg27的日志并没有发现一主两从架构中的选举争用信息，最终成员mg27成为主节点成员，此时成员mg27也获得了数据库写的权限

总结：对于三个节点的副本集成员来说，官方建议选择1主俩从的架构，而仲裁节点一般适应于拥有偶数个副本集成员的情况下，这样可以避免选举打平的现象

来自 “ ITPUB博客 ” ，链接：http://blog.itpub.net/20801486/viewspace-1867663/，如需转载，请注明出处，否则将追究法律责任。

转载于:http://blog.itpub.net/20801486/viewspace-1867663/