架构
因为我们机器不多,考虑到小而稳定,我们用三台机器来搭建这个复制集,其中一台primary,一台secondary,一台ARBITER
角色 | 机器 |
---|---|
primary | 192.168.203.128 |
secondary | 192.168.203.129 |
ARBITER | 192.168.203.130 |
网络环境
hosts配置
复制集每台机器互相之间的连接可以通过ip:port,也可以通过domain:port方式,推荐用domain:port,因为在窗机复制集时,默认会用第一台的机器作为primary,并且默认使用这台机器的hostname来标记这台机器。所以需要在每台机器的hosts上修改/etc/host文件。
防火墙
每台机器的防火墙允许另外其他可以访问27017端口。
如果机器为云服务器,公网ip不是绑定在机器的网卡上,而是外部的VR上,那么云服务器的防火墙需要允许该外网Ip的访问。
搭建详细过程
- 创建目录
mkdir /data/mongodb
mkdir /data/mongodb/{logs,data}
- 修改/etc/mongodb.conf
storage:
dbPath: /data/mongodb/data
journal:
enabled: true
systemLog:
path: /data/mongodb/logs/mongodb.log
logAppend: true
destination: file
net:
port: 27017
bindIp: 0.0.0.0
maxIncomingConnections: 50000
processManagement:
fork: true
security:
authorization: disabled
暂时不开启验证,在配置好权限之后再开启
3. 启动192.168.203.128上的mongo
/etc/init.d/mongodb start
- 添加用户权限
添加admin 和root用户。root用户的角色需要为root,这样才可以使用复制集命令:
> use admin
switched to db admin
> db.createUser({user: "admin",pwd: "1111",roles: [ { role: "userAdminAnyDatabase", db: "admin" }, "readWriteAnyDatabase" ]})
> db.createUser({user: "root",pwd: "2222",roles: [ { role: "root", db: "admin" } ]})
> exit
- 创建认证key文件
在192.168.203.128上,创建用于认证的keyfile,这个文件的读写权限必须设置为600:
openssl rand -base64 102 >/data/mongodb/key
chmod 600 /data/mongodb/key
将key文件传输到另外两条机器上。当mongodb开启验证时,复制集机器间的连接通过此keyfile通过验证。
5. 修改配置文件,开启验证
修改每一台机器的配置:
storage:
dbPath: /data/mongodb/data
journal:
enabled: true
systemLog:
path: /data/mongodb/logs/mongodb.log
logAppend: true
destination: file
net:
port: 27017
bindIp: 0.0.0.0
maxIncomingConnections: 50000
processManagement:
fork: true
security:
keyFile: /data/mongodb/key
#authorization: disabled
replication:
replSetName: rs0
oplogSizeMB: 128
replSetName是整个复制集的名称,每一台的机器要一致。有keyFile字段的情况下,会默认开启验证,所以authorization字段可以不用配置。
6. 启动每台机器的mongodb
/etc/init.d/mongodb start
- 登录mongo的root用户
> use admin
> db.auth('root','2222')
- 初始化复制集
> rs.initiate()
rs0:PRIMARY> rs.add('192.168.203.129:27017')
rs0:PRIMARY> rs.addArb("192.168.203.130:27017")
rs.initiate()会采用默认配置来生成复制集的配置,可以在函数中指定配置参数进行初始化,可以通过rs.conf()来查看配置
默认会将复制集中的第一台作为primary,rs.add()可以往复制集中添加节点,默认会成为secondary,rs.addArb()可以往复制集中添加投票节点。
通过rs.status()来查看,如果stateStr字段不是primary/secondary/ARBITER的话,需要查看日志确定问题,一般为网络问题。
9. 查看复制集状态
rs0:PRIMARY> rs.status()
{
"set" : "rs0",
"date" : ISODate("2018-10-24T09:27:54.521Z"),
"myState" : 1,
"term" : NumberLong(1),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1540373269, 1),
"t" : NumberLong(1)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1540373269, 1),
"t" : NumberLong(1)
},
"appliedOpTime" : {
"ts" : Timestamp(1540373269, 1),
"t" : NumberLong(1)
},
"durableOpTime" : {
"ts" : Timestamp(1540373269, 1),
"t" : NumberLong(1)
}
},
"lastStableCheckpointTimestamp" : Timestamp(1540373237, 1),
"members" : [
{
"_id" : 0,
"name" : "VM_191_137_centos:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 349,
"optime" : {
"ts" : Timestamp(1540373269, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-10-24T09:27:49Z"),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "could not find member to sync from",
"electionTime" : Timestamp(1540373175, 2),
"electionDate" : ISODate("2018-10-24T09:26:15Z"),
"configVersion" : 3,
"self" : true,
"lastHeartbeatMessage" : ""
},
{
"_id" : 1,
"name" : "192.168.203.129:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 24,
"optime" : {
"ts" : Timestamp(1540373269, 1),
"t" : NumberLong(1)
},
"optimeDurable" : {
"ts" : Timestamp(1540373269, 1),
"t" : NumberLong(1)
},
"optimeDate" : ISODate("2018-10-24T09:27:49Z"),
"optimeDurableDate" : ISODate("2018-10-24T09:27:49Z"),
"lastHeartbeat" : ISODate("2018-10-24T09:27:53.806Z"),
"lastHeartbeatRecv" : ISODate("2018-10-24T09:27:54.451Z"),
"pingMs" : NumberLong(4),
"lastHeartbeatMessage" : "",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : 3
},
{
"_id" : 2,
"name" : "192.168.203.130:27017",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 4,
"lastHeartbeat" : ISODate("2018-10-24T09:27:53.845Z"),
"lastHeartbeatRecv" : ISODate("2018-10-24T09:27:54.084Z"),
"pingMs" : NumberLong(6),
"lastHeartbeatMessage" : "",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : 3
}
],
"ok" : 1,
"operationTime" : Timestamp(1540373269, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1540373269, 1),
"signature" : {
"hash" : BinData(0,"6FoGRVSEu2S7x3Sye30IfjPEATs="),
"keyId" : NumberLong("6615852418850619393")
}
}
}
- 为其他数据库创建用户
rs0:PRIMARY> use wiki
rs0:PRIMARY> db.createUser({user:'wiki',pwd:"111111",roles:[{role:"readWrite",db:"wiki"}]})
rs0:PRIMARY> use socketProject
rs0:PRIMARY> db.createUser({user:'dev',pwd:"111111",roles:[{role:"readWrite",db:"socketProject"}]})
- 导入wiki和socketProject数据
- 测试复制集
在secondary的MongoDB shell中,查询数据,会提示需要认证,说明认证功能启用:
rs0:SECONDARY> rs.status()
{
"operationTime" : Timestamp(1540431843, 1),
"ok" : 0,
"errmsg" : "command replSetGetStatus requires authentication",
"code" : 13,
"codeName" : "Unauthorized",
"$clusterTime" : {
"clusterTime" : Timestamp(1540431843, 1),
"signature" : {
"hash" : BinData(0,"5gkn6K6b3D6pq9sfq+EnJ38EHCM="),
"keyId" : NumberLong("6615852418850619393")
}
}
}
登录root后就可以操作了:
rs0:SECONDARY> use admin
switched to db admin
rs0:SECONDARY> db.auth('root','2222')
1
rs0:SECONDARY> rs.status()
rs0:SECONDARY> rs.slaveOk()
rs0:SECONDARY> show dbs
admin 0.000GB
config 0.000GB
local 0.006GB
socketProject 0.006GB
wiki 0.000GB
rs0:SECONDARY>
rs0:SECONDARY> use wiki
switched to db wiki
rs0:SECONDARY> show collections
bruteforces
entries
sessions
settings
uplfiles
uplfolders
users
可以看到,之前我们在primary上加的用户同样可以在secondary进行登录验证。同时,从primary导入的wiki数据库也同步到了。如果是在投票节点进行查看,会发现数据没有同步,这是因为投票节点不同步数据,只做投票用。
12. 修改secondary优先级
考虑到生产环境没有做高可用,就算primary发生故障导致复制集重新选举primary,我们也没有办法把代码切换到新的IP,因此干脆把secondary的priority置为0,让该secondary不会成为primary,同时,投票节点的机器可能随时退掉,这样做也可以避免没有投票节点导致的选举问题。
rs0:PRIMARY> cfg=rs.conf()
rs0:PRIMARY> cfg['members'][1]
rs0:PRIMARY> cfg['members'][1].priority=0
rs0:PRIMARY> rs.reconfig(cfg)
rs0:PRIMARY> rs.conf()
primary切换测试
接下来我们测试一下在Primary发生故障的情况下,primary是否会切换到另外一台节点上,这个测试至少得有一台priority不为0的secondary。
- 先把当前的primary关闭
rs0:PRIMARY> use admin
switched to db admin
rs0:PRIMARY> db.shutdownServer()
server should be down...
2018-10-24T12:04:31.867+0800 I NETWORK [js] trying reconnect to 127.0.0.1:27017 failed
2018-10-24T12:04:32.367+0800 I NETWORK [js] reconnect 127.0.0.1:27017 failed failed
2018-10-24T12:04:32.370+0800 I NETWORK [js] trying reconnect to 127.0.0.1:27017 failed
2018-10-24T12:04:32.370+0800 I NETWORK [js] reconnect 127.0.0.1:27017 failed failed
>
- 在另外一台secondary节点上查看
rs0:SECONDARY> rs.status()
{
"set" : "rs0",
"date" : ISODate("2018-10-24T04:05:41.645Z"),
"myState" : 1,
"term" : NumberLong(4),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1540353867, 1),
"t" : NumberLong(3)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1540353867, 1),
"t" : NumberLong(3)
},
"appliedOpTime" : {
"ts" : Timestamp(1540353936, 1),
"t" : NumberLong(4)
},
"durableOpTime" : {
"ts" : Timestamp(1540353936, 1),
"t" : NumberLong(4)
}
},
"lastStableCheckpointTimestamp" : Timestamp(1540353867, 1),
"members" : [
{
"_id" : 0,
"name" : "VM_191_137_centos:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDurable" : {
"ts" : Timestamp(0, 0),
"t" : NumberLong(-1)
},
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"optimeDurableDate" : ISODate("1970-01-01T00:00:00Z"),
"lastHeartbeat" : ISODate("2018-10-24T04:05:40.794Z"),
"lastHeartbeatRecv" : ISODate("2018-10-24T04:04:32.787Z"),
"pingMs" : NumberLong(4),
"lastHeartbeatMessage" : "Error connecting to VM_191_137_centos:27017 (192.168.203.128:27017) :: caused by :: Connection refused",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : -1
},
{
"_id" : 2,
"name" : "192.168.203.130:27017",
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 576,
"lastHeartbeat" : ISODate("2018-10-24T04:05:40.632Z"),
"lastHeartbeatRecv" : ISODate("2018-10-24T04:05:40.925Z"),
"pingMs" : NumberLong(9),
"lastHeartbeatMessage" : "",
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "",
"configVersion" : 5
},
{
"_id" : 3,
"name" : "192.168.203.129:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 577,
"optime" : {
"ts" : Timestamp(1540353936, 1),
"t" : NumberLong(4)
},
"optimeDate" : ISODate("2018-10-24T04:05:36Z"),
"syncingTo" : "",
"syncSourceHost" : "",
"syncSourceId" : -1,
"infoMessage" : "could not find member to sync from",
"electionTime" : Timestamp(1540353884, 1),
"electionDate" : ISODate("2018-10-24T04:04:44Z"),
"configVersion" : 5,
"self" : true,
"lastHeartbeatMessage" : ""
}
],
"ok" : 1,
"operationTime" : Timestamp(1540353936, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1540353936, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
可以看到,此时的secondary已经变为primary了
投票节点作用测试
1、关闭投票节点
rs0:ARBITER> use admin
switched to db admin
rs0:ARBITER> db.shutdownServer()
server should be down...
2018-10-24T12:24:34.173+0800 I NETWORK [js] trying reconnect to 127.0.0.1:27017 failed
2018-10-24T12:24:34.174+0800 I NETWORK [js] reconnect 127.0.0.1:27017 failed failed
2018-10-24T12:24:34.176+0800 I NETWORK [js] trying reconnect to 127.0.0.1:27017 failed
2018-10-24T12:24:34.176+0800 I NETWORK [js] reconnect 127.0.0.1:27017 failed failed
2、再关闭primary
rs0:PRIMARY> use admin
switched to db admin
rs0:PRIMARY> db.shutdownServer()
server should be down...
2018-10-24T12:25:08.338+0800 I NETWORK [js] trying reconnect to 127.0.0.1:27017 failed
2018-10-24T12:25:09.313+0800 I NETWORK [js] reconnect 127.0.0.1:27017 failed failed
2018-10-24T12:25:09.316+0800 I NETWORK [js] trying reconnect to 127.0.0.1:27017 failed
2018-10-24T12:25:09.316+0800 I NETWORK [js] reconnect 127.0.0.1:27017 failed failed
> ^C
3、在另外的secondary查看状态
rs0:SECONDARY> rs.status()
此时会发现,整个复制集没有primary存在,唯一的secondary没有切换为primary,仍为secondary,因为现存的复制集只有一台节点的话,选举异常。
4、启动投票节点
/etc/init.d/mongodb start
5、查看状态
此时唯一的一台secondary会切换为primary,复制集恢复正常