##集群规划
IP | 端口 | 角色 |
---|---|---|
10.1.200.77 | 27017 | master |
10.1.200.78 | 27017 | slave |
10.1.200.87 | 27017 | slave |
10.1.200.118 | 27017 | slave |
master、slave集群具体流程如下图所示:
流程分析:
1、写操作访问master,读操作访问slave。
2、master和slave之间没有像redis一样主挂从顶的机制。
3、slave根据自己节点中的source来找到当前是和哪个master通信。
4、master和slave之间是通过oplog.$main的操作日志来同步的,默认是每隔10s同步一次。
5、只有master和slave之间有通信,slave和slave之间是没有通信的。
6、master节点挂了之后,slave无法自动切换为写操作,需要手工处理。
##安装和配置
参照单实例。
##各个节点配置文件修改
1、10.1.200.77
节点:
[root@centos4 init.d]# vim /etc/mongod.conf
# 数据库文件位置
dbpath=/usr/local/mongodb/db
# 日志文件位置
logpath=/usr/local/mongodb/logs/mongodb.log
# 以追加方式写入日志
logappend=true
# 绑定服务IP,若绑定127.0.0.1,则只能本机访问,不指定默认本地所有IP
bind_ip=10.1.200.77
# 默认端口为27017
port=27017
# 是否为master
master=true
2、10.1.200.78
节点:
[root@centos4 init.d]# vim /etc/mongod.conf
# 数据库文件位置
dbpath=/usr/local/mongodb/db
# 日志文件位置
logpath=/usr/local/mongodb/logs/mongodb.log
# 以追加方式写入日志
logappend=true
# 绑定服务IP,若绑定127.0.0.1,则只能本机访问,不指定默认本地所有IP
bind_ip=10.1.200.78
# 默认端口为27017
port=27017
# 是否为slave
slave=true
# 从哪个master节点复制数据
source=10.1.200.77:27017
3、10.1.200.87
节点:
[root@centos4 init.d]# vim /etc/mongod.conf
# 数据库文件位置
dbpath=/usr/local/mongodb/db
# 日志文件位置
logpath=/usr/local/mongodb/logs/mongodb.log
# 以追加方式写入日志
logappend=true
# 绑定服务IP,若绑定127.0.0.1,则只能本机访问,不指定默认本地所有IP
bind_ip=10.1.200.87
# 默认端口为27017
port=27017
# 是否为slave
slave=true
# 从哪个master节点复制数据
source=10.1.200.77:27017
4、10.1.200.118
节点:
[root@centos4 init.d]# vim /etc/mongod.conf
# 数据库文件位置
dbpath=/usr/local/mongodb/db
# 日志文件位置
logpath=/usr/local/mongodb/logs/mongodb.log
# 以追加方式写入日志
logappend=true
# 绑定服务IP,若绑定127.0.0.1,则只能本机访问,不指定默认本地所有IP
bind_ip=10.1.200.118
# 默认端口为27017
port=27017
# 是否为slave
slave=true
# 从哪个master节点复制数据
source=10.1.200.77:27017
##查看各个节点的状态 1、重新启动各个节点的mongodb服务,查看mongodb的日志文件:
1)10.1.200.77
节点如下:
2016-08-11T16:29:33.899+0800 I CONTROL [main] ***** SERVER RESTARTED *****
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] MongoDB starting : pid=4785 port=27017 dbpath=/usr/local/mongodb/db master=1 64-bit host=centos4
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] db version v3.2.8
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] git version: ed70e33130c977bda0024c125b56d159573dbaf0
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] allocator: tcmalloc
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] modules: none
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] build environment:
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] distarch: x86_64
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] target_arch: x86_64
2016-08-11T16:29:33.906+0800 I CONTROL [initandlisten] options: { config: "/etc/mongod.conf", master: true, net: { port: 27017 }, storage: { dbPath: "/usr/local/mongodb/db" }, systemLog: { destination: "file", logAppend: true, path: "/usr/local/mongodb/logs/mongodb.log" } }
……
从日志中可以看出这个节点是master。
2)10.1.200.78
、10.1.200.87
和10.1.200.118
三个节点如下:
2016-08-11T16:30:50.897+0800 I CONTROL [main] ***** SERVER RESTARTED *****
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] MongoDB starting : pid=31548 port=27017 dbpath=/usr/local/mongodb/db slave=1 64-bit host=centos4
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] db version v3.2.8
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] git version: ed70e33130c977bda0024c125b56d159573dbaf0
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] allocator: tcmalloc
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] modules: none
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] build environment:
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] distarch: x86_64
2016-08-11T16:30:50.903+0800 I CONTROL [initandlisten] target_arch: x86_64
2016-08-11T16:30:50.904+0800 I CONTROL [initandlisten] options: { config: "/etc/mongod.conf", net: { port: 27017 }, slave: true, source: "10.1.200.77:27017", storage: { dbPath: "/usr/local/mongodb/db" }, systemLog: { destination: "file", logAppend: true, path: "/usr/local/mongodb/logs/mongodb.log" } }
……
以下为slave复制master的日志信息:
2016-08-11T16:30:52.257+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T16:30:52.264+0800 I REPL [replslave] resync: dropping database flightswitch
2016-08-11T16:30:52.264+0800 I REPL [replslave] resync: cloning database flightswitch to get an initial copy
2016-08-11T16:30:52.319+0800 I INDEX [replslave] build index on: flightswitch.flight properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "flightswitch.flight" }
2016-08-11T16:30:52.319+0800 I INDEX [replslave] building index using bulk method
2016-08-11T16:30:52.323+0800 I INDEX [replslave] build index done. scanned 1 total records. 0 secs
2016-08-11T16:30:52.323+0800 I STORAGE [replslave] copying indexes for: { name: "flight", options: { autoIndexId: true } }
2016-08-11T16:30:52.323+0800 I REPL [replslave] resync: done with initial clone for db: flightswitch
2016-08-11T16:30:53.328+0800 I REPL [replslave] sleep 2 sec before next pass
2016-08-11T16:30:55.329+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T16:30:56.331+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T16:30:57.334+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T16:30:58.336+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T16:30:59.339+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T16:31:00.341+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T16:31:01.444+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
从日志中可以看出3个节点都是slave,而且slave都在同步master的信息。
2、也可以使用db.isMaster()
命令来查看各个节点是master还是slave:
1)10.1.200.77
节点:
> db.isMaster()
{
"ismaster" : true,
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 1000,
"localTime" : ISODate("2016-08-11T08:46:54.433Z"),
"maxWireVersion" : 4,
"minWireVersion" : 0,
"ok" : 1
}
>
2)10.1.200.78
节点:
> db.isMaster()
{
"ismaster" : false,
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 1000,
"localTime" : ISODate("2016-08-11T08:51:25.190Z"),
"maxWireVersion" : 4,
"minWireVersion" : 0,
"ok" : 1
}
>
3)10.1.200.87
节点:
> db.isMaster()
{
"ismaster" : false,
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 1000,
"localTime" : ISODate("2016-08-11T08:53:15.192Z"),
"maxWireVersion" : 4,
"minWireVersion" : 0,
"ok" : 1
}
>
4)10.1.200.118
节点:
> db.isMaster()
{
"ismaster" : false,
"maxBsonObjectSize" : 16777216,
"maxMessageSizeBytes" : 48000000,
"maxWriteBatchSize" : 1000,
"localTime" : ISODate("2016-08-11T08:54:05.810Z"),
"maxWireVersion" : 4,
"minWireVersion" : 0,
"ok" : 1
}
>
##集群测试 1、在10.1.200.77
节点写入数据:
> use flightswitch
switched to db flightswitch
> db.flight.insert({"dai":"daijiong"})
WriteResult({ "nInserted" : 1 })
>
2、在10.1.200.78
节点读这个数据:
> use flightswitch
switched to db flightswitch
> db.flight.find()
Error: error: { "ok" : 0, "errmsg" : "not master and slaveOk=false", "code" : 13435 }
此时发现读取的时候报错了,这是正常的,因为slave是不允许读写的,需要在slave中指定slaveOk
:
> db.getMongo().setSlaveOk()
此时再查询就成功了:
> db.flight.find()
{ "_id" : ObjectId("57a9607cec764a2c6cecb687"), "originCity" : "SHA" }
{ "_id" : ObjectId("57ac3fa39ba331615f2474b7"), "dai" : "daijiong" }
{ "_id" : ObjectId("57ac43e3bc145f0d25ae7186"), "jiong" : "daijiong" }
{ "_id" : ObjectId("57ac489dbc145f0d25ae7187"), "originCity" : "SHA" }
>
##集群故障转移测试 模拟当master节点挂了的情况下,slave是否可以当成master。
1)在10.1.200.77
节点关闭mongodb服务,
[root@centos4 logs]# service mongodb stop
/etc/init.d/mongodb: line 1: tartup: command not found
db path is: /usr/local/mongodb/db
/usr/local/mongodb/bin/mongod
Stopping mongod: [确定]
[root@centos4 logs]#
在10.1.200.78
节点尝试写入操作
> use flightswitch
switched to db flightswitch
> db.flight.insert({"dai1":"daijiong"})
WriteResult({ "writeError" : { "code" : 10107, "errmsg" : "not master" } })
>
此时发现无法写入,说明当master挂了,slave无法当成maser。
2)那此时整个集群中就没有了master,只能从集群中读取数据,不能往集群里写入数据,这时候只能手动的将其中一个slave变成master。 比如此时想要把10.1.200.78
这个slave节点变成master节点,首先需要停止10.1.200.78
节点上的mongodb服务
[root@centos4 logs]# service mongodb stop
db path is: /usr/local/mongodb/db
/usr/local/mongodb/bin/mongod
Stopping mongod: [确定]
然后修改配置文件将slave=true
修改为master=true
。
[root@centos4 logs]# vim /etc/mongod.conf
# 数据库文件位置
dbpath=/usr/local/mongodb/db
# 日志文件位置
logpath=/usr/local/mongodb/logs/mongodb.log
# 以追加方式写入日志
logappend=true
# 默认端口为27017
port=27017
# 是否为master
master=true
然后再启动mongodb服务:
[root@centos4 logs]# service mongodb start
db path is: /usr/local/mongodb/db
/usr/local/mongodb/bin/mongod
Starting mongod: /usr/local/mongodb/bin/mongod --config /etc/mongod.conf
[root@centos4 logs]#
然后再尝试写入操作:
> use flightswitch
switched to db flightswitch
> db.flight.insert({"dai1":"daijiong"})
WriteResult({ "nInserted" : 1 })
>
发现此时是正常的。但是此时另外两个slave里面配置连接的source需要修改成这个新的master的ip和端口,否则会一直报连接失败:
2016-08-11T18:14:12.342+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T18:14:12.343+0800 W NETWORK [replslave] Failed to connect to 10.1.200.77:27017, reason: errno:111 Connection refused
2016-08-11T18:14:12.343+0800 E REPL [replslave] couldn't connect to server 10.1.200.77:27017, connection attempt failed
2016-08-11T18:14:12.343+0800 I REPL [replslave] sleep 3 sec before next pass
2016-08-11T18:14:15.345+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T18:14:15.345+0800 W NETWORK [replslave] Failed to connect to 10.1.200.77:27017, reason: errno:111 Connection refused
2016-08-11T18:14:15.345+0800 E REPL [replslave] couldn't connect to server 10.1.200.77:27017, connection attempt failed
2016-08-11T18:14:15.345+0800 I REPL [replslave] sleep 3 sec before next pass
2016-08-11T18:14:18.347+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T18:14:18.348+0800 W NETWORK [replslave] Failed to connect to 10.1.200.77:27017, reason: errno:111 Connection refused
2016-08-11T18:14:18.348+0800 E REPL [replslave] couldn't connect to server 10.1.200.77:27017, connection attempt failed
2016-08-11T18:14:18.348+0800 I REPL [replslave] sleep 3 sec before next pass
2016-08-11T18:14:21.350+0800 I REPL [replslave] syncing from host:10.1.200.77:27017
2016-08-11T18:14:21.350+0800 W NETWORK [replslave] Failed to connect to 10.1.200.77:27017, reason: errno:111 Connection refused
2016-08-11T18:14:21.351+0800 E REPL [replslave] couldn't connect to server 10.1.200.77:27017, connection attempt failed
2016-08-11T18:14:21.351+0800 I REPL [replslave] sleep 3 sec before next pass
由此可见这种集群一旦master出现问题,需要手工操作才能恢复这个集群。