准备
本地环境:2台64位centos7虚拟机
软件安装包:mongodb-linux-x86_64-rhel70-3.4.7.tgz
按照官方说明
- 至少需要3个config server组成的replica set
- 每个shard至少需要3个副本组成的replica set
- 至少1个mongos作为客户端路由
replica set实现HA的最小化配置:
- 使用至少3个节点组成,一旦primary节点失效了,其他节点会选举(elect)出一个secondary节点自动升级为primary。
- 使用2个副本节点和1个arbiter(仲裁)节点构成,关于arbiter节点,后续文章会介绍。
官网说明
作为测试,这里我们部署2个config server,2个shard,每个shard一个副本,一共就是2*2=4个shard,2个mongos,共需要8个Mongodb实例。这里只虚拟出2个机器节点,将不同的Mongodb实例部署在不同的端口上模拟实现。
因为各个replica set的节点个数是2个,一旦primary节点失效,secondary节点不会自动升级为primary。生成环境下replica set的实例个数一定保证3个。
config server | leo.zhi.1:10010,leo.zhi.2:10010 |
mongos server | leo.zhi.1:10020,leo.zhi.2:10020 |
shard server | leo.zhi.1:10001,leo.zhi.2:10002 |
leo.zhi.1和leo.zhi.2上分别创建文件夹
[root@leo mongodb]# mkdir -p config/data [root@leo mongodb]# mkdir -p config/log [root@leo mongodb]# mkdir -p mongos/log [root@leo mongodb]# mkdir -p shard1/data [root@leo mongodb]# mkdir -p shard1/log [root@leo mongodb]# mkdir -p shard2/data [root@leo mongodb]# mkdir -p shard2/log
其中data目录存放数据,log目录存放日志
mongos服务不存放数据,所以不需要数据目录,只需要log目录
部署config server
Mongodb配置文件(采用YAML格式书写)
systemLog:
destination: file
path: "/usr/local/mongodb/config/log/mongod.log"
logAppend: true
storage:
dbPath: "/usr/local/mongodb/config/data"
journal:
enabled: true
directoryPerDB: true
net:
port: 10010
processManagement:
fork: true
pidFilePath: "/usr/local/mongodb/config/mongod.pid"
sharding:
clusterRole: configsvr
replication:
replSetName: configReplSet
1. 在每台服务器上分别启Mongodb实例
[root@leo mongodb]# mongod --config config/mongod.conf about to fork child process, waiting until server is ready for connections. forked process: 3293 child process started successfully, parent exiting
2. 随便选个一个节点,登录mongo,初始化config server 的 replica set
[root@leo mongodb]# mongo --port 10010 > rs.initiate( { _id: "configReplSet", configsvr: true, members:[ {_id:0,host:"leo.zhi.1:10010"}, {_id:1,host:"leo.zhi.2:10010"}] } )
... configReplSet:OTHER> exit [root@leo mongodb]# mongo --port 10010 configReplSet:PRIMARY>
部署shard1
Mongodb配置文件(采用YAML格式书写)
systemLog:
destination: file
path: "/usr/local/mongodb/shard1/log/mongod.log"
logAppend: true
storage:
dbPath: "/usr/local/mongodb/shard1/data"
journal:
enabled: true
directoryPerDB: true
net:
port: 10001
processManagement:
fork: true
pidFilePath: "/usr/local/mongodb/shard1/mongod.pid"
sharding:
clusterRole: shardsvr
replication:
replSetName: shard1ReplSet
1. 在每个节点启动shard1
[root@leo mongodb]# mongod --config shard1/mongod.conf
2. 随便选个一个节点,登录mongo,初始化shard1 的 replica set
[root@leo mongodb]# mongo --port 10001 > rs.initiate( { _id: "shard1ReplSet", members:[ {_id:0,host:"leo.zhi.1:10001"}, {_id:1,host:"leo.zhi.2:10002"}] } ) ... shard1ReplSet:OTHER> exit [root@leo mongodb]# mongo --port 10001 shard1ReplSet:PRIMARY>
部署shard2
Mongodb配置文件(采用YAML格式书写)
systemLog:
destination: file
path: "/usr/local/mongodb/shard2/log/mongod.log"
logAppend: true
storage:
dbPath: "/usr/local/mongodb/shard2/data"
journal:
enabled: true
directoryPerDB: true
net:
port: 10002
processManagement:
fork: true
pidFilePath: "/usr/local/mongodb/shard2/mongod.pid"
sharding:
clusterRole: shardsvr
replication:
replSetName: shard2ReplSet
其余操作步骤同部署shard1
部署mongos
Mongodb配置文件(采用YAML格式书写)
systemLog:
destination: file
path: "/usr/local/mongodb/mongos/log/mongos.log"
logAppend: true
net:
port: 10020
processManagement:
fork: true
pidFilePath: "/usr/local/mongodb/mongos/mongos.pid"
sharding:
configDB: configReplSet/leo.zhi.1:10010,leo.zhi.2:10010
1. 每个节点启动mongos
[root@leo mongodb]# mongos -f mongos/mongod.conf 2017-09-13T09:15:32.440+0800 W SHARDING [main] Running a sharded cluster with fewer than 3 config servers should only be done for testing purposes and is not recommended for production.about to fork child process, waiting until server is ready for connections. forked process: 2590 child process started successfully, parent exiting
mongos只是客户端连接mongodb的一个代理,是config server的配置信息的映射,实际不存储数据,只记录log。
为了实现mongos的HA,可以创建任意多个mongos实例,都是master实例,其中任何一个实例挂掉,另外的也不影响使用。
程序中访问mongodb的时候,可以动态从公共配置里读取连接信息,或者直接连接负载均衡器,让负载均衡自己选择可用的mongos实例。
注册shard信息
连接mongos实例
[root@leo mongodb]# mongo --port 10020 MongoDB shell version v3.4.7 connecting to: mongodb://127.0.0.1:10020/ MongoDB server version: 3.4.7 Server has startup warnings: 2017-09-13T09:15:32.466+0800 I CONTROL [main] 2017-09-13T09:15:32.466+0800 I CONTROL [main] ** WARNING: Access control is not enabled for the database. 2017-09-13T09:15:32.467+0800 I CONTROL [main] ** Read and write access to data and configuration is unrestricted. 2017-09-13T09:15:32.467+0800 I CONTROL [main] ** WARNING: You are running this process as the root user, which is not recommended. 2017-09-13T09:15:32.467+0800 I CONTROL [main] mongos> show dbs admin 0.000GB config 0.001GB test 0.006GB
向集群中注册shard信息
# 注册shard1的replica set
mongos> sh.addShard( "shard1ReplSet/leo.zhi.1:10001,leo.zhi.2:10001") ... # 注册shard2的replica set mongos> sh.addShard( "shard2ReplSet/leo.zhi.1:10002,leo.zhi.2:10002") ...
启动数据库sharding功能
启动某个数据库的sharding功能,这里我们使用test数据库
mongos> sh.enableSharding("test")
对collection进行shard
这里对test数据库里的logcollection进行shard,shard key是sn字段,shard策略是hashed
mongos> sh.shardCollection("test.log", { "sn" : "hashed" } )
查看shard信息
mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "minCompatibleVersion" : 5, "currentVersion" : 6, "clusterId" : ObjectId("59b7742fd43f3ce12e8ef885") } shards: { "_id" : "shard1ReplSet", "host" : "shard1ReplSet/leo.zhi.1:10001,leo.zhi.2:10001", "state" : 1 } { "_id" : "shard2ReplSet", "host" : "shard2ReplSet/leo.zhi.1:10002,leo.zhi.2:10002", "state" : 1 } active mongoses: "3.4.7" : 1 autosplit: Currently enabled: yes balancer: Currently enabled: yes Currently running: no Balancer lock taken at Wed Sep 13 2017 09:13:43 GMT+0800 (CST) by ConfigServer:Balancer Failed balancer rounds in last 5 attempts: 2 Last reported error: could not find host matching read preference { mode: "primary" } for set shard1ReplSet Time of Reported error: Wed Sep 13 2017 09:14:03 GMT+0800 (CST) Migration Results for the last 24 hours: 2 : Success databases: { "_id" : "test", "primary" : "shard2ReplSet", "partitioned" : true } test.log shard key: { "sn" : "hashed" } unique: false balancing: true chunks: shard1ReplSet 2 shard2ReplSet 2 { "sn" : { "$minKey" : 1 } } -->> { "sn" : NumberLong("-4611686018427387902") } on : shard1ReplSet Timestamp(2, 2) { "sn" : NumberLong("-4611686018427387902") } -->> { "sn" : NumberLong(0) } on : shard1ReplSet Timestamp(2, 3) { "sn" : NumberLong(0) } -->> { "sn" : NumberLong("4611686018427387902") } on : shard2ReplSet Timestamp(2, 4) { "sn" : NumberLong("4611686018427387902") } -->> { "sn" : { "$maxKey" : 1 } } on : shard2ReplSet Timestamp(2, 5) mongos>
使用hashed策略旨在让mongodb将数据均匀分布,如果使用{"sn",1}进行range shard,数据会按照大小顺序分布,就会出现某个shard数据很多,其他的很少的现象。
添加测试数据
mongos> use test mongos> for(var i=1; i<=100000; i++){db.log.insert({sn:i, msg:'Message ' + i});}
一段时间后。。。
查看log的shard信息
mongos> db.log.stats() { "sharded" : true, "capped" : false, "ns" : "test.log", "count" : 100000, "size" : 5688895, "storageSize" : 1867776, "totalIndexSize" : 4427776, "indexSizes" : { "_id_" : 1003520, "sn_hashed" : 3424256 }, "avgObjSize" : 56, "nindexes" : 2, "nchunks" : 4, "shards" : { "shard1ReplSet" : { "ns" : "test.log", "size" : 2866884, "count" : 50393, "avgObjSize" : 56, "storageSize" : 946176, "capped" : false, ... ... ... "shard2ReplSet" : { "ns" : "test.log", "size" : 2822011, "count" : 49607, "avgObjSize" : 56, "storageSize" : 921600, "capped" : false, ... ... ...
结果显示,基本均匀分布