MongoDB与redis等都非常的相似,集群可以有主从模式(master-slaver)、副本集、副本集Arbiter(redis的哨兵模式)、以及分片模式(redis的hash槽)。所以全面的了解MongoDB还是很有必要的,从主从模式开始。
一、主从模式
1、主从模式使用场景
主从模式下当主节点发生异常的时候,从节点不能升级为主节点继续提供服务(即故障转移),并且需要显示的指定主节点和从节点。但是当数据量不是特别大,并且数据的重要性并不是那么强的情况下,也可以使用单机或者主从模式。不如,我们之前的项目使用MongoDB存储用户行为日志数据。而日志数据并非全量的数据,全量数据放在阿里云的OSS上,可以使用定时任务,安装时间随时获取指定时间段内的数据。这时候就可以使用单节点或者主从模式。
主从模式的使用场景:
1)、副本集有一个限制就是当集群节点数限制不能超过11个,所以当需要副本集的数量超过限制的时候可以使用主从模式,当然这样的话说明数据量或者访问量是相当大的了,则不应该继续使用副本集而是分片。
2)、复制数据库的数据。
2、主从模式的搭建
主节点启动时添加 -- master选项;而从节点启动时添加 -- slave --resource master-ip:master-port ; 如下所示
二、副本集(主从读写分离)
由于最近服务器资源短缺,所以在一台服务器上模拟搭建三个节点的MongoDB副本集。需要在为每个MongoDB创建mongo.conf配置文件,可参考之前写MongoDB单机安装,具体更多的mongo.config的更多配置信息可以参加./mongod --help。每个版本可能会有些许差异,并且后面的版本会支持yml的配置方式(个人不是很喜欢使用yml的方式,确实配置一目了然,但是不好查找问题)。
Options:
General options:
-h [ --help ] show this usage information
--version show version information
-f [ --config ] arg configuration file specifying
additional options
-v [ --verbose ] [=arg(=v)] be more verbose (include multiple times
for more verbosity e.g. -vvvvv)
--quiet quieter output
--port arg specify port number - 27017 by default
--bind_ip arg comma separated list of ip addresses to
listen on - localhost by default
--bind_ip_all bind to all ip addresses
--ipv6 enable IPv6 support (disabled by
default)
--listenBacklog arg (=128) set socket listen backlog size
--maxConns arg max number of simultaneous connections
- 1000000 by default
--logpath arg log file to send write to instead of
stdout - has to be a file, not
directory
--syslog log to system's syslog facility instead
of file or stdout
--syslogFacility arg syslog facility used for mongodb syslog
message
--logappend append to logpath instead of
over-writing
--logRotate arg set the log rotation behavior
(rename|reopen)
--timeStampFormat arg Desired format for timestamps in log
messages. One of ctime, iso8601-utc or
iso8601-local
--pidfilepath arg full path to pidfile (if not set, no
pidfile is created)
--timeZoneInfo arg full path to time zone info directory,
e.g. /usr/share/zoneinfo
--keyFile arg private key for cluster authentication
--noauth run without security
--setParameter arg Set a configurable parameter
--transitionToAuth For rolling access control upgrade.
Attempt to authenticate over outgoing
connections and proceed regardless of
success. Accept incoming connections
with or without authentication.
--clusterAuthMode arg Authentication mode used for cluster
authentication. Alternatives are
(keyFile|sendKeyFile|sendX509|x509)
--nounixsocket disable listening on unix sockets
--unixSocketPrefix arg alternative directory for UNIX domain
sockets (defaults to /tmp)
--filePermissions arg permissions to set on UNIX domain
socket file - 0700 by default
--fork fork server process
--networkMessageCompressors [=arg(=disabled)] (=snappy)
Comma-separated list of compressors to
use for network messages
--auth run with security
--clusterIpSourceWhitelist arg Network CIDR specification of permitted
origin for `__system` access.
--slowms arg (=100) value of slow for profile and console
log
--slowOpSampleRate arg (=1) fraction of slow ops to include in the
profile and console log
--profile arg 0=off 1=slow, 2=all
--cpu periodically show cpu and iowait
utilization
--sysinfo print some diagnostic system
information
--noIndexBuildRetry don't retry any index builds that were
interrupted by shutdown
--noscripting disable scripting engine
--notablescan do not allow table scans
--shutdown kill a running server (for init
scripts)
Replication options:
--oplogSize arg size to use (in MB) for replication op
log. default is 5% of disk space (i.e.
large is good)
Master/slave options (old; use replica sets instead):
--master master mode
--slave slave mode
--source arg when slave: specify master as
<server:port>
--only arg when slave: specify a single database
to replicate
--slavedelay arg specify delay (in seconds) to be used
when applying master ops to slave
--autoresync automatically resync if slave data is
stale
Replica set options:
--replSet arg arg is <setname>[/<optionalseedhostlist
>]
--replIndexPrefetch arg specify index prefetching behavior (if
secondary) [none|_id_only|all]
--enableMajorityReadConcern [=arg(=1)] (=1)
enables majority readConcern
Sharding options:
--configsvr declare this is a config db of a
cluster; default port 27019; default
dir /data/configdb
--shardsvr declare this is a shard db of a
cluster; default port 27018
SSL options:
--sslOnNormalPorts use ssl on configured ports
--sslMode arg set the SSL operation mode
(disabled|allowSSL|preferSSL|requireSSL
)
--sslPEMKeyFile arg PEM file for ssl
--sslPEMKeyPassword arg PEM file password
--sslClusterFile arg Key file for internal SSL
authentication
--sslClusterPassword arg Internal authentication key file
password
--sslCAFile arg Certificate Authority file for SSL
--sslCRLFile arg Certificate Revocation List file for
SSL
--sslDisabledProtocols arg Comma separated list of TLS protocols
to disable [TLS1_0,TLS1_1,TLS1_2]
--sslWeakCertificateValidation allow client to connect without
presenting a certificate
--sslAllowConnectionsWithoutCertificates
allow client to connect without
presenting a certificate
--sslAllowInvalidHostnames Allow server certificates to provide
non-matching hostnames
--sslAllowInvalidCertificates allow connections to servers with
invalid certificates
--sslFIPSMode activate FIPS 140-2 mode at startup
Storage options:
--storageEngine arg what storage engine to use - defaults
to wiredTiger if no data files present
--dbpath arg directory for datafiles - defaults to
/data/db
--directoryperdb each database will be stored in a
separate directory
--noprealloc disable data file preallocation - will
often hurt performance
--nssize arg (=16) .ns file size (in MB) for new databases
--quota limits each database to a certain
number of files (8 default)
--quotaFiles arg number of files allowed per db, implies
--quota
--smallfiles use a smaller default file size
--syncdelay arg (=60) seconds between disk syncs (0=never,
but not recommended)
--upgrade upgrade db if needed
--repair run repair on all dbs
--repairpath arg root directory for repair files -
defaults to dbpath
--journal enable journaling
--nojournal disable journaling (journaling is on by
default for 64 bit)
--journalOptions arg journal diagnostic options
--journalCommitInterval arg how often to group/batch commit (ms)
WiredTiger options:
--wiredTigerCacheSizeGB arg maximum amount of memory to allocate
for cache; defaults to 1/2 of physical
RAM
--wiredTigerJournalCompressor arg (=snappy)
use a compressor for log records
[none|snappy|zlib]
--wiredTigerDirectoryForIndexes Put indexes and data in different
directories
--wiredTigerCollectionBlockCompressor arg (=snappy)
block compression algorithm for
collection data [none|snappy|zlib]
--wiredTigerIndexPrefixCompression arg (=1)
use prefix compression on row-store
leaf pages
我创建副本集的时候,配置了如下信息:
# mongod.conf 副本集节点1 配置文件
port=27017 # 启动端口
# bind_ip=0.0.0.0 # Listen to local interface only, comment to listen on all interfaces
dbpath=/data/mongo-replSet/mongodb/data # 数据目录
logpath=/data/mongo-replSet/mongodb/logs/logs.log # 日志目录
logappend=true # 日志追加,而不是覆盖
# directoryperdb=true # each database will be stored in a separate directory
fork=true # 以守护程序的方式启用,即在后台运行
replSet=kevinDemo # 设置副本集名称以及关联的其他节点(只设置一个,形成集群后会连接其他节点)
oplogSize=100 # oplog的大小(单位M)
# 副本集配置文件 节点2 没有使用yml方式
port=27217 # 启动端口
# bind_ip=0.0.0.0 # Listen to local interface only, comment to listen on all interfaces
dbpath=/data/mongo-replSet/mongodb2/data # 数据目录
logpath=/data/mongo-replSet/mongodb2/logs/logs.log # 日志目录
logappend=true # 日志追加,而不是覆盖
# directoryperdb=true # each database will be stored in a separate directory
fork=true # 以守护程序的方式启用,即在后台运行
replSet=kevinDemo/127.0.0.1:27017 # 设置副本集名称以及关联的其他节点(只设置一个,形成集群后会连接其他节点)
oplogSize=100 # oplog的大小(单位M)
# 副本集配置文件 节点3 没有使用yml方式
port=27317 # 启动端口
# bind_ip=0.0.0.0 # Listen to local interface only, comment to listen on all interfaces
dbpath=/data/mongo-replSet/mongodb3/data # 数据目录
logpath=/data/mongo-replSet/mongodb3/logs/logs.log # 日志目录
logappend=true # 日志追加,而不是覆盖
# directoryperdb=true # each database will be stored in a separate directory
fork=true # 以守护程序的方式启用,即在后台运行
replSet=kevinDemo/127.0.0.1:27017 # 设置副本集名称以及关联的其他节点(只设置一个,形成集群后会连接其他节点)
oplogSize=100 # oplog的大小(单位M)
分别使用mongod启动三台服务器的,启动后需要副本集中的各个成员都不知道彼此的存在,需要使用全局变量rs(类似于db的全局变量),在有一个节点上进行配置,并通过该节点告知其他节点配置信息。
db.runCommand({"replSetInitiate":{
"_id":"kevinDemo",
"members":[{"_id":1,"host":"127.0.0.1:27017","priority":1},
{"_id":2,"host":"127.0.0.1:27217","priority":2},
{"_id":3,"host":"127.0.0.1:27317","priority":3}]}
});
并且使用其中一个./mongo ip:port 启动并连接到一个服务中,并行需要在admin的数据库中执行命令,否则会报错(replSetInitiate may only be run against the admin database.),如下:
则需要切换为admin数据库,如下图所示:
由于设置了优先级,所以节点三自动成为主节点(master),其他节点为secondary节点,如上图。在创建副本集的时候其实是可以使用rs全局变量进行设置的。并且在设置节点信息的时候还有很多选项。将在后面的副本集配置细节中说明。当节点配置成功后,可以允许rs.status() 查看副本集的明细信息:
kevinDemo:SECONDARY> rs.status()
{
"set" : "kevinDemo",
"date" : ISODate("2018-08-19T07:18:53.718Z"),
"myState" : 2,
"term" : NumberLong(2),
"syncingTo" : "127.0.0.1:27317",
"heartbeatIntervalMillis" : NumberLong(2000),
"optimes" : {
"lastCommittedOpTime" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"readConcernMajorityOpTime" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"appliedOpTime" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"durableOpTime" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
}
},
"members" : [
{
"_id" : 1,
"name" : "127.0.0.1:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 76089,
"optime" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"optimeDate" : ISODate("2018-08-19T07:18:49Z"),
"syncingTo" : "127.0.0.1:27317",
"configVersion" : 1,
"self" : true
},
{
"_id" : 2,
"name" : "127.0.0.1:27217",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 1648,
"optime" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"optimeDurable" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"optimeDate" : ISODate("2018-08-19T07:18:49Z"),
"optimeDurableDate" : ISODate("2018-08-19T07:18:49Z"),
"lastHeartbeat" : ISODate("2018-08-19T07:18:52.855Z"),
"lastHeartbeatRecv" : ISODate("2018-08-19T07:18:52.687Z"),
"pingMs" : NumberLong(0),
"syncingTo" : "127.0.0.1:27317",
"configVersion" : 1
},
{
"_id" : 3,
"name" : "127.0.0.1:27317",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 1648,
"optime" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"optimeDurable" : {
"ts" : Timestamp(1534663129, 1),
"t" : NumberLong(2)
},
"optimeDate" : ISODate("2018-08-19T07:18:49Z"),
"optimeDurableDate" : ISODate("2018-08-19T07:18:49Z"),
"lastHeartbeat" : ISODate("2018-08-19T07:18:52.856Z"),
"lastHeartbeatRecv" : ISODate("2018-08-19T07:18:51.945Z"),
"pingMs" : NumberLong(0),
"electionTime" : Timestamp(1534661507, 1),
"electionDate" : ISODate("2018-08-19T06:51:47Z"),
"configVersion" : 1
}
],
"ok" : 1,
"operationTime" : Timestamp(1534663129, 1),
"$clusterTime" : {
"clusterTime" : Timestamp(1534663129, 1),
"signature" : {
"hash" : BinData(0,"AAAAAAAAAAAAAAAAAAAAAAAAAAA="),
"keyId" : NumberLong(0)
}
}
}
三、副本集的调整
很多时候副本集需要动态的进行调整,比如增加rs.add和删除rs.remove节点、或者可以直接重新配置rs.reconfig副本集。只是需要注意,只能在主节点上添加、删除、重配置节点信息。
1、rs.remove
rs.remove("127.0.0.1"); 直接使用ip的方式移除节点,或者ip+port方式移除rs.remove("127.0.0.1:27017"); 但是移除节点命令只能在主节点上操作,否则会报错:
需要链接到主节点后才能进行操作:
2、rs.add
rs.add( {"host":"127.0.0.1:27017","priority":1}); # 添加一个普通节点
rs.addArb({"host":"127.0.0.1:27017"}); # 添加一个仲裁节点
只是需要注意添加节点也需要在主节点上运行:
3、rs.reconfig
以上的增加、删除节点信息等,也可以通过reconfig的方式实现。
var config = rs.config();
config.members[0].priority = 5;
rs.reconfig(config);