29.2 Replica Set Operation and Management
Replica sets automate most administrative tasks associated with database replication. Nevertheless, several operations related to deployment and systems management require administrator intervention remain. This document provides an overview of those tasks, in addition to a collection of troubleshooting suggestions for administers of replica sets.
See also:
• rs.status() and db.isMaster()(page 841)
• Replica Set Reconfiguration Process(page 985)
• rs.conf()(page 849) and rs.reconfig()(page 850)
• Replica Set Configuration
• rs.status()和 db.isMaster()
• 副本集重构过程
• rs.conf() 和 rs.reconfig()
• 副本集配置
The following tutorials provide task-oriented instructions for specific administrative tasks related to replica set operation.
• Deploy a Replica Set 部署副本集
• Convert a Standalone to a Replica Set(page 322) 转换单机实例到副本集
• Add Members to a Replica Set(page 324) 向副本集添加节点
• Deploy a Geographically Distributed Replica Set(page 326) 部署一个地理上分布的副本集
• Change the Size of the Oplog(page 332) 变更Oplog的大小
• Force a Member to Become Primary(page 334) 强制指定一个节点为主节点
• Change Hostnames in a Replica Set(page 336) 变更副本集中节点的主机名
• Convert a Secondary to an Arbiter(page 340) 转换从节点为仲裁者节点
• Reconfigure a Replica Set with Unavailable Members(page 342) 重构包含隐藏节点的副本集
• Recover MongoDB Data following Unexpected Shutdown(page 586) 在意外宕机后恢复MongoDB数据

29.2.1 Menber Configurations 节点配置
All replica sets have a single primary and one or more secondaries. Replica sets allow you to configure secondary members in a variety of ways. This section describes these configurations.
Note: A replica set can have up to 12 members, but only 7 members can have votes. For configuration informationregarding non-voting members, see Non-Voting Members(page 286).
Warning:The rs.reconfig()(page 850) shell method can force the current primary to step down, which causes an election(page 278). When the primary steps down, the mongod closes all client connections. While this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. To successfully reconfigurea replica set, a majority of the members must be accessible.
See also:
The Elections(page 278) section in the Replica Set Fundamental Concepts (page 277) document, and the Election Internals (page 311) section in the Replica Set Internals and Behaviors(page 309) document.
Secondary-only Menbers
The secondary-only configuration prevents a secondary member in areplica set from ever becoming a primary in a failover. You can set secondary-only mode for any member of the set except the current primary.
secondary-only 配置阻止副本集中的一个从节点在故障转移时成为主节点。你可以对一些节点设置secondary-only默认除了当前主节点。
For example, you may want to configure all members of a replica sets located outside of the main data centers as secondary-only to prevent these members from ever becoming primary.
To configure a member as secondary-only, set its priority(page 983) value to 0. Any member with a priority (page 983) equal to 0 will never seek election(page 278) and cannot become primary in any situation. For more information on priority levels, see Member Priority(page 278).
Note: When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id(page 982) field in each document in the members array.
The _id(page 982) rarely corresponds to the array index.
As an example of modifying member priorities, assume a four-member replica set. Use the
following sequence of operations in the mongo shell to modify member priorities:
cfg = rs.conf()
cfg.members[ 0 ].priority = 2
cfg.members[ 1 ].priority = 1
cfg.members[ 2 ].priority = 0 . 5
cfg.members[ 3 ].priority = 0
This reconfigures the set, with the following priority settings:
• Member 0 to a priority of 2 so that it becomes primary, under most circumstances.
• Member 1 to a priority of 1, which is the default value. Member 1 becomes primary if no member with a higher priority is eligible.
• Member 2 to a priority of 0.5, which makes it less likely to become primary than other members but doesn’t prohibit the possibility.
• Member 3 to a priority of 0. Member 3 cannot become the primary member under any circumstances.
• 节点0优先级为2,所有它在大多数情况下成为主节点。
• 节点1优先级为1,这是默认值。节点1在没有更高优先级节点符合资格时成为主节点。
• 节点2优先级为0.5,使它比其他节点有更小的可能成为主节点,但不能排除这种可能性。
• 节点3优先级为0,节点3在任何情况下不能成为主节点。
Note: If your replica set has an even number of members, add an arbiter(page 286) to ensure that members can quickly obtain a majority of votes in an election for primary.
Note:MongoDB does not permit the current primary to have a priority(page 983) of 0. If you want to prevent the current primary from becoming primary, first users.stepDown() to step down the current primary, and then reconfigure the replica set(page 985) with rs.conf()(page 849) and rs.reconfig()(page 850).
See also:
priority(page 983) and Replica Set Reconfiguration(page 985).
Hidden Menbers 隐藏节点
Hidden members are part of a replica set but cannot become primary and are invisible to client applications.However, hidden members do vote inelections(page 278).
Hidden members are ideal for instances that will have significantly different usage patterns than the other members and require separation from normal traffic. Typically, hidden members provide reporting, dedicated backups, and dedicated read-only testing and integration support.
Hidden members have priority(page 983) set 0 and have hidden(page 982) set to true.
To configure a hidden member, use the following sequence of operations in the mongo shell:
cfg = rs.conf()
cfg.members[ 0 ].priority = 0
cfg.members[ 0 ].hidden = true
After re-configuring the set, the first member of the set in the members array will have a priority of 0 so
that it cannot become primary. The other members in the set will not advertise the hidden member in the isMaster (page 762) or db.isMaster()(page 841) output.
Note:You must send the rs.reconfig()(page 850) command to a set member that can become primary. In the above example, if you issue the rs.reconfig()(page 850) operation to a member with a priority(page 983) of 0 the operation will fail.
Note:Changed in version 2.0.
For sharded clusters running with replica sets before 2.0 if you reconfigured a member as hidden, you had to restart
mongos to prevent queries from reaching the hidden member.
跑在2.0版本以前的分片集群,如果重配置一个节点为隐藏节点, 你需要重启mongos以防止查询操作发送到隐藏节点。
See also:
Replica Set Read Preference(page 303) and Replica Set Reconfiguration(page 985).
Delayed Members 延迟节点
Delayed members copy and apply operations from the primary’s oplog with a specified delay. If a member has a delay of one hour, then the latest entry in this member’s oplog will not be more recent than one hour old, and the state of data for the member will reflect the state of the set an hour earlier.·
Example:If the current time is 09:52 and the secondary is a delayed by an hour, no operation will be more recent than 08:52.
例如: 现在时间为09:52,从节点的延迟时间为1个小时,没有比08:52分更近的操作。
Delayed members may help recover from various kinds of human error. Such errors may include inadvertently deleted databases or botched application upgrades. Consider the following factors when determining the amount of slave delay to apply:
• Ensure that the length of the delay is equal to or greater than your maintenance windows.
• The size of the oplog is sufficient to capture more than the number of operations that typically occur in that period of time. For more information on oplog size, see the Oplog(page 280) topic in the Replica Set Fundamental Concepts(page 277) document.
• 确保延迟的长度等于或大于你维护的时间窗。
• oplog的大小足以保存通常在这个时间段发送的操作数。
Delayed members must have a priority set to 0 to prevent them from becoming primary in their replica sets. Also these members should be hidden(page 284) to prevent your application from seeing or querying this member.
To configure a replica set member with a one hour delay, use the following sequence of operations in the mongo shell:
cfg = rs.conf()
cfg.members[ 0 ].priority = 0
cfg.members[ 0 ].slaveDelay = 3600
After the replica set reconfigures, the first member of the set in the members array will have a priority of 0
and cannot become primary. The slaveDelay(page 983) value delays both replication and the member’s oplog by 3600 seconds (1 hour). Setting slaveDelay(page 983) to a non-zero value also sets hidden(page 982) to true for this replica set so that it does not receive application queries in normal operations.
Warning: The length of the secondary slaveDelay(page 983) must fit within the window of the oplog. If the oplog is shorter than the slaveDelay(page 983) window, the delayed member cannot successfully replicate operations.
提醒:slaveDelay 的长度必须适合oplog的窗口。如果oplog比slaveDelay窗口小,延迟节点就不能成功地复制操作。
See also:
slaveDelay(page 983), Replica Set Reconfiguration(page 985), Oplog(page 280), Changing Oplog Size in this document, and the Change the Size of the Oplog(page 332) tutorial.
Arbiters 仲裁者
Arbiters are special mongod instances that do not hold a copy of the data and thus cannot become primary.
Arbiters exist solely to participate in elections(page 278).
Note: Because of their minimal system requirements, you may safely deploy an arbiter on a system with another workload, such as an application server or monitoring member.
Warning:Do not run arbiter processes on a system that is an active primary or secondary of its replica set.
Arbiters never receive the contents of any collection but do have the following interactions with the rest of the replica set:
• Credential exchanges that authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyfiles. These exchanges are encrypted.
MongoDB only transmits the authentication credentials in a cryptographically secure exchange, and encrypts no other exchange.
• Exchanges of replica set configuration data and of votes. These are not encrypted.
• 身份凭据流转以验证副本集的仲裁者。副本集里的所有节点都是用密钥文件。这些交换是加密的。
• 副本集配置数据和选票的交互。这是没加密的。
If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation forUse MongoDB with SSL Connections(page 45) for more information.
As with all MongoDB components, run arbiters on secure networks.
To add an arbiter, see Adding an Arbiter(page 289).
Non-Voting Members 没选票节点
You may choose to change the number of votes that each member has in elections(page 278) for primary. In general,all members should have only 1 vote to prevent intermittent ties, deadlock, or the wrong members from becoming primary. Use replica set priorities(page 278) to control which members are more likely to become primary.
To disable a member’s ability to vote in elections, use the following command sequence in the mongo shell.
cfg = rs.conf()
cfg.members[ 3 ].votes = 0
cfg.members[ 4 ].votes = 0
cfg.members[ 5 ].votes = 0
This sequence gives 0 votes to the fourth, fifth, and sixth members of the set according to the order of the members array in the output of rs.conf(). This setting allows the set to elect these members as primary but does not allow them to vote in elections. If you have three non-voting members, you can add three additional voting members to your set. Place voting members so that your designated primary or primaries can reach a majority of votes in the event of a network partition.
Note:In general and when possible, all members should have only 1 vote. This prevents intermittent ties, deadlocks,or the wrong members from becoming primary. Use Replica Set Priorities(page 278) to control which members are more likely to become primary.
 Chained Replication 链式复制
New in version 2.0.
Chained replication occurs when a secondary member replicates from another secondary member instead of from the primary. This might be the case, for example, a secondary selects its replication target based on ping time and if the closest member is another secondary.
Chained replication can reduce load on the primary. But chained replication can also result in increased replication lag, depending on the topology of the network.
Beginning with version 2.2.4, you can use the chainingAllowed setting in Replica Set Configuration to disable chained replication for situations where chained replication is causing lag. For details, see Chained Replication(page 287).

29.2.2 Procedures 程序
This section gives overview information on a number of replica set administration procedures. You can find documentation of additional procedures in the replica set tutorials(page 319) section.
 Adding Members 增加节点
Before adding a new member to an existing replica set, do one of the following to prepare the new member’s data directory:
• Make sure the new member’s data directory does not contain data. The new member will copy the data from an existing member.
If the new member is in a recovering state, it must exit and become a secondary before MongoDB can copy all data as part of the replication process. This process takes time but does not require administrator intervention.
• Manually copy the data directory from an existing member. The new member becomes a secondary member and will catch up to the current state of the replica set after a short interval. Copying the data over manually shortens the amount of time for the new member to become current.
Ensure that you can copy the data directory to the new member and begin replication within the window allowed by the oplog(page 280). If the difference in the amount of time between the most recent operation and the most recent operation to the database exceeds the length of the oplog on the existing members, then the new instance will have to perform an initial sync, which completely resynchronizes the data, as described in Resyncing a Member of a Replica Set.
Use db.printReplicationInfo()(page 842) to check the current state of replica set members with regards to the oplog.
• 确保新节点的数据目录没有包含数据。写节点将从已存在的节点复制数据。
• 从已存在的节点手动复制数据目录。新节点成为一个从节点,并且将在短暂的间隔之后赶上副本集的当前状态。通过手动复制数据大大缩短新节点赶上当前状态的时间量。
确保你能复制数据目录到新的节点,并且在oplog的窗口允许情况下开始复制。如果最近的操作与最近到数据库的操作之间的时间差值超出已存在的节点的oplog的长度,那么新实例将执行初始同步(完全同步数据,像Resyncing a Member of a Replica Set中描述的那样)。
For the procedure to add a member to a replica set, see Add Members to a Replica Set(page 324).
 Removing Members 移除节点
You may remove a member of a replica set at any time;however, for best results always shut down the mongod instance before removing it from a replica set.
Changed in version 2.2: Before 2.2, you had to shut down the mongod instance before removing it. While 2.2 removes this requirement, it remains good practice.
To remove a member, use the rs.remove() method in the mongo shell while connected to the current primary. Issue the db.isMaster() command when connected to any member of the set to determine the current primary. Use a command in either of the following forms to remove the member:
rs.remove( "mongo2.example.net:27017" )
rs.remove( "mongo3.example.net" )
This operation disconnects the shell briefly and forces a re-connection as the replica set renegotiates which member will be primary. The shell displays an error even if this command succeeds.
You can re-add a removed member to a replica set at any time using the procedure for adding replica set members. Additionally, consider using the replica set reconfiguration procedure to change the host value to rename a member in a replica set directly.
 Replacing a Member 变更节点
Use this procedure to replace a member of a replica set when the hostname has changed. This procedure preserves all existing configuration for a member, except its hostname/location.
You may need to replace a replica set member if you want to replace an existing system and only need to change the hostname rather than completely replace all configured options related to the previous member.
Use rs.reconfig() to change the value of the host field to reflect the new hostname or
port number. rs.reconfig() will not change the value of _id .
使用rs.reconfig() 改变host域的值来反映新主机名或端口。rs.reconfig() 不会改变_id的值。
cfg = rs.conf()
cfg.members[ 0 ].host = "mongo2.example.net:27019"
Warning: Any replica set configuration change can trigger the current primary to step down, which forces an election. This causes the current shell session, and clients connected to this replica set, to produce an error even when the operation succeeds.
To change the value of the priority(page 983) in the replica set configuration, use the following sequence of commands in the mongo shell:
cfg = rs.conf()
cfg.members[ 0].priority = 0. 5
cfg.members[ 1].priority = 2
cfg.members[ 2].priority = 2
The first operation uses rs.conf() to set the local variable cfg to the contents of the current replica set configuration, which is a document. The next three operations change the priority value in the cfg document for the first three members configured in the members array. The final operation calls rs.reconfig() with the argument of cfg to initialize the new configuration.
Note: When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id field in each document in the members array.
The _id rarely corresponds to the array index.
If a member has priority set to 0, it is ineligible to become primary and will not seek election.Hidden members,delayed members, and arbiters all have priority set to 0.
All members have a priority equal to 1 by default.
The value of priority can be any floating point (i.e. decimal) number between 0 and 1000. Priorities are only used to determine the preference in election. The priority value is used only in relation to other members.
With the exception of members with a priority of 0, the absolute value of the priority value is irrelevant.
Replica sets will preferentially elect and maintain the primary status of the member with the highest priority setting.
Warning: Replica set reconfiguration can force the current primary to step down, leading to an election for
primary in the replica set. Elections cause the current primary to close all open client connections.
Perform routine replica set reconfiguration during scheduled maintenance windows.
See also:
TheReplica Reconfiguration Usage(page 985) example revolves around changing the priorities of the members of a replica set.
 Adding an Arbiter 增加仲裁者
For a description of arbiters and their purpose inreplica sets, see Arbiters(page 286).
To prevent tied elections, do not add an arbiter to a set if the set already has an odd number of voting members.
Because arbiters do not hold a copies of collection data, they have minimal resource requirements and do not require dedicated hardware.
1. Create a data directory for the arbiter. The mongod uses this directory for configuration information.
It will not hold database collection data. The following example creates the /data/arbdata directory:
mkdir /data/arb
2. Start the arbiter, making sure to specify the replica set name and the data directory. Consider the following
mongod --port 30000 --dbpath /data/arb --replSet rs
3. In a mongo shell connected to the primary, add the arbiter to the replica set by issuing the rs.addArb() method, which uses the following syntax:
For example, if the arbiter runs on m1.example.net:30000, you would issue this command:
1. 为仲裁者增加一个数据目录。mongod使用这个目录来配置信息。
它不保持数据库集合数据。下面的例子创建 /data/arbdata目录:
mkdir / data / arb
2. 启动仲裁者,确保指定副本集的名称和数据目录。尝试下面的例子:
mongod --port 30000 --dbpath /data /arb --replSet rs
3. 在一个mongo内核中连接到主节点,通过执行rs.addArb()方法增加仲裁者节点到副本集,使用下列语法:
rs.addArb( "<hostname><:port>")
rs.addArb( "m1.example.net:30000")
 Manually Configure a Secondary’s Sync Target 
To override the default sync target selection logic, you may manually configure a secondary member’s sync target for pulling oplog entries temporarily. The following operations provide access to this functionality:
• replSetSyncFrom command, or
• rs.syncFrom() helper in the mongo shell
• replSetSyncFrom 命令
• mongo内核中的助手rs.syncFrom()
Only modify the default sync logic as needed, and always exercise caution. rs.syncFrom() will not affect an in-progress initial sync operation. To affect the sync target for the initial sync, run rs.syncFrom() operation before initial sync.
If you run rs.syncFrom() during initial sync, MongoDB produces no error messages, but the sync
target will not change until after the initial sync operation.
Note:replSetSyncFrom and rs.syncFrom() provide a temporary override of default behavior. If:
• the mongod instance restarts or
• the connection to the sync target closes;
then, the mongod instance will revert to the default sync logic and target.
注意:replSetSyncFrom 和 rs.syncFrom() 提供覆盖默认默认行为的一个暂时的方法。如果:
• mongod实例重启
• 同步目标的连接关闭
 Manage Chained Replication 管理链式复制
New in version 2.2.4. 2.2.4版本更新
MongoDB enables chained replication by default. This procedure describes how to disable it and how to re-enable it.
To disable chained replication, set the chainingAllowed field in Replica Set Configuration to false.
要禁用链式复制,设置副本集配置中的 chainingAllowed 域为false。
You can use the following sequence of commands to set chainingAllowed to false:
1. Copy the configuration settings into the cfg object:
cfg = rs.config()
2. Take note of whether the current configuration settings contain the settings sub-document. If they do, skip this step.
Warning: To avoid data loss, skip this step if the configuration settings contain the settings sub-document.
If the current configuration settings do not contain the settings sub-document, create the sub-document by issuing the following command:
cfg.settings = { }
3. Issue the following sequence of commands to set chainingAllowed to false:
cfg.settings.chainingAllowed = false
To re-enable chained replication, set chainingAllowed to true. You can use the following sequence of commands:
cfg = rs.config()
cfg.settings.chainingAllowed = true
你可以使用下列命令设置chainingAllowed 为false:
1. 拷贝配置设置到cfg对象
cfg = rs.config()
2. 注意当前配置设置是否包含settings 子文档。如果是,跳过这一步
如果当前配置设置没有包含settings 子文档,通过下列命令创建这个子文档:
cfg.settings = { }
3. 执行下列命令设置chainingAllowed 为false:
cfg.settings.chainingAllowed = false
要重新启用链式复制,设置chainingAllowed 为true。使用下列命令:
cfg = rs.config()
cfg.settings.chainingAllowed = true
Note:If chained replication is disabled, you still can use replSetSyncFrom to specify that a secondary replicates from another secondary. But that configuration will last only until the secondary recalculates which member to sync from.
注意:如果链式复制是禁用的,你仍可以使用replSetSyncFrom 指定一个从节点从另一个从节点复制。但该配置只将持续,直到次级重新计算从哪个成员同步。
 Changing Oplog Size 改变Oplog大小
The following is an overview of the procedure for changing the size of the oplog. For a detailed procedure, see Change the Size of the Oplog (page 332).
1. Shut down the current primary instance in the replica set and then restart it on a different port and in “standalone” mode.
2. Create a backup of the old (current) oplog. This is optional.
3. Save the last entry from the old oplog.
4. Drop the old oplog.
5. Create a new oplog of a different size.
6. Insert the previously saved last entry from the old oplog into the new oplog.
7. Restart the server as a member of the replica set on its usual port.
8. Apply this procedure to any other member of the replica set that could become primary.
1. 停止副本集的当前主节点,然后在另一个端口使用"standalone"(单例)模式重启它。
2. 创建旧(当前)oplog的一个备份。这是可选的。
3. 保存旧oplog的最后一个条目。
4. 删除就oplog。
5. 使用不同的大小创建一个新的oplog。
6. 插入刚才从旧oplog中保存的最后一个条目到新的oplog。
7. 在平常使用的端口以副本集的一个成员重启服务。
8. 应用这个程序到任何可能成为主节点的节点。
 Resyncing a Member of a Replica Set
When a secondary’s replication process falls behind so far that primary overwrites oplog entries that the secondary has not yet replicated, that secondary cannot catch up and becomes “stale.” When that occurs, you must completely resynchronize the member by removing its data and performing an initial sync.
To do so, use one of the following approaches:
• Restart the mongod with an empty data directory and let MongoDB’s normal initial syncing feature restore the data. This is the more simple option, but may take longer to replace the data.
See Automatically Resync a Stale Member.
• Restart the machine with a copy of a recent data directory from another member in the replica set. This procedure can replace the data more quickly but requires more manual steps.
See Resync by Copying All Datafiles from Another Member.
• 使用空数据目录重启mongd,让MongoDB标准初始化同步功能恢复数据。这是最简单的方式,但会使用很长时间替换数据。
• 使用副本集另一个节点的最新的数据目录副本重启机器。这个过程可以很快的替换数据,但需要更多手工步骤。
 Automatically Resync a Stale Member 
This procedure relies on MongoDB’s regular process for initial sync. This will restore the data on the stale member to reflect the current state of the set. For an overview of MongoDB initial sync process, see the Syncing section.
To resync the stale member:
1. Stop the stale member’s mongod instance. On Linux systems you can use mongod --shutdown
Set --dbpath to the member’s data directory, as in the following:
mongod --dbpath /data/db/ --shutdown
2. Delete all data and sub-directories from the member’s data directory. By removing the data dbpath(page 939), MongoDB will perform a complete resync. Consider making a backup first.
3. Restart the mongod instance on the member. For example:
mongod --dbpath /data/db/ --replSet rsProduction
At this point, the mongod will perform an initial sync. The length of the initial sync may process depends on the size of the database and network connection between members of the replica set.
Initial sync operations can impact the other members of the set and create additional traffic to the primary, and can only occur if another member of the set is accessible and up to date.
1. 停止陈旧节点的mongod实例。Linux下使用:
mongod --dbpath /data/db/ --shutdown
2. 删除所有数据和数据子目录。通过删除数据dbpath,MongoDB将执行一次完整的重同步。考虑先做备份。
3. 重启mongod实例:
mongod --dbpath /data/db/ --replSet rsProduction
 Resync by Copying All Datafiles from Another Member
This approach uses a copy of the data files from an existing member of the replica set, or a back of the data files to “seed” the stale member.
The copy or backup of the data files must be sufficiently recent to allow the new member to catch up with the oplog, otherwise the member would need to perform an initial sync.
Note:In most cases you cannot copy data files from a running mongod instance to another, because the data files will change during the file copy operation. Consider the Backup Strategies for MongoDB Systems documentation for several methods that you can use to capture a consistent snapshot of a running mongod instance.
After you have copied the data files from the “seed” source, start the mongod instance and allow it to apply all operations from the oplog until it reflects the current state of the replica set.

 29.2.3 Security Considerations for Replica Sets 副本集安全注意事项
In most cases, the most effective ways to control access and to secure the connection between members of a replica set depend on network-level access control. Use your environment’s firewall and network routing to ensure that traffic only from clients and other replica set members can reach your mongod instances. If needed, use virtual private networks (VPNs) to ensure secure connections over wide area networks (WANs.)
Additionally, MongoDB provides an authentication mechanism for mongod and mongos instances connecting to replica sets. These instances enable authentication but specify a shared key file that serves as a shared password.
New in version 1.8: Added support authentication in replica set deployments.
Changed in version 1.9.1: Added support authentication in sharded replica set deployments.
To enable authentication add the following option to your configuration file:
keyFile = /srv /mongodb /keyfile
Note:You may chose to set these run-time configuration options using the --keyFile(or mongos --keyFile) options on the command line.
注意:你可以选择使用--keyFile(或 mongos --keyFile)选项在命令行设置这是运行时配置选项。
Setting keyFile(page 938) enables authentication and specifies a key file for the replica set members to use when authenticating to each other. The content of the key file is arbitrary but must be the same on all members of the replica set and on all mongos instances that connect to the set.
The key file must be less one kilobyte in size and may only contain characters in the base64 set. The key file must not have group or “world” permissions on UNIX systems. Use the following command to use the OpenSSL package to generate “random” content for use in a key file:
openssl rand -base 64 753
Note:Key file permissions are not checked on Windows systems.

 29.2.4 Troubleshooting Replica Sets 副本集故障排除
This section describes common strategies for troubleshootingreplica sets.
See also:
Monitoring Database Systems(page 53).
 Check Replica Set Status 检查副本集状态
To display the current state of the replica set and current state of each member, run the rs.status() method in a mongo shell connected to the replica set’s primary. For descriptions of the information displayed by rs.status(), see Replica Set Status Reference.
Note:The rs.status() method is a wrapper that runs the replSetGetStatus database command.
注意:rs.status() 方式是一个包装程序,运行的是replSetGetStatus 数据命令。
 Check the Replication Lag 检查复制延迟
Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments.
Excessive replication lag makes “lagged” members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.
To check the current length of replication lag:
• In a mongo shell connected to the primary, call the db.printSlaveReplicationInfo() method.
• 在连接到主节点的mongo内核,调用db.printSlaveReplicationInfo() 方法。
The returned document displays the syncedTo value for each member, which shows you when each member last read from the oplog, as shown in the following example:
返回的文档显示每个节点的syncedTo 值,这是各节点最后一次读取oplog的时间:
source : m1.example.net : 30001
    syncedTo : Tue Oct 02 2012 11 : 33 : 40 GMT - 0400 (EDT)
         = 7475 secs ago ( 2. 08hrs)
source : m2.example.net : 30002
    syncedTo : Tue Oct 02 2012 11 : 33 : 40 GMT - 0400 (EDT)
         = 7475 secs ago ( 2. 08hrs)
• Monitor the rate of replication by watching the oplog time in the “replica” graph in the MongoDB Monitoring Service. For more information see the documentation for MMS.
• 通过查看在MongoDB检控服务的"dreplica"视图中的oplog时间,监视复制的速度。
Possible causes of replication lag include:
• Network Latency
Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.
Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints.
• Disk Throughput
If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary,then the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant systems, including vitalized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon’s EBS system.)
Use system-level tools to assess disk status, including iostat or vmstat.
• Concurrency
In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries, as described in Write Concern. This prevents write operations from returning if replication cannot keep up with the write load.
Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.
• Appropriate Write Concern
If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.
To prevent this, require write acknowledgment or journaled write concern after every 100, 1,000, or an another interval to provide an opportunity for secondaries to catch up with the primary.
For more information see:
– Write Concern
– Oplog
• 网络延迟
• 磁盘吞吐量
• 并发
使用database profiler查看使用有符合延迟发生的缓慢查询任务或长时间运行的操作。
• 合适的写关注
 Test Connections Between all Members 测试节点间连接
All members of a replica set must be able to connect to every other member of the set to support replication. Always verify connections in both “directions.” Networking topologies and firewall configurations prevent normal and required connectivity, which can block replication.
Consider the following example of a bidirectional test of networking:
Given a replica set with three members running on three separate hosts:
1. Test the connection from m1.example.net to the other hosts with the following operation set m1.example.net:
1. 测试从m1.example.net 到其他主机的连接:
mongo -- host m2.example.net -- port 27017
mongo -- host m3.example.net -- port 27017
2. Test the connection from m2.example.net to the other two hosts with the following operation set from m2.example.net, as in:
2. 测试从m2.example.net到其他两个主机的连接:
mongo --host m1.example.net --port 27017
mongo --host m3.example.net --port 27017
You have now tested the connection between m2.example.net and m1.example.net in both directions.
你现在已经测试了 m2.example.net 和 m1.example.net 两个方向的连接。
3. Test the connection from m3.example.net to the other two hosts with the following operation set from the m3.example.nethost, as in:
3. 测试从m3.example.net到其他两个主机的连接
mongo --host m1.example.net --port 27017
mongo --host m2.example.net --port 27017
If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.
 Check the Size of the Oplog 检查oplog大小
A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.
To check the size of the oplog for a given replica set member, connect to the member in a mongo shell and run the db.printReplicationInfo()(page 842) method.
The output displays the size of the oplog and the date ranges of the operations contained in the oplog. In the following example, the oplog is about 10MB and is able to fit about 26 hours (94400 seconds) of operations:
configured oplog size: 10.10546875MB
log length start to end: 94400 (26.22hrs)
oplog first event time: Mon Mar 19 2012 13:50:38 GMT-0400 (EDT)
oplog last event time: Wed Oct 03 2012 14:59:10 GMT-0400 (EDT)
now: Wed Oct 032012 15:00:21 GMT-0400 (EDT)
The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week’s work of operations.
For more information on how oplog size affects operations, see:
• The Oplog(page 280) topic in theReplica Set Fundamental Concepts(page 277) document.
• The Delayed Members(page 285) topic in this document.
• The Check the Replication Lag(page 293) topic in this document.
Note: You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.
To change oplog size, see Changing Oplog Size(page 291) in this document or see the Change the Size of the Oplog(page 332) tutorial.
 Failover and Recovery 故障切换和恢复
Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.
While failover is automatic, replica set administrators should still understand exactly how this process works. This section below describe failover in detail.
In most cases, failover occurs without administrator intervention seconds after the primary either steps down, becomes inaccessible, or becomes otherwise ineligible to act as primary. If your MongoDB deployment does not failover according to expectations, consider the following operational errors:
• No remaining member is able to form a majority. This can happen as a result of network partitions that render some members inaccessible. Design your deployment to ensure that a majority of set members can elect a primary in the same facility as core application systems.
• No member is eligible to become primary. Members must have a priority setting greater than 0, have a state that is less than ten seconds behind the last operation to the replica set, and generally be more up to date than the voting members.
• 没有剩余的成员形成多数。这可能是因为网络分区造成一些节点不可访问导致的。设计你的部署以确保多数节点可以选举出与主应用系统相同设施的主节点。
• 没有节点符合成为主节点的资格。节点需要有一个大于0的优先级,并且一般比参与投票的节点更新。
In many senses,rollbacks represent a graceful recovery from an impossible failover and recovery situation.
Rollbacks occur when a primary accepts writes that other members of the set do not successfully replicate before theprimary steps down. When the former primary begins replicating again it performs a “rollback.” Rollbacks removethose operations from the instance that were never replicated to the set so that the data set is in a consistent state. The m ongod program writes rolled back data to a BSONfile that you can view using bsondump(page 912), applied manually using mongorestore(page 909).
You can prevent rollbacks using a replica acknowledged  write concern. These write operations requirenot only the primary to acknowledge the write operation, sometimes even the majority of the set to confirm the writeoperation before returning.
你可以使用replica acknowledged(复制确认)写关注。这些写操作在返回前不仅要求主节点确认,有时甚至需要大多数节点确认。
enabling write concern.启用写关注
See also:
The Elections(page 278) section in the Replica Set Fundamental Concepts(page 277) document, and the ElectionInternals (page 311) section in theReplica Set Internals and Behaviors(page 309) document.
 Oplog Entry Timestamp Error oplog条目时间戳错误
Consider the following error in mongod output and logs:
考虑mongod 输出的以下错误日志:
replSet error fatal couldn’t query the local local.oplog.rs collection. Terminating mongod after 30 <timestamp> [rsStart] bad replSet oplog entry?
Often, an incorrectly typed value in the ts field in the last oplog entry causes this error. The correct data type is Timestamp.
Check the type of the ts value using the following two queries against the oplog collection:
db = db.getSiblingDB( "local")
db.oplog.rs.find().sort({$natural : - 1}).limit( 1)
db.oplog.rs.find({ts :{$type : 17}}).sort({$natural : - 1}).limit( 1)
The first query returns the last document in the oplog, while the second returns the last document in the oplog where the ts value is a Timestamp. The $type(page 707) operator allows you to select BSON type 17, is the Timestamp data type.
If the queries don’t return the same document, then the last document in the oplog has the wrong data type in the ts field.
If the first query returns this as the last oplog entry:
    "ts" : {t : 1347982456000, i : 1},
    "h" : NumberLong( "8191276672478122996"),
    "op" : "n",
    "ns" : "",
    "o" : { "msg" : "Reconfig set", "version" : 4 }
And the second query returns this as the last entry where ts has the Timestamp type:
    "ts" : Timestamp( 1347982454000, 1),
    "h" : NumberLong( "6188469075153256465"),
    "op" : "n",
    "ns" : "",
    "o" : { "msg" : "Reconfig set", "version" : 3 }
Then the value for the ts field in the last oplog entry is of the wrong data type.
To set the proper type for this value and resolve this issue, use an update operation that resembles the following:
db.oplog.rs.update( { ts : { t : 1347982456000, i : 1 } },{ $set : { ts : new Timestamp( 1347982456000, 1)}})
Modify the timestamp values as needed based on your oplog entry. This operation may take some period to complete because the update must scan and pull the entire oplog into memory.
 Duplicate Key Error on local.slaves
The duplicate key on local.slaves error, occurs when a secondary or slave changes its hostname and the primary or master tries to update its local.slaves collection with the new name. The update fails because it contains the same _id value as the document containing the previous hostname. The error itself will resemble the following.
exception 11000 E11000 duplicate key error index : local.slaves.$_id_ dup key : { : ObjectId(’ <object
This is a benign error and does not affect replication operations on the secondary or slave.
To prevent the error from appearing, drop the local.slaves collection from the primary or master, with the following sequence of operations in the mongo shell:
use local
The next time a secondary or slave polls the primary or master, the primary or master recreates the local.slaves collection.
 Elections and Network Partitions 选举和网络分区
Members on either side of a network partition cannot see each other when determining whether a majority is available to hold an election.
That means that if a primary steps down and neither side of the partition has a majority on its own, the set will not elect a new primary and the set will become read only. To avoid this situation, attempt to place a majority of instances in one data center with a minority of instances in a secondary facility.
Election Internals(page 311).
END 13/08/10 0:49:22




