官方文档译文 Release 2.2.5 译文：Part VII Replication 29.2 Replica Set Operation and Management

最新推荐文章于 2024-10-08 17:39:57 发布

yameing

最新推荐文章于 2024-10-08 17:39:57 发布

阅读量2.5k

点赞数

分类专栏： MongoDB 文章标签： MongoDB 数据库

MongoDB 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

官方文档译文 Release 2.2.5 译文：Part VII Replication 29.0

29.2 Replica Set Operation and Management

Replica sets automate most administrative tasks associated with database replication. Nevertheless, several operations related to deployment and systems management require administrator intervention remain. This document provides an overview of those tasks, in addition to a collection of troubleshooting suggestions for administers of replica sets.

See also:

• rs.status() and db.isMaster()(page 841)

• Replica Set Reconfiguration Process(page 985)

• rs.conf()(page 849) and rs.reconfig()(page 850)

• Replica Set Configuration

副本集自动化管理大多数与数据库复制相关联的任务。然而，有关部署和系统管理的几个操作人人需要管理员干预。本文档提供这些任务的概述，和给副本集管理员一些故障排除的建议。

• rs.status()和 db.isMaster()

• 副本集重构过程

• rs.conf() 和 rs.reconfig()

• 副本集配置

The following tutorials provide task-oriented instructions for specific administrative tasks related to replica set operation.

• Deploy a Replica Set 部署副本集

• Convert a Standalone to a Replica Set(page 322) 转换单机实例到副本集

• Add Members to a Replica Set(page 324) 向副本集添加节点

• Deploy a Geographically Distributed Replica Set(page 326) 部署一个地理上分布的副本集

• Change the Size of the Oplog(page 332) 变更Oplog的大小

• Force a Member to Become Primary(page 334) 强制指定一个节点为主节点

• Change Hostnames in a Replica Set(page 336) 变更副本集中节点的主机名

• Convert a Secondary to an Arbiter(page 340) 转换从节点为仲裁者节点

• Reconfigure a Replica Set with Unavailable Members(page 342) 重构包含隐藏节点的副本集

• Recover MongoDB Data following Unexpected Shutdown(page 586) 在意外宕机后恢复MongoDB数据

29.2.1 Menber Configurations 节点配置

All replica sets have a single primary and one or more secondaries. Replica sets allow you to configure secondary members in a variety of ways. This section describes these configurations.

所有副本集都包含单个主节点和一个或多个从节点。副本集运行你使用多种方式配置从节点。下面将描述这些配置。

---------------------------------------------------------------------

Note: A replica set can have up to 12 members, but only 7 members can have votes. For configuration informationregarding non-voting members, see Non-Voting Members(page 286).

注意：一个副本集最多能有12个节点，但只有7个节点可以参加投票。

---------------------------------------------------------------------

Warning:The rs.reconfig()(page 850) shell method can force the current primary to step down, which causes an election(page 278). When the primary steps down, the mongod closes all client connections. While this typically takes 10-20 seconds, attempt to make these changes during scheduled maintenance periods. To successfully reconfigurea replica set, a majority of the members must be accessible.
提醒：rs.reconfig()内核方法能强制当前主节点下台，这将导致一次投票。当主节点下台，mongod关闭所有客户端连接。虽然这通常需要10-20秒，尝试在定期维护期间进行这些变更。想成功重构一个副本集，大多数的节点必须可以访问。

See also:
The Elections(page 278) section in the Replica Set Fundamental Concepts (page 277) document, and the Election Internals (page 311) section in the Replica Set Internals and Behaviors(page 309) document.

Secondary-only Menbers

The secondary-only configuration prevents a secondary member in areplica set from ever becoming a primary in a failover. You can set secondary-only mode for any member of the set except the current primary.

secondary-only 配置阻止副本集中的一个从节点在故障转移时成为主节点。你可以对一些节点设置secondary-only默认除了当前主节点。

For example, you may want to configure all members of a replica sets located outside of the main data centers as secondary-only to prevent these members from ever becoming primary.

例如，你可能想要配置副本集的所有节点作为位于主数据中心之外的secondary-only节点以防止这些节点成为主节点。

To configure a member as secondary-only, set its priority(page 983) value to 0. Any member with a priority (page 983) equal to 0 will never seek election(page 278) and cannot become primary in any situation. For more information on priority levels, see Member Priority(page 278).

配置一个节点为secondary-only，设置它的优先级为0。优先级为0的节点从不谋求被选举且任何情况下都不能成为主节点。

---------------------------------------------------------------------

Note: When updating the replica configuration object, address all members of the set using the index value in the array. The array index begins with 0. Do not confuse this index value with the value of the _id(page 982) field in each document in the members array.

The _id(page 982) rarely corresponds to the array index.

注意：当更新复制配置对象，使用下标的值定位副本集的成员在数组中的位置。数组下标开始于0。不要混淆这个下标志与数组里每个文档的_id域的值。

_id很少对应于数组的下标。

---------------------------------------------------------------------

As an example of modifying member priorities, assume a four-member replica set. Use the

following sequence of operations in the mongo shell to modify member priorities:

修改节点优先级的例子，假定一个有4个节点的副本集。在mongo内核中使用以下操作修改节点优先级。

           cfg 
          = rs.conf() 
          
 cfg.members[ 
          0 ].priority 
           = 
          2 
          
 cfg.members[ 
          1 ].priority 
           = 
          1 
          
 cfg.members[ 
          2 ].priority 
           = 
          0 . 
          5 
          
 cfg.members[ 
          3 ].priority 
           = 
          0 
          
 rs.reconfig(cfg) 
         

This reconfigures the set, with the following priority settings:

• Member 0 to a priority of 2 so that it becomes primary, under most circumstances.

• Member 1 to a priority of 1, which is the default value. Member 1 becomes primary if no member with a higher priority is eligible.

• Member 2 to a priority of 0.5, which makes it less likely to become primary than other members but doesn’t prohibit the possibility.

• Member 3 to a priority of 0. Member 3 cannot become the primary member under any circumstances.

使用下列优先级配置：

• 节点0优先级为2，所有它在大多数情况下成为主节点。

• 节点1优先级为1，这是默认值。节点1在没有更高优先级节点符合资格时成为主节点。

• 节点2优先级为0.5，使它比其他节点有更小的可能成为主节点，但不能排除这种可能性。

• 节点3优先级为0，节点3在任何情况下不能成为主节点。

---------------------------------------------------------------------

Note: If your replica set has an even number of members, add an arbiter(page 286) to ensure that members can quickly obtain a majority of votes in an election for primary.

注意：如果副本集有偶数个节点，增加一个仲裁者节点以确保在选举主节点时能快速取得一个多票数的节点。

---------------------------------------------------------------------

Note:MongoDB does not permit the current primary to have a priority(page 983) of 0. If you want to prevent the current primary from becoming primary, first users.stepDown() to step down the current primary, and then reconfigure the replica set(page 985) with rs.conf()(page 849) and rs.reconfig()(page 850).

注意：MongoDB不允许当前主节点的优先级为0。如果你不想当前主节点作为主节点，先使用users.stepDown()是当前主节点下台，然后使用rs.conf()和rs.reconfig()重新配置副本集。

---------------------------------------------------------------------

See also:

priority(page 983) and Replica Set Reconfiguration(page 985).

Hidden Menbers 隐藏节点

Hidden members are part of a replica set but cannot become primary and are invisible to client applications.However, hidden members do vote inelections(page 278).

隐藏节点是副本集的一部分，但不能成为主节点，而且对客户端不可见。然而，隐藏节点参与选举投票。

Hidden members are ideal for instances that will have significantly different usage patterns than the other members and require separation from normal traffic. Typically, hidden members provide reporting, dedicated backups, and dedicated read-only testing and integration support.

Hidden members have priority(page 983) set 0 and have hidden(page 982) set to true.

隐藏节点是理想的实例，它与其他节点有显著不同的使用模式，同时要求分离正常流量。通常情况下，隐藏节点提供报表、专用备份和专用只读测试与集成支持。隐藏节点需设置优先级为0和设置hidden为true。

To configure a hidden member, use the following sequence of operations in the mongo shell:

配置隐藏节点，在mongo内核使用以下操作：

         cfg 
        = rs.conf() 
        
 cfg.members[ 
        0 ].priority 
         = 
        0 
        
 cfg.members[ 
        0 ].hidden 
         = true 
        
 rs.reconfig(cfg) 
       

After re-configuring the set, the first member of the set in the members array will have a priority of 0 so

that it cannot become primary. The other members in the set will not advertise the hidden member in the isMaster (page 762) or db.isMaster()(page 841) output.

重配置副本集后，副本集节点数组的第一个节点的优先级将为0，所以它不能成为主节点。副本集的其他节点将不会在isMaster和db.isMaster()输出中显示出隐藏节点。

---------------------------------------------------------------------

Note:You must send the rs.reconfig()(page 850) command to a set member that can become primary. In the above example, if you issue the rs.reconfig()(page 850) operation to a member with a priority(page 983) of 0 the operation will fail.

注意：你需要发送rs.reconfig()命令到副本集中能成为主节点的节点。在上面的例子中，如果你发布rs.reconfig()操作到一个优先级为0的节点，操作将失败。

---------------------------------------------------------------------

Note:Changed in version 2.0.

For sharded clusters running with replica sets before 2.0 if you reconfigured a member as hidden, you had to restart

mongos to prevent queries from reaching the hidden member.

注意：在2.0版本中变更

跑在2.0版本以前的分片集群，如果重配置一个节点为隐藏节点，你需要重启mongos以防止查询操作发送到隐藏节点。

---------------------------------------------------------------------

See also:

Replica Set Read Preference(page 303) and Replica Set Reconfiguration(page 985).

Delayed Members 延迟节点

Delayed members copy and apply operations from the primary’s oplog with a specified delay. If a member has a delay of one hour, then the latest entry in this member’s oplog will not be more recent than one hour old, and the state of data for the member will reflect the state of the set an hour earlier.·

延迟节点在一个指定的延迟后复制和应用从主节点的oplog（操作日志）。如果一个节点有1小时的延迟，那么这个节点的oplog的最后一个条目的时间不会小于1小时，该节点的数据的状态反映了副本集一小时前的状态。

Example:If the current time is 09:52 and the secondary is a delayed by an hour, no operation will be more recent than 08:52.

例如: 现在时间为09:52，从节点的延迟时间为1个小时，没有比08:52分更近的操作。

Delayed members may help recover from various kinds of human error. Such errors may include inadvertently deleted databases or botched application upgrades. Consider the following factors when determining the amount of slave delay to apply:

• Ensure that the length of the delay is equal to or greater than your maintenance windows.

• The size of the oplog is sufficient to capture more than the number of operations that typically occur in that period of time. For more information on oplog size, see the Oplog(page 280) topic in the Replica Set Fundamental Concepts(page 277) document.

延迟节点而已回复集中认为错误。这些错误可能包括意外的删除数据库或者拙劣的应用程序升级。当决定从节点的适应的延时时，考虑以下因素：

• 确保延迟的长度等于或大于你维护的时间窗。

• oplog的大小足以保存通常在这个时间段发送的操作数。

Delayed members must have a priority set to 0 to prevent them from becoming primary in their replica sets. Also these members should be hidden(page 284) to prevent your application from seeing or querying this member.

延迟节点必须设置优先级为0，以防止他成为副本集中的主节点。当然也需要设置为隐藏以防止应用程序可以访问到。

To configure a replica set member with a one hour delay, use the following sequence of operations in the mongo shell:

配置副本集节点为1个小时延迟：

         cfg 
        = rs.conf() 
        
 cfg.members[ 
        0 ].priority 
         = 
        0 
        
 cfg.members[ 
        0 ].slaveDelay 
         = 
        3600 
        
 rs.reconfig(cfg) 
       

After the replica set reconfigures, the first member of the set in the members array will have a priority of 0

and cannot become primary. The slaveDelay(page 983) value delays both replication and the member’s oplog by 3600 seconds (1 hour). Setting slaveDelay(page 983) to a non-zero value also sets hidden(page 982) to true for this replica set so that it does not receive application queries in normal operations.

副本集重新配置后，节点集合中的第一个节点的优先级为0，不能成为主节点。延迟复制与oplog的slaveDelay值为3600秒（1小时）。设置slaveDelay为一个不为0的值且设置hidden为true，使它不能接受应用正常操作中的查询。

Warning: The length of the secondary slaveDelay(page 983) must fit within the window of the oplog. If the oplog is shorter than the slaveDelay(page 983) window, the delayed member cannot successfully replicate operations.

提醒：slaveDelay 的长度必须适合oplog的窗口。如果oplog比slaveDelay窗口小，延迟节点就不能成功地复制操作。

See also:

slaveDelay(page 983), Replica Set Reconfiguration(page 985), Oplog(page 280), Changing Oplog Size in this document, and the Change the Size of the Oplog(page 332) tutorial.

Arbiters 仲裁者

Arbiters are special mongod instances that do not hold a copy of the data and thus cannot become primary.

Arbiters exist solely to participate in elections(page 278).

仲裁者是特殊的mongod实例，它不保持数据的副本，所以不能成为主节点。

仲裁者的存在仅仅是为了参与选举。

---------------------------------------------------------------------

Note: Because of their minimal system requirements, you may safely deploy an arbiter on a system with another workload, such as an application server or monitoring member.

注意：因为他们的最低系统要求，你可以安全地部署系统上的仲裁者与另一工作负荷，如应用程序服务器或监控成员。

---------------------------------------------------------------------

Warning:Do not run arbiter processes on a system that is an active primary or secondary of its replica set.

提醒：不要部署一个仲裁者进程在他的副本集的活动主节点或从节点的同一个系统中。

Arbiters never receive the contents of any collection but do have the following interactions with the rest of the replica set:

• Credential exchanges that authenticate the arbiter with the replica set. All MongoDB processes within a replica set use keyfiles. These exchanges are encrypted.

MongoDB only transmits the authentication credentials in a cryptographically secure exchange, and encrypts no other exchange.

• Exchanges of replica set configuration data and of votes. These are not encrypted.

仲裁者永远不会接收任何集合的内容，但与副本集的其他节点有以下交互：

• 身份凭据流转以验证副本集的仲裁者。副本集里的所有节点都是用密钥文件。这些交换是加密的。

MongoDB的只有一个加密的安全地交换传送的身份验证凭据，并没有其他的加密交互。

• 副本集配置数据和选票的交互。这是没加密的。

If your MongoDB deployment uses SSL, then all communications between arbiters and the other members of the replica set are secure. See the documentation forUse MongoDB with SSL Connections(page 45) for more information.

As with all MongoDB components, run arbiters on secure networks.

如果你的MongoDB是用SSL部署，那么仲裁者和副本集其他节点之间的所有交流都是安全的。

To add an arbiter, see Adding an Arbiter(page 289).

Non-Voting Members 没选票节点

You may choose to change the number of votes that each member has in elections(page 278) for primary. In general,all members should have only 1 vote to prevent intermittent ties, deadlock, or the wrong members from becoming primary. Use replica set priorities(page 278) to control which members are more likely to become primary.

为了选举出主节点，你可以选择改变选举中每个节点的票数。一般地，所有节点只能有1个选票以防止间歇性关系、死锁或错误的节点成为主节点。使用副本集优先级控制哪个节点更有可能成为主节点。

To disable a member’s ability to vote in elections, use the following command sequence in the mongo shell.

禁用节点在选举中投票的能力：

       cfg 
      = rs.conf() 
      
 cfg.members[ 
      3 ].votes 
       = 
      0 
      
 cfg.members[ 
      4 ].votes 
       = 
      0 
      
 cfg.members[ 
      5 ].votes 
       = 
      0 
      
 rs.reconfig(cfg) 
     

This sequence gives 0 votes to the fourth, fifth, and sixth members of the set according to the order of the members array in the output of rs.conf(). This setting allows the set to elect these members as primary but does not allow them to vote in elections. If you have three non-voting members, you can add three additional voting members to your set. Place voting members so that your designated primary or primaries can reach a majority of votes in the event of a network partition.

以上序列使集合中根据rs.conf()输出的节点数组的第4、5、6个节点拥有0个选票。这些设置运行副本集选举这些节点为主节点但不允许他们在选举中投票。如果你有3个没选票节点，你可以在副本集中增加另外3个可投票节点。定位可投票节点，使你指定的主节点或主节点们能在发生网络分区事件时收到大多数选票。

---------------------------------------------------------------------

Note:In general and when possible, all members should have only 1 vote. This prevents intermittent ties, deadlocks,or the wrong members from becoming primary. Use Replica Set Priorities(page 278) to control which members are more likely to become primary.

注意：通常情况下或可能的情况下，所有节点应该有且只有1个选票。这以防止间歇性关系、死锁或错误的节点成为主节点。使用副本集优先级控制哪个节点更有可能成为主节点。

---------------------------------------------------------------------

Chained Replication 链式复制

New in version 2.0.

2.0版本更新

Chained replication occurs when a secondary member replicates from another secondary member instead of from the primary. This might be the case, for example, a secondary selects its replication target based on ping time and if the closest member is another secondary.

链式复制发生于当一个从节点从另一个从节点复制而不是主节点的时候。其原因可能是，比如，如果一个从节点基于ping的时间选择它复制的目标，或者最近的一个节点是另一个从节点。

Chained replication can reduce load on the primary. But chained replication can also result in increased replication lag, depending on the topology of the network.

链式复制可以减少主节点的负载。但链式复制会导致复制延时，这依赖于网络拓扑。

Beginning with version 2.2.4, you can use the chainingAllowed setting in Replica Set Configuration to disable chained replication for situations where chained replication is causing lag. For details, see Chained Replication(page 287).

从2.2.4版本开始，你可以在副本集配置中使用chainingAllowed禁止当链式复制导致延迟的情况。

29.2.2 Procedures 程序

This section gives overview information on a number of replica set administration procedures. You can find documentation of additional procedures in the replica set tutorials(page 319) section.

Adding Members 增加节点

Before adding a new member to an existing replica set, do one of the following to prepare the new member’s data directory:

• Make sure the new member’s data directory does not contain data. The new member will copy the data from an existing member.

If the new member is in a recovering state, it must exit and become a secondary before MongoDB can copy all data as part of the replication process. This process takes time but does not require administrator intervention.

• Manually copy the data directory from an existing member. The new member becomes a secondary member and will catch up to the current state of the replica set after a short interval. Copying the data over manually shortens the amount of time for the new member to become current.

Ensure that you can copy the data directory to the new member and begin replication within the window allowed by the oplog(page 280). If the difference in the amount of time between the most recent operation and the most recent operation to the database exceeds the length of the oplog on the existing members, then the new instance will have to perform an initial sync, which completely resynchronizes the data, as described in Resyncing a Member of a Replica Set.

Use db.printReplicationInfo()(page 842) to check the current state of replica set members with regards to the oplog.

向已存在的副本集中增加节点前，执行下列中的一个以准备新节点的数据目录：

• 确保新节点的数据目录没有包含数据。写节点将从已存在的节点复制数据。

如果新节点处于恢复状态，它一定在MongoDB能复制所有数据的复制过程的一部分之前退出了并且成为一个从节点。这个过程需要时间当不需要管理员干预。

• 从已存在的节点手动复制数据目录。新节点成为一个从节点，并且将在短暂的间隔之后赶上副本集的当前状态。通过手动复制数据大大缩短新节点赶上当前状态的时间量。

确保你能复制数据目录到新的节点，并且在oplog的窗口允许情况下开始复制。如果最近的操作与最近到数据库的操作之间的时间差值超出已存在的节点的oplog的长度，那么新实例将执行初始同步（完全同步数据，像Resyncing a Member of a Replica Set中描述的那样）。

使用db.printReplicationInfo()检查副本集成员关于oplog的当前状态。

For the procedure to add a member to a replica set, see Add Members to a Replica Set(page 324).

Removing Members 移除节点

You may remove a member of a replica set at any time;however, for best results always shut down the mongod instance before removing it from a replica set.

有时你需要移除副本集中的节点；然而，为获得最佳效果总是在把它从一个副本集移除前关闭mongod实例。

Changed in version 2.2: Before 2.2, you had to shut down the mongod instance before removing it. While 2.2 removes this requirement, it remains good practice.

2.2版本更新：在2.2之前，你需要在把mongod实例移除前关闭它。尽管2.2版本中，删除这条规则仍然是很好的实践。

To remove a member, use the rs.remove() method in the mongo shell while connected to the current primary. Issue the db.isMaster() command when connected to any member of the set to determine the current primary. Use a command in either of the following forms to remove the member:

移除一个节点，连接当前主节点在其mongo内核中使用rs.remove()方法。当连接到副本集的任何节点时发送db.isMaster()命令来确定当前的主节点。使用下列形式的命令移除节点：

         rs.remove( 
        "mongo2.example.net:27017" ) 
        
 rs.remove( 
        "mongo3.example.net" ) 
       

This operation disconnects the shell briefly and forces a re-connection as the replica set renegotiates which member will be primary. The shell displays an error even if this command succeeds.

这个操作短暂的断开与shell的连接，然后强制重新连接，副本集重新协商哪个节点是主节点。即使该命令成功，shell也会显示一个错误。

You can re-add a removed member to a replica set at any time using the procedure for adding replica set members. Additionally, consider using the replica set reconfiguration procedure to change the host value to rename a member in a replica set directly.

你可以使用增加节点到副本集的程序在任何时间重新增加一个已移除的节点到副本集。此外，考虑使用副本集重新配置的程序去改变主机值直接重命名副本集的一个成员。

Replacing a Member 变更节点

Use this procedure to replace a member of a replica set when the hostname has changed. This procedure preserves all existing configuration for a member, except its hostname/location.

当副本集中的节点的主机名改变了，使用这个程序变更节点。这个程序保留节点所有现在的配置，除了它的主机名/位置。

You may need to replace a replica set member if you want to replace an existing system and only need to change the hostname rather than completely replace all configured options related to the previous member.

你可能需要变更一个副本集节点，如果你想变更一个现有的系统而且只需要改变主机名而不是完全变更所有之前的节点的选项配置。

Use rs.reconfig() to change the value of the host field to reflect the new hostname or

port number. rs.reconfig() will not change the value of _id .

使用rs.reconfig() 改变host域的值来反映新主机名或端口。rs.reconfig() 不会改变_id的值。

         cfg 
        = rs.conf() 
        
 cfg.members[ 
        0 ].host 
         = 
        "mongo2.example.net:27019" 
        
 rs.reconfig(cfg) 
       

Warning: Any replica set configuration change can trigger the current primary to step down, which forces an election. This causes the current shell session, and clients connected to this replica set, to produce an error even when the operation succeeds.

提醒：任何副本集配置的改变都可能触发当前主节点下台，强制一个选举。这导致当前shell会话和连接到该副本集的客户端产生一个错误，就算该操作成功。

To change the value of the priority(page 983) in the replica set configuration, use the following sequence of commands in the mongo shell:

改变副本集配置中优先级的值：

       cfg 
      = rs.conf() 
      
 cfg.members[ 
      0].priority 
       = 
      0. 
      5 
      
 cfg.members[ 
      1].priority 
       = 
      2 
      
 cfg.members[ 
      2].priority 
       = 
      2 
      
 rs.reconfig(cfg) 
     

The first operation uses rs.conf() to set the local variable cfg to the contents of the current replica set configuration, which is a document. The next three operations change the priority value in the cfg document for the first three members configured in the members array. The final operation calls rs.reconfig() with the argument of cfg to initialize the new configuration.

第一步操作使用rs.conf()设置本地变量cfg（这是一个文档）为当前副本集的配置内容。接下来三步操作改变成员数组前三个成员在cfg文档中的优先级。最后一步操作调用带cfg参数的rs.reconfig()命令初始化新配置。

---------------------------------------------------------------------

The _id rarely corresponds to the array index.

注意：当更新复制配置对象，使用下标的值定位副本集的成员在数组中的位置。数组下标开始于0。不要混淆这个下标志与数组里每个文档的_id域的值。

_id很少对应于数组的下标。

---------------------------------------------------------------------

If a member has priority set to 0, it is ineligible to become primary and will not seek election.Hidden members,delayed members, and arbiters all have priority set to 0.

如果一个节点的优先级设置为0，它没有资格成为主节点并且不会谋求选举。隐藏节点、延迟节点、仲裁者的优先级都设置为0。

All members have a priority equal to 1 by default.

所有节点都有默认为1的优先级。

The value of priority can be any floating point (i.e. decimal) number between 0 and 1000. Priorities are only used to determine the preference in election. The priority value is used only in relation to other members.

With the exception of members with a priority of 0, the absolute value of the priority value is irrelevant.

优先级的值可以是1到1000之间的任意浮点数（如：十进制）。优先级只用在决定选举的优先权。优先级的值只用在相对于其他成员。

Replica sets will preferentially elect and maintain the primary status of the member with the highest priority setting.

副本集将优先选择高优先级的成员并维持它的主节点状态。

Warning: Replica set reconfiguration can force the current primary to step down, leading to an election for

primary in the replica set. Elections cause the current primary to close all open client connections.

Perform routine replica set reconfiguration during scheduled maintenance windows.

提醒：副本集重配置会强制当前主节点下台、引导副本集选举主节点。选举导致当前主节点关闭所有打开的客户端连接。

在定期维护窗口进行常规副本集重配置。

See also:

TheReplica Reconfiguration Usage(page 985) example revolves around changing the priorities of the members of a replica set.

Adding an Arbiter 增加仲裁者

For a description of arbiters and their purpose inreplica sets, see Arbiters(page 286).

To prevent tied elections, do not add an arbiter to a set if the set already has an odd number of voting members.

Because arbiters do not hold a copies of collection data, they have minimal resource requirements and do not require dedicated hardware.

1. Create a data directory for the arbiter. The mongod uses this directory for configuration information.

It will not hold database collection data. The following example creates the /data/arbdata directory:

mkdir /data/arb

2. Start the arbiter, making sure to specify the replica set name and the data directory. Consider the following

example:

mongod --port 30000 --dbpath /data/arb --replSet rs

3. In a mongo shell connected to the primary, add the arbiter to the replica set by issuing the rs.addArb() method, which uses the following syntax:

rs.addArb("<hostname><:port>")

For example, if the arbiter runs on m1.example.net:30000, you would issue this command:

rs.addArb("m1.example.net:30000")

为了防止并列选举，不要在已经有奇数个可选举节点的副本集中加入仲裁者节点。

因为仲裁者没有保持集合数据的一个副本，它需要最小的资源需求且不要求专用硬件。

1. 为仲裁者增加一个数据目录。mongod使用这个目录来配置信息。

它不保持数据库集合数据。下面的例子创建 /data/arbdata目录：

mkdir / data / arb

2. 启动仲裁者，确保指定副本集的名称和数据目录。尝试下面的例子：

mongod --port 30000 --dbpath /data /arb --replSet rs

3. 在一个mongo内核中连接到主节点，通过执行rs.addArb()方法增加仲裁者节点到副本集，使用下列语法：

rs.addArb( "<hostname><:port>")

例如，如果仲裁者运行在m1.example.net:30000上，执行以下命令：

rs.addArb( "m1.example.net:30000")

Manually Configure a Secondary’s Sync Target

手动配置从节点同步目标

To override the default sync target selection logic, you may manually configure a secondary member’s sync target for pulling oplog entries temporarily. The following operations provide access to this functionality:

• replSetSyncFrom command, or

• rs.syncFrom() helper in the mongo shell

覆盖默认同步目标选择逻辑，你需要手动配置暂时的从节点同步目标以拉取oplog条目。下面的操作提供了访问这个功能的方法：

• replSetSyncFrom 命令

• mongo内核中的助手rs.syncFrom()

Only modify the default sync logic as needed, and always exercise caution. rs.syncFrom() will not affect an in-progress initial sync operation. To affect the sync target for the initial sync, run rs.syncFrom() operation before initial sync.

只在有需要的时候修改默认同步逻辑，且总是审慎行事。rs.syncFrom()不会对正在进行中的初始化同步操作起作用。要使同步目标对初始化同步起作用，在初始化同步前执行rs.syncFrom()操作。

If you run rs.syncFrom() during initial sync, MongoDB produces no error messages, but the sync

target will not change until after the initial sync operation.

如果你在初始化同步时执行rs.syncFrom()，MongoDB不会产生错误信息，但同步目标在初始化同步操作结束前不会起作用。

---------------------------------------------------------------------

Note:replSetSyncFrom and rs.syncFrom() provide a temporary override of default behavior. If:

• the mongod instance restarts or

• the connection to the sync target closes;

then, the mongod instance will revert to the default sync logic and target.

注意：replSetSyncFrom 和 rs.syncFrom() 提供覆盖默认默认行为的一个暂时的方法。如果：

• mongod实例重启

• 同步目标的连接关闭

那么，mongod实例将还原到默认同步逻辑和同步目标。

---------------------------------------------------------------------

Manage Chained Replication 管理链式复制

New in version 2.2.4. 2.2.4版本更新

MongoDB enables chained replication by default. This procedure describes how to disable it and how to re-enable it.

To disable chained replication, set the chainingAllowed field in Replica Set Configuration to false.

MongoDB默认允许链式复制。这个程序描述如何禁用和启用它。

要禁用链式复制，设置副本集配置中的 chainingAllowed 域为false。

You can use the following sequence of commands to set chainingAllowed to false:

1. Copy the configuration settings into the cfg object:

cfg = rs.config()

2. Take note of whether the current configuration settings contain the settings sub-document. If they do, skip this step.

Warning: To avoid data loss, skip this step if the configuration settings contain the settings sub-document.

If the current configuration settings do not contain the settings sub-document, create the sub-document by issuing the following command:

cfg.settings = { }

3. Issue the following sequence of commands to set chainingAllowed to false:

cfg.settings.chainingAllowed = false

rs.reconfig(cfg)

To re-enable chained replication, set chainingAllowed to true. You can use the following sequence of commands:

cfg = rs.config()

cfg.settings.chainingAllowed = true

rs.reconfig(cfg)

你可以使用下列命令设置chainingAllowed 为false：

1. 拷贝配置设置到cfg对象

cfg = rs.config()

2. 注意当前配置设置是否包含settings 子文档。如果是，跳过这一步

提醒：为了避免数据丢失，如果配置设置包含settings子文档时跳过这一步

如果当前配置设置没有包含settings 子文档，通过下列命令创建这个子文档：

cfg.settings = { }

3. 执行下列命令设置chainingAllowed 为false：

cfg.settings.chainingAllowed = false
rs.reconfig(cfg)

要重新启用链式复制，设置chainingAllowed 为true。使用下列命令：

cfg = rs.config()
cfg.settings.chainingAllowed = true
rs.reconfig(cfg)

---------------------------------------------------------------------

Note:If chained replication is disabled, you still can use replSetSyncFrom to specify that a secondary replicates from another secondary. But that configuration will last only until the secondary recalculates which member to sync from.

注意：如果链式复制是禁用的，你仍可以使用replSetSyncFrom 指定一个从节点从另一个从节点复制。但该配置只将持续，直到次级重新计算从哪个成员同步。

---------------------------------------------------------------------

Changing Oplog Size 改变Oplog大小

The following is an overview of the procedure for changing the size of the oplog. For a detailed procedure, see Change the Size of the Oplog (page 332).

1. Shut down the current primary instance in the replica set and then restart it on a different port and in “standalone” mode.

2. Create a backup of the old (current) oplog. This is optional.

3. Save the last entry from the old oplog.

4. Drop the old oplog.

5. Create a new oplog of a different size.

6. Insert the previously saved last entry from the old oplog into the new oplog.

7. Restart the server as a member of the replica set on its usual port.

8. Apply this procedure to any other member of the replica set that could become primary.

下列是改变Oplog大小的程序的概览：

1. 停止副本集的当前主节点，然后在另一个端口使用"standalone"（单例）模式重启它。

2. 创建旧（当前）oplog的一个备份。这是可选的。

3. 保存旧oplog的最后一个条目。

4. 删除就oplog。

5. 使用不同的大小创建一个新的oplog。

6. 插入刚才从旧oplog中保存的最后一个条目到新的oplog。

7. 在平常使用的端口以副本集的一个成员重启服务。

8. 应用这个程序到任何可能成为主节点的节点。

Resyncing a Member of a Replica Set

重新同步副本集的节点

When a secondary’s replication process falls behind so far that primary overwrites oplog entries that the secondary has not yet replicated, that secondary cannot catch up and becomes “stale.” When that occurs, you must completely resynchronize the member by removing its data and performing an initial sync.

To do so, use one of the following approaches:

• Restart the mongod with an empty data directory and let MongoDB’s normal initial syncing feature restore the data. This is the more simple option, but may take longer to replace the data.

See Automatically Resync a Stale Member.

• Restart the machine with a copy of a recent data directory from another member in the replica set. This procedure can replace the data more quickly but requires more manual steps.

See Resync by Copying All Datafiles from Another Member.

当一个从节点的复制滞后到主节点覆盖oplog条目，从节点还未复制，从节点不能赶上且数据变得陈旧。当发生这样的情况，你必须通过删除它的数据和执行初始化同步以完全同步这个节点。

使用下面的方法：

• 使用空数据目录重启mongd，让MongoDB标准初始化同步功能恢复数据。这是最简单的方式，但会使用很长时间替换数据。

查看“自动重新同步陈旧节点”

• 使用副本集另一个节点的最新的数据目录副本重启机器。这个过程可以很快的替换数据，但需要更多手工步骤。

查看“通过从其他节点复制数据文件重新同步”

Automatically Resync a Stale Member

自动重新同步陈旧节点

This procedure relies on MongoDB’s regular process for initial sync. This will restore the data on the stale member to reflect the current state of the set. For an overview of MongoDB initial sync process, see the Syncing section.

这个程序依赖MongoDB正常的初始化同步程序。这将恢复陈旧节点的数据到反映副本集当前的状态。

To resync the stale member:

1. Stop the stale member’s mongod instance. On Linux systems you can use mongod --shutdown

Set --dbpath to the member’s data directory, as in the following:

mongod --dbpath /data/db/ --shutdown

2. Delete all data and sub-directories from the member’s data directory. By removing the data dbpath(page 939), MongoDB will perform a complete resync. Consider making a backup first.

3. Restart the mongod instance on the member. For example:

mongod --dbpath /data/db/ --replSet rsProduction

At this point, the mongod will perform an initial sync. The length of the initial sync may process depends on the size of the database and network connection between members of the replica set.

Initial sync operations can impact the other members of the set and create additional traffic to the primary, and can only occur if another member of the set is accessible and up to date.

重新同步陈旧节点：

1. 停止陈旧节点的mongod实例。Linux下使用：

mongod --dbpath /data/db/ --shutdown

2. 删除所有数据和数据子目录。通过删除数据dbpath，MongoDB将执行一次完整的重同步。考虑先做备份。

3. 重启mongod实例：

mongod --dbpath /data/db/ --replSet rsProduction

这一步骤，mongod将执行一次初始化同步。初始化同步可能处理的时长依赖于数据库的大小和副本集中两个节点之间的网络连接。

初始化同步操作会影响副本集的其他节点及为主节点创造额外流量，而且只发生在副本集的其他节点可访问且是最新的。

Resync by Copying All Datafiles from Another Member

通过从其他节点复制数据文件重新同步

This approach uses a copy of the data files from an existing member of the replica set, or a back of the data files to “seed” the stale member.

The copy or backup of the data files must be sufficiently recent to allow the new member to catch up with the oplog, otherwise the member would need to perform an initial sync.

这个方法使用副本集现有节点的数据文件副本，或作为陈旧节点种子的数据文件备份。

这个数据文件的副本或备份必须足够新，才能让新节点赶上oplog，否则这个节点将需要执行一次初始化同步。

---------------------------------------------------------------------

Note:In most cases you cannot copy data files from a running mongod instance to another, because the data files will change during the file copy operation. Consider the Backup Strategies for MongoDB Systems documentation for several methods that you can use to capture a consistent snapshot of a running mongod instance.

注意：大多数情况下，你不能从一个正在运行的mongod实例中复制数据到另一个节点，因为数据文件将在复制过程中改变。研究MongoDB的备份策略文档提供的几个方法，你能使用它捕获正在运行的mongod实例的一个一致的快照。

---------------------------------------------------------------------

After you have copied the data files from the “seed” source, start the mongod instance and allow it to apply all operations from the oplog until it reflects the current state of the replica set.

从种子源复制数据文件后，启动mongod实例且运行它执行oplog中的所有操作，直到它能反映副本集的当前状态。

29.2.3 Security Considerations for Replica Sets 副本集安全注意事项

In most cases, the most effective ways to control access and to secure the connection between members of a replica set depend on network-level access control. Use your environment’s firewall and network routing to ensure that traffic only from clients and other replica set members can reach your mongod instances. If needed, use virtual private networks (VPNs) to ensure secure connections over wide area networks (WANs.)

大多数情况下，最有效的方法来控制访问，并确保副本的成员之间的安全连接设置依赖于网络级的访问控制。使用你的环境的防火墙和网络路由来确保只有从客户端和其他副本集成员的流程能到达你的mongod实例。如果有必要，使用虚拟专用网络（VPN）来确保广域网的安全连接（WAN）。

Additionally, MongoDB provides an authentication mechanism for mongod and mongos instances connecting to replica sets. These instances enable authentication but specify a shared key file that serves as a shared password.

另外，MongoDB为mongod和连接到副本集的mongo实例提供身份认证机制。这些实例启用身份验证，但指定一个共享的密钥文件，作为一个共享的密码。

New in version 1.8: Added support authentication in replica set deployments.

Changed in version 1.9.1: Added support authentication in sharded replica set deployments.

1.8版本更新：在副本集部署中增加身份认证。

1.9.1版本更新：在分片副本集部署中增加身份认证支持。

To enable authentication add the following option to your configuration file:

开启身份认证：

     keyFile 
    = 
    /srv 
    /mongodb 
    /keyfile 
   

---------------------------------------------------------------------

Note:You may chose to set these run-time configuration options using the --keyFile(or mongos --keyFile) options on the command line.

注意：你可以选择使用--keyFile(或 mongos --keyFile)选项在命令行设置这是运行时配置选项。

---------------------------------------------------------------------

Setting keyFile(page 938) enables authentication and specifies a key file for the replica set members to use when authenticating to each other. The content of the key file is arbitrary but must be the same on all members of the replica set and on all mongos instances that connect to the set.

设置启用身份验证密钥文件，并指定副本集成员彼此进行身份验证时使用的密钥文件。密钥文件的内容可以是任意的，但在副本集的所有成员和连接到该副本集的Mongo实例之间必须是相同的。

The key file must be less one kilobyte in size and may only contain characters in the base64 set. The key file must not have group or “world” permissions on UNIX systems. Use the following command to use the OpenSSL package to generate “random” content for use in a key file:

这个key文件必须小于1kb，且只能使用base64字符集。在UNIX系统上，key文件不能有组或"world"权限。使用下面的命令运用PoenSSL包为key文件生成随机内容：

     openssl rand 
    -base 
     64 
    753 
   

---------------------------------------------------------------------

Note:Key file permissions are not checked on Windows systems.

注意：在window上不检查key文件的权限。

---------------------------------------------------------------------

29.2.4 Troubleshooting Replica Sets 副本集故障排除

This section describes common strategies for troubleshootingreplica sets.

See also:

Monitoring Database Systems(page 53).

Check Replica Set Status 检查副本集状态

To display the current state of the replica set and current state of each member, run the rs.status() method in a mongo shell connected to the replica set’s primary. For descriptions of the information displayed by rs.status(), see Replica Set Status Reference.

要显示副本集和各节点的当前状态，连接到副本集的主节点，在mongo内核运行rs.status()方法。

---------------------------------------------------------------------

Note:The rs.status() method is a wrapper that runs the replSetGetStatus database command.

注意：rs.status() 方式是一个包装程序，运行的是replSetGetStatus 数据命令。

---------------------------------------------------------------------

Check the Replication Lag 检查复制延迟

Replication lag is a delay between an operation on the primary and the application of that operation from the oplog to the secondary. Replication lag can be a significant issue and can seriously affect MongoDB replica set deployments.

Excessive replication lag makes “lagged” members ineligible to quickly become primary and increases the possibility that distributed read operations will be inconsistent.

复制延迟是主节点上的操作与来自oplog的操作应用到从节点之间的延迟。复制延迟是一个重大的问题，会严重影响MongoDB的副本集部署。

To check the current length of replication lag:

检查当前复制延迟长度：

• In a mongo shell connected to the primary, call the db.printSlaveReplicationInfo() method.

• 在连接到主节点的mongo内核，调用db.printSlaveReplicationInfo() 方法。

The returned document displays the syncedTo value for each member, which shows you when each member last read from the oplog, as shown in the following example:

返回的文档显示每个节点的syncedTo 值，这是各节点最后一次读取oplog的时间：

source : m1.example.net : 30001
    syncedTo : Tue Oct 02 2012 11 : 33 : 40 GMT - 0400 (EDT)
         = 7475 secs ago ( 2. 08hrs)
source : m2.example.net : 30002
    syncedTo : Tue Oct 02 2012 11 : 33 : 40 GMT - 0400 (EDT)
         = 7475 secs ago ( 2. 08hrs)

• Monitor the rate of replication by watching the oplog time in the “replica” graph in the MongoDB Monitoring Service. For more information see the documentation for MMS.

• 通过查看在MongoDB检控服务的"dreplica"视图中的oplog时间，监视复制的速度。

Possible causes of replication lag include:

• Network Latency

Check the network routes between the members of your set to ensure that there is no packet loss or network routing issue.

Use tools including ping to test latency between set members and traceroute to expose the routing of packets network endpoints.

• Disk Throughput

If the file system and disk device on the secondary is unable to flush data to disk as quickly as the primary,then the secondary will have difficulty keeping state. Disk-related issues are incredibly prevalent on multi-tenant systems, including vitalized instances, and can be transient if the system accesses disk devices over an IP network (as is the case with Amazon’s EBS system.)

Use system-level tools to assess disk status, including iostat or vmstat.

• Concurrency

In some cases, long-running operations on the primary can block replication on secondaries. For best results, configure write concern to require confirmation of replication to secondaries, as described in Write Concern. This prevents write operations from returning if replication cannot keep up with the write load.

Use the database profiler to see if there are slow queries or long-running operations that correspond to the incidences of lag.

• Appropriate Write Concern

If you are performing a large data ingestion or bulk load operation that requires a large number of writes to the primary, particularly with unacknowledged write concern, the secondaries will not be able to read the oplog fast enough to keep up with changes.

To prevent this, require write acknowledgment or journaled write concern after every 100, 1,000, or an another interval to provide an opportunity for secondaries to catch up with the primary.

For more information see:

– Write Concern

– Oplog

复制延迟可能的原因包括：

• 网络延迟

检查副本集的节点之间的网络路由来确保没有丢包或网络路由问题。

使用工具：ping测试副本集节点之间的延迟、traceroute揭露数据包的网络路由终点。

• 磁盘吞吐量

如果从节点的文件系统和磁盘驱动不能想主节点一样快的刷数据到磁盘，那么从节点的状态保持将有困难。磁盘相关的问题是令人难以置信的普遍存在于多租户系统上（包含活跃实例），并且可以是瞬态的，如果系统访问磁盘设备通过一个IP网络（如与亚马逊的EBS系统的情况下）。

是用系统级别的工具访问磁盘状态，包括iostat 和 vmstat。

• 并发

在一些情况下，在主节点上的长时间运行的操作可能阻塞从节点的复制。为了得到最好的结果，配置写关注以要求确认复制到从节点。这可以防止写操作回流如果复制不能跟上写入负载。

使用database profiler查看使用有符合延迟发生的缓慢查询任务或长时间运行的操作。

• 合适的写关注

如果您正在执行需要大量的写入到主节点的大型数据插入或大容量加载操作，特别是与未确认写关注，从节点将无法足够快的读取oplog以跟上变化。

为了防止这件事发生，在每100秒,1000秒之后要求写确认或日志写关注，或其他的时间间隔来提供一个给从节点跟上主节点的机会。

Test Connections Between all Members 测试节点间连接

All members of a replica set must be able to connect to every other member of the set to support replication. Always verify connections in both “directions.” Networking topologies and firewall configurations prevent normal and required connectivity, which can block replication.

Consider the following example of a bidirectional test of networking:

为了支撑复制，副本集中的所有节点都必须能彼此互相连接。应该总是验证连接的两个方向。网络拓扑结构和防火墙配置阻止正常和必需的连接，阻塞复制。

请看下面的例子一个双向联网测试：

----------------------------------------------------

Example

Given a replica set with three members running on three separate hosts:

给出运行在三个独立主机的包含三个节点的副本集：

• m1.example.net
• m2.example.net
• m3.example.net

1. Test the connection from m1.example.net to the other hosts with the following operation set m1.example.net:

1. 测试从m1.example.net 到其他主机的连接：

       mongo 
      -- host m2.example.net 
       -- port 
      27017 
      
 mongo 
      -- host m3.example.net 
       -- port 
      27017 
     

2. Test the connection from m2.example.net to the other two hosts with the following operation set from m2.example.net, as in:

2. 测试从m2.example.net到其他两个主机的连接：

     mongo 
    --host m1.example.net 
     --port 
    27017 
    
 mongo 
    --host m3.example.net 
     --port 
    27017 
   

You have now tested the connection between m2.example.net and m1.example.net in both directions.

你现在已经测试了 m2.example.net 和 m1.example.net 两个方向的连接。

3. Test the connection from m3.example.net to the other two hosts with the following operation set from the m3.example.nethost, as in:

3. 测试从m3.example.net到其他两个主机的连接

     mongo 
    --host m1.example.net 
     --port 
    27017 
    
 mongo 
    --host m2.example.net 
     --port 
    27017 
   

If any connection, in any direction fails, check your networking and firewall configuration and reconfigure your environment to allow these connections.

如果任何任何方向的互连失败，检查你的网络和防火墙配置并且重新配置你的环境以允许他们连接。

----------------------------------------------------

Check the Size of the Oplog 检查oplog大小

A larger oplog can give a replica set a greater tolerance for lag, and make the set more resilient.

一个大容量的oplog可以给副本集一个更大度量的延迟，而且使副本集更加弹性。

To check the size of the oplog for a given replica set member, connect to the member in a mongo shell and run the db.printReplicationInfo()(page 842) method.

要检查指定的副本集节点的oplog大小，使用mongo连接到这个节点运行db.printReplicationInfo()方法。

The output displays the size of the oplog and the date ranges of the operations contained in the oplog. In the following example, the oplog is about 10MB and is able to fit about 26 hours (94400 seconds) of operations:

它的输出显示oplog的大小和oplog里包含的操作的范围。在下面的例子中，oplog大约10MB，并且能够适应26小时的操作。

 
    configured oplog size:  10.10546875MB
 log length start to end:  94400 (26.22hrs)
 oplog first event time: Mon Mar  19 2012 13:50:38 GMT-0400 (EDT)
 oplog last event time: Wed Oct  03 2012 14:59:10 GMT-0400 (EDT)
 now: Wed Oct 032012 15:00:21 GMT-0400 (EDT) 
   

The oplog should be long enough to hold all transactions for the longest downtime you expect on a secondary. At a minimum, an oplog should be able to hold minimum 24 hours of operations; however, many users prefer to have 72 hours or even a week’s work of operations.

oplog应该足够长以保持所有事务的时间，你希望一个从节点最长的停机时间。在最低限度，一个oplog应该能够容纳至少24个小时的操作;然而，一些用户更喜欢能容纳72小时或甚至一个星期的操作。

For more information on how oplog size affects operations, see:

• The Oplog(page 280) topic in theReplica Set Fundamental Concepts(page 277) document.

• The Delayed Members(page 285) topic in this document.

• The Check the Replication Lag(page 293) topic in this document.

---------------------------------------------------------------------

Note: You normally want the oplog to be the same size on all members. If you resize the oplog, resize it on all members.

注意：你通常希望所有成员的oplog是相同大小的。如果你调整oplog的，调整所有成员的oplog。

---------------------------------------------------------------------

To change oplog size, see Changing Oplog Size(page 291) in this document or see the Change the Size of the Oplog(page 332) tutorial.

Failover and Recovery 故障切换和恢复

Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.

副本集具有自动故障切换功能。如果主节点离线或没有响应，而且原来的副本集成员仍然能互相连接，副本集将选举一个新的主节点。

While failover is automatic, replica set administrators should still understand exactly how this process works. This section below describe failover in detail.

虽然故障切换是自动的，副本集管理员还是需要理解究竟这个过程是怎样进行的。以下小节详细描述故障切换。

In most cases, failover occurs without administrator intervention seconds after the primary either steps down, becomes inaccessible, or becomes otherwise ineligible to act as primary. If your MongoDB deployment does not failover according to expectations, consider the following operational errors:

• No remaining member is able to form a majority. This can happen as a result of network partitions that render some members inaccessible. Design your deployment to ensure that a majority of set members can elect a primary in the same facility as core application systems.

• No member is eligible to become primary. Members must have a priority setting greater than 0, have a state that is less than ten seconds behind the last operation to the replica set, and generally be more up to date than the voting members.

在很多情况下，无论是主节点下台（不可访问）或不再有资格作为主节点，不久后故障切换就在没有管理员干预下发生。如果你的MongoDB部署没有像预期那样故障切换，考虑下面的错误操作：

• 没有剩余的成员形成多数。这可能是因为网络分区造成一些节点不可访问导致的。设计你的部署以确保多数节点可以选举出与主应用系统相同设施的主节点。

• 没有节点符合成为主节点的资格。节点需要有一个大于0的优先级，并且一般比参与投票的节点更新。

In many senses,rollbacks represent a graceful recovery from an impossible failover and recovery situation.

在很多意义下，回滚代表一个从不可能的故障切换和恢复的情况的一个优美的恢复。

Rollbacks occur when a primary accepts writes that other members of the set do not successfully replicate before theprimary steps down. When the former primary begins replicating again it performs a “rollback.” Rollbacks removethose operations from the instance that were never replicated to the set so that the data set is in a consistent state. The m ongod program writes rolled back data to a BSONfile that you can view using bsondump(page 912), applied manually using mongorestore(page 909).

回滚发生在当一个主节点接受了写操作，而在主节点下台前其他节点没有成功的复制的情况下。当前主节点开始复制，它执行了一次回滚。回滚操作删除那些从那个实例来的且不能复制到副本集的操作，所以数据集的状态是一致的。mongod程序把回滚的数据写入到一个BSON文件中，你可以使用bsondump查看这些数据，用mongorestore执行。

You can prevent rollbacks using a replica acknowledged write concern. These write operations requirenot only the primary to acknowledge the write operation, sometimes even the majority of the set to confirm the writeoperation before returning.

你可以使用replica acknowledged（复制确认）写关注。这些写操作在返回前不仅要求主节点确认，有时甚至需要大多数节点确认。

enabling write concern.启用写关注

See also:

The Elections(page 278) section in the Replica Set Fundamental Concepts(page 277) document, and the ElectionInternals (page 311) section in theReplica Set Internals and Behaviors(page 309) document.

Oplog Entry Timestamp Error oplog条目时间戳错误

Consider the following error in mongod output and logs:

考虑mongod 输出的以下错误日志：

replSet error fatal couldn’t query the local local.oplog.rs collection. Terminating mongod after 30 <timestamp> [rsStart] bad replSet oplog entry?

Often, an incorrectly typed value in the ts field in the last oplog entry causes this error. The correct data type is Timestamp.

经常地，在最后一条oplog条目中的ts域里的错误类型数据都是因为这个错误。正确的数据类型是时间戳。

Check the type of the ts value using the following two queries against the oplog collection:

针对oplog集合使用下面的查询语句检查ts值的类型。

     db 
    = db.getSiblingDB( 
    "local") 
    
 db.oplog.rs.find().sort({$natural 
    : 
    - 
    1}).limit( 
    1) 
    
 db.oplog.rs.find({ts 
    :{$type 
    : 
    17}}).sort({$natural 
    : 
    - 
    1}).limit( 
    1) 
   

The first query returns the last document in the oplog, while the second returns the last document in the oplog where the ts value is a Timestamp. The $type(page 707) operator allows you to select BSON type 17, is the Timestamp data type.

If the queries don’t return the same document, then the last document in the oplog has the wrong data type in the ts field.

第一条查询语句返回oplog的最后一个文档，而第二条返回oplog中最后一条ts值是时间戳的文档。$Type操作允许你选择时间戳数据类型代号17的BSON。

如果这些查询没有返回同一个文档，那么oplog的最后一个文档的ts的数据类型是错误的。

----------------------------------------------------

Example

If the first query returns this as the last oplog entry:

第一个查询返回oplog的最后一个文档：

     { 
     
         "ts" 
     : {t 
     : 
     1347982456000, i 
     : 
     1}, 
     
 
         "h" 
     : NumberLong( 
     "8191276672478122996"), 
     
 
         "op" 
     : 
      "n", 
     
 
         "ns" 
     : 
      "", 
     
 
         "o" 
     : { 
     "msg" 
     : 
     "Reconfig set", 
     "version" 
     : 
     4 } 
    

      } 
    

And the second query returns this as the last entry where ts has the Timestamp type:

第二个查询返回oplog中最后一条ts值是时间戳的文档

     { 
     
         "ts" 
     : Timestamp( 
     1347982454000, 
     1), 
     
 
         "h" 
     : NumberLong( 
     "6188469075153256465"), 
     
 
         "op" 
     : 
      "n", 
     
 
         "ns" 
     : 
      "", 
     
 
         "o" 
     : { 
     "msg" 
     : 
     "Reconfig set", 
     "version" 
     : 
     3 } 
    

      } 
    

Then the value for the ts field in the last oplog entry is of the wrong data type.

那么oplog的最后一个文档的ts的数据类型是错误的

----------------------------------------------------

To set the proper type for this value and resolve this issue, use an update operation that resembles the following:

设置正确的值类型同时解决这个问题，使用类似于下面的update语句：

     db.oplog.rs.update( { ts 
    : { t 
    : 
    1347982456000, i 
    : 
    1 } },{ $set 
    : { ts 
    : 
    new Timestamp( 
    1347982456000, 
    1)}}) 
   

Modify the timestamp values as needed based on your oplog entry. This operation may take some period to complete because the update must scan and pull the entire oplog into memory.

基于你的oplog条目修改需要的时间戳值。这个操作可能要花费一点时间去完成，因为这个update必须扫描oplog和插入到oplog中。

Duplicate Key Error on local.slaves

local.slaves重复主键错误

The duplicate key on local.slaves error, occurs when a secondary or slave changes its hostname and the primary or master tries to update its local.slaves collection with the new name. The update fails because it contains the same _id value as the document containing the previous hostname. The error itself will resemble the following.

在local.slaves中的重复键错误，发生在当一个从节点改变他的主机名视图update新的名字到local.slaves集合的时候。因为保存同样的_id值而更新失败。错误如下：

      exception 
     11000 E11000 duplicate key error index 
     : local.slaves.$_id_ dup key 
     : { 
     : ObjectId(’ 
     <object 
    

This is a benign error and does not affect replication operations on the secondary or slave.

To prevent the error from appearing, drop the local.slaves collection from the primary or master, with the following sequence of operations in the mongo shell:

这是一个良性错误，不会影响从节点上的复制操作。要防止错误出现，删除主节点的local.slaves集合：

      use local 
     
 db.slaves.drop()

The next time a secondary or slave polls the primary or master, the primary or master recreates the local.slaves collection.

下一次从节点投票出主节点，主节点会重新创建local.slaves集合。

Elections and Network Partitions 选举和网络分区

Members on either side of a network partition cannot see each other when determining whether a majority is available to hold an election.

That means that if a primary steps down and neither side of the partition has a majority on its own, the set will not elect a new primary and the set will become read only. To avoid this situation, attempt to place a majority of instances in one data center with a minority of instances in a secondary facility.

在任何一侧上的网络分区的节点不能访问彼此，当正在决定是否拥有hold住选举的大多数可见节点的时候。

这意味着如果一个主节点下台，而且网络分区的每一侧都没有大多数节点，副本集将不能选举出一个新的主节点，副本集将变成只读的。要避免这种情况，尝试放置大多数实例在一个数据中心，在辅助设施放置少数从节点。

See

Election Internals(page 311).

END 13/08/10 0:49:22