官方文档译文 Release 2.2.5 译文：Part VII Replication 29.1 Replica Set Fundamental Concepts

最新推荐文章于 2021-02-16 17:15:35 发布

yameing

最新推荐文章于 2021-02-16 17:15:35 发布

阅读量2.3k

点赞数

分类专栏： MongoDB 文章标签： MongoDB 数据库

MongoDB 专栏收录该内容

9 篇文章 0 订阅

订阅专栏

官方文档译文 Release 2.2.5 译文：Part VII Replication 29.0

副本集的使用和操作

29.1 Replica Set Fundamental Concepts 复制的概述

A MongoDB replica set is a cluster of mongod(page 887) instances that replicate amongst one another and ensure automated failover. Most replica sets consist of two or more mongod(page 887) instances with at most one of these designated as the primary and the rest as secondary members. Clients direct all writes to the primary, while the secondary members replicate from the primary asynchronously.

MongoDB副本集是一个mongod实例的集群，实例之间互相复制，并确保自动处理故障切换。副本集由两个或多个mongod实例组成，其中至多一个实例被指定为主节点，其余的为从节点。客户端直接写入到主节点，而从节点从主节点异步地复制。

Database replication with MongoDB adds redundancy, helps to ensure high availability, simplifies certain administrative tasks such as backups, and may increase read capacity. Most production deployments use replication.

MongoDB数据库复制可以增加冗余，有助于确保高可用性，简化某些管理任务（如备份），增强读操作性能（用于读扩展）。大多数生产部署环境使用复制。

If you’re familiar with other database systems, you may think about replica sets as a more sophisticated（复杂的） form of traditional master-slave replication.In master-slave replication, a master node accepts writes while one or more slave nodes replicate those write operations and thus maintain data sets identical to the master. For MongoDB deployments,the member that accepts write operations is the primary, and the replicating members are secondaries.

如果你熟悉其他数据库系统，你会觉得副本集比传统主从复制还复杂。在主从复制中，一个主节点处理写操作，同时一个或多少从节点复制这些写操作，从而保持与主节点数据集的一致。对于MongoDB，接受写操作的节点是主节点，其他做复制的节点都是从节点。

MongoDB’s replica sets provide automated failover. If aprimary fails, the remaining members will automatically try to elect a new primary.

MongoDB的副本集有自动故障切换功能。如果主节点不工作了，其余的节点将自动尝试选举出一个新的主节点。

A replica set can have up to 12 members, but only 7 members can have votes. For information regarding non-voting members, see non-voting members(page 286)

一个副本集最多可以有12个节点，但只有7个节点可以参与投票选举。关于无投票权节点的说明，查看无投票权节点的说明。

See also:

The Replication(page 275) index for a list of the documents in this manual that describe the operation and use of replica sets.

29.1.1 Member Configuration Properties 节点属性配置

You can configure replica set members in a variety of ways, as listed here. In most cases, members of a replica set have the default proprieties.

有很多种方式配置副本集中的节点（如下）。大多数情况下，副本集中的节点使用默认配置。

• Secondary-Only: These members have data but cannot become primary under any circumstance. See Secondary-Only Members(page 283).

• Hidden: These members are invisible to client applications. See Hidden Members(page 284).

• Delayed: These members apply operations from the primary’s oplog after a specified delay. You can think of a delayed member as a form of “rolling backup.” See Delayed Members(page 285).

• Arbiters: These members have no data and exist solely to participate in elections(page 278). See Arbiters(page 286).

• Non-Voting: These members do not vote in elections. Non-voting members are only used for larger sets with more than 7 members. See Non-Voting Members(page 286).

• Secondary-Only: 这类节点有数据，但任何情况下都不能成为主节点。

• Hidden: 这类节点对客户端应用程序是隐藏的。

• Delayed: 这类节点按指定的延时应用主节点的操作记录。可把此类节点当做“回滚备份”的一种形式。

• Arbiters: 仲裁者，这类节点没有数据，只参与投票选举。

• Non-Voting: 这类节点不参与投票，只用于大于7个节点的副本集中。

For more information about each member configuration, see the Member Configurations(page 283) section in the Replica Set Operation and Management(page 282) document.

29.1.2 Failover 故障切换

Replica sets feature automated failover. If the primary goes offline or becomes unresponsive and a majority of the original set members can still connect to each other, the set will elect a new primary.

副本集具有自动故障切换功能。如果主节点断线或不响应，而大多数原副本集的成员仍相互连接，则副本集将选举出一个新主节点。

For a detailed explanation of failover, see the Failover(page 278) section in the Replica Set Operation and Management(page 282) document.

Elections 选举

When any failover occurs, an election takes place to decide which member should become primary.

Elections provide a mechanism for the members of a replica set to autonomously select a new primary without administrator intervention. The election allows replica sets to recover from failover situations very quickly and robustly.

当发生故障切换，将通过选举决定哪个节点可以成为主节点。选举提供一个机制保证副本集的节点在没有管理员干预的情况下自主自动地选择出一个新的主节点。选举允许副本集非常迅速和稳健地恢复切换。

Whenever the primary becomes unreachable, the secondary members trigger an election. The first member to receive votes from a majority of the set will become primary. The most important feature of replica set elections is that a majority of the original number of members in the replica set must be present for election to succeed. If you have a three-member replica set, the set can elect a primary when two or three members can connect to each other. If two members in the replica go offline, then the remaining member will remain a secondary.

当主节点无法访问，从节点就触发一个选举。第一个接收到大多数选票的节点将成为主节点。副本集选举最重要的特征是，为了保证选举成功，副本集里的大多数节点必须参与选举。如有一个三个节点的副本集，当两个或三个副本能相互访问时才能选举主节点。如果有两个节点断线，那么剩余的节点将保持从节点的身份。

Note: When the current primary steps down and triggers an election, the mongod(page 887) instances will close all client connections. This ensures that the clients maintain an accurate view of the replica set and helps prevent rollbacks.

注意：当当前主节点下台并触发选举，mongod实例会关闭所有客户端连接。这会确保客户端保持一个准确的副本集视图，这有利于数据回滚。

For more information on elections and failover, see:

• The Failover(page 278) section in theReplica Set Operation and Management(page 282) document.

• The Election Internals(page 311) section in theReplica Set Internals and Behaviors(page 309) document.

Member Priority 节点优先级

In a replica set, every member has a “priority,” that helps determine eligibility for election(page 278) to primary. By default, all members have a priority of 1, unless you modify the priority(page 983) value. All members have a single vote in elections.

在副本集中，每个节点都有“优先级”，这有助于选举出符合资格的主节点。每个节点的默认优先级是1，除非修改。每个节点都有一个选票。

Warning:Always configure the priority(page 983) value to control which members will become primary.

提醒：应该总是通过配置优先级以控制节点成为主节点。

Do not configure votes(page 983) except to permit more than 7 secondary members.

For more information on member priorities, see theAdjusting Priority(page 288) section in theReplica Set Operation and Management(page 282) document.

29.1.3 Consistenct 一致性

This section provides an overview of the concepts that underpin database consistency and the MongoDB mechanisms to ensure that users have access to consistent data.

本节介绍支撑数据库一致性的机制，以确保用户有访问到一致的数据。

In MongoDB, all read operations issued to the primary of a replica set are consistent with the last write operation.

在MongoDB中，所有发送到副本集主节点的读操作与最后一次写操作是一致的。

If clients configure the read preference to permit secondary reads, read operations cannot return from secondary members that have not replicated more recent updates or operations. In these situations the query results may reflect a previous state.

如果客户端设置允许从节点的读操作，读操作无法从未复制最近的更新或操作的从节点返回数据。

This behavior is sometimes characterized as eventual consistency because the secondary member’s state will eventually reflect the primary’s state and MongoDB cannot guarantee strict consistency for read operations from secondary members.

有时，这种行为被定性为最终一致性，这是因为从节点最终会反映主节点的状态，但MongoDB不能保证从从节点的读操作的严格一致性。

There is no way to guarantee consistency for reads from secondary members, except by configuring the client and driver to ensure that write operations succeed on all members before completing successfully.

没有办法保证从从节点读取的一致性，除非配置客户端和驱动程序以确保写操作在所有节点完成时成功完成。

Rollbacks 回滚

In some failover situations primaries will have accepted write operations that have not replicated to the secondaries after a failover occurs. This case is rare and typically occurs as a result of a network partition with replication lag. When this member (the former primary) rejoins the replica set and attempts to continue replication as a secondary the former primary must revert these operations or “roll back” these operations to maintain database consistency across the replica set.

在一些故障切换情况下，主节点接受了写操作，但在故障切换发生后没有复制到从节点。这种情况是很少见的，通常是出现网络分区复制滞后的结果。当这个节点（前主节点）重新加入副本集，并且当做从节点尝试继续复制主节点时，必须恢复或者回滚这些操作以维持副本集的数据库一致性。

MongoDB writes the rollback data to a BSON file in the database’s dbpath(page 939) directory. Use bsondump (page 912) to read the contents of these rollback files and then manually apply the changes to the new primary. There is no way for MongoDB to appropriately and fairly handle rollback situations automatically. Therefore you must intervene manually to apply rollback data. Even after the member completes the rollback and returns to secondary status, administrators will need to apply or decide to ignore the rollback data. MongoDB writes rollback data to a rollback/ folder within the dbpath(page 939) directory to files with filenames in the following form:

<database>.<collection>.<timestamp>.bson

For example:

records.accounts.2011-05-09T18-10-04.0.bson

MongoDB把回滚数据写到数据库dbpath目录的一个BSON文件中。使用bsondump读取回滚数据文件的内容，并手动应用这些变更到新的主节点。MongoDB没有办法自动适当地公平地处理回滚。因此，您必须手动干预以应用回滚数据。即使在节点完成回滚并返回到从节点，管理员仍需应用或决定忽略这些回滚数据。MongoDB将回滚数据写到%dbpath%/rollback/中，文件名格式如下：

<database>.<collection>.<timestamp>.bson

例如：

records.accounts.2011-05-09T18-10-04.0.bson

The best strategy for avoiding all rollbacks is to ensure write propagation(page 301) to all or some of the members in the set. Using these kinds of policies prevents situations that might create rollbacks.

避免回滚最好的策略是确保写操作输出到副本集中的所有或大多数节点。使用这种策略可防止造成回滚的情况。

Warning: A mongod(page 887) instance will not rollback more than 300 megabytes of data. If your systemneeds to rollback more than 300 MB, you will need to manually intervene to recover this data. If this is the case,you will find the following line in your mongod(page 887) log:

[replica set sync] replSet syncThread: 13410 replSet too much data to roll back

In these situations you will need to manually intervene to either save data or to force the member to performan initial sync from a “current” member of the set by deleting the content of the existing dbpath(page 939)directory.

提醒：mongod实例不能回滚大于300MB的数据。如果你的系统需要回滚超过300MB的数据，你需要手动恢复这些数据。这种情况下，你将会看到mongod日志中出现下行：

[replica set sync] replSet syncThread: 13410 replSet too much data to roll back

这种情况下，你需要手动干预保存数据，或通过删除数据目录中的内容强制地使节点从副本集的当前节点执行初始同步。

For more information on failover, see:

• The Failover and Recovery(page 295) section in this document.

• The Failover(page 278) section in the Replica Set Operation and Management(page 282) document.

Application Concerns 应用程序关注点

Client applications are indifferent to the configuration and operation of replica sets. While specific configurationdepends to some extent on the client drivers(page 427), there is often minimal or no difference between applicationsusing replica sets or standalone instances.

客户端应用程序一点都不关心副本集的操作和配置。虽然特性的配置一定程度上依赖于客户端驱动程序，但这一般与使用副本集或单机实例很少或者根本没有区别。

There are two major concepts that are important to consider when working with replica sets:

1. Write Concern(page 122).

Write concern sends a MongoDB client a response from the server to confirm successful write operations. Inreplica sets you can configure replica acknowledged(page 122) write concern to ensure that secondary membersof the set have replicated operations before the write returns.

2. Read Preference(page 303)

By default, read operations issued against a replica set return results from the primary. Users may configureread preference on a per-connection basis to prefer that read operations return on the secondary members.

使用副本集有两个大概念需要考虑：

1. 写关注

写关注从服务器发送一个回应给客户端确保写操作成功。在副本集中，你可以配置replica acknowledged write concern(副本公认写关注) 以确保从节点在写操作返回时已经复制了操作。

2. 读偏好

默认情况下，对副本集发出读操作的结果从主节点返回。用户应该在每个连接上配置读偏好，使读操作由从节点返回。

Read preferenceandwrite concernhave particularconsistency(page 279) implications.

For a more detailed discussion of application concerns, seeReplica Set Considerations and Behaviors for Applicationsand Development(page 301).

29.1.4 Administration and Operations 管理与运营

This section provides a brief overview of concerns relevant to administrators of replica set deployments.

For more information on replica set administration, operations, and architecture, see:

• Replica Set Operation and Management(page 282)

• Replica Set Architectures and Deployment Patterns(page 297)

Oplog

The oplog(operations log) is a special capped collection that keeps a rolling record of all operations that modify that data stored in your databases. MongoDB applies database operations on the primary and then records the operations on the primary’s oplog. The secondary members then replicate this log and apply the operations to them selves in an asynchronous process. All replica set members contain a copy of the oplog, allowing them to maintain the current state of the database. Operations in the oplog are idempotent.

oplog（操作日志）是一个专用的固定集合，保存所有存储在数据库中的数据的修改操作的回滚记录。MongoDB在主节点应用数据库操作，然后再主节点的oplog中记录这些操作。然后从节点复制这些日志同时异步地应用这些操作。所有副本集中的节点包含oplog的一个副本，使他们保持数据库当前状态。在oplog中的操作是等幂的。

By default, the size of the oplog is as follows:

• For 64-bit Linux, Solaris, FreeBSD, and Windows systems, MongoDB will allocate 5% of the available free disk space to the oplog.

If this amount is smaller than a gigabyte, then MongoDB will allocate 1 gigabyte of space.

• For 64-bit OS X systems, MongoDB allocates 183 megabytes of space to the oplog.

• For 32-bit systems, MongoDB allocates about 48 megabytes of space to the oplog.

默认的，oplog的大小如下：

• 64位的Linux、Solaris、FreeBSD、Windows系统，MongoDB为oplog分配5%的可用磁盘空间。

如果少于 1GB ，则分配 1GB 。

• 64位OS X系统（Mac），MongoDB为oplog分配 183MB 。

• 32位系统中，MongoDB为oplog分配 48MB 。

Before oplog creation, you can specify the size of your oplog with the oplogSize(page 944) option. After you start a replica set member for the first time, you can only change the size of the oplog by using the Change the Size of the Oplog(page 332)tutorial.

创建oplog之前，你可以通过oplogSize选项指定oplog的大小。第一次使用的副本集节点，可通过Change the Size of the Oplog教程改变oplog的大小。

In most cases, the default oplog size is sufficient. For example, if an oplog that is 5% of free disk space fills up in 24 hours of operations, then secondaries can stop copying entries from the oplog for up to 24 hours without becoming stale. However, most replica sets have much lower operation volumes, and their oplogs can hold a much larger number of operations.

大多数情况下，默认的oplog大小以及足够的了。例如：如果oplog 5% 的可用磁盘空间被24小时的操作日志填满，从节点在不超时的情况下超过24小时停止从oplog复制条目。然而，大多数副本集的操作量很少，但oplog能保持大量操作。

The following factors affect how MongoDB uses space in the oplog:

• Update operations that affect multiple documents at once.

The oplog must translate multi-updates into individual operations, in order to maintain idempotency. This can use a great deal of oplog space without a corresponding increase in disk utilization.

• If you delete roughly the same amount of data as you insert.

In this situation the database will not grow significantly in disk utilization, but the size of the operation log canbe quite large.

• If a significant portion of your workload entails in-place updates.

In-place updates create a large number of operations but do not change the quantity data on disk.

以下因素影响MongoDB使用oplog的空间：

• 一次性执行影响多个文档的更新操作

oplog须翻译多更新为单个操作，以保持等幂。这可以使用大量oplog空间且不增加磁盘利用率。

• 如果删除与插入一样多的数据量

这种情况下，数据库的磁盘使用率不会显著增长，但操作日志很快就相当大了。

• 如果一部分特别的工作必需就地更新

就地更新造成大量操作，但不会改变磁盘上数据的量。

If you can predict your replica set’s workload to resemble one of the above patterns, then you may want to considercreating an oplog that is larger than the default. Conversely, if the predominance of activity of your MongoDB-basedapplication are reads and you are writing a small amount of data, you may find that you need a much smaller oplog.

如果能预测到副本集的工作类似上述的一种模式，那你可能会考虑创建一个比默认值大的oplog。相反地，如果你基于MongoDB的应用以读取数据同时较少数据量的写入为主，则需一个小一点的oplog。

To view oplog status, including the size and the time range of operations, issue thedb.printReplicationInfo()(page 842) method. For more information on oplog status, see Check theSize of the Oplog(page 295).

For additional information about oplog behavior, see Oplog Internals(page 310) and Syncing(page 312).

Replica Set Deployment

Without replication, a standalone MongoDB instance represents a single point of failure and any disruption of theMongoDB system will render the database unusable and potentially unrecoverable. Replication increase the reliabilityof the database instance, and replica sets are capable of distributing reads to secondary members depending on readpreference. For database work loads dominated by read operations, (i.e. “read heavy”) replica sets can greatly increase the capability of the database system.

没有复制，一个单独的MongoDB实例容易发生单点故障，同时MongoDB的任何故障都将导致数据库不可用和潜在的不可恢复。复制提高了数据库实例的可靠性，同时副本集能根据读偏好分发读操作到从节点。数据库工作负载主要是读操作时，（如：频繁读取）副本集可大大提高数据库系统的性能。

The minimum requirements for a replica set include two members with data, for a primary and a secondary, and anarbiter(page 286). In most circumstances, however, you will want to deploy three data members.

副本集的最低要求是包含两个数据节点（一个主节点一个从节点），和一个仲裁者节点。然而，大多数情况下，都会部署3个数据节点。

For those deployments that rely heavily on distributing reads to secondary instances, add additional members to theset as load increases. As your deployment grows, consider adding or moving replica set members to secondarydata centers or to geographically distinct locations for additional redundancy. While many architectures are possible,always ensure that the quorum of members required to elect a primary remains in your main facility.

对于那些严重依赖分发读操作到从节点的部署，在负载增加时增加附加的节点。如部署增长，考虑增加或移动副本集节点到从节点数据中心或不同地理地点以增加冗余。尽管大多数架构可能、总是确保在你的设施中，合法的节点须选举出一个主节点。

Depending on your operational requirements, you may consider adding members configured for a specific purposeincluding, a delayed member to help provide protection against human errors and change control, a hidden member toprovide an isolated member for reporting and monitoring, and/or a secondary only member(page 283) for dedicatedbackups.

根据您的操作要求，你需考虑增加一个为特定目的配置的节点，如：一个延迟节点提供帮助防范人为错误和改变控制；一个隐藏节点提供报告和监控的隔离节点；和/或一个唯一的从节点做专用备份。

The process of establishing a new replica set member can be resource intensive on existing members. As a result,deploy new members to existing replica sets significantly before current demand saturates the existing members.

建立一个副本集节点的过程可能是因为现有节点上的资源紧张。因此，在现有节点达到需求饱和之前部署一个新的节点。

Note: Journaling, provides single-instance write durability. The journaling greatly improves the reliability anddurability of a database. Unless MongoDB runs with journaling, when a MongoDB instance terminates ungracefully,the database can end in a corrupt and unrecoverable state.

You should assume that a database, running without journaling, that suffers a crash or unclean shutdown is in corrupt or inconsistent state.

Use journaling, however, do not forego proper replication because of journaling.

64-bit versions of MongoDB after version 2.0 have journaling enabled by default.

注意：Journaling（日志记录），提供单实例写的耐用性。日志记录大大的提供数据库的可靠性和耐用性。除非MongoDB使用日志记录，当MongoDB非正常关闭，数据库可以在错误中或不可恢复状态中退出。

假设一个数据库没有运行日志记录，遭受崩溃或异常关机导致毁损或不一致的状态。

使用日志记录，然而，不要因为日志记录而抛弃复制。

2.0以后64位版本的MongoDB默认开启日志记录。

Security 安全

In most cases, replica set administrators do not have to keep additional considerations in mind beyond the normalsecurity precautions that all MongoDB administrators must take. However, ensure that:

• Your network configuration will allow every member of the replica set to contact every other member of thereplica set.

• If you use MongoDB’s authentication system to limit access to your infrastructure, ensure that you configure akeyFile(page 938) on all members to permit authentication.

For more information, see theSecurity Considerations for Replica Sets(page 292) section in theReplica Set Operationand Management(page 282) document.

Architectures 架构

The architecture and design of the replica set deployment can have a great impact on the set’s capacity and capability. This section provides a general overview of the architectural possibilities for replica set deployments. However, for most production deployments a conventional 3-member replica set with priority(page 983) values of 1are sufficient.

副本集部署的架构和设计对容量和性能有很大影响。本节提供副本集部署的可能的架构的概述。然而，对于大多数生产部署，传统的优先级都为1的3节点副本集已经足够使用的了。

While the additional flexibility discussed is below helpful for managing a variety of operational complexities, it always makes sense to let those complex requirements dictate complex architectures, rather than add unnecessary complexity to your deployment.

虽然下面讨论的额外的灵活性对大多数复杂性操作有帮助，它总是对指导复杂的架构有意义，而不是将不必要的复杂性添加到您的部署中。

Consider the following factors when developing an architecture for your replica set:

• Ensure that the members of the replica set will always be able to elect aprimary. Run an odd number of members or run an arbiter on one of your application servers if you have an even number of members.

• With geographically distributed members, know where the “quorum” of members will be in the case of any network partitions. Attempt to ensure that the set can elect a primary among the members in the primary data center.

• Consider including a hidden(page 284) or delayed member(page 285) in your replica set to support dedicated functionality, like backups, reporting, and testing.

• Consider keeping one or two members of the set in an off-site data center, but make sure to configure the priority (page 278) to prevent it from becoming primary.

• Create custom write concerns with replica set tags(page 986) to ensure that applications can control the thresh old for a successful write operation. Use these write concerns to ensure that operations propagate to specific data centers or to machines of different functions before returning successfully.

在为你的副本集搭建架构时考虑下列因素：

• 确保副本集的成员总是能选举出主节点。在你的应用服务器运行偶数个成员或运行一个仲裁者节点（如果你有偶数个节点）。

• 分散在各地的不同节点，在任何网络分区情况下都知晓"quorum"（法定成员）在哪里。尝试确保副本集在主数据中心中能选举出主节点。

• 考虑在你的副本集中加入一个隐藏节点或延迟节点以支持专用功能，如备份、报告和测试。

• 考虑在异地数据中心保持有一个或两个节点，但确保有配置防止他们成为主节点的优先级配置。

• 使用副本集标签创建客户写关注以确保应用能控制成功的写操作的阈值。使用这些写关注以确保在正常返回之前，那些操作传递到指定的数据中心或不同功能的机器。

For more information regarding replica set configuration and deployments seeReplica Set Architectures and Deploy ment Patterns(page 297).

END 13/07/22 0:39:26