MongoDB读隔离Majority和Linearizable的区别_command failed with error 148 (readconcernmajority-CSDN博客

本文链接：https://blog.csdn.net/qq_37135640/article/details/120688803

看了很多文章包括文档都没怎么理解Read Concern这两个级别的具体区别，最后还是在Mongo的git里翻到了最有力的解释。

官方文档里对Majority和Linearizable的解释是：

Performance Comparisons

Unlike “majority”, “linearizable” read concern confirms with secondary members that the read operation is reading from a primary that is capable of confirming writes with { w: “majority” } write concern. As such, reads with linearizable read concern may be significantly slower than reads with “majority” or “local” read concerns.

Majority是保证读到的是被大多数节点确认接收的数据，但会出现当主节点down掉恢复后，某一瞬间会有双主现象时产生旧数据（stale data）返回，导致脏读。而Linear则是为了避免这种现象在返回前多了一步写确认。

那么这个写确认具体是什么？

在git中有如下描述

Majority

Majority does a timestamped read at the stable timestamp (also called the last committed snapshot in the code, for legacy reasons). The data read only reflects the oplog entries that have been replicated to a majority of nodes in the replica set. Any data seen in majority reads cannot roll back in the future. Thus majority reads prevent dirty reads, though they often are stale reads.

Read concern majority reads usually return as fast as local reads, but sometimes will block. Read concern majority reads do not wait for anything to be committed; they just use different snapshots from local reads. They do block though when the node metadata (in the catalog cache) differs from the committed snapshot. For example, index builds or drops, collection creates or drops, database drops, or collmod’s could cause majority reads to block. If the primary receives a createIndex command, subsequent majority reads will block until that index build is finished on a majority of nodes. Majority reads also block right after startup or rollback when we do not yet have a committed snapshot.

Linearizable

Linearizable read concern actually does block for some time. Linearizability guarantees that if one thread does a write that is acknowledged and tells another thread about that write, then that second thread should see the write. If you transiently have 2 primaries (one has yet to step down) and you read the data from the old primary, the new one may have newer data and you may get a stale read.

Linearizable有时会被阻塞，他保证如果一个线程已经完成了写入并且告知了其他线程，那么这其他的线程就可以看到这些改动。如果某一瞬间你的副本集出现了两个主节点（有一个还未来得及降级）然后你从这个老的主节点上进行读取，与此同时新的主节点上已经有了新的数据，你读到的数据就是旧数据。

To prevent reading from stale primaries, reads block to ensure that the current node remains the primary after the read is complete. Nodes just write a noop to the oplog and wait for it to be replicated to a majority of nodes. The node reads data from the most recent snapshot, and then the noop write occurs after the fact. Thus, since we wait for the noop write to be replicated to a majority of nodes, linearizable reads satisfy all of the same guarantees of read concern majority, and then some. Linearizable read concern reads are only done on the primary, and they only apply to single document reads, since linearizability is only defined as a property on single objects.

为了避免在过期的主节点上读，读取的时候会产生一些必要的阻塞，来保证现在读取节点是真正的主节点。节点会执行一个空写入并且等待他被复制到大多数节点上。节点会从最新的快照中进行读取，之后空写入会出现。因此，等大部分节点都复制了这个空写入的时候，linearizable满足了所有majority以及之上的要求。linearizable只能在主节点用，并且只能用于单个文档，因为一致性作用于单个事物。

所以Linear是通过对每个节点传递一个noop write，为了验证这个我也做了简单的实验：

# 在复制集中出发一次linearizable读
db.test.find( { id: 5 } ).readConcern("linearizable").maxTimeMS(10000)

# 查看oplog.rs 发现有一条msg为linearizable read的空写 op:n
use local 
{ "ts" : Timestamp(1632672934, 1), "t" : NumberLong(1), "h" : NumberLong("7843100492836678140"), "v" : 2, "op" : "n", "ns" : "", "wall" : ISODate("2021-09-26T16:15:34.402Z"), "o" : { "msg" : "linearizable read" } }

在看oplog的时候也会发现mongo会每十秒生成一个noop peridic，这个原因也在git上有描述：

Primary must write periodic no-ops

Consider a scenario in which the primary does not:

There are no writes for an hour.
A client performs a heavy read-only workload with read preference mode “nearest” and maxStalenessSeconds of 90 seconds.
The primary receives a write.
In the brief time before any secondary replicates the write, the client re-checks all servers.
Since the primary’s lastWriteDate is an hour ahead of all secondaries’, the client only queries the primary.
After heartbeatFrequencyMS, the client re-checks all servers and finds that the secondaries aren’t lagging after all, and resumes querying them.

This apparent “replication lag spike” is just a measurement error, but it causes exactly the behavior the user wanted to avoid: a small replication lag makes the client route all queries from the secondaries to the primary.

Therefore an idle primary must execute a no-op every 10 seconds (idleWritePeriodMS) to keep secondaries’ lastWriteDate values close to the primary’s clock. The no-op also keeps opTimes close to the primary’s, which helps mongos choose an up-to-date secondary to read from in a CSRS.

Monitoring software like MongoDB Cloud Manager that charts replication lag will also benefit when spurious lag spikes are solved.

第一次写东西发上来，如有不准确的地方欢迎指正。

最后附git的连接
Mongo repl README