Rac 的GES/GCS原理（5)

最新推荐文章于 2024-08-14 11:43:40 发布

傻儿哥

最新推荐文章于 2024-08-14 11:43:40 发布

阅读量1.3k

点赞数

分类专栏： ORACLE 文章标签： buffer cache oracle 磁盘 disk 网络

ORACLE 专栏收录该内容

106 篇文章 0 订阅

订阅专栏

Resource Roles

每个分配给某个实例的资源组都会被分配一个角色。这个角色可以使本地的或者全局的。
当某个数据块最被读入某个buffer cache中，而没有其他实例读过这个数据块，那么这个
数据块可以被本地管理。

GCS分配一个本地角色给某个数据块。如果这个数据块被某个实例修改并被传输到另一个实例
去，那么它将被全局化管理，而且GCS将分配一个全局角色给这个数据块。

当一个数据块被被传送的过程中，资源模式可能会维持为专有模式，或者它将从专有改为共享模式。
GCS 将跟踪所有实例中每个buffer cache 中的每个块的位置、资源属主、资源角色。GCS被用来
确保cache 的一致性，如果buffer cache中的数据块的当前版本被另一个实例申请修改的时候。

Cache的同步

cache同步在很多计算技术中是一个重要的概念。在ORACLE 的RAC数据库中，它被定义为
在多个cache中进行数据块的同步。

GCS 为了保证cache的同步，需要通过要求实例在全局级别来申请资源。GCS将同步全局访问，
同一时间只允许一个实例来修改数据块。

Oracle使用多版本结构，这种结构下，在群集的多个实例中数据块只有一个当前版本。只有
数据块的当前版本容许被修改。同时，允许存在一系列数据块的读一致性版本。某个数据块的
读一致性版本代表了某个数据块在特殊时间点的快照。时间是通过SCN来代表。

一致性读数据块可以被修改，虽然它能够作为建立更早一致性数据块的起点。GCS管理数据块的
当前版本和一致性读版本。如果一个本地实例修改了一个数据块，某个远程实例需要他，那么本

地实例将建立一个数据块的过去镜像，然后再把这个数据块发到远程镜像。在某个实例、节点

失败的情况下，pi能够被用来构建数据块的当前以及一致性读版本。

Cache fusion

cache融合表示了几种类型的不同节点间的同步机制
同步读
同步读和写
同步写
同步读

当两个实例需要访问同样的数据块的时候，在多个节点间发生同步读事件。在这种情况下，
不需要任何同步机制，因为多个实例可以共享数据块读而不发生冲突。

同步读和写
如果一个实例需要读某个数据块，而这个数据块已经被别的实例改动过了，但脏数据还没有
写回硬盘，这个数据块可以通过内联网络从持有的实例分发到要求的实例上来。

同步写
当一个实例修改buffer cache中的数据块，修改之后的数据块叫做dirty buffer.只有数据块的当前
版本能够被修改。实例必须先获取数据块的当前版本，然后才能修改它。如果数据块的当前版本
无法获取，实例必须等待。

在一个实例可以修改buffer cache中的数据块之前，它首先需要建立针对这个数据块的所有
redo信息。当redo信息被拷贝至redo buffer之后，redo信息就可以被应用到buffer cache中的
数据块了。脏数据块将随后通过DBWn后台进程被写到磁盘。然而，必须先把数据块从redo
log buffer 写入到 redo log文件之后，脏块才可以写到磁盘上来。

如果本地实例需要修改一个数据块，它并不持有那个数据块，而是和资源属主（resource master)
联系，确认是否其他的实例正持有这个数据块。如果远程实例正持有一个数据块的脏版本，那么
远程实例将通过内联网络传输脏块，然后本地实例可以对这个数据块的当前版本进行修改操作。

注意数据块不必要被某个实例锁以专有模式持有直到事务结束。一旦某个本地实例已经修改
了某个数据块的当前版本的某行，这个数据块可以被传输到远程实例上，另外一个事务可以修改
不同的行。然而，远程实例不能够修改被本地实例修改过的行，除非本地的事务已经commits或回滚。

在这个方面，行级锁的表现与单实例数据库上行几所表现是相似的。

在oracle 8.1.5之前，如果一个本地实例需要一个数据块，而这个数据块在另外一个实例的
buffer cache中处于"脏"的状态,远程实例将把数据块写回数据文件，并通知本地实例。

本地实例将把数据块从磁盘读到buffer cache中来，这个过程叫做disk ping。Disk ping是非常
消耗资源的，他们将请求磁盘IO以及实例间的IPC通信。Oracle 8.1.5以及以上，如果本地实例

需要一个数据块的一致性读版本，而这个数据块在远程实例的buffer

cache中是脏的状态。远程实例将先构建该数据块此SCN时间点的一致性读镜像，
再通过内连网络发送一致性数据块。这个算法叫做Cache fusion 阶段1 。

这个技术在群集数据库技术中是相当前进的一步。oracle 9.0.1以上，

一致性数据块和当前的数据块都能通过内联网络进行传输。而传输当前
块的技术是因为PI 存在而变得可行的。这个算法叫做cache fusion 阶段2。

虽然在RAC数据库，cache fusion 进程因为额外的信息传递而导致大的开销，但这不一定会
增加对磁盘阵列的 I/O 开销。

当一个本地实例尝试读一个数据块，而这个数据块不再本地的log buffer中，它会首先和资源属主
(resource master)联系，资源属主将在GRD中检查当前数据块的状态。如果有一个远程实例持有
这个数据块，那这个资源属主将要求远程实例把数据块发送到本地的实例上来。为了一致性
读，远程实例将应用undo信息来吧数据块恢复到某个时间点。

因此，如果本地实例尝试读一个远程实例cache中的数据块，它将通过内连网络接收这个数据块的
副本。这种情况下，我们没有必要通过本地实例来从磁盘读取数据块。因为这个机制需要两个或三个节点的加入，

耗费CPU和网络资源，这个操作仅仅比采用物理磁盘I/O 读消耗更少。

当一个本地实例修改一个数据块的时候，当事务commit，修改的信息将立即写入redo buffer，
并通过lgwr后台进程写入redo log。然而，当buffer cache 需要空闲的buffer或者check point发生
的时候，修改后的数据块才会被写入到磁盘中。

如果一个远程实例需要修改某个数据块，这个数据块不会通过本地实例写入到磁盘。而是通过
内联网络传输到远程节点，进行进一步的修改。而PI 数据块，它是数据块在某个时间点的拷贝，

它存在于本地实例的buffer cache中，直到它收到远程节点的确认已经把数据写回到磁盘。（

这个PI数据块才可以被干掉）

针对读数据块，这个机制将保证2个或三个节点的加入，而且目的是通过耗费额外的CPU和网络资源
来避免磁盘IO。

在读、写过程中，需要涉及到的节点个数依据于资源属主(resource master)的位置。
如果资源属主和读数据块的源实例是同一个实例，或者写数据块的目标实例是同一个实例，
那么只有两个实例将加入这个操作。而如果资源属主和源或目实例都不是一个实例，三个实例将加入到这个操作。

当然，前提是群集至少有个三节点。

中英文对照如下：

||||||||||||||||||||||||||||||||||||||||||||||||||||||||

Resource Roles
A role is assigned to every resource held by an instance. This role can be either local or global. When
a block is initially read into the buffer cache of an instance and no other instance has read the same
block, the block can be locally managed.

The GCS assigns a local role to the block. If the block has been modified by one instance and is transmitted to another instance, then it becomes globally managed, and the GCS assigns a global role to the block.

When the block is transferred, the resource mode may remain exclusive, or it may be converted from exclusive to shared.
当一个数据块被被传送的过程中，资源模式可能会维持为专有模式，或者它将从专有改为共享模式。

The GCS tracks the location, resource mode, and resource role of each block in the buffer cache
of all instances. The GCS is used to ensure cache coherency when the current version of a data block
is in the buffer cache of one instance and another requires the same block for update.

GCS 将跟踪所有实例中每个buffer cache 中的每个块的位置、资源属主、资源角色。GCS被用来
确保cache 的一致性，如果buffer cache中的数据块的当前版本被另一个实例申请修改的时候。

Cache Coherency
Cache coherency is an important concept in many computing technologies. In an Oracle RAC
database, it is defined as the synchronization of data in multiple caches, so that reading a memory
location through any cache will return the most recent data written to that location through any
other cache. In other words, if a block is updated by any instance, then all other instances will be
able to see that change the next time they access the block.

Cache的同步
cache同步在很多计算技术中是一个重要的概念。在ORACLE 的RAC数据库中，它被定义为
在多个cache中进行数据块的同步。

The GCS ensures cache coherency by requiring instances to acquire resources at a global level
before modifying a database block. The GCS synchronizes global cache access, allowing only one
instance to modify a block at a time.
GCS 为了保证cache的同步，需要通过要求实例在全局级别来申请资源。GCS将同步全局访问，
同一时间只允许一个实例来修改数据块。

Oracle uses a multiversioning architecture, in which there can be one current version of a block
throughout all instances in the cluster. Only the current version of a block may be updated. There
can also be any number of consistent read (CR) versions of the block. A consistent read version of
a block represents a snapshot of the data in that block at a specific point in time. The time is represented by the SCN.

Consistent read blocks cannot be modified, though they can be used as a starting
point to construct earlier consistent blocks. The GCS manages both current and consistent read
blocks.

一致性读数据块可以被修改，虽然它能够作为建立更早一致性数据块的起点。GCS管理数据块的
当前版本和一致性读版本。

If a local instance has modified a block and a remote instance requests it, the local instance
creates a past image (PI) of the block before it transfers the block to the remote image.

如果一个本地实例修改了一个数据块，某个远程实例需要他，那么本地实例将建立一个
数据块的过去镜像，然后再把这个数据块发到远程镜像。

In the event of a node or instance failure, the PI can be used to reconstruct current and consistent read versions of the block.

在某个实例、节点失败的情况下，pi能够被用来构建数据块的当前以及一致性读版本。

Cache Fusion
Cache Fusion addresses several types of concurrency between different nodes:
• Concurrent reads
• Concurrent reads and writes
• Concurrent writes
Concurrent Reads
Concurrent reads on multiple nodes occur when two instances need to read the same block. In this
case, no synchronization is required, as multiple instances can share data blocks for read access
without any conflict.

Cache fusion
cache融合表示了几种类型的不同节点间的同步机制
同步读
同步读和写
同步写

同步读
当两个实例需要访问同样的数据块的时候，在多个节点间发生同步读事件。在这种情况下，
不需要任何同步机制，因为多个实例可以共享数据块读而不发生冲突。

Concurrent Reads and Writes
If one instance needs to read a block that was modified by another instance and has not yet been
written to disk, this block can be transferred across the interconnect from the holding instance to
the requesting instance. The block transfer is performed by the GCS background processes (LMSn)
on the participating instances.

Concurrent Writes
When an instance updates a block in the buffer cache, the resulting block is called a dirty buffer.
Only the current version of the block can be modified. The instance must acquire the current version
of the block before it can modify it. If the current version of the block is not currently available,
the instance must wait.

Before an instance can modify a block in the buffer cache, it must construct a redo record
containing all the changes that will be applied to the block.When the redo record has been copied to
the redo buffer, the changes it contains can be applied to the block(s) in the buffer cache. The dirty
block will subsequently be written to disk by the DBWn background process. However, the dirty block
cannot be written to disk until the change vector in the redo buffer has been flushed to the redo
log file.

If the local instance needs to update a block, and it does not currently hold that block, it contacts
the resource master to identify whether any other instance is currently holding the block. If
a remote instance is holding a dirty version of the block, the remote instance will send the dirty
block across the interconnect, so that the local instance can perform the updates on the most recent
version of the block.

The remote instance will retain a copy of the dirty block in its buffer cache until
it receives a message confirming that the block has subsequently been written to disk. This copy is
called a past image (PI). The GCS manages past images and uses them in failure recovery.

Note that a block does not have to be held by an instance in exclusive mode until the transaction
has completed. Once a local instance has modified a row in current version of the block, the block
can be passed to a remote instance where another transaction can modify a different row. However,
the remote instance will not be able to modify the row changed by the local instance until the transaction on the local instance either commits or rolls back.

In this respect, row locking behavior is identical to that on a single-instance Oracle database.
Prior to Oracle 8.1.5, if a local instance required a block that was currently dirty in the buffer
cache of another instance, the remote instance would write the block back to the datafile and signal
the local instance.
在这个方面，行级锁的表现与单实例数据库上行几所表现是相似的。
在oracle 8.1.5之前，如果一个本地实例需要一个数据块，而这个数据块在另外一个实例的
buffer cache中处于"脏"的状态,远程实例将把数据块写回数据文件，并通知本地实例。

The local instance would then read the block from disk into its buffer cache. This
process is known as a disk ping. Disk pings are very resource intensive, as they require disk I/O and
IPC communication between the instances.

本地实例将把数据块从磁盘读到buffer cache中来，这个过程叫做disk ping。Disk ping是非常
消耗资源的，他们将请求磁盘IO以及实例间的IPC通信。

In Oracle 8.1.5 and above, if the local instance required a block that was currently dirty in the
buffer cache of another instance for a consistent read, the remote instance would construct a consistent image of the block at the required SCN and send the consistent block across the interconnect.
Oracle 8.1.5以及以上，如果本地实例需要一个数据块的一致性读版本，而这个数据块在远程实例的buffer cache中是脏的状态。远程实例将先构建该数据块此SCN时间点的一致性读镜像，
再通过内连网络发送一致性数据块。

This algorithm was known as Cache Fusion Phase I and was a significant step forward in cluster
database technology.In Oracle 9.0.1 and above, both consistent blocks and current blocks can be sent across the interconnect. The transfer of current blocks is made possible by the existence of past images (PI). This algorithm is known as Cache Fusion Phase II.
这个算法叫做Cache fusion 阶段1 。这个技术在群集数据库技术中是相当前进的一步。
oracle 9.0.1以上，一致性数据块和当前的数据块都能通过内联网络进行传输。而传输当前
块的技术是因为PI 存在而变得可行的。这个算法叫做cache fusion 阶段2。

Although in a RAC database, Cache Fusion processing incurs overheads in the form of additional
messaging, it does not necessarily increase the amount of I/O performed against the storage.

虽然在RAC数据库，cache fusion 进程因为额外的信息传递而导致大的开销，但这不一定会
增加对磁盘阵列的 I/O 开销。

When a local instance attempts to read a block that is not currently in the local buffer cache, it first
contacts the resource master, which checks the current status of the block in the GRD. If a remote
instance is currently holding the block, the resource master requests that the remote instance send
the block to the local instance. For a consistent read, the remote instance will apply any undo necessary to restore the block to the appropriate SCN.
当一个本地实例尝试读一个数据块，而这个数据块不再本地的log buffer中，它会首先和资源属主
(resource master)联系，资源属主将在GRD中检查当前数据块的状态。如果有一个远程实例持有
这个数据块，那这个资源属主将要求远程实例把数据块发送到本地的实例上来。为了一致性
读，远程实例将应用undo信息来吧数据块恢复到某个时间点。

Therefore, if the local instance attempts to read a block that is in the cache of any other instance, it will receive a copy of the block over the interconnect network. In this case, it is not necessary for the local instance to read the block from disk.
因此，如果本地实例尝试读一个远程实例cache中的数据块，它将通过内连网络接收这个数据块的
副本。这种情况下，我们没有必要通过本地实例来从磁盘读取数据块。

While this mechanism requires the participation of two or three instances, consuming CPU and
networking resources, these are generally less expensive than the cost of performing a single physical
disk I/O.

因为这个机制需要两个或三个节点的加入，耗费CPU和网络资源，这个操作仅仅比采用
物理磁盘I/O 读消耗更少。

When a local instance modifies a block, the changes are written immediately to the redo buffer
and are flushed to the redo log by the log writer (LGWR) background process when the transaction is
committed. However, the modified block is not written to disk by the database writer (DBWn) background
process until a free buffer is required in the buffer cache for another block or a checkpoint
occurs.

If a remote instance requests the block for modification, the block will not be written to disk
by the local instance. Instead, the block will be passed over the interconnect network to the remote
instance for further modification.

如果一个远程实例需要修改某个数据块，这个数据块不会通过本地实例写入到磁盘。而是通过
内联网络传输到远程节点，进行进一步的修改。

A past image (PI) block, which is a copy of the block at the time it was transferred, is retained in the buffer cache of the local instance until it receives confirmation that the remote instance has written the block to disk.

而PI 数据块，它是数据块在某个时间点的拷贝，它存在于本地实例的buffer cache中，直到它收到
远程节点的确认已经把数据写回到磁盘。（这个PI数据块才可以被干掉）

As with reads, this mechanism requires the participation of two or three instances and is designed to avoid disk I/Os at the expense of additional CPU and networking resources.

针对读数据块，这个机制将保证2个或三个节点的加入，而且目的是通过耗费额外的CPU和网络资源
来避免磁盘IO。

The number of nodes involved in a read or write request that is satisfied by a block transfer
across the interconnect depends on the location of the resource master.

在读、写过程中，需要涉及到的节点个数依据于资源属主(resource master)的位置。

If the resource master is the same instance as the source instance for a read or the destination instance for a write, then only two instances will participate in the operation.

如果资源属主和读数据块的源实例是同一个实例，或者写数据块的目标实例是同一个实例，
那么只有两个实例将加入这个操作。

If the resource master is on a different instance than the source or destination instance, three instances will participate in the operation. Obviously, there must be at least three active instances in the cluster for this situation to arise.
而如果资源属主和源或目实例都不是一个实例，三个实例将加入到这个操作。当然，前提是
群集至少有个三节点。

傻儿哥

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Rac 的GES/GCS原理（5)

Resource RolesA role is assigned to every resource held by an instance. This role can be either local or global. Whena block is initially read into the buffer cache of an instance and no other instance has read the sameblock, the block can be locally manag
复制链接

扫一扫

专栏目录