oracle rac 锁机制,Oracle RAC Cache Fusion 机制 详解

Cache Fusion 是RAC 的核心机制,之前在blog里有小提到一些。

Oracle RAC 并发与架构

在网上找到一篇很详细的资料,转过来。 链接如下:

一.

Introduction

This post is about Oracle Cache Fusion technology, which is

implemented in Oracle database 10g RAC. We are going to discuss

just about cache fusion. You should have the architecture knowledge

about RAC. Please check Oracle documentation for understanding

Oracle RAC architecture. Also you can visit my previous post about

Oracle RAC installation to get some basic information and

installation details.

Cache fusion technology was partially

implemented in Oracle 8i in OPS (Oracle Parallel Server).

Before Oracle 8i the situation was different. If we take a case of

multi-instance Oracle Parallel server and if one of the instance

ask for a block of data which is currently modified by other

instance of same database, then the holding instance needs to write

the data to disk so that requesting instance can read the same

data. This is called “Disk Ping”. This has greatly effected the

performance of the database. With Oracle

8i, partial cache fusion was

implemented.

Oracle 8i (Oracle Parallel Server) has a background process called

“Block Server Process” which was responsible for cache fusion in

Oracle 8i OPS. Following table gives the scenario when cache fusion

works in Oracle 8i OPS and scenario where cache fusion was not

working. Of course these limitations are not present in Oracle 10g

RAC.

a4c26d1e5885305701be709a3d33442f.png

So when requesting instance ask for a block which is present in

holding instance in a read or write mode and if the block is

dirtied, then cache fusion used to work and block from cache of

holding instance used to get copied to requesting instance. But if

block is not dirtied and block is present in holding instance then

requesting instance has to read the block from datafile. Also if

the block is opened for write in holding instance and other

instance wants to update the same block then holding instance have

to write the block to disk so that requesting instance can read

it.

二.

Concept of cache fusion

Cache Fusion basically is about fusing the

memory buffer cache of multiple instance into one single

cache. For example if we have 3 instance in a RAC which is

using the same datafiles and each instance is having its own memory

buffer cache in there own SGA, then cache fusion will make the

database behave as if it has a single instance and the total buffer

cache is the sum of buffer cache of all the 3 instance. Below

figure shows what I mean.

a4c26d1e5885305701be709a3d33442f.png

This behavior is possible because of high

speed interconnect existing in the cluster between each

instance. Each of instance is connected to other instance

using a high-speed interconnect. This makes it possible to share

the memory between 2 or more servers. Previously only datafile

sharing was possible, now because of interconnect, even the cache

memory can be shared.

But how this helps? Well, for example if we have a data block in

one of the instance and its updating the block and other instance

needs the same data block then this data block can be copied from

holding instance buffer cache to requesting instance buffer cache

using this high-speed interconnect. This

high speed interconnect is a private connection made just for

sending data blocks and more by instances. External users

cannot use this connection. It is this interconnect which makes

multiple server behave like a cluster. These servers are bind together using this

interconnect.

Moving further, now we know how the cluster is formed and what is

the back bone of cluster and what exactly we call “cache fusion”.

Next we will see how cache fusion works. But before that we need to

discuss few important headings which is very important to

understand.

We will discuss

following topics before discussing Cache Fusion

(1)Cache Coherency

(2)Multi-Version consistency

model

(3)Resource Co-ordination –

Synchronization

(4)Global Cache Service

(GCS)

(5)Global Enqueue Service

(6)Global Resource Directory

(7)GCS resource modes and

roles

(8)Past Images

(9)Block access modes and buffer

states

I promise this wont be too heavy. Lets look into the overview of

these concepts. I wont be going into the details, just sufficient

for you to understand cache fusion.

2.1 Cache

Coherency

If we consider a single instance database, whenever a user queries

for data he gets a consistent view of data. For example another

user has already read a block of data and changed some rows in

buffer cache. If another user want to read the data from same data

block then Oracle will make a copy of that data block in buffer

cache and apply the undo information present in undo tablespace to

get a consistent view of data. This consistent data is then

presented to user who wants to read the data.

This is called

maintaining consistency of data.

Now consider a multi instance system RAC, where a data block might

not be present in same instance. A user might be updating data

block in some other instance. If data block are already available

in local instance then they will be immediately available to the

user. if they are present in some other instance with in the

cluster, they will be transfered into local buffer cache.

Maintaining the consistency of data blocks in the buffer cache of

multiple instance is called “Cache

Coherency”.

2.2 Multi-Version

consistency model

Multi version consistency model

distinguishes between current version of data block and one or mode

read consistent version of data block. The current block is

the one which contains all the changes, committed as well as

uncommitted. Example a user fired a DML on a data block which is

not present in any of the instance. Then this block will be read

from disk into buffer cache where the value gets changed. After

then user commits and fires another DML on same data block. Now

that data block is dirty and contains committed as well as

uncommitted changes.

Suppose this data block is requested by another user for reading,

then oracle will make a copy and apply undo information and

make a Consistent Read “CR” copy of this

block and ship it to requesting instance. Thus we have

multiple versions of same data blocks, each of them are consistent

with respect to the user who requested.

During the course of operation there can

be many more version of same data block, each of them consistent

with respect to some point in time.

关于CR 可以参考我的Blog:

CR (consistent read) blocks

create 说明

2.3 Resource Co-ordination

– Synchronization

In case of multi instance system such as RAC, where same resources

(example data block) are getting used concurrently, effective

synchronization is required for maintaining consistency.

With in the shared cache, co-ordination of

concurrent task is called synchronization. The

synchronization provided by Oracle RAC provides a cluster wide

concurrency of resource and in turn ensure integrity of shared

data. All though there is synchronization within the cache, there

is some cost involved for doing the same. If we talk about low

level operation of synchronization, its just a data copy operation

or data transfer operation.

According to Oracle studies, accessing the block in a local cache

is much faster then accessing the block from another instance cache

with in the cluster. Because with local cache is the in memory copy

and with other instance cache, the data transfer needs to be done

over high speed interconnect which is obviously slower then in

memory copy. Worst is the copy from disk,

which is much slower then above two process. Below graph

shows the block access time using these 3

methods.

For

example:

Block access in local cache ~ 0.01 msec

Block access in remote cache ~ 2.5 msec

Block access on disk ~ 14 msec+

2.4 Global Cache

Service

Global Cache Service (GCS) is the main component of Oracle Cache

Fusion technology. This is represented by

background process LMSn. There can

be max 10 LMS process for an instance. The main function of

GCS is to track the status and location of data blocks. Status of

data block means the mode and role of data block (I will explain

mode and role further). GCS is the main mechanism by which cache

coherency among “multiple cache” is maintained. GCS is also responsible for block transfer between the

instances.

2.5 Global Enqueue

Service

Global Enqueue Service (GES) tracks the status of all Oracle

enqueuing mechanism. This involves all

non-cache fusion intra instance operations. GES performs

concurrency control on dictionary cache locks, library cache locks

and transactions. If performs this operation for resources that are

accessed by more then once instance.

Enqueue services are also present in single instance database.

These are responsible for locking the rows on a table using

different locking modes. To understand more about enqueues,

check Oracle

documentation about

locking.

2.6 Global Resource

Directory

GES and GCS together maintains Global

Resource Directory (GRD). GRD is like an in-memory database

which contains details about all the

blocks that are present in cache. GRD know what is the

location of latest version of block, what is the mode of block,

what is the role of block (Mode and role will be discussed shortly)

etc. When ever a user ask for any data block GCS gets all the

information from GRD. GRD is a distributed

resource, meaning that each instance maintain some part of

GRD. This distributed nature of GRD is a key to fault

tolerance of RAC. GRD is stored in

SGA.

Typically GRD

contains following and more information

(1)Data Block Address – This is the address of data

block being modified

(2)Location of most current version of data

block

(3)Modes of data block

(4)Roles of data block

(5)SCN number of data block

(7)Image of data block – Could be current image or past

image.

2.7 GCS resource modes and

roles

Mode of data block is decided based

on whether a resource holder intends to modify the data or read the

data. The modes are as follows:

(1)Null (N) Mode:

Null mode is the least

restrictive mode. It indicates no access

rights. It acts as a place holder.

(2)Shared (S) Mode:

Shared mode indicate that

database block is being read and not modified. However another

session can read the data block

(3)Exclusive (X) Mode:

Exclusive mode indicate

exclusive access to block. Other resource cannot have write over

this data block. However it can have

consistent read on this datablock.

GCS resources also

has roles. Following are the different roles

present:

(1)Local: When a data block is first read into the

instance from the disk it has a local role. Meaning that only 1 copy of data block exists in the

cache. No other instance cache has a copy of this

block.

(2)Global: Global role indicates that multiple copy of data block exists in clustered

instance. For example a user connected to one of the

instance request for a data block. This data block is read from

disk into an instance. The role granted is local. If another

instance request for same block this block will get copied to the

requesting instance and the role becomes

global.

This role and mode information is maintained in GRD (Global

Resource Directory) by GCS (Global Cache

Service).

2. 8 Past

Images

Past Image concept was introduced in Oracle 9i to maintain data

integrity. In an Oracle database, a typical block is not written to

disk immediately after it is dirtied. This is to reduce excessive

IO. When the same dirty block is requested by some other instance

for write of read purpose, an image of the

block is created in owning instance and then the block is shifted to requesting

instance. This image copy of the block is called Past Image

(PI). In the event of failure Oracle can

reconstruct the block by reading PIs. It is also possible to

have more then 1 PI of the block, depending on how many times the

block was requested in dirty stage.

A past image of the block is different then CR (Consistent read)

image. Past image is required to create CR by applying undo

data.

“Juggling” Data with

Multiple Past Images

(1)Multiple Past Image versions of a data block may

be kept by different instances

(2)Upon a checkpoint, only the current image is

written to disk; Past Images are discarded

(3)In the event of a failure, current version of

block can be reconstructed from PIs

(4)Since PIs are kept in memory, they aid in avoiding

frequent disk writes

(5)This avoids “disk pinging” experienced with 8i OPS

due to frequent writes to disk

(6)Data is “juggled” in memory, without touching down

on the disk

Oracle RAC Past

Image(PI) 说明

2.9 Block access

modes and buffer states

An additional concurrency control concept is the

buffer state

which is the state

of a buffer in the local cache of an instance. The buffer state of

a block relates to the access mode of the block. For example, if a

buffer state is exclusive current

(XCUR), an instance owns the resource in exclusive

mode.

To see a buffer’s state, query the “status” column of the

V$BH dynamic performance

view.

The V$BH view provides information about

the block access mode and their buffer state names as

follows:

(1)With a block access mode of

NULL the buffer state name is CR — An instance can perform a consistent read of the

block. That is, if the instance holds an older version of the

data.

(2)With a block access mode of S the buffer state name is SCUR — An instance has

shared access to the block and can only perform

reads.

(3)With a block access mode of X the buffer state name is XCUR –An instance has

exclusive access to the block and can modify

it.

(4)With a block access mode of

NULL the buffer state name is PI — An instance has made changes to the block but

retains copies of it as past images to record its state before

changes.

关于v$bh 视图的更多内容参考官网链接:

Only the SCUR and PI buffer states are

Real Application Clusters-specific. There can be

only one copy of any one block buffered in

the XCUR state in the cluster database at any time. To

perform modifications on a block, a process must assign an XCUR

buffer state to the buffer containing the data block.

For example, if another instance requests read access to the most

current version of the same block, then Oracle changes the access

mode from exclusive to shared, sends a current read version of the

block to the requesting instance, and keeps a PI buffer if the

buffer contained a dirty block.

At this point, the first instance has the current block and the

requesting instance also has the current block in shared mode.

Therefore, the role of the resource

becomes global. There can be multiple shared current (SCUR) versions of this block

cached throughout the cluster database at any

time.

三.

Block transfer using Cache Fusion

Lets consider a very details example of how the block transfer

happens between different instances. For explaining this example I

am assuming a 3 node RAC system and also another assumption is that

any DML statement is followed by a commit. So if I say that a user executed update that means

user executed update + commit. But there is no checkpoint until the

end.

Stage

1

In stage 1 datablock is requested by a user C who is connected to

instance 3. So a data block is read into the buffer cache of

instance 3.

SQL>select sales_rank from

salesman where salesid = 10;

Assume this gives a value of 30. This block is read for the first

time and its not present in any other instance. So the role of

block is LOCAL and the block is read in SHARED mode. Also there are

NO PAST IMAGES. So we describe this stage has instance 3 having

SL0 mode (SHARED, LOCAL, 0 PAST

IMAGES).

关于这些Lock Modes,在我的Blog里有说明:

Oracle RAC Past Image(PI) 说明

a4c26d1e5885305701be709a3d33442f.png

Stage

2

In stage 2 user B issues the same select statement against the

salesman table. Instance 2 will need the same block; therefore, the

block is shipped from instance 3 to instance 2 via cache fusion

interconnect. There is no disk read at

this time. Both instances are in SHARED mode (S) and role is

LOCAL (L). Here if you see carefully that even though the block is

present in more then once instance, still we say that role is local

because the block is not yet dirtied. Had

the block been dirty and then requested by other instance, then in

that case the role will change to

global.

Stage

3

In stage 3 user B decides to update the row and commit at instance

2. The new sales rank is 24. At this stage, instance 2 acquires

EXCLUSIVE lock for updating the data at instance 2 and SHARED lock from instance 3 is downgraded to NULL

lock.

SQL>update salesman set sales_rank = 24 where salesid =

10;

SQL>commit;

So instance 2 is having a mode XL0 (Exclusive, Local with 0 past

images) and instance 3 is having a NULL lock, which is just a place

holder. Also the role of the block is

still LOCAL because the block is dirtied for the first time only on

instance 2 and no other instance is having any dirty copy of

that. If another instance now tries to update same block the

role will change to global.

Stage

4

In stage 4 user A decides to update in instance 1 the same row and

hence the same block with sales rank of 40. It finds that block is

dirtied in instance 2. Therefore the data block is shipped to

instance 1 from instance 2, however, a

PAST IMAGE of the data block is created on instance 2 and lock mode

on instance 2 is downgraded to NULL with a GLOBAL role.

Instance 2 now has NG1 (NULL lock with GLOBAL role and 1 PAST

IMAGE). At this time instance 1 will have EXCLUSIVE lock with

GLOBAL role (XG0).

Stage

5

User C executes a select statement from instance 3 on same row. The

data block from instance 1 being the most recent copy (GRD (Global

Resource Directory) knows this information about which instance is

having the latest copy of data block), it is shipped to instance 3.

As a result the lock on instance 1 is

converted to SHARED GLOBAL with 1 PAST IMAGE. The reason the

lock gets changed to SHARED and not NULL is because instance 3 asked for shared lock (for reading

data) and not exclusive lock (for updating data). If the

instance 3 asked for exclusive lock then the instance 1 would have

had NULL lock.

Also the instance 3 will now hold SG0 (SHARED, GLOBAL with 0 PAST

IMAGES).

a4c26d1e5885305701be709a3d33442f.png

Stage

6

User B issues the same select statement against the salesman table

on instance 2. Instance 2 will request for a consistent copy of

buffer from another instance, which happens to be the current

master.

Therefore instance 1 will ship the block to instance 2, where it

will be required with SG1 (SHARED, GLOBAL with 1 PAST IMAGE).

So instance 2 mode becomes

SG1.

Stage

7

User C on instance 3 updates the same row. Therefore the instance 3

requires an exclusive lock and instance 1 and instance 2 will be

downgraded to NULL lock with GLOBAL role and 1 PAST IMAGE.

Instance 3 will have EXCLUSIVE lock,

GLOBAL role and with no PAST IMAGES

(XG0).

Stage

8

The checkpoint is initiated and a “Write to Disk” takes place at

instance 3. As a result previous past images will be discarded (as

they are not required for recovery) and instance 3 will hold that

block in EXCLUSIVE lock LOCAL role with no PAST IMAGES

(XL0).

Further if any instance wants to read or write on the same block

then a copy will be again shifted from instance

3.

图示:

Read/Read Cache Fusion – GCS

Processing

a4c26d1e5885305701be709a3d33442f.png

Write/Write Cache

Fusion – GCS Processing

a4c26d1e5885305701be709a3d33442f.png

Blocks to Disk –

GCS Processing

a4c26d1e5885305701be709a3d33442f.png

四. Online Instance

Recovery Steps

步骤如下:

(1)Instance Failure detected by Cluster Manager and

GCS

(2)Reconfiguration of GES resources (enqueues);

global resource directory is

frozen

(3)Reconfiguration of GCS resources; involves

redistribution among surviving instances

(4)One of the surviving instances becomes the

“recovering instance”

(5)SMON process of recovering instance starts first

pass of redo log read of the failed instance’s redo log

thread

(6)SMON finds BWR (block written records) in the redo

and removes them as their PI is already written to

disk

(7)SMON prepares recovery set of the blocks modified

by the failed instance but not written to

disk

(8)Entries in the recovery list are sorted by first

dirty SCN

(9)SMON informs each block’s master node to take

ownership of the block for recovery

(10)Second pass of log read begins.

(11)Redo is applied to the data

files.

(12)Global Resource Directory is

unfrozen

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值