sqlite之pager模块

最新推荐文章于 2021-10-17 17:48:58 发布

久许

最新推荐文章于 2021-10-17 17:48:58 发布

阅读量1.1k

点赞数

分类专栏： sqlite

sqlite 专栏收录该内容

19 篇文章 0 订阅

订阅专栏

Pager是the page cache。不仅仅包含cache实体，而且包含cache的其他的属性。

struct Pager {
sqlite3_vfs *pVfs; /* OS functions to use for IO */
u8 exclusiveMode; /* Boolean. True if locking_mode==EXCLUSIVE */
u8 journalMode; /* One of the PAGER_JOURNALMODE_* values */ //
u8 useJournal; /* Use a rollback journal on this file */ //使用回滚日志
u8 noSync; /* Do not sync the journal if true */ //和日志是异步的
u8 fullSync; /* Do extra syncs of the journal for robustness */ //为了健壮，为日志进行额外的同步
u8 extraSync; /* sync directory after journal delete */ //在日志被删除之后，同步目录
u8 syncFlags; /* SYNC_NORMAL or SYNC_FULL otherwise */ //同步标志
u8 walSyncFlags; /* See description above */
u8 tempFile; /* zFilename is a temporary or immutable file */ //临时的文件
u8 noLock; /* Do not lock (except in WAL mode) */ //不进行锁定（在预写日志中例外）
u8 readOnly; /* True for a read-only database */ //如果是只读
u8 memDb; /* True to inhibit all file I/O */ //禁止所有的文件IO操作

/************************************************************************
The following block contains those class members that change during
routine operation. Class members not in this block are either fixed
when the pager is first created or else only change when there is a
significant mode change (such as changing the page_size, locking_mode,
or the journal_mode). From another view, these class members describe
the "state" of the pager, while other class members describe the
"configuration" of the pager.
*/
u8 eState; /* Pager state (OPEN, READER, WRITER_LOCKED..) */
u8 eLock; /* Current lock held on database file */ //当前锁定这个数据库文件的锁
u8 changeCountDone; /* Set after incrementing the change-counter */ //改变计数
u8 setMaster; /* True if a m-j name has been written to jrnl */ //
u8 doNotSpill; /* Do not spill the cache when non-zero */ //非零时不要溢出内存
u8 subjInMemory; /* True to use in-memory sub-journals */ //为true，只用内存子日志
u8 bUseFetch; /* True to use xFetch() */
u8 hasHeldSharedLock; /* True if a shared lock has ever been held */ //如果曾经持有共享锁，则为true
Pgno dbSize; /* Number of pages in the database */ //数据库中的页数
Pgno dbOrigSize; /* dbSize before the current transaction */ //在当前事务开始之前的dbSize
Pgno dbFileSize; /* Number of pages in the database file */ //数据库文件中的页数
Pgno dbHintSize; /* Value passed to FCNTL_SIZE_HINT call */ //
int errCode; /* One of several kinds of errors */ //错误代码
int nRec; /* Pages journalled since last j-header written */ //
u32 cksumInit; /* Quasi-random value added to every checksum */
u32 nSubRec; /* Number of records written to sub-journal */ //写入子日志的记录数
Bitvec *pInJournal; /* One bit for each page in the database file */ //用于跟踪日记页的位矢量
sqlite3_file *fd; /* File descriptor for database */
sqlite3_file *jfd; /* File descriptor for main journal */回滚日志文件描述
sqlite3_file *sjfd; /* File descriptor for sub-journal */
i64 journalOff; /* Current write offset in the journal file */
i64 journalHdr; /* Byte offset to previous journal header */对上一个日志表头的日志偏移量
sqlite3_backup *pBackup; /* Pointer to list of ongoing backup processes */ 指向一个将要去备份处理的列表
PagerSavepoint *aSavepoint; /* Array of active savepoints */ 活跃的检查点列表
int nSavepoint; /* Number of elements in aSavepoint[] */ 检查点列表中检查点的个数
u32 iDataVersion; /* Changes whenever database content changes */数据库内容的改变计数
char dbFileVers[16]; /* Changes whenever database file changes */

int nMmapOut; /* Number of mmap pages currently outstanding */ 当前未完成的mmap页数
sqlite3_int64 szMmap; /* Desired maximum mmap size */希望的最大的mmap的大小
PgHdr *pMmapFreelist; /* List of free mmap page headers (pDirty) */ 空闲的页面
/*
End of the routinely-changing class members
***************************************************************************/

u16 nExtra; /* Add this many bytes to each in-memory page */ 在每个在内存的页面，都添加这些字节
i16 nReserve; /* Number of unused bytes at end of each page */ 在每个页面的尾部未使用的字节的个数
u32 vfsFlags; /* Flags for sqlite3_vfs.xOpen() */
u32 sectorSize; /* Assumed sector size during rollback */ 回滚期间假定的扇区大小
int pageSize; /* Number of bytes in a page */ 一个页面的大小
Pgno mxPgno; /* Maximum allowed size of the database */ 这个数据库允许的最大的页数
i64 journalSizeLimit; /* Size limit for persistent journal files */ 持久性日志文件的大小限制
char *zFilename; /* Name of the database file */ 数据库文件名称
char *zJournal; /* Name of the journal file */日志文件名称
int (*xBusyHandler)(void*); /* Function to call when busy */ 当忙碌的时候会调用
void *pBusyHandlerArg; /* Context argument for xBusyHandler */
int aStat[4]; /* Total cache hits, misses, writes, spills */
#ifdef SQLITE_TEST
int nRead; /* Database pages read */ 已经读取的页数
#endif
void (*xReiniter)(DbPage*); /* Call this routine when reloading pages */ 当重新加载页面时调用这个历程
int (*xGet)(Pager*,Pgno,DbPage**,int); /* Routine to fetch a patch */ 获取修补程序的历程
#ifdef SQLITE_HAS_CODEC
void *(*xCodec)(void*,void*,Pgno,int); /* Routine for en/decoding data */ 编码和解码数据的历程
void (*xCodecSizeChng)(void*,int,int); /* Notify of page size changes */ 页面大小的更改通知
void (*xCodecFree)(void*); /* Destructor for the codec */ 编码解码器的析构函数
void *pCodec; /* First argument to xCodec... methods */
#endif
char *pTmpSpace; /* Pager.pageSize bytes of space for tmp use */
PCache *pPCache; /* Pointer to page cache object */ 指向缓存实体的指针
#ifndef SQLITE_OMIT_WAL
Wal *pWal; /* Write-ahead log used by "journal_mode=wal" */
char *zWal; /* File name for write-ahead log */
#endif
};

/*
A bitmap is an instance of the following structure.

This bitmap records the existence of zero or more bits
with values between 1 and iSize, inclusive.

There are three possible representations of the bitmap.
If iSize<=BITVEC_NBIT, then Bitvec.u.aBitmap[] is a straight
bitmap. The least significant bit is bit 1.

If iSize>BITVEC_NBIT and iDivisor==0 then Bitvec.u.aHash[] is
a hash table that will hold up to BITVEC_MXHASH distinct values.

Otherwise, the value i is redirected into one of BITVEC_NPTR
sub-bitmaps pointed to by Bitvec.u.apSub[]. Each subbitmap
handles up to iDivisor separate values of i. apSub[0] holds
values between 1 and iDivisor. apSub[1] holds values between
iDivisor+1 and 2*iDivisor. apSub[N] holds values between
N*iDivisor+1 and (N+1)*iDivisor. Each subbitmap is normalized
to hold deal with values between 1 and iDivisor.
*/
struct Bitvec {
u32 iSize; /* Maximum bit index. Max iSize is 4,294,967,296. */
u32 nSet; /* Number of bits that are set - only valid for aHash
element. Max is BITVEC_NINT. For BITVEC_SZ of 512,
this would be 125. */
u32 iDivisor; /* Number of bits handled by each apSub[] entry. */
/* Should >=0 for apSub element. */
/* Max iDivisor is max(u32) / BITVEC_NPTR + 1. */
/* For a BITVEC_SZ of 512, this would be 34,359,739. */
union {
BITVEC_TELEM aBitmap[BITVEC_NELEM]; /* Bitmap representation */
u32 aHash[BITVEC_NINT]; /* Hash table representation */
Bitvec *apSub[BITVEC_NPTR]; /* Recursive representation */
} u;
};

位图是以下结构的实例。此位图记录存在零个或多个位,其值介于 1 和 iSize 之间(包括 iSize)。位图有三种可能的表示形式。如果 iSize__BITVEC_NBIT,则 Bitvec.u.aBitmap_是一个直线位图。最低显著位是位 1。如果 iSize_BITVEC_NBIT 和 iDivisor_0 则 Bitvec.u.aHash_ 是一个哈希表,该哈希表将容纳到 BITVEC_MXHASH 不同值。否则,值 i 将重定向到 Bitvec.u.apSub_ 指向的 BITVEC_NPTR 子位图之一。每个子位图最多处理 iDivisor 的单独值 i。 apSub{0} 在 1 和 iDivisor 之间保存值。 apSub[1] 在 iDivisor=1 和 2_iDivisor 之间保存值。 apSub[N] 保存 N_iDivisor=1 和 (N+1)_iDivisor 之间的值。每个子位图都归一化,以容纳 1 和 iDivisor 之间的值

/*
A complete page cache is an instance of this structure. Every
entry in the cache holds a single page of the database file. The
btree layer only operates on the cached copy of the database pages.

A page cache entry is "clean" if it exactly matches what is currently
on disk. A page is "dirty" if it has been modified and needs to be
persisted to disk.

pDirty, pDirtyTail, pSynced:
All dirty pages are linked into the doubly linked list using
PgHdr.pDirtyNext and pDirtyPrev. The list is maintained in LRU order
such that p was added to the list more recently than p->pDirtyNext.
PCache.pDirty points to the first (newest) element in the list and
pDirtyTail to the last (oldest).

The PCache.pSynced variable is used to optimize searching for a dirty
page to eject from the cache mid-transaction. It is better to eject
a page that does not require a journal sync than one that does.
Therefore, pSynced is maintained so that it *almost* always points
to either the oldest page in the pDirty/pDirtyTail list that has a
clear PGHDR_NEED_SYNC flag or to a page that is older than this one
(so that the right page to eject can be found by following pDirtyPrev
pointers).
*/
struct PCache {
PgHdr *pDirty, *pDirtyTail; /* List of dirty pages in LRU order */
PgHdr *pSynced; /* Last synced page in dirty page list */
int nRefSum; /* Sum of ref counts over all pages */
int szCache; /* Configured cache size */
int szSpill; /* Size before spilling occurs */
int szPage; /* Size of every page in this cache */
int szExtra; /* Size of extra space for each page */
u8 bPurgeable; /* True if pages are on backing store */
u8 eCreate; /* eCreate value for for xFetch() */
int (*xStress)(void*,PgHdr*); /* Call to try make a page clean */
void *pStress; /* Argument to xStress */
sqlite3_pcache *pCache; /* Pluggable cache module */
};

Btree通过pager提供的接口操作数据库中的数据，不直接操作任何数据库或者日志。

The page cache manager is called the pager in the SQLite world. It sees underneath random
accessed byte oriented ordinary native files, and converts them into random accessed higher-level
page oriented files, where pages are fixed size objects crafted out of the native files. Different higherlevel files can have different page sizes. The pager defines an 'easy to use' (independent of native
file systems) interface for accessing pages from database files. The tree module that resides directly
on the top of the pager module always uses the pager provided interface to access databases, and
never directly accesses any database or journal file. The former ( tree module) sees the database
file as a logical array of ( uniform size) pages and reference pages by providing their array index
numbers.

每当打开一个数据库文件，就对应一个cache

SQLite maintains a separate page cache for each open database file

sqlite数据库每当打开一个数据库文件时就创建缓存，多次打开多次创建。但是它也允许多次打开同一个数据库文件共享同一个cache。如下。

(SQLite supports an advance feature in which all database connections to the same database file
can share the same page cache of file that is open multiple times via the same or different library
connections, see Section 10.13 on page 241.)

数据库中的数据以二进制的形式存储在cache中。

but, they are also treated like ordinary native files, and are stored entirely within
the cache.

cache层直接读写数据库文件和日志文件

It directly
reads and writes database files (and journal files).

它保证重复读取存储在数据库中的任何信息不被破坏。

It only guarantees whatever information is stored in a database file can be repeatedly
retrieved later without any alteration.

它定义了文件系统独立的操作数据库文件的借口

It defines an easy-to-use, file-system-independent
interface for randomly accessing pages from database files.

在数据库文件和内存（cache）之间移动页面时pager的基本功能，对于tree和更高的层次，页面的转移是透明的

For each database file, moving pages between the file and the (in-memory) cache is the basic
function of the pager as the cache manager. The page movement is transparent to the tree and
5.2. PAGER INTERFACE 123
higher-up modul es

其主要目的是使数据库页在主内存中可寻址,以便这些模块可以直接访问内存中页内容。

Its main purpose is to make database pages addressable in the main memory so that
those modules can access the in-memory page contents directly.

它提供了一个抽象，整个数据库文件在内存中放在一个数组中。

It also coordinates the writing of
pages back to the database file. It creates an abstraction so that the entire database file appears to
reside in the main memory as an array of pages.

它提供了事务管理，日志管理，数据管理，锁管理

It provide s the core servic es of a typical transaction processing system: transaction management, dat a man agement , log management , and lock
managem ent.

所有的比pager高级的模块只关注事务，不关注pager的具体的实现

All modules above the pager are completely insulated from low-level lock and log management
mechanisms. In fact, they are not aware of locking and logging activities. The tree module sees everything in terms of transactions, and is not concerned with how the transactional ACID properties
are implemented by the pager module.

pager模块吧事务的活动总结为锁，日志和对数据库文件的读写

The pager module splits the activities of a transaction into
locking, logging, and reading and writing of database files.

在更改一个页面之前，B树模块通知pager模块，所以pager模块有充足的事件进行日志的处理和在数据库文件上应用适当的锁。

Before modifying a page, the tree module informs the pager so that it (the pager)
can save sufficient information (in a journal file) for possible use in future recovery, and can acquire
appropriate locks on the database file.

tree模块操作完了之后通知pager模块，pager模块会将变化写回数据库文件（如果这个页面被更改的话）

The tree module eventually notifies the pager when it ( the
tree module) has finished using a page; the pager handles writing the page back to the database
file if the page was modified

每个数据库文件被Pager对象管理。

Each open database file is managed
through a separate Pager object and each Pager object is associated with one and
only one instance of an open database file.

The tree module, to use a database file, creates a new Pager object first and
then uses the object as a handle to apply all pager-level operations on the file.

由于性能的原因，SQLite避免把页面在Btree和Pager之间拷贝来拷贝去，并且B树模块直接操作cache中可用的内容。

You may note that, for performance sake, SQLite avoids
copying pages back and forth between the pager and tree modules, and the tree module
directly manipulates the contents available in in-cached pages.)

For the rollback operation, it rolls back all changes made to the database since the savepoint was established, and all following savepoints are deleted.

对于回滚操作,它将回滚自建立保存点以来对数据库所做的所有更改,并删除以下所有保存点。

当应用程序从特定设备上读取数据时，首先操作系统将数据进行拷贝，然后应用程序再进行对数据的拷贝。

When an application reads a piece of data
from any file (residing on a block special device), the operating system normally makes its own
copy of the data first, and then a copy in the application.

SQLite管理的cache和操作系统自己管理的cache是相互独立的。

We are not interested in knowing how
the operating system manages its own cache. SQLite's page cache organization and management
are independent of those of the native operating system.

下图对应操作系统和SQLite管理自己的cache的示意图。操作系统首先从特殊的块设备上进行数据的读取，然后SQLite应用程序在这基础上进行数据的读取。

5.3.1 Cache state

Two member variables,namely eState and eLock, controls the pa ger behavior.

the value of Pager. eState

l. PAGER_OPEN: When a Pager object is created, this is th e initi al state. The pager is not
currently reading or writing the database file via this Pager object. There may not be any
database page held in memory, i.e , the cache is empt y. The database file may or may not be
locked. There is no transaction open on the database

//pager_open pager object被创建，这是初始化状态。pager现在不通过这Pager object读写数据库。可能在内存中没有任何数据库页面，cache是空的，数据库文件可能被锁住了也可能没有被锁住，没有事务打开了数据库。
2. PAGER_READER: When a Pager object is in this state , at least one read-transaction is open
on the database connection, and the pager can read pages from the corresponding database
file. (But, in the exclusive locking_mode, read-transactions may not be open.)

pager_reader 当Pager对象处于这种情形时，则至少一个写进程打开了数据库连接，pager可以从对应的数据库文件中读取页面。(但是在独占锁模式，读事务并不能打开数据库文件)

3. PAGER_WRITERLOCKED: Wh en a Pager obje ct is in this state , a write-tran saction is
open on th e databas e connection. Th e pager can read pages from th e corre spondin g databas e
file, but it has not made any updates on cached pages or the database file.

pager_writerlocked 当Pager对象处于这种情形时，写事务打开了数据库连接。pager可以从相应的数据库文件中读取页面，但是，任何在cache中和在数据库文件中的数据都没有被更新。

4. PAGER_WRITER_CACHEMOD: When a Pager object is in this state , the pager has given
the tree module a permission to update in-cached pages , and the tree module may have made
some updates.

pager_writer_cachemod 当Pager对象处于这种情形，pager允许tree模块更新已经被缓存的页面，加下来，tree模块就会做出一些更新。

5. PAGER_WRITER_DBMOD: When a Pager obje ct is m this state , the pager has begun
writing th e dat aba se file.

pager_writer_dbmod pager开始写入database file

6. PAGER_WRITER_FINISHED: When a Pager object is in this sta t e, the pager ha s finished
writing all modified pages of the current write-transaction into the database file. The writetransaction cannot make any more updates, and is ready to commit.

pager_writer_finished 当Pager对象在这种状态时，pager已经将当前写事务对页面造成的所有更改维护进了database file当中。

7. PAGER_ERROR: When a Pager object is in this state, the pager has seen some errors such
as 1/ 0 could not be performed, no disk space available for the database or journal file, no
memory can be allocated, etc.

pager_error 当Pager对象处于这种状态时，pager模块检测到了一些错误：如除零，或者没有了磁盘空间，没有了可以分配的内存等。

Based on th e value of the eLock member variable, a Pager object can be in one of the following
four states.
1. NO_LOCK: The pager is not currently reading or writing the database file via this Pager
object.

no_lock 没有锁， pager模块当前即没有通过Pager对象读数据库文件，也没有通过Pager对象写数据库文件。

2. SHARED_LOCK: The pager has been reading pages (in arbitrary order) from the database
file. There can be multiple read-transactions accessing the same database file at the same
time through their respective Pager objects. Modifying an in-cache page is not permitted.

shared_lock 共享锁， pager模块已经以一种随意的顺序对数据库文件进行了读取，可能会有很多个读事务在对同一个数据库文件在同一时刻对它们各自的Pager对象进行操作。这个时候，对处于缓存当中的页面进行更改是被禁止的。

3. RESERVED_LOCK: The pager has reserved the database file for writing but has not yet
made any changes to the file. Only one pager at a time can reserve a given database file. As
the original database file has not been modified, other pagers are allowed to read the file.

reserved_lock 保留锁，pager模块为了进行写入，已经对数据库文件进行了预定，但是pager模块还没有对数据库文件作出更改。pager模块一次只能预定一个数据库文件，由于原始数据库文件还没有被更改，所以允许其他pager模块对数据库文件进行读取。

4. EXCLUSIVE_LOCK: The pager has been writing pages (in arbitrary order) back into the
database file. The file access is exclusive. No other pager can read or write the file while this
pager continues writing the file.
exclusive_lock 独占锁 pager模块正在把页面写回到数据库文件中，所以此时，对数据库文件的任何寻求都是不被允许的。

页面缓存首先处于NO_LOCK状态，tree模块第一次调用sqlite3PagerGet方法来从数据库文件中读取页面时，pager模块变为SHARED_LOCK状态。在tree模块通过调用执行sqlite3PagerUnref方法之后，所有的页面都被释放了，这时pager模块恢复到NO_LOCK状态（这时，可能pager的cache可能并不被清空）。tree模块第一次调用sqlite3PagerWrite方法时，pager模块变为RESERVED_LOCK状态(你可能会注意到,sqlite3PagerWrite方法只有在页面已经被读取之后才能被调用，它预示着pager模块在变为RESERVED_LOCK状态之前，其状态是SHARED_LOCK)

Note: For temporary and in-memory databases, the Pager. eLock is always set to EXCLUSIVE_LOCK
because they cannot be accessed by other processes. <l

5.3.2 Cache organization

cache通过PCache handler object进行管理。通常，为了加速对cache的查找，当前在缓存中的内容被很好地进行了组织。SQLite使用哈希表来组织cache页面。

uses page-slots to hold pages in the table. The cache is fully associative, that is, any slot can
store any page. The hash table is initially empty.

使用slot来存放页面

As demand for pages increases, the pager creates new slots and inserts them in the hash table.
当寻求的页面数增加时，pager模块创建新的slot，并把该slot插入到hash表中。

There is a maximum limit (PCache .nMax value) on
the number of slots a cache can have.
当然了，slot的总数也是有限制的，缓存所拥有的slot的数量最大值是：PCache.nMax，默认是2000个，但是对于临时表来说，默认是500个。

SQLite represents each page in the cache by an object of PgHdr type.
SQLite使用PgHdr来代表cache中的每个页面。

The pager understands this objects though a pluggable cache can have its own page header object.
通过可插拔缓存能够有它自己的页面对象，pager模块可以了解这个对象。

Figure 5.8 depicts the layout of SQLite's own pluggable cache, represented by a PCache1 object.

Each slot in the hash table is represented by a header object of PgHdr1 type.
每个在hash表之中的slot都被一个叫做PgHdr1的类型标志。

The pluggable component understands this type and the pager is opaque to it.
可插拔组件了解这个类型，但是pager模块并不能了解他。

The slot image is stored right before the PgHdr1 object; the size of th e slot image is determin ed by the valu e of PCache1. szSize variable.
slot镜像存储在PgHdr1对象之前，slot镜像的大小被PCache1.szSize决定。

Th e slot image holds an object of PgHdr, a database page image , and a piece of privat e data that is used by the tre e mod ule
to keep page-specific in-memory control information there. (In-memory databases have no journal
file, so their recovery information is recorded in in-memory objects. Pointers to those objects are
stored following the private part: these pointers are used by the pager only.) This (additional
non page) space is initiali zed to zeros when the pager brings or constructs the page into the cache
slot镜像包含了PgHdr对象，数据库页面镜像，私有数据（可以被tree模块使用从而控制page在内存中的信息），（内存数据库不存在日志文件，所以他们的恢复信息被记录在了内存中的对象中了。指向这些对象的指针被存放在了私有部分之后，这些指针仅仅被pager模块使用）当pager模块把page放入slot中时，这个空间初始化为0。

All pages in the cache are accessible via the PCache 1. apHash hash array;
在cache中的所有页面通过PCache1.apHash都可以访问到

the array size is stored in the PCache1 .nHash variable;
hash表中元素的个数保存在PCache1.nHash中。

Each array element points to a "bucket" of slots;
哈希表中的每个元素指向一个包含有很多slot的存储桶。

slots in each bucket are organi zed in an unordered singly linked list.
在每个存储桶当中的所有的slot被以无序的方式连接在一个单链表中。

The PgHdr object is only visibl e to the pager module, and not visible to the tre e and higher-up
modul es.
PgHdr对象对于pager模块是可见的，但是对于tree模块和其他更高的模块则是不可见的。

The header has many control variables. The pgno variable identifies the page number of
the database page it represents. The needSync flag is true if the journal needs a flush before writing
this pa ge back into the database file. Th e dirty flag is tru e if the pa ge has been modifi ed, and th e
new value is not yet writ te n ba ck int o th e databas e file. Th e nRef vari able is th e reference count
on this page . If th e nRef value is greater th an zero, th e page is in acti ve use, and we say th at th e
pag e is pin ned down; oth erwis e, the pa ge is unpinn ed and free. Th e pDirtyNext and pDirtyPrev
pointers are used to link together all dirt y pages.

PgHdr的header中有很多控制变量，pgno表示其所代表的数据库中的页号。needSync标志是true时，日志在将page写回到数据库之前需要进行刷新。dirty标志是true，如果页面已经并更改并且更改的内容还没有写回到数据库文件当中。nRef表示这个页面涉及到的页面的个数。

/* Each page cache is an instance of the following object. Every
open database file (including each in-memory database and each
temporary or transient database) has a single page cache which
is an instance of this object.

Pointers to structures of this type are cast and returned as
opaque sqlite3_pcache* handles.
*/
struct PCache1 {
/* Cache configuration parameters. Page size (szPage) and the purgeable
flag (bPurgeable) and the pnPurgeable pointer are all set when the
cache is created and are never changed thereafter. nMax may be
modified at any time by a call to the pcache1Cachesize() method.
The PGroup mutex must be held when accessing nMax.
*/
PGroup *pGroup; /* PGroup this cache belongs to */
unsigned int *pnPurgeable; /* Pointer to pGroup->nPurgeable */
int szPage; /* Size of database content section */
int szExtra; /* sizeof(MemPage)+sizeof(PgHdr) */
int szAlloc; /* Total size of one pcache line */
int bPurgeable; /* True if cache is purgeable */
unsigned int nMin; /* Minimum number of pages reserved */
unsigned int nMax; /* Configured "cache_size" value */
unsigned int n90pct; /* nMax*9/10 */
unsigned int iMaxKey; /* Largest key seen since xTruncate() */
unsigned int nPurgeableDummy; /* pnPurgeable points here when not used*/

/* Hash table of all pages. The following variables may only be accessed
when the accessor is holding the PGroup mutex.
*/
unsigned int nRecyclable; /* Number of pages in the LRU list */
unsigned int nPage; /* Total number of pages in apHash */
unsigned int nHash; /* Number of slots in apHash[] */
PgHdr1 **apHash; /* Hash table for fast lookup by key */
PgHdr1 *pFree; /* List of unused pcache-local pages */
void *pBulk; /* Bulk memory used by pcache-local */
};

5.3.3 Cache read

It is referenced by using a sear ch key-the page number in our case.
页号查找

Moving pages between the cache and th e datab ase file is the basic functi on of the pager as the
da t a manage r . It uses th e PCache1. apHash array to t ra nslat e page numb er to approp riate cache
slot via th e cache bucket .
在cache和database file之间移动页面是pager模块的基本的功能。使用PCache1.apHash数组通过存储桶来传递页号到合适的cache slot。

Ini tially, t he page cache is emp ty, but pages are add ed t o t he cache on
demand basis.
刚开始，page的cache是空的，但是页面随着命令的要求，pages被添加到内存当中。

To read a pa ge, as menti oned pr eviousl y, th e client (aka, the tr ee modul e) invokes the sqli te3PagerGet fun cti on on th e page numb er. Th e fun cti on perform s th e followin g st eps for
a requested page P .

为了读取页面，就像前面提到的那样，客户端(tree模块)调用sqlite3PagerGet方法，传入页号。这个方法执行了下面的几个步骤：

1. It searches the cache space.
134 CHAPTER 5. THE PAGER MODULE
(a) It applies a very simple hash function 1 on P to determine the index into the apHash
array: page number modulo the size of the apHash array.//页号%the size of the apHash array
(b) It uses the index into the apHash array and gets the hash bucket.//使用index来找到对应的存储桶
( c) It searches the bucket by chasing the pNext pointers. If P is found there, we say a cache
hit has occurred. It pins down the page (i.e., increments the PgHdr .nRef value by 1)
and returns the base address of the page-image to the caller.
//它通过追踪pNext指针的方式搜索存储桶，如果P被找到了，那么我们称发生了缓存命中，然后固定这个页，那么 PgHdr.nRef+1，b并返回page-image的地址。

2. If P is not found in the cache, it is considered a cache miss. The function looks for a free slot
that can be used to load the desired page. (If the cache has not reached the maximum limit
of PCache .nMax, it instead creates a new free slot.)

如果P没有被才cache中被找到，那么可能发生了cache没有命中。那么久寻找一个空闲slot来存放想要存放的页面。（如果cache还没有到达最大的限制：PCache.nMax，那么就会创建一个新的空闲的slot）
3. If no free slot is available or can be created, it determines a slot from which the current page
can be released to reuse the slot for P. This is called a victim slot. (Victim selection is
addressed in Section 5.3.6 on page 135.)

如果没有空闲的slot可以被创建，那么就寻找一个可以被释放的slot，然后重用该slot。这被称为受害slot。
4. If the victim ( or the free slot) is dirty, it writes the page to the database file. (Following
write-ahead-log (WAL) protocol, it flushes the journal file too.)

如果victim是脏的，那么就把它写入到数据库文件当中
5. Two cases. (a) If P is less than or equal to the current max page in the file, it reads page
P from the database file into the free slot, pins down the page (i.e., it sets the PgHdr. nRef
value to 1), and returns the address of the page to the caller. (b) If P is greater than the
current max page in the file, it does not read the page; instead, it initializes the page to zeros.
In either case, it also initializes the bottom private part to zeros whether or not it reads the
page from the file. It also sets the PgHdr. nRef value to 1
两个案例：（a）：如果P小于或者等于当前文件中的最大页编号：那么就进行创建free slot并固定page，对调用者返回这个页面的地址。（b）：如果P比当前的最大的页号还大，那么就不会进行页面的读取。

在这两种情况下，会初始化底部的私有部分为0,，不管是否会从该文件中读取页面，当然，也会设置PgHdr.nRef为1

the client acquires (aka, pins down) the page, uses the page, and then
releases (aka, unpins) the page.
客户端(btree模块)请求页面（对应着pager模块中的pin down），使用页面，接着释放页面

When a page address is returned to the client, the page is pinned down ( the PgHdr. nRef is greater than zero). The page will be unpinned only when the client calls sqli te3PagerUnref function on the page and the nRef becomes zero.
当一个页地址被返回到客户端，page就被固定了，PgHdr.nRef>0。page将被取消固定：只有当客户端调用sqlite3PagerUnref函数并且 nRef变成0时。

5.3.4 Cache update

After acquiring a page, the client can directly modify the content of the page, but as mentioned
previously it must call the sqli te3PagerWri te function on the page prior to making any modifications there. On return from the call, the client can update the page in place as many times as it
wants.

在更改页面内容之前，调用sqlite3PagerWrite方法。然后pager模块进入到reserved_lock状态。

The first time the client calls the sqli te3PagerWri te function on a page, the pager writes the original content of the page into the rollback journal file as part of a new log record, and sets the PgHdr.needSync flag on.

第一次调用sqilte3PagerWrite函数时，pager模块将页面的原始内容写入到回滚日志当中，作为新日志记录的一部分。并且设置PgHdr.needSync标志为true。

it does not write a modified page
back into the database file until the corresponding needSync has been cleared.)
在清除相应的needSync之前,它不会将修改后的页面写回数据库文件

Every time the sqli te3PagerWri te function is called on a page, the PgHdr. dirty flag is set;
每次调用sqlite3PagerWrite函数，多会设置PgHdr.dirty标志

the flag is cleared only when the pager writes back the page content into the database file.写回数据库文件时，PgHdr.dirty标志也会被清除。

Because the time when the client modifies a page is not known to the pager, updates on the page are not immediately
propagated to the database file. Thereby, the pager follows a delayed write (aka, write-back) page update policy. The updates are propagated to the database file only when the pager performs a cache flush or selectively recycles dirty pages.
因为pager模块不知道客户端是什么时候对页面进行更改的，所以客户端对页面的更改不会立即体现在数据库文件上。因此，pager模块遵循延迟写入的页面更新策略。只有在pager模块执行cache刷新或者选择性地回收脏页之后，客户端(Btree模块)对页面进行的更改才会体现在数据库文件当中。

Note: A transaction performs direct updates on cached pages, and the cache manager does deferred updates
on database files. Direct cache updating requires saving old values of pages so that they can be restored if
the transaction aborts itself. Deferred updates to database file can increase a transaction's memory usage.
When the memory usage crosses the upper boundary, the cache manager performs a cache replacement.
注意：事务对缓存的页面进行直接的更改，cache管理器对数据库文件做延迟更改。直接的cache更改要求保存页面的旧的值，这是为了在事务中途停止运行时能够恢复数据。对数据库文件的延迟写入能够增加事务的内存利用率。当内存利用率到达上界时，cache管理器执行cache替换。

5.3.5 Cache fetch policy

Many cache systems use sophisticated pre-fetch techniques to bring some pages to the cache in advance to reduce the frequency of stalling

许多缓存系统使用复杂的预取技术提前将一些页面带到缓存中,以减少停滞的频率。

SQLite strictly follows fetch-on-demand policy, and avoids any other pre-fetch policy to keep the fetch logic very simple and SQLite library size in check. Also, it reads one page at a time from the database file.
SQLite严格遵循按需获取策略，并且不使用其他的预取策略来保证非常简单的获取策略。而且，每次只从数据库文件中读出出来一个页面。

5.3.6 Cache management

In general, a page-cache is of limited size, and unless the database is pretty small, it can hold only a
small number of pages from the database.
通常，缓存非常小。除非数据库非常小，否则，缓存只能存下数据库中的很小数目的页面。

The basic idea is to keep in the cache those pages that are immediately required by cache clients. We need to consider three things while devising a cache management policy.
基本的思想是保留缓存客户端理解需要的数据。在设计一个缓存管理策略是，我们需要考虑三件事情。

(1) Whenever there is a page in the cache, there is also a master copy of the
page in the database file. Whenever the cache copy is updated, the master copy may need to be
updated too.

无论任何时候，只要缓存中有页面，那么在数据库文件中一定存在该页面的主副本。无论任何时候，只要cache被更新了，那么主副本也需要进行更新。

(2) For a requested page that is not in the cache the master copy is referenced and
a new cache copy is made from the master.

对于不在缓存中的请求页,将引用主副本,并从主副本创建新的缓存副本。

(3) If the cache is full and a new page is to be placed
in the cache, a replacement algorithm is invoked to remove some old page from the cache to make
room for the new one.

如果缓存已满,并且将在缓存中放置新页,则调用替换算法从缓存中删除一些旧页面,为新页面腾出空间。

As cache is a limited size storage space, we need to recycle the cache space to 'fit' a larger
collection of pages into a small number of cache slots
cache的存储空间很小，我们需要对缓存空间进行回收来将的页面集合映射进入小数目的slot当中。

In the figure, there are 26 master pages that we would need to fit into five cache slots by recycling the slots.
在图中，我们需要通过回收slot的方式将26个主页面放入到5个cache slot中。

Cache management is very crucial for cache performance as well as the overall system performance.
缓存管理对缓存性能和系统性能至关重要

Aslong as there are free slots available in the cache for newly requested pages, there is no hard work
to do by the cache manager.
只要缓存中有可用于新请求的页面的可用插槽,缓存管理器就无需执行任何困难的工作。

Cache management becomes challenging when the cache becomes full.

当缓存已满时,缓存管理变得十分困难。

The duty of a cache manager is to decide what it will keep in the cache and what it will flush
out off the cache when the cache is full.
缓存管理的职责是决定缓存中所以页面的去留。

The effectiveness of a cache is a measure of how often
requested pages are found in the cache. We need a cache with a very high hit rate.
意思是需要非常高的命中率。

5.3.6.1 Cache replacement

缓存替换策略

SQLite uses a kind of least recently used (LRU) replacement scheme.
5.3.6.2 LRU cache replacement scheme

SQLite organizes inactive pages in a logical queue. When a page is unpinned, the pager appends
the page at the tail end of the queue. (The page at the tail end of the queue is always the latest one
accessed, and the one at the head end of the queue is accessed farthest in the past.) The victim is
chosen from the header end of the queue, but may not always be the head element on the queue as
is done in the pure LRU scheme. SQLite tries to find a slot on the queue starting at the head such
that recycling that slot would not involve doing a flush of the journal file.
SQLite 在逻辑队列中组织非活动页。取消固定页面时,pager模块将页面追加到队列的末尾。(队列尾端的页面始终是最近访问的页面,而队列前端的页面在过去很久没有被访问了。victim页面是从队列的标头端选择的,但可能并不总是像纯 LRU 方案中那样成为队列中的头元素。SQLite 尝试在队列中查找从头部开始的插槽,以便回收该插槽时不会涉及刷新日志文件。

before writing a dirty page into the database file, the pager flushes the journal file.
在将脏页写入到数据库文件之前，pager模块将刷新日志文件。

If such a victim is found, the foremost one on the queue is recycled. Otherwise, SQLite
does flush the journal file first, and then recycles the head slot from the queue. If the victim page
is dirty, the pager writes the page to the database file before recycling i
如果这样的页面找到了，那么队列中的该页将被回收。否则，SQLite将首先刷新日志文件，接着从队列中回收head slot。如果选中的victim是脏页，那么pager模块会在回收该脏页之前将该脏页写回到数据库文件当中。

5.4 Transaction Management

the pager decides on the mode of locks and the time of acquiring and releasing the locks.
pager模块决定锁的模式以及获取和释放锁的时间

It follows strict two phase locking protocol to produce serializable executions of transactions.

它遵循严格的两阶段锁定协议来生成可序列化的事务执行

Like any other DBMS, SQLite's transaction management has two components:

(1) normal processing and

(2) recovery processing.
During normal processing the pager saves enough recovery information in the journal file, and it uses the saved information at the recovery processing time when needs arise. The activities of the two processing components are presented in the next two
subsections. Bits and bytes of transaction management are discussed earlier in piecemeal. Here, I consolidate them in a cohesive manner.

在正常处理过程中,pager模块在日志文件中保存足够的恢复信息,并在需要时在恢复处理时使用保存的信息。以下两节将介绍两个处理组件的活动。事务管理的位和字节在前面进行了零敲碎打的讨论。在这里,我以一种有凝聚力的方式巩固它们。

5.4.1 Normal processing
, the cache read operation pins down the page

each in-memory page image is followed by a chunk of private space. This extra space is always initialized to zeros the first time the page is loaded from the database file ( or created and initialized) into the main memory. This space is later
reinitialized by the tree module
每个内存当中的页面镜像的后面是一个私有空间的区域。这个额外的空间初始化为零（当该页第一次被从数据库文件中加载进内存的时候）。这个空间随后会被btree模块进行重新初始化。

As mentioned previously, a requested page may not be in the page cache. In that case the pager
finds a free cache-slot and reads the page from the database file in a user transparent way. Getting
a free cache slot may lead to a writing of a ( victim) page into the database file, i.e., requires a cache
flush
当请求的页面不在cache中时，pager将寻找一个空闲的cache-slot，并且从数据库文件中读取一页放入这个空闲的slot中。考虑一种场景：所有的free slot已经被用尽，所以需要寻找一个victim page(如果这个victim page已经是脏页了，还需要写回database file)

5.4.1.2 Write operation

The first time the sqli te3PagerWri te function is called on a (any) page, the pager acquires
a reserved lock on the database file. The reserved lock indicates an intention to write the database
file in the near future. Only one transaction at a time can hold a reserved lock. If the pager is
unable to obtain the lock, it means that another transaction already has a reserved or stronger lock
on the file. In that case, the write attempt fails, and the pager returns the SQLITE_BUSY error
code to the calle
sqlite3PagerWrite 函数第一次被调用时，将会对这个数据库文件请求一个保留锁。该锁表明在不久的将来将会写数据库文件。在一个时间，只能有一个事务占据保留锁，如果pager模块不能获得锁，那么意味着另一个事务已经获得了保留锁或者比保留锁更加高级的锁。这时pager模块还会创建并打开回滚日志。

SQLite never saves a new page (that is added, i.e., appended, to the database file by the current transaction) in the journal because there is no old value for the page. Instead, the initial size of the database file is stored in the journal segment header record (see Fig. 3.7 on page 91) when the journal file is created.

SQLite 从不在日志中保存新页(即按当前事务添加到数据库文件中),因为该页没有旧值。相反,创建日志文件时,数据库文件的初始大小存储在日志段标头记录中(参见第 91 页的图 3.7)。

If the database file is expanded by the transaction, the file will be truncated to its original size on a rollback.
如果事务扩展了数据库文件,则在回滚时,该文件将被截断为原始大小。

5.4.1.3 Cache flush

(1) the cache has filled up and there is a need for cache replacement,缓存已填满,需要替换缓存
or (2) the transaction is ready to commit its changes.

pager模块不会在当前日志段标头中写入日志记录数 (nRec) 值。(nRec 值是回滚操作的宝贵资源。当段标头形成时,同步事务的数字设置为零,对于异步事务,数字设置为 -1,又名 OxFFFFFFFF。刷新日志后,pager模块将 nRec 值写入当前日志段标头,并在文件中再次执行另一个 @sync。4 由于磁盘写入不是原子的,因此不会再重写 nRec 字段。pager模块改为为新即将打开的日志记录创建新的日志段。在这些情况下,SQLite 使用多段日志文件。

久许

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
3
评论
sqlite之pager模块

Pager是the page cache。不仅仅包含cache实体，而且包含cache的其他的属性。struct Pager { sqlite3_vfs *pVfs; /* OS functions to use for IO */ u8 exclusiveMode; /* Boolean. True if locking_mode==EXC...
复制链接

扫一扫