xlog文件内容结构
XLogPageHeaderData
XLogRecord
rmgr-specific data
BkpBlock
XLogRecData
BkpBlock
XLogRecData
...
/*
* disk page organization
*
* space management information generic to any page
*
* pd_lsn - identifies xlog record for last change to this page.
* pd_checksum - page checksum, if set.
* pd_flags - flag bits.
* pd_lower - offset to start of free space.
* pd_upper - offset to end of free space.
* pd_special - offset to start of special space.
* pd_pagesize_version - size in bytes and page layout version number.
* pd_prune_xid - oldest XID among potentially prunable tuples on page.
*
* The LSN is used by the buffer manager to enforce the basic rule of WAL:
* "thou shalt write xlog before data". A dirty buffer cannot be dumped
* to disk until xlog has been flushed at least as far as the page's LSN.
*
* pd_checksum stores the page checksum, if it has been set for this page;
* zero is a valid value for a checksum. If a checksum is not in use then
* we leave the field unset. This will typically mean the field is zero
* though non-zero values may also be present if databases have been
* pg_upgraded from releases prior to 9.3, when the same byte offset was
* used to store the current timelineid when the page was last updated.
* Note that there is no indication on a page as to whether the checksum
* is valid or not, a deliberate design choice which avoids the problem
* of relying on the page contents to decide whether to verify it. Hence
* there are no flag bits relating to checksums.
*
* pd_prune_xid is a hint field that helps determine whether pruning will be
* useful. It is currently unused in index pages.
*
* The page version number and page size are packed together into a single
* uint16 field. This is for historical reasons: before PostgreSQL 7.3,
* there was no concept of a page version number, and doing it this way
* lets us pretend that pre-7.3 databases have page version number zero.
* We constrain page sizes to be multiples of 256, leaving the low eight
* bits available for a version number.
*
* Minimum possible page size is perhaps 64B to fit page header, opaque space
* and a minimal tuple; of course, in reality you want it much bigger, so
* the constraint on pagesize mod 256 is not an important restriction.
* On the high end, we can only support pages up to 32KB because lp_off/lp_len
* are 15 bits.
*/
typedef struct PageHeaderData
{
/* XXX LSN is member of *any* block, not only page-organized ones */
PageXLogRecPtr pd_lsn;/* LSN: next byte after last byte of xlog
* record for last change to this page */
uint16 pd_checksum; /* checksum */
uint16 pd_flags; /* flag bits, see below */
LocationIndex pd_lower;/* offset to start of free space */
LocationIndex pd_upper;/* offset to end of free space */
LocationIndex pd_special;/* offset to start of special space */
uint16 pd_pagesize_version;
TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
ItemIdData pd_linp[1]; /* beginning of line pointer array */
} PageHeaderData;
typedef PageHeaderData *PageHeader;
XLogRecord
/*
* The overall layout of an XLOG record is:
* Fixed-size header (XLogRecord struct)
* rmgr-specific data
* BkpBlock
* backup block data
* BkpBlock
* backup block data
* ...
*
* where there can be zero to four backup blocks (as signaled by xl_info flag
* bits). XLogRecord structs always start on MAXALIGN boundaries in the WAL
* files, and we round up SizeOfXLogRecord so that the rmgr data is also
* guaranteed to begin on a MAXALIGN boundary. However, no padding is added
* to align BkpBlock structs or backup block data.
*
* NOTE: xl_len counts only the rmgr data, not the XLogRecord header,
* and also not any backup blocks. xl_tot_len counts everything. Neither
* length field is rounded up to an alignment boundary.
*/
typedef struct XLogRecord
{
uint32 xl_tot_len; /* total len of entire record */
TransactionId xl_xid;/* xact id */
uint32 xl_len; /* total len of rmgr data */
uint8 xl_info;/* flag bits, see below */
RmgrId xl_rmid; /* resource manager for this record */
/* 2 bytes of padding here, initialize to zero */
XLogRecPtr xl_prev; /* ptr to previous record in log */
pg_crc32 xl_crc; /* CRC for this record */
/* If MAXALIGN==8, there are 4 wasted bytes here */
/* ACTUAL LOG DATA FOLLOWS AT END OF STRUCT */
} XLogRecord;
PageXLogRecPtr
/*
* For historical reasons, the 64-bit LSN value is stored as two 32-bit
* values.
*/
typedef struct
{
uint32 xlogid; /* high bits */
uint32 xrecoff; /* low bits */
} PageXLogRecPtr;
XLogRecData
/*
* The rmgr data to be written by XLogInsert() is defined by a chain of
* one or more XLogRecData structs. (Multiple structs would be used when
* parts of the source data aren't physically adjacent in memory, or when
* multiple associated buffers need to be specified.)
*
* If buffer is valid then XLOG will check if buffer must be backed up
* (ie, whether this is first change of that page since last checkpoint).
* If so, the whole page contents are attached to the XLOG record, and XLOG
* sets XLR_BKP_BLOCK(N) bit in xl_info. Note that the buffer must be pinned
* and exclusive-locked by the caller, so that it won't change under us.
* NB: when the buffer is backed up, we DO NOT insert the data pointed to by
* this XLogRecData struct into the XLOG record, since we assume it's present
* in the buffer. Therefore, rmgr redo routines MUST pay attention to
* XLR_BKP_BLOCK(N) to know what is actually stored in the XLOG record.
* The N'th XLR_BKP_BLOCK bit corresponds to the N'th distinct buffer
* value (ignoring InvalidBuffer) appearing in the rdata chain.
*
* When buffer is valid, caller must set buffer_std to indicate whether the
* page uses standard pd_lower/pd_upper header fields. If this is true, then
* XLOG is allowed to omit the free space between pd_lower and pd_upper from
* the backed-up page image. Note that even when buffer_std is false, the
* page MUST have an LSN field as its first eight bytes!
*
* Note: data can be NULL to indicate no rmgr data associated with this chain
* entry. This can be sensible (ie, not a wasted entry) if buffer is valid.
* The implication is that the buffer has been changed by the operation being
* logged, and so may need to be backed up, but the change can be redone using
* only information already present elsewhere in the XLOG entry.
*/
typedef struct XLogRecData
{
char *data;/* start of rmgr data to include */
uint32 len; /* length of rmgr data to include */
Buffer buffer; /* buffer associated with data, if any */
bool buffer_std;/* buffer has standard pd_lower/pd_upper */
struct XLogRecData *next;/* next struct in chain, or NULL */
} XLogRecData;
typedef struct xl_xact_abort
{
TimestampTz xact_time;/* time of abort */
int nrels;/* number of RelFileNodes */
int nsubxacts;/* number of subtransaction XIDs */
/* Array of RelFileNode(s) to drop at abort */
RelFileNode xnodes[1];/* VARIABLE LENGTH ARRAY */
/* ARRAY OF ABORTED SUBTRANSACTION XIDs FOLLOWS */
} xl_xact_abort;
RmgrData
/*
* Method table for resource managers.
*
* This struct must be kept in sync with the PG_RMGR definition in
* rmgr.c.
*
* RmgrTable[] is indexed by RmgrId values (see rmgrlist.h).
*/
typedef struct RmgrData
{
const char *rm_name;
void (*rm_redo) (XLogRecPtr lsn, struct XLogRecord *rptr);
void (*rm_desc) (StringInfo buf, uint8 xl_info, char *rec);
void (*rm_startup) (void);
void (*rm_cleanup) (void);
} RmgrData;
BkpBlock
/*
* Header info for a backup block appended to an XLOG record.
*
* As a trivial form of data compression, the XLOG code is aware that
* PG data pages usually contain an unused "hole" in the middle, which
* contains only zero bytes. If hole_length > 0 then we have removed
* such a "hole" from the stored data (and it's not counted in the
* XLOG record's CRC, either). Hence, the amount of block data actually
* present following the BkpBlock struct is BLCKSZ - hole_length bytes.
*
* Note that we don't attempt to align either the BkpBlock struct or the
* block's data. So, the struct must be copied to aligned local storage
* before use.
*/
typedef struct BkpBlock
{
RelFileNode node;/* relation containing block */
ForkNumber fork; /* fork within the relation */
BlockNumber block;/* block number */
uint16 hole_offset; /* number of bytes before "hole" */
uint16 hole_length; /* number of bytes in "hole" */
/* ACTUAL BLOCK DATA FOLLOWS AT END OF STRUCT */
} BkpBlock;
RelFileNode
/*
* RelFileNode must provide all that we need to know to physically access
* a relation, with the exception of the backend ID, which can be provided
* separately. Note, however, that a "physical" relation is comprised of
* multiple files on the filesystem, as each fork is stored as a separate
* file, and each fork can be divided into multiple segments. See md.c.
*
* spcNode identifies the tablespace of the relation. It corresponds to
* pg_tablespace.oid.
*
* dbNode identifies the database of the relation. It is zero for
* "shared" relations (those common to all databases of a cluster).
* Nonzero dbNode values correspond to pg_database.oid.
*
* relNode identifies the specific relation. relNode corresponds to
* pg_class.relfilenode (NOT pg_class.oid, because we need to be able
* to assign new physical files to relations in some situations).
* Notice that relNode is only unique within a database in a particular
* tablespace.
*
* Note: spcNode must be GLOBALTABLESPACE_OID if and only if dbNode is
* zero. We support shared relations only in the "global" tablespace.
*
* Note: in pg_class we allow reltablespace == 0 to denote that the
* relation is stored in its database's "default" tablespace (as
* identified by pg_database.dattablespace). However this shorthand
* is NOT allowed in RelFileNode structs --- the real tablespace ID
* must be supplied when setting spcNode.
*
* Note: in pg_class, relfilenode can be zero to denote that the relation
* is a "mapped" relation, whose current true filenode number is available
* from relmapper.c. Again, this case is NOT allowed in RelFileNodes.
*
* Note: various places use RelFileNode in hashtable keys. Therefore,
* there *must not* be any unused padding bytes in this struct. That
* should be safe as long as all the fields are of type Oid.
*/
typedef struct RelFileNode
{
Oid spcNode;/* tablespace */
Oid dbNode;/* database */
Oid relNode;/* relation */
} RelFileNode;