通过使用主机层面的hexdump命令结合PG插件pageinspect可以对PG的Page分析,这样对PG的page会有更深的了解
--//创建测试表
drop table if exists test;
create table test (id int,name varchar(10));
insert into test values(1,'aaaaa');
insert into test values(2,'bbbbb');
postgres=# drop table if exists test;
DROP TABLE
postgres=# create table test (id int,name varchar(10));
CREATE TABLE
postgres=# insert into test values(1,'aaaaa');
INSERT 0 1
postgres=# insert into test values(2,'bbbbb');
INSERT 0 1
postgres=#
postgres=# select * from test;
id | name
----+-------
1 | aaaaa
2 | bbbbb
(2 rows)
--//获取表的路径
postgres=# \df *filepath*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+----------------------+------------------+---------------------+------
pg_catalog | pg_relation_filepath | text | regclass | func
(1 row)
postgres=# select pg_relation_filepath('test');
pg_relation_filepath
----------------------
base/13237/73829
(1 row)
postgres=#
--//参考获取结构体大小的方法,使用hexdump的方式分析page
--// hexdump -C $PGDATA/base/13237/73829
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829
00000000 00 00 00 00 40 e7 16 03 00 00 00 00 20 00 b0 1f |....@....... ...|
00000010 00 20 04 20 00 00 00 00 d8 9f 44 00 b0 9f 44 00 |. . ......D...D.|
00000020 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00001fb0 65 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 |e...............|
00001fc0 02 00 02 00 02 08 18 00 02 00 00 00 0d 62 62 62 |.............bbb|
00001fd0 62 62 00 00 00 00 00 00 64 00 10 00 00 00 00 00 |bb......d.......|
00001fe0 00 00 00 00 00 00 00 00 01 00 02 00 02 08 18 00 |................|
00001ff0 01 00 00 00 0d 61 61 61 61 61 00 00 00 00 00 00 |.....aaaaa......|
00002000
[postgres@postgres ~]$
--//因为test表只插入了两行数据,所以test表的大小为8K
--//通过上面hexdump的分析可以看到,首先插入aaaaa数据在块的最底部
--//一个Page的组成PageheaderData+ItemData+tupleHeaer+tuple
--//Page的组成可以参考https://blog.csdn.net/m15217321304/article/details/107956507
--//PageheaderData的结构如下
typedef struct PageHeaderData
{
/* XXX LSN is member of *any* block, not only page-organized ones */
PageXLogRecPtr pd_lsn; /* LSN: next byte after last byte of xlog
* record for last change to this page */
uint16 pd_checksum; /* checksum */
uint16 pd_flags; /* flag bits, see below */
LocationIndex pd_lower; /* offset to start of free space */
LocationIndex pd_upper; /* offset to end of free space */
LocationIndex pd_special; /* offset to start of special space */
uint16 pd_pagesize_version;
TransactionId pd_prune_xid; /* oldest prunable XID, or zero if none */
ItemIdData pd_linp[FLEXIBLE_ARRAY_MEMBER]; /* line pointer array */
} PageHeaderData;
--//参考https://blog.csdn.net/m15217321304/article/details/117818394获取结构体变量大小的方法,获取结构体各变量的大小
--//使用hexdump按照变量大小,获取相应的值
1、PageHeader
--//pd_lsn
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 0 -n 8
00000000 00 00 00 00 40 e7 16 03 |....@...|
00000008
[postgres@postgres ~]$
--//数据文件的8个Bytes存储的是LSN,其中最开始的4个Bytes是TimelineID,在这里是\x0000 0000(即数字0),后面的4个Bytes是\x0316e740,LSN为0/316e740
--//X86为小端模式,字节需要高低位变换
--//数据库中查看LSN的方法
postgres=# \df *lsn*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+---------------------------+------------------+---------------------+------
pg_catalog | pg_current_wal_flush_lsn | pg_lsn | | func
pg_catalog | pg_current_wal_insert_lsn | pg_lsn | | func
pg_catalog | pg_current_wal_lsn | pg_lsn | | func
pg_catalog | pg_last_wal_receive_lsn | pg_lsn | | func
pg_catalog | pg_last_wal_replay_lsn | pg_lsn | | func
pg_catalog | pg_lsn_cmp | integer | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_eq | boolean | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_ge | boolean | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_gt | boolean | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_hash | integer | pg_lsn | func
pg_catalog | pg_lsn_hash_extended | bigint | pg_lsn, bigint | func
pg_catalog | pg_lsn_in | pg_lsn | cstring | func
pg_catalog | pg_lsn_le | boolean | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_lt | boolean | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_mi | numeric | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_ne | boolean | pg_lsn, pg_lsn | func
pg_catalog | pg_lsn_out | cstring | pg_lsn | func
pg_catalog | pg_lsn_recv | pg_lsn | internal | func
pg_catalog | pg_lsn_send | bytea | pg_lsn | func
pg_catalog | pg_wal_lsn_diff | numeric | pg_lsn, pg_lsn | func
(20 rows)
postgres=# select pg_current_wal_lsn();
pg_current_wal_lsn
--------------------
0/316E880
(1 row)
postgres=#
--//pd_checksum
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8 -n 2
00000008 00 00 |..|
0000000a
[postgres@postgres ~]$
--//checksum为\x0000
--//pd_flags
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 12 -n 2
0000000c 20 00 | .|
0000000e
[postgres@postgres ~]$
--//0x0020 转换10进制为32
--//pd_upper
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 14 -n 2
0000000e b0 1f |..|
00000010
[postgres@postgres ~]$
--//0x1fb0 转换10进制为8112
--//pd_special
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 16 -n 2
00000010 00 20 |. |
00000012
[postgres@postgres ~]$
--//0x2000 转换10进制为8192
--//pd_pagesize_version
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 18 -n 2
00000012 04 20 |. |
00000014
[postgres@postgres ~]$
--//0x2004 转换10进制为 8196
--//pd_prune_xid
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 20 -n 4
00000014 00 00 00 00 |....|
00000018
[postgres@postgres ~]$
--//0x0000
--//PageHeaderData的数据可以通过pageinspect来确认
postgres=# \dx
List of installed extensions
Name | Version | Schema | Description
--------------+---------+------------+-------------------------------------------------------
add_one | 1.0 | public | only test publication
pageinspect | 1.7 | public | inspect the contents of database pages at a low level
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgres_fdw | 1.0 | public | foreign-data wrapper for remote PostgreSQL servers
(4 rows)
postgres=# \dx+ pageinspect
Objects in extension "pageinspect"
Object description
-------------------------------------------------------------------
function brin_metapage_info(bytea)
function brin_page_items(bytea,regclass)
function brin_page_type(bytea)
function brin_revmap_data(bytea)
function bt_metap(text)
function bt_page_items(bytea)
function bt_page_items(text,integer)
function bt_page_stats(text,integer)
function fsm_page_contents(bytea)
function get_raw_page(text,integer)
function get_raw_page(text,text,integer)
function gin_leafpage_items(bytea)
function gin_metapage_info(bytea)
function gin_page_opaque_info(bytea)
function hash_bitmap_info(regclass,bigint)
function hash_metapage_info(bytea)
function hash_page_items(bytea)
function hash_page_stats(bytea)
function hash_page_type(bytea)
function heap_page_item_attrs(bytea,regclass)
function heap_page_item_attrs(bytea,regclass,boolean)
function heap_page_items(bytea)
function page_checksum(bytea,integer)
function page_header(bytea)
function tuple_data_split(oid,bytea,integer,integer,text)
function tuple_data_split(oid,bytea,integer,integer,text,boolean)
(26 rows)
postgres=# select * from page_header(get_raw_page('test',0));
lsn | checksum | flags | lower | upper | special | pagesize | version | prune_xid
-----------+----------+-------+-------+-------+---------+----------+---------+-----------
0/316E740 | 0 | 0 | 32 | 8112 | 8192 | 8192 | 4 | 0
(1 row)
postgres=#
2、PageHeaderData之后就是ItemData
PageHeaderData之后是ItemId数组,每个元素占用的空间为4Bytes,数据结构:
typedef struct ItemIdData
{
unsigned lp_off:15,/* offset to tuple (from start of page) */
lp_flags:2,/* state of item pointer, see below */
lp_len:15;/* byte length of tuple */
} ItemIdData;
typedef * ItemId;
--//ItemIdData的长度为4个字节,可以参考https://blog.csdn.net/m15217321304/article/details/117818394
--//一个ItemIdData代表一行tuple
--//其中前15bit为 off,中间两位为state,后15位tuple的长度
--//ItemIdData前16bit
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 24 -n 2
00000018 d8 9f |..D.|
0000001c
[postgres@postgres ~]$
--//0x9fd8
--//lp_off
echo $((0x9fd8& ~$((1<<15))))
[postgres@postgres ~]$ echo $((0x9fd8& ~$((1<<15))))
8152
[postgres@postgres ~]$
--//表示第1个Item(tuple)从8152开始
--//lp_len
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 26 -n 2
0000001a 44 00 |D.|
0000001c
[postgres@postgres ~]$
[postgres@postgres ~]$ echo $((0x0044 >> 1))
34
[postgres@postgres ~]$
--//表示第1个Item(tuple)的大小为34
--//一下获取4位
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 24 -n 4
00000018 d8 9f 44 00 |..D.|
0000001c
[postgres@postgres ~]$
--//0x00449fd8
--//转换2进制 0000000000100010 01 001111111011000
--//lp_flags
取第16-17位,01,即1
--//另外一种获取方式
[postgres@postgres ~]$ echo $((0x00449fd8>>17))
34
[postgres@postgres ~]$ echo $((0x00449fd8&(0xffff>>1))) //通过与的方式,取后15位
8152
[postgres@postgres ~]$ echo $((0x00449fd8>>15&0xf>>2)) //取中间两位
1
--//通过PageHeaderData的pd_lower可以推算出该Page包含多少行数据
--//(pg_lower-24)/4 = (32-24)/4 = 2 ,其中前24位page固定的大小,然后一个ItemData占用4个字节
--//第二行数据
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 28 -n 4
0000001c b0 9f 44 00 |..D.|
00000020
[postgres@postgres ~]$
[postgres@postgres ~]$ echo $((0x00449fb0>>17))
34
[postgres@postgres ~]$ echo $((0x00449fb0&(0xffff>>1)))
8112
[postgres@postgres ~]$ echo $((0x00449fb0>>15&0xf>>2))
1
[postgres@postgres ~]$
--//通过pageinspect获取item,确认得到的值是一样的
postgres=# select * from heap_page_items(get_raw_page('test',0));
lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid | t_data
----+--------+----------+--------+---------+--------+----------+--------+-------------+------------+--------+--------+-------+------------------------
1 | 8152 | 1 | 34 | 1048676 | 0 | 0 | (0,1) | 2 | 2306 | 24 | | | \x010000000d6161616161
2 | 8112 | 1 | 34 | 1048677 | 0 | 0 | (0,2) | 2 | 2306 | 24 | | | \x020000000d6262626262
(2 rows)
postgres=#
3、行数据(Tuple) = tupleheader+tuple数据
--//access/htup_details.h
--//tupleheader数据
struct HeapTupleHeaderData
{
union
{
HeapTupleFields t_heap;
DatumTupleFields t_datum;
} t_choice;
ItemPointerData t_ctid; /* current TID of this or newer tuple (or a
* speculative insertion token) */
/* Fields below here must match MinimalTupleData! */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK2 2
uint16 t_infomask2; /* number of attributes + various flags */
#define FIELDNO_HEAPTUPLEHEADERDATA_INFOMASK 3
uint16 t_infomask; /* various flag bits, see below */
#define FIELDNO_HEAPTUPLEHEADERDATA_HOFF 4
uint8 t_hoff; /* sizeof header incl. bitmap, padding */
/* ^ - 23 bytes - ^ */
#define FIELDNO_HEAPTUPLEHEADERDATA_BITS 5
bits8 t_bits[FLEXIBLE_ARRAY_MEMBER]; /* bitmap of NULLs */
/* MORE DATA FOLLOWS AT END OF STRUCT */
};
--//tulefield数据
typedef struct HeapTupleFields
{
TransactionId t_xmin; /* inserting xact ID */
TransactionId t_xmax; /* deleting or locking xact ID */
union
{
CommandId t_cid; /* inserting or deleting command ID, or both */
TransactionId t_xvac; /* old-style VACUUM FULL xact ID */
} t_field3;
} HeapTupleFields;
--//第1个Tuple的偏移为8152,下面使用hexdump对其中的数据解析:
--//t_xmin
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8152 -n 4
00001fd8 64 00 10 00 |d...|
00001fdc
[postgres@postgres ~]$ echo $((0x00100064))
1048676
[postgres@postgres ~]$
--//t_max
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8156 -n 4
00001fdc 00 00 00 00 |....|
00001fe0
[postgres@postgres ~]$
--//t_cid/t_xvac
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8160 -n 4
00001fe0 00 00 00 00 |....|
00001fe4
[postgres@postgres ~]$
--//t_ctid
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8164 -n 6
00001fe4 00 00 00 00 01 00 |......|
00001fea
[postgres@postgres ~]$
//ip_blkid=\x0000,即blockid=0
//ip_posid=\x0001,即posid=1,第1个tuple
--//t_infomask2
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8170 -n 2
00001fea 02 00 |..|
00001fec
[postgres@postgres ~]$
--//0x0002
--//前11位代表属性的个数(比如一个表,有多少个列,这里表示的就是test表包含了2个列)
--//t_infomask
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8172 -n 2
00001fec 02 08 |..|
00001fee
[postgres@postgres ~]$
--//0x0802 转换10十进制为2050
--//组成为HEAP_XMAX_INVALID + HEAP_HASVARWIDTH
--//t_infomask标识
/*
* information stored in t_infomask:
*/
#define HEAP_HASNULL 0x0001 /* has null attribute(s) */
#define HEAP_HASVARWIDTH 0x0002 /* has variable-width attribute(s) */
#define HEAP_HASEXTERNAL 0x0004 /* has external stored attribute(s) */
#define HEAP_HASOID 0x0008 /* has an object-id field */
#define HEAP_XMAX_KEYSHR_LOCK 0x0010 /* xmax is a key-shared locker */
#define HEAP_COMBOCID 0x0020 /* t_cid is a combo cid */
#define HEAP_XMAX_EXCL_LOCK 0x0040 /* xmax is exclusive locker */
#define HEAP_XMAX_LOCK_ONLY 0x0080 /* xmax, if valid, is only a locker */
/* xmax is a shared locker */
#define HEAP_XMAX_SHR_LOCK (HEAP_XMAX_EXCL_LOCK | HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_LOCK_MASK (HEAP_XMAX_SHR_LOCK | HEAP_XMAX_EXCL_LOCK | \
HEAP_XMAX_KEYSHR_LOCK)
#define HEAP_XMIN_COMMITTED 0x0100 /* t_xmin committed */
#define HEAP_XMIN_INVALID 0x0200 /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)
#define HEAP_XMAX_COMMITTED 0x0400 /* t_xmax committed */
#define HEAP_XMAX_INVALID 0x0800 /* t_xmax invalid/aborted */
#define HEAP_XMAX_IS_MULTI 0x1000 /* t_xmax is a MultiXactId */
#define HEAP_UPDATED 0x2000 /* this is UPDATEd version of row */
#define HEAP_MOVED_OFF 0x4000 /* moved to another place by pre-9.0
* VACUUM FULL; kept for binary
* upgrade support */
#define HEAP_MOVED_IN 0x8000 /* moved from another place by pre-9.0
* VACUUM FULL; kept for binary
* upgrade support */
#define HEAP_MOVED (HEAP_MOVED_OFF | HEAP_MOVED_IN)
--//t_hoff
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8174 -n 2
00001fee 18 00 |.|
00001fef
[postgres@postgres ~]$
--//0x0018
[postgres@postgres ~]$ echo $((0x0018))
24
[postgres@postgres ~]$
--//用户数据开始偏移为24,即8152+24
--//Tuple数据
说完了Tuple的头部数据,接下来我们看看实际的数据存储。上一节我们得到Tuple总的长度是34,计算得到数据大小为34-24=10。
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8176 -n 10
00001ff0 01 00 00 00 0d 61 61 61 61 61 |.....aaaaa|
00001ffa
--//第2个Tuple的偏移为8112,下面使用hexdump对其中的数据解析:
--//t_xmin
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8112 -n 4
00001fb0 65 00 10 00 |e...|
00001fb4
[postgres@postgres ~]$
[postgres@postgres ~]$ echo $((0x00100065))
1048677
[postgres@postgres ~]$
--//t_max
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8116 -n 4
00001fb4 00 00 00 00 |....|
00001fb8
[postgres@postgres ~]$
--//t_cid/t_xvac
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8120 -n 4
00001fb8 00 00 00 00 |....|
00001fbc
[postgres@postgres ~]$
--//t_ctid
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8124 -n 6
00001fbc 00 00 00 00 02 00 |......|
00001fc2
[postgres@postgres ~]$
//ip_blkid=\x0000,即blockid=0
//ip_posid=\x0001,即posid=2,第2个tuple
--//t_infomask2
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8130 -n 2
00001fc2 02 00 |..|
00001fc4
--//包含两个属性
--//t_infomask
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8132 -n 2
00001fc4 02 08 |..|
00001fc6
[postgres@postgres ~]$
--//0x0802 转换10十进制为2050
--//组成为HEAP_XMAX_INVALID + HEAP_HASVARWIDTH
--//t_hoff
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8134 -n 2
00001fc6 18 00 |..|
00001fc8
[postgres@postgres ~]$
--//用户数据开始偏移为24,即8112+24
--//Tuple数据
说完了Tuple的头部数据,接下来我们看看实际的数据存储。从最开始我们得到Tuple总的长度是34,计算得到数据大小为34-24=10。
[postgres@postgres ~]$ hexdump -C $PGDATA/base/13237/73829 -s 8136 -n 10
00001fc8 02 00 00 00 0d 62 62 62 62 62 |.....bbbbb|
00001fd2
[postgres@postgres ~]$
可以看到,得到的两行数据和最初创建的TEST一致
create table test (id int,name varchar(10));
insert into test values(1,'aaaaa');
insert into test values(2,'bbbbb');
第一行数据
id=\x00000001 name=\x0d6161616161
第二行数据
id=\x00000002 name=\x0d6262626262
获取数据的几个关键点
1、通过pd_lower的值可以判断该page有多少行记录,(pd_lower-24)/4 = 行数
2、通过t_infomask2 可以获得该page中表有多少个列
不清楚的点
1、怎么知道列的类型,列的长度
2、怎么根据列的个数,把数据拆分成对应列的值,比如第二行数据02 00 00 00 0d 62 62 62 62 62 ,怎么知道第一个ID列占多少位,name列占多少?