PostgreSQL OID 源码分析

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

  • Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
  • Postgres-XL的项目发起人Mason Sharp
  • pgpool的作者石井达夫(Tatsuo Ishii)
  • PG-Strom的作者海外浩平(Kaigai Kohei)
  • Greenplum研发总监姚延栋
  • 周正中(德哥), PostgreSQL中国用户会创始人之一
  • 汪洋,平安科技数据库技术部经理
  • ……
 
  • 2015年度PG大象会报名地址:http://postgres2015.eventdove.com/
  • PostgreSQL中国社区: http://postgres.cn/
  • PostgreSQL专业1群: 3336901(已满)
  • PostgreSQL专业2群: 100910388
  • PostgreSQL专业3群: 150657323



PostgreSQL的许多系统表都用到了OID这个数据类型,OID是一个uint类型,最大可以存储的值为2^32-1

/*
 * Object ID is a fundamental type in Postgres.
 */
typedef unsigned int Oid;

很多系统表使用OID作为主键,PostgreSQL 使用 GetNewOid函数获取下一个可用的OID, 那么怎么保证获取到的OID是可用的呢,并且确保不重复即不违反唯一约束?
例如,我们使用pg_resetxlog可以重置NEXTOid,如果被重置后,又是如何获取一个不会违反唯一约束的OID呢?

看代码最直接,获取OID的底层函数如下,第一次时,一次性获取 VAR_OID_PREFETCH = 8192个作为缓存,
access/transam/varsup.c:#define VAR_OID_PREFETCH                8192

src/backend/access/transam/varsup.c

/*
 * GetNewObjectId -- allocate a new OID
 *
 * OIDs are generated by a cluster-wide counter.  Since they are only 32 bits
 * wide, counter wraparound will occur eventually, and therefore it is unwise
 * to assume they are unique unless precautions are taken to make them so.
 * Hence, this routine should generally not be used directly.  The only
 * direct callers should be GetNewOid() and GetNewRelFileNode() in
 * catalog/catalog.c.
 */
Oid
GetNewObjectId(void)
{
        Oid                     result;

        /* safety check, we should never get this far in a HS slave */
        if (RecoveryInProgress())
                elog(ERROR, "cannot assign OIDs during recovery");

        LWLockAcquire(OidGenLock, LW_EXCLUSIVE);

        /*
         * Check for wraparound of the OID counter.  We *must* not return 0
         * (InvalidOid); and as long as we have to check that, it seems a good
         * idea to skip over everything below FirstNormalObjectId too. (This
         * basically just avoids lots of collisions with bootstrap-assigned OIDs
         * right after a wrap occurs, so as to avoid a possibly large number of
         * iterations in GetNewOid.)  Note we are relying on unsigned comparison.
         *
         * During initdb, we start the OID generator at FirstBootstrapObjectId, so
         * we only wrap if before that point when in bootstrap or standalone mode.
         * The first time through this routine after normal postmaster start, the
         * counter will be forced up to FirstNormalObjectId.  This mechanism
         * leaves the OIDs between FirstBootstrapObjectId and FirstNormalObjectId
         * available for automatic assignment during initdb, while ensuring they
         * will never conflict with user-assigned OIDs.
         */
        if (ShmemVariableCache->nextOid < ((Oid) FirstNormalObjectId))    // 处理非常规OID,改为常规的OID,起到保护作用
        {
                if (IsPostmasterEnvironment)    
                {
                        /* wraparound, or first post-initdb assignment, in normal mode */
                        ShmemVariableCache->nextOid = FirstNormalObjectId;
                        ShmemVariableCache->oidCount = 0;
                }
                else
                {
                        /* we may be bootstrapping, so don't enforce the full range */
                        if (ShmemVariableCache->nextOid < ((Oid) FirstBootstrapObjectId))
                        {
                                /* wraparound in standalone mode (unlikely but possible) */
                                ShmemVariableCache->nextOid = FirstNormalObjectId;
                                ShmemVariableCache->oidCount = 0;
                        }
                }
        }

        /* If we run out of logged for use oids then we must log more */
        if (ShmemVariableCache->oidCount == 0)  // 当一次获取的缓存用完后,重新获取8192个oid
        {
                XLogPutNextOid(ShmemVariableCache->nextOid + VAR_OID_PREFETCH);  // 一次获取多个
                ShmemVariableCache->oidCount = VAR_OID_PREFETCH;  // 将计数器置为8192
        }

        result = ShmemVariableCache->nextOid;

        (ShmemVariableCache->nextOid)++;  // 每次使用一个oid后nextOid加1
        (ShmemVariableCache->oidCount)--;  // 对应的count减1

        LWLockRelease(OidGenLock);

        return result;
}


16384是第一个数据库启动后,用户可以使用的OID。
src/include/access/transam.h

/* ----------
 *              Object ID (OID) zero is InvalidOid.
 *
 *              OIDs 1-9999 are reserved for manual assignment (see the files
 *              in src/include/catalog/).
 *
 *              OIDS 10000-16383 are reserved for assignment during initdb
 *              using the OID generator.  (We start the generator at 10000.)
 *
 *              OIDs beginning at 16384 are assigned from the OID generator
 *              during normal multiuser operation.  (We force the generator up to
 *              16384 as soon as we are in normal operation.)
 *
 * The choices of 10000 and 16384 are completely arbitrary, and can be moved
 * if we run low on OIDs in either category.  Changing the macros below
 * should be sufficient to do this.
 *
 * NOTE: if the OID generator wraps around, we skip over OIDs 0-16383
 * and resume with 16384.  This minimizes the odds of OID conflict, by not
 * reassigning OIDs that might have been assigned during initdb.
 * ----------
 */
#define FirstBootstrapObjectId  10000
#define FirstNormalObjectId             16384


下面是获取OID的面向上层的接口,这里有保护作用。
例如我们前面的疑问,如何确保获取的OID不违反唯一约束。
src/backend/catalog/catalog.c

/*
 * GetNewOid
 *              Generate a new OID that is unique within the given relation.
 *
 * Caller must have a suitable lock on the relation.
 *
 * Uniqueness is promised only if the relation has a unique index on OID.
 * This is true for all system catalogs that have OIDs, but might not be
 * true for user tables.  Note that we are effectively assuming that the
 * table has a relatively small number of entries (much less than 2^32)
 * and there aren't very long runs of consecutive existing OIDs.  Again,
 * this is reasonable for system catalogs but less so for user tables.
 *
 * Since the OID is not immediately inserted into the table, there is a
 * race condition here; but a problem could occur only if someone else
 * managed to cycle through 2^32 OIDs and generate the same OID before we
 * finish inserting our row.  This seems unlikely to be a problem.  Note
 * that if we had to *commit* the row to end the race condition, the risk
 * would be rather higher; therefore we use SnapshotDirty in the test,
 * so that we will see uncommitted rows.
 */
Oid
GetNewOid(Relation relation)
{
        Oid                     oidIndex;

        /* If relation doesn't have OIDs at all, caller is confused */
        Assert(relation->rd_rel->relhasoids);

        /* In bootstrap mode, we don't have any indexes to use */
        if (IsBootstrapProcessingMode())
                return GetNewObjectId();

        /* The relcache will cache the identity of the OID index for us */
        oidIndex = RelationGetOidIndex(relation);     // 获取OID的对象的OID列的唯一索引的pg_class.oid

        /* If no OID index, just hand back the next OID counter value */
        if (!OidIsValid(oidIndex))   // 如果OID不为InvalidOid ,说明这个需要获取oid的对象的OID列上面没有唯一索引
        {
                /*
                 * System catalogs that have OIDs should *always* have a unique OID
                 * index; we should only take this path for user tables. Give a
                 * warning if it looks like somebody forgot an index.
                 */
                if (IsSystemRelation(relation))
                        elog(WARNING, "generating possibly-non-unique OID for \"%s\"",
                                 RelationGetRelationName(relation));

                return GetNewObjectId();  // 不考虑重复问题,因为OID列没有唯一索引
        }

        /* Otherwise, use the index to find a nonconflicting OID */
        return GetNewOidWithIndex(relation, oidIndex, ObjectIdAttributeNumber);  // 考虑重复
}


获取 OID的对象的OID列的唯一索引的pg_class.oid,有唯一索引,返回pg_class.oid,没有唯一索引,返回 InvalidOid 

/*
 * RelationGetOidIndex -- get the pg_class OID of the relation's OID index
 *
 * Returns InvalidOid if there is no such index.
 */
Oid
RelationGetOidIndex(Relation relation)
{
List   *ilist;

/*
* If relation doesn't have OIDs at all, caller is probably confused. (We
* could just silently return InvalidOid, but it seems better to throw an
* assertion.)
*/
Assert(relation->rd_rel->relhasoids);

if (relation->rd_indexvalid == 0)
{
/* RelationGetIndexList does the heavy lifting. */
ilist = RelationGetIndexList(relation);
list_free(ilist);
Assert(relation->rd_indexvalid != 0);
}

return relation->rd_oidindex;   // 返回OID列对应的唯一索引的pg_class.OID
}

typedef struct RelationData
{
......
Oid rd_oidindex; /* OID of unique index on OID, if any */  
......
}


对于OID列有唯一索引的表,需要调用这个函数来获取OID,确保不会获得违反唯一约束的OID。
例如pg_class.oid列有唯一索引,那么在向pg_class插入数据时,需要使用 GetNewOidWithIndex来获得next oid.

/*
 * GetNewOidWithIndex
 *              Guts of GetNewOid: use the supplied index
 *
 * This is exported separately because there are cases where we want to use
 * an index that will not be recognized by RelationGetOidIndex: TOAST tables
 * have indexes that are usable, but have multiple columns and are on
 * ordinary columns rather than a true OID column.  This code will work
 * anyway, so long as the OID is the index's first column.  The caller must
 * pass in the actual heap attnum of the OID column, however.
 *
 * Caller must have a suitable lock on the relation.
 */
Oid
GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)
{
        Oid                     newOid;
        SnapshotData SnapshotDirty;
        SysScanDesc scan;
        ScanKeyData key;
        bool            collides;

        InitDirtySnapshot(SnapshotDirty);

        /* Generate new OIDs until we find one not in the table */  // 获取一个不会违反唯一约束的OID
        do
        {
                CHECK_FOR_INTERRUPTS();

                newOid = GetNewObjectId();

                ScanKeyInit(&key,
                                        oidcolumn,
                                        BTEqualStrategyNumber, F_OIDEQ,
                                        ObjectIdGetDatum(newOid));

                /* see notes above about using SnapshotDirty */
                scan = systable_beginscan(relation, indexId, true,
                                                                  &SnapshotDirty, 1, &key);

                collides = HeapTupleIsValid(systable_getnext(scan));

                systable_endscan(scan);
        } while (collides);

        return newOid;
}


例子:

postgres=# select max(oid) from pg_class;
  max   
--------
 186766
(1 row)


从控制文件获得 NextOID =  186769

postgres@digoal-> pg_controldata 
Latest checkpoint's NextOID:          186769


创建一个包含oid的表,并且oid列上面没有建唯一索引。

postgres=# create table t1(id int) with oids;
CREATE TABLE
postgres=# insert into t1 values (1);
INSERT 186772 1
postgres=# select oid,* from t1;
  oid   | id 
--------+----
 186772 |  1
(1 row)
postgres=# insert into t1 values (1);
INSERT 186773 1
postgres=# select oid,* from t1;
  oid   | id 
--------+----
 186772 |  1
 186773 |  1
(2 rows)


修改控制文件,把nextoid改为100,小于normaloid的一个值。

postgres@digoal-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped

postgres@digoal-> pg_resetxlog --help
pg_resetxlog resets the PostgreSQL transaction log.

Usage:
  pg_resetxlog [OPTION]... DATADIR

Options:
  -e XIDEPOCH      set next transaction ID epoch
  -f               force update to be done
  -l XLOGFILE      force minimum WAL starting location for new transaction log
  -m MXID,MXID     set next and oldest multitransaction ID
  -n               no update, just show what would be done (for testing)
  -o OID           set next OID
  -O OFFSET        set next multitransaction offset
  -V, --version    output version information, then exit
  -x XID           set next transaction ID
  -?, --help       show this help, then exit

Report bugs to <pgsql-bugs@postgresql.org>.
postgres@digoal-> pg_resetxlog -o 100 $PGDATA
Transaction log reset


启动数据库后,插入T1表,我们看到获取到的oid是16384即从 FirstNormalObjectId 开始的第一个值。

postgres=# insert into t1 values (1);
INSERT 16384 1
postgres=# insert into t1 values (1);
INSERT 16385 1
postgres=# insert into t1 values (1);
INSERT 16386 1


控制文件在做完检查点后会被修改。

postgres=# \q
postgres@digoal-> pg_controldata |grep OID
Latest checkpoint's NextOID:          100

postgres=# checkpoint;
CHECKPOINT

修改后的值= 24576

postgres@digoal-> pg_controldata |grep OID
Latest checkpoint's NextOID:          24576

这个值刚好等于 16384+8192,验证了前面的代码,一次获取8192个值。
正常关闭数据库的时候,这个值又会被修改到真实的值,即 16387
如果是非正常关闭,那么预分配的OID会跳掉,不过没有关系,因为OID是循环使用的,以后还有可能被用到跳过的OID。

postgres=# select 16384+8192;
 ?column? 
----------
    24576
(1 row)

再次创建一个T2表,查看pg_class.oid

postgres=# create table t2(id int);
CREATE TABLE
postgres=# select oid from pg_class where relname='t2';
  oid  
-------
 16387
(1 row)

因为建表时,默认会自动创建表结构对应的复合类型以及复合类型的数组类型。

postgres=# select * from pg_type where oid=16389;
 typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput  | typoutput  | typreceive  |   typsend   | typmodin | typmodout | typan
alyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl 
---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+-----------+------------+-------------+-------------+----------+-----------+------
------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------
 t2      |         2200 |       10 |     -1 | f        | c       | C           | f              | t            | ,        |    16387 |       0 |    16388 | record_in | record_out | record_recv | record_send | -        | -         | -    
      | d        | x          | f          |           0 |        -1 |        0 |            0 |               |            | 
(1 row)
postgres=# select * from pg_type where oid=16388;
 typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput | typoutput | typreceive |  typsend   | typmodin | typmodout |    typana
lyze    | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl 
---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+----------+-----------+------------+------------+----------+-----------+----------
--------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------
 _t2     |         2200 |       10 |     -1 | f        | b       | A           | f              | t            | ,        |        0 |   16389 |        0 | array_in | array_out | array_recv | array_send | -        | -         | array_typ
analyze | d        | x          | f          |           0 |        -1 |        0 |            0 |               |            | 
(1 row)

所以两个类型又消耗了2个OID。

再次创建T3表,你会发现这个表的OID和T2表的pg_class.oid不是连续的,原因就在此。

postgres=# create table t3(id int);
CREATE TABLE
postgres=# select oid from pg_class where relname='t3';
  oid  
-------
 16390
(1 row)


再次修改控制文件的OID

postgres@digoal-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
postgres@digoal-> pg_resetxlog -o 100 $PGDATA
Transaction log reset
postgres@digoal-> pg_ctl start
server starting
postgres@digoal->  0LOG:  00000: redirecting log output to logging collector process
 0HINT:  Future log output will appear in directory "/data03/pg_log_1921".
 0LOCATION:  SysLogger_Start, syslogger.c:645


创建表时,用了16384这个OID,这个OID在pg_class中一定是唯一的,因为pg_class.oid有唯一约束,

Indexes:
    "pg_class_oid_index" UNIQUE, btree (oid)

前面分析过代码了,PG会处理唯一约束。

postgres@digoal-> psql
psql (9.4.4)
Type "help" for help.
postgres=# create table t4(id int);
CREATE TABLE
postgres=# select oid from pg_class where relname='t4';
  oid  
-------
 16384
(1 row)

在来看一个例子,就是如果OID上没有唯一约束,是可能存进去重复的OID的,如下:

postgres@digoal-> pg_ctl stop -m fast
waiting for server to shut down.... done
server stopped
postgres@digoal-> pg_resetxlog -o 100 $PGDATA
Transaction log reset
postgres@digoal-> pg_ctl start
server starting
postgres@digoal->  0LOG:  00000: redirecting log output to logging collector process
 0HINT:  Future log output will appear in directory "/data03/pg_log_1921".
 0LOCATION:  SysLogger_Start, syslogger.c:645

postgres@digoal-> psql
psql (9.4.4)
Type "help" for help.
postgres=# \d+ t1
                          Table "public.t1"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 id     | integer |           | plain   |              | 
Has OIDs: yes
postgres=# insert into t1 values (100);
INSERT 16384 1
postgres=# select oid,* from t1;
  oid   | id  
--------+-----
 186772 |   1
 186773 |   1
  16384 |   1
  16385 |   1
  16386 |   1
  16384 | 100
(6 rows)

如果t1表的oid上面建了唯一约束,就不会存进去重复值,获取OID也不会报错,因为PG已经帮你处理了。

postgres=# \d+ t6
                          Table "public.t6"
 Column |  Type   | Modifiers | Storage | Stats target | Description 
--------+---------+-----------+---------+--------------+-------------
 id     | integer |           | plain   |              | 
Indexes:
    "i" UNIQUE, btree (oid)
Has OIDs: yes

INSERT 16397 1
postgres=# insert into t6 values (1);
INSERT 16398 1
OID 16399 被跳过.
postgres=# insert into t6 values (1);
INSERT 16400 1
postgres=# insert into t6 values (1);
INSERT 16401 1


最后小结一下,
1.   每个系统表都可以使用约2^32-1个OID(排除16384之前的OID),PostgreSQL 在获取OID时,会判断需要OID的表是否为
表的oid允许重复,
    我们在PostgreSQL中最多可以建多少个表呢?2^32-1吗?
    否。
    因为pg_type里面每个表对应了2个类型,所以最多只能创建2^31-1个表。
2.   凡是with oids的表,只要这个表的oid字段上面包含了唯一约束,PG就会帮你处理nextoid请求,确保不出现违反唯一约束的oid。但是如果oid上面没有唯一索引或唯一约束,oid就有可能重复哦。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值