PostgreSQL OID 源码分析

最新推荐文章于 2024-06-19 13:44:25 发布

Postgresql中国用户会

最新推荐文章于 2024-06-19 13:44:25 发布

阅读量1.1k

点赞数

分类专栏：转载 PostgreSQL源码分析文章标签： kernel postgresql 代码数据库

转载同时被 2 个专栏收录

108 篇文章 0 订阅

订阅专栏

PostgreSQL源码分析

26 篇文章 0 订阅

订阅专栏

Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大，国内顶级PostgreSQL数据库专家将悉数到场，并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:

Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
Postgres-XL的项目发起人Mason Sharp
pgpool的作者石井达夫(Tatsuo Ishii)
PG-Strom的作者海外浩平(Kaigai Kohei)
Greenplum研发总监姚延栋
周正中(德哥), PostgreSQL中国用户会创始人之一
汪洋，平安科技数据库技术部经理
……

 
 2015年度PG大象会报名地址：http://postgres2015.eventdove.com/
PostgreSQL中国社区： http://postgres.cn/
PostgreSQL专业1群： 3336901（已满）
PostgreSQL专业2群： 100910388
PostgreSQL专业3群： 150657323

PostgreSQL的许多系统表都用到了OID这个数据类型，OID是一个uint类型，最大可以存储的值为2^32-1

* Object ID is a fundamental type in Postgres.

typedef unsigned int Oid;

很多系统表使用OID作为主键，PostgreSQL 使用 GetNewOid函数获取下一个可用的OID，那么怎么保证获取到的OID是可用的呢，并且确保不重复即不违反唯一约束？

例如，我们使用pg_resetxlog可以重置NEXTOid，如果被重置后，又是如何获取一个不会违反唯一约束的OID呢？

看代码最直接，获取OID的底层函数如下，第一次时，一次性获取 VAR_OID_PREFETCH = 8192个作为缓存，

access/transam/varsup.c:#define VAR_OID_PREFETCH 8192

src/backend/access/transam/varsup.c

* GetNewObjectId -- allocate a new OID

* OIDs are generated by a cluster-wide counter. Since they are only 32 bits

* wide, counter wraparound will occur eventually, and therefore it is unwise

* to assume they are unique unless precautions are taken to make them so.

* Hence, this routine should generally not be used directly. The only

* direct callers should be GetNewOid() and GetNewRelFileNode() in

* catalog/catalog.c.

Oid

GetNewObjectId(void)

{

Oid result;

/* safety check, we should never get this far in a HS slave */

if (RecoveryInProgress())

elog(ERROR, "cannot assign OIDs during recovery");

LWLockAcquire(OidGenLock, LW_EXCLUSIVE);

* Check for wraparound of the OID counter. We *must* not return 0

* (InvalidOid); and as long as we have to check that, it seems a good

* idea to skip over everything below FirstNormalObjectId too. (This

* basically just avoids lots of collisions with bootstrap-assigned OIDs

* right after a wrap occurs, so as to avoid a possibly large number of

* iterations in GetNewOid.) Note we are relying on unsigned comparison.

* During initdb, we start the OID generator at FirstBootstrapObjectId, so

* we only wrap if before that point when in bootstrap or standalone mode.

* The first time through this routine after normal postmaster start, the

* counter will be forced up to FirstNormalObjectId. This mechanism

* leaves the OIDs between FirstBootstrapObjectId and FirstNormalObjectId

* available for automatic assignment during initdb, while ensuring they

* will never conflict with user-assigned OIDs.

if (ShmemVariableCache->nextOid < ((Oid) FirstNormalObjectId)) // 处理非常规OID，改为常规的OID，起到保护作用

{

if (IsPostmasterEnvironment)

{

/* wraparound, or first post-initdb assignment, in normal mode */

ShmemVariableCache->nextOid = FirstNormalObjectId;

ShmemVariableCache->oidCount = 0;

}

else

{

/* we may be bootstrapping, so don't enforce the full range */

if (ShmemVariableCache->nextOid < ((Oid) FirstBootstrapObjectId))

{

/* wraparound in standalone mode (unlikely but possible) */

ShmemVariableCache->nextOid = FirstNormalObjectId;

ShmemVariableCache->oidCount = 0;

}

/* If we run out of logged for use oids then we must log more */

if (ShmemVariableCache->oidCount == 0) // 当一次获取的缓存用完后，重新获取8192个oid

{

XLogPutNextOid(ShmemVariableCache->nextOid + VAR_OID_PREFETCH); // 一次获取多个

ShmemVariableCache->oidCount = VAR_OID_PREFETCH; // 将计数器置为8192

}

result = ShmemVariableCache->nextOid;

(ShmemVariableCache->nextOid)++; // 每次使用一个oid后nextOid加1

(ShmemVariableCache->oidCount)--; // 对应的count减1

LWLockRelease(OidGenLock);

return result;

}

16384是第一个数据库启动后，用户可以使用的OID。

src/include/access/transam.h

/* ----------

* Object ID (OID) zero is InvalidOid.

* OIDs 1-9999 are reserved for manual assignment (see the files

* in src/include/catalog/).

* OIDS 10000-16383 are reserved for assignment during initdb

* using the OID generator. (We start the generator at 10000.)

* OIDs beginning at 16384 are assigned from the OID generator

* during normal multiuser operation. (We force the generator up to

* 16384 as soon as we are in normal operation.)

* The choices of 10000 and 16384 are completely arbitrary, and can be moved

* if we run low on OIDs in either category. Changing the macros below

* should be sufficient to do this.

* NOTE: if the OID generator wraps around, we skip over OIDs 0-16383

* and resume with 16384. This minimizes the odds of OID conflict, by not

* reassigning OIDs that might have been assigned during initdb.

* ----------

#define FirstBootstrapObjectId 10000

#define FirstNormalObjectId 16384

下面是获取OID的面向上层的接口，这里有保护作用。

例如我们前面的疑问，如何确保获取的OID不违反唯一约束。

src/backend/catalog/catalog.c

* GetNewOid

* Generate a new OID that is unique within the given relation.

* Caller must have a suitable lock on the relation.

* Uniqueness is promised only if the relation has a unique index on OID.

* This is true for all system catalogs that have OIDs, but might not be

* true for user tables. Note that we are effectively assuming that the

* table has a relatively small number of entries (much less than 2^32)

* and there aren't very long runs of consecutive existing OIDs. Again,

* this is reasonable for system catalogs but less so for user tables.

* Since the OID is not immediately inserted into the table, there is a

* race condition here; but a problem could occur only if someone else

* managed to cycle through 2^32 OIDs and generate the same OID before we

* finish inserting our row. This seems unlikely to be a problem. Note

* that if we had to *commit* the row to end the race condition, the risk

* would be rather higher; therefore we use SnapshotDirty in the test,

* so that we will see uncommitted rows.

Oid

GetNewOid(Relation relation)

{

Oid oidIndex;

/* If relation doesn't have OIDs at all, caller is confused */

Assert(relation->rd_rel->relhasoids);

/* In bootstrap mode, we don't have any indexes to use */

if (IsBootstrapProcessingMode())

return GetNewObjectId();

/* The relcache will cache the identity of the OID index for us */

oidIndex = RelationGetOidIndex(relation); // 获取OID的对象的OID列的唯一索引的pg_class.oid

/* If no OID index, just hand back the next OID counter value */

if (!OidIsValid(oidIndex)) // 如果OID不为InvalidOid ，说明这个需要获取oid的对象的OID列上面没有唯一索引

{

* System catalogs that have OIDs should *always* have a unique OID

* index; we should only take this path for user tables. Give a

* warning if it looks like somebody forgot an index.

if (IsSystemRelation(relation))

elog(WARNING, "generating possibly-non-unique OID for \"%s\"",

RelationGetRelationName(relation));

return GetNewObjectId(); // 不考虑重复问题，因为OID列没有唯一索引

}

/* Otherwise, use the index to find a nonconflicting OID */

return GetNewOidWithIndex(relation, oidIndex, ObjectIdAttributeNumber); // 考虑重复

}

获取 OID的对象的OID列的唯一索引的pg_class.oid，有唯一索引，返回pg_class.oid，没有唯一索引，返回 InvalidOid 。


  
  
   
   
    
    /*
   
   
   
   
    
     * RelationGetOidIndex -- get the pg_class OID of the relation's OID index
   
   
   
   
    
     *
   
   
   
   
    
     * Returns InvalidOid if there is no such index.
   
   
   
   
    
     */
   
   
   
   
    
    Oid
   
   
   
   
    
    RelationGetOidIndex(Relation relation)
   
   
   
   
    
    {
   
   
   
   
    
    	List	   *ilist;
   
   
   
   
    
    

   
   
   
   
    
    	/*
   
   
   
   
    
    	 * If relation doesn't have OIDs at all, caller is probably confused. (We
   
   
   
   
    
    	 * could just silently return InvalidOid, but it seems better to throw an
   
   
   
   
    
    	 * assertion.)
   
   
   
   
    
    	 */
   
   
   
   
    
    	Assert(relation->rd_rel->relhasoids);
   
   
   
   
    
    

   
   
   
   
    
    	if (relation->rd_indexvalid == 0)
   
   
   
   
    
    	{
   
   
   
   
    
    		/* RelationGetIndexList does the heavy lifting. */
   
   
   
   
    
    		ilist = RelationGetIndexList(relation);
   
   
   
   
    
    		list_free(ilist);
   
   
   
   
    
    		Assert(relation->rd_indexvalid != 0);
   
   
   
   
    
    	}
   
   
   
   
    
    

   
   
   
   
    
    	return relation->rd_oidindex;   // 返回OID列对应的唯一索引的pg_class.OID
   
   
   
   
    
    }
   
   
  
  
  
  
   
   

  
  
  
  
   
   
    
    typedef struct RelationData
   
   
   
   
    
    {
   
   
  
  
  
  
   
   ......
  
  
  
  
   
   
    
    	Oid			rd_oidindex;	/* OID of unique index on OID, if any */  
   
   
  
  
  
  
   
   ......
  
  
  
  
   
   }

对于OID列有唯一索引的表，需要调用这个函数来获取OID，确保不会获得违反唯一约束的OID。

例如pg_class.oid列有唯一索引，那么在向pg_class插入数据时，需要使用 GetNewOidWithIndex来获得next oid.

* GetNewOidWithIndex

* Guts of GetNewOid: use the supplied index

* This is exported separately because there are cases where we want to use

* an index that will not be recognized by RelationGetOidIndex: TOAST tables

* have indexes that are usable, but have multiple columns and are on

* ordinary columns rather than a true OID column. This code will work

* anyway, so long as the OID is the index's first column. The caller must

* pass in the actual heap attnum of the OID column, however.

* Caller must have a suitable lock on the relation.

Oid

GetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn)

{

Oid newOid;

SnapshotData SnapshotDirty;

SysScanDesc scan;

ScanKeyData key;

bool collides;

InitDirtySnapshot(SnapshotDirty);

/* Generate new OIDs until we find one not in the table */ // 获取一个不会违反唯一约束的OID

{

CHECK_FOR_INTERRUPTS();

newOid = GetNewObjectId();

ScanKeyInit(&key,

oidcolumn,

BTEqualStrategyNumber, F_OIDEQ,

ObjectIdGetDatum(newOid));

/* see notes above about using SnapshotDirty */

scan = systable_beginscan(relation, indexId, true,

&SnapshotDirty, 1, &key);

collides = HeapTupleIsValid(systable_getnext(scan));

systable_endscan(scan);

} while (collides);

return newOid;

}

例子：


  
  
   
   postgres=# select max(oid) from pg_class;
  
  
  
  
   
     max   
  
  
  
  
   
   --------
  
  
  
  
   
    186766
  
  
  
  
   
   (1 row)

从控制文件获得 NextOID = 186769

postgres@digoal-> pg_controldata

Latest checkpoint's NextOID: 186769

创建一个包含oid的表，并且oid列上面没有建唯一索引。


  
  
   
   postgres=# create table t1(id int) with oids;
  
  
  
  
   
   CREATE TABLE
  
  
  
  
   
   postgres=# insert into t1 values (1);
  
  
  
  
   
   INSERT 186772 1
  
  
  
  
   
   postgres=# select oid,* from t1;
  
  
  
  
   
     oid   | id 
  
  
  
  
   
   --------+----
  
  
  
  
   
    186772 |  1
  
  
  
  
   
   (1 row)
  
  
  
  
   
   postgres=# insert into t1 values (1);
  
  
  
  
   
   INSERT 186773 1
  
  
  
  
   
   postgres=# select oid,* from t1;
  
  
  
  
   
     oid   | id 
  
  
  
  
   
   --------+----
  
  
  
  
   
    186772 |  1
  
  
  
  
   
    186773 |  1
  
  
  
  
   
   (2 rows)

修改控制文件，把nextoid改为100，小于normaloid的一个值。

postgres@digoal-> pg_ctl stop -m fast

waiting for server to shut down.... done

server stopped

postgres@digoal-> pg_resetxlog --help

pg_resetxlog resets the PostgreSQL transaction log.

Usage:

pg_resetxlog [OPTION]... DATADIR

Options:

-e XIDEPOCH set next transaction ID epoch

-f force update to be done

-l XLOGFILE force minimum WAL starting location for new transaction log

-m MXID,MXID set next and oldest multitransaction ID

-n no update, just show what would be done (for testing)

-o OID set next OID

-O OFFSET set next multitransaction offset

-V, --version output version information, then exit

-x XID set next transaction ID

-?, --help show this help, then exit

Report bugs to <pgsql-bugs@postgresql.org>.

postgres@digoal-> pg_resetxlog -o 100 $PGDATA

Transaction log reset

启动数据库后，插入T1表，我们看到获取到的oid是16384即从 FirstNormalObjectId 开始的第一个值。


  
  
   
   postgres=# insert into t1 values (1);
  
  
  
  
   
   INSERT 16384 1
  
  
  
  
   
   postgres=# insert into t1 values (1);
  
  
  
  
   
   INSERT 16385 1
  
  
  
  
   
   postgres=# insert into t1 values (1);
  
  
  
  
   
   INSERT 16386 1

控制文件在做完检查点后会被修改。

postgres=# \q

postgres@digoal-> pg_controldata |grep OID

Latest checkpoint's NextOID: 100

postgres=# checkpoint;

CHECKPOINT

修改后的值= 24576

postgres@digoal-> pg_controldata |grep OID

Latest checkpoint's NextOID: 24576

这个值刚好等于 16384+8192，验证了前面的代码，一次获取8192个值。

正常关闭数据库的时候，这个值又会被修改到真实的值，即 16387 。

如果是非正常关闭，那么预分配的OID会跳掉，不过没有关系，因为OID是循环使用的，以后还有可能被用到跳过的OID。

postgres=# select 16384+8192;

?column?

----------

24576

(1 row)

再次创建一个T2表，查看pg_class.oid


  
  
   
   postgres=# create table t2(id int);
  
  
  
  
   
   
    
    CREATE TABLE
   
   
   
   
    
    postgres=# select oid from pg_class where relname='t2';
   
   
   
   
    
      oid  
   
   
   
   
    
    -------
   
   
   
   
    
     16387
   
   
   
   
    
    (1 row)

因为建表时，默认会自动创建表结构对应的复合类型以及复合类型的数组类型。


  
  
   
   postgres=# select * from pg_type where oid=16389;
  
  
  
  
   
    typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput  | typoutput  | typreceive  |   typsend   | typmodin | typmodout | typan
  
  
  
  
   
   alyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl 
  
  
  
  
   
   ---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+-----------+------------+-------------+-------------+----------+-----------+------
  
  
  
  
   
   ------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------
  
  
  
  
   
    t2      |         2200 |       10 |     -1 | f        | c       | C           | f              | t            | ,        |    16387 |       0 |    16388 | record_in | record_out | record_recv | record_send | -        | -         | -    
  
  
  
  
   
         | d        | x          | f          |           0 |        -1 |        0 |            0 |               |            | 
  
  
  
  
   
   (1 row)
  
  
  
  
   
   postgres=# select * from pg_type where oid=16388;
  
  
  
  
   
    typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput | typoutput | typreceive |  typsend   | typmodin | typmodout |    typana
  
  
  
  
   
   lyze    | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl 
  
  
  
  
   
   ---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+----------+-----------+------------+------------+----------+-----------+----------
  
  
  
  
   
   --------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------
  
  
  
  
   
    _t2     |         2200 |       10 |     -1 | f        | b       | A           | f              | t            | ,        |        0 |   16389 |        0 | array_in | array_out | array_recv | array_send | -        | -         | array_typ
  
  
  
  
   
   analyze | d        | x          | f          |           0 |        -1 |        0 |            0 |               |            | 
  
  
  
  
   
   (1 row)

所以两个类型又消耗了2个OID。

再次创建T3表，你会发现这个表的OID和T2表的pg_class.oid不是连续的，原因就在此。


  
  
   
   postgres=# create table t3(id int);
  
  
  
  
   
   CREATE TABLE
  
  
  
  
   
   postgres=# select oid from pg_class where relname='t3';
  
  
  
  
   
     oid  
  
  
  
  
   
   -------
  
  
  
  
   
    16390
  
  
  
  
   
   (1 row)

再次修改控制文件的OID

postgres@digoal-> pg_ctl stop -m fast

waiting for server to shut down.... done

server stopped

postgres@digoal-> pg_resetxlog -o 100 $PGDATA

Transaction log reset

postgres@digoal-> pg_ctl start

server starting

postgres@digoal-> 0LOG: 00000: redirecting log output to logging collector process

0HINT: Future log output will appear in directory "/data03/pg_log_1921".

0LOCATION: SysLogger_Start, syslogger.c:645

创建表时，用了16384这个OID，这个OID在pg_class中一定是唯一的，因为pg_class.oid有唯一约束，

Indexes:

"pg_class_oid_index" UNIQUE, btree (oid)

前面分析过代码了，PG会处理唯一约束。


   
   
    
    postgres@digoal-> psql
   
   
   
   
    
    psql (9.4.4)
   
   
   
   
    
    Type "help" for help.
   
   
   
   
    
    postgres=# create table t4(id int);
   
   
   
   
    
    CREATE TABLE
   
   
   
   
    
    postgres=# select oid from pg_class where relname='t4';
   
   
   
   
    
      oid  
   
   
   
   
    
    -------
   
   
   
   
    
     16384
   
   
   
   
    
    (1 row)

在来看一个例子，就是如果OID上没有唯一约束，是可能存进去重复的OID的，如下：

postgres@digoal-> pg_ctl stop -m fast

waiting for server to shut down.... done

server stopped

postgres@digoal-> pg_resetxlog -o 100 $PGDATA

Transaction log reset

postgres@digoal-> pg_ctl start

server starting

postgres@digoal-> 0LOG: 00000: redirecting log output to logging collector process

0HINT: Future log output will appear in directory "/data03/pg_log_1921".

0LOCATION: SysLogger_Start, syslogger.c:645

postgres@digoal-> psql

psql (9.4.4)

Type "help" for help.

postgres=# \d+ t1

Table "public.t1"

--------+---------+-----------+---------+--------------+-------------

Has OIDs: yes

postgres=# insert into t1 values (100);

INSERT 16384 1

postgres=# select oid,* from t1;

oid | id

--------+-----

186772 | 1

186773 | 1

16384 | 1

16385 | 1

16386 | 1

16384 | 100

(6 rows)

如果t1表的oid上面建了唯一约束，就不会存进去重复值，获取OID也不会报错，因为PG已经帮你处理了。


    
    
     
     
      
      
       
       postgres=# \d+ t6
      
      
      
      
       
                                 Table "public.t6"
      
      
      
      
       
        Column |  Type   | Modifiers | Storage | Stats target | Description 
      
      
      
      
       
       --------+---------+-----------+---------+--------------+-------------
      
      
      
      
       
        id     | integer |           | plain   |              | 
      
      
      
      
       
       Indexes:
      
      
      
      
       
           "i" UNIQUE, btree (oid)
      
      
      
      
       
       Has OIDs: yes
      
      
     
     
     
     
      
      

     
     
     
     
      
      INSERT 16397 1
     
     
     
     
      
      postgres=# insert into t6 values (1);
     
     
     
     
      
      INSERT 16398 1
     
     
     
     
      
      OID 16399 被跳过.
     
     
     
     
      
      postgres=# insert into t6 values (1);
     
     
     
     
      
      INSERT 16400 1
     
     
     
     
      
      postgres=# insert into t6 values (1);
     
     
     
     
      
      INSERT 16401 1