Postgres2015全国用户大会将于11月20至21日在北京丽亭华苑酒店召开。本次大会嘉宾阵容强大,国内顶级PostgreSQL数据库专家将悉数到场,并特邀欧洲、俄罗斯、日本、美国等国家和地区的数据库方面专家助阵:
- Postgres-XC项目的发起人铃木市一(SUZUKI Koichi)
- Postgres-XL的项目发起人Mason Sharp
- pgpool的作者石井达夫(Tatsuo Ishii)
- PG-Strom的作者海外浩平(Kaigai Kohei)
- Greenplum研发总监姚延栋
- 周正中(德哥), PostgreSQL中国用户会创始人之一
- 汪洋,平安科技数据库技术部经理
- ……
|
|
PostgreSQL的许多系统表都用到了OID这个数据类型,OID是一个uint类型,最大可以存储的值为2^32-1
/** Object ID is a fundamental type in Postgres.*/typedef unsigned int Oid;
很多系统表使用OID作为主键,PostgreSQL 使用
GetNewOid函数获取下一个可用的OID,
那么怎么保证获取到的OID是可用的呢,并且确保不重复即不违反唯一约束?
例如,我们使用pg_resetxlog可以重置NEXTOid,如果被重置后,又是如何获取一个不会违反唯一约束的OID呢?
看代码最直接,获取OID的底层函数如下,第一次时,一次性获取
VAR_OID_PREFETCH = 8192个作为缓存,
access/transam/varsup.c:#define VAR_OID_PREFETCH 8192
src/backend/access/transam/varsup.c
/** GetNewObjectId -- allocate a new OID** OIDs are generated by a cluster-wide counter. Since they are only 32 bits* wide, counter wraparound will occur eventually, and therefore it is unwise* to assume they are unique unless precautions are taken to make them so.* Hence, this routine should generally not be used directly. The only* direct callers should be GetNewOid() and GetNewRelFileNode() in* catalog/catalog.c.*/OidGetNewObjectId(void){Oid result;
/* safety check, we should never get this far in a HS slave */if (RecoveryInProgress())elog(ERROR, "cannot assign OIDs during recovery");
LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
/** Check for wraparound of the OID counter. We *must* not return 0* (InvalidOid); and as long as we have to check that, it seems a good* idea to skip over everything below FirstNormalObjectId too. (This* basically just avoids lots of collisions with bootstrap-assigned OIDs* right after a wrap occurs, so as to avoid a possibly large number of* iterations in GetNewOid.) Note we are relying on unsigned comparison.** During initdb, we start the OID generator at FirstBootstrapObjectId, so* we only wrap if before that point when in bootstrap or standalone mode.* The first time through this routine after normal postmaster start, the* counter will be forced up to FirstNormalObjectId. This mechanism* leaves the OIDs between FirstBootstrapObjectId and FirstNormalObjectId* available for automatic assignment during initdb, while ensuring they* will never conflict with user-assigned OIDs.*/if (ShmemVariableCache->nextOid < ((Oid) FirstNormalObjectId)) // 处理非常规OID,改为常规的OID,起到保护作用{if (IsPostmasterEnvironment){/* wraparound, or first post-initdb assignment, in normal mode */ShmemVariableCache->nextOid = FirstNormalObjectId;ShmemVariableCache->oidCount = 0;}else{/* we may be bootstrapping, so don't enforce the full range */if (ShmemVariableCache->nextOid < ((Oid) FirstBootstrapObjectId)){/* wraparound in standalone mode (unlikely but possible) */ShmemVariableCache->nextOid = FirstNormalObjectId;ShmemVariableCache->oidCount = 0;}}}
/* If we run out of logged for use oids then we must log more */if (ShmemVariableCache->oidCount == 0) // 当一次获取的缓存用完后,重新获取8192个oid{XLogPutNextOid(ShmemVariableCache->nextOid + VAR_OID_PREFETCH); // 一次获取多个ShmemVariableCache->oidCount = VAR_OID_PREFETCH; // 将计数器置为8192}
result = ShmemVariableCache->nextOid;
(ShmemVariableCache->nextOid)++; // 每次使用一个oid后nextOid加1(ShmemVariableCache->oidCount)--; // 对应的count减1
LWLockRelease(OidGenLock);
return result;}
16384是第一个数据库启动后,用户可以使用的OID。
src/include/access/transam.h
/* ----------* Object ID (OID) zero is InvalidOid.** OIDs 1-9999 are reserved for manual assignment (see the files* in src/include/catalog/).** OIDS 10000-16383 are reserved for assignment during initdb* using the OID generator. (We start the generator at 10000.)** OIDs beginning at 16384 are assigned from the OID generator* during normal multiuser operation. (We force the generator up to* 16384 as soon as we are in normal operation.)** The choices of 10000 and 16384 are completely arbitrary, and can be moved* if we run low on OIDs in either category. Changing the macros below* should be sufficient to do this.** NOTE: if the OID generator wraps around, we skip over OIDs 0-16383* and resume with 16384. This minimizes the odds of OID conflict, by not* reassigning OIDs that might have been assigned during initdb.* ----------*/#define FirstBootstrapObjectId 10000#define FirstNormalObjectId 16384
下面是获取OID的面向上层的接口,这里有保护作用。
例如我们前面的疑问,如何确保获取的OID不违反唯一约束。
src/backend/catalog/catalog.c
/** GetNewOid* Generate a new OID that is unique within the given relation.** Caller must have a suitable lock on the relation.** Uniqueness is promised only if the relation has a unique index on OID.* This is true for all system catalogs that have OIDs, but might not be* true for user tables. Note that we are effectively assuming that the* table has a relatively small number of entries (much less than 2^32)* and there aren't very long runs of consecutive existing OIDs. Again,* this is reasonable for system catalogs but less so for user tables.** Since the OID is not immediately inserted into the table, there is a* race condition here; but a problem could occur only if someone else* managed to cycle through 2^32 OIDs and generate the same OID before we* finish inserting our row. This seems unlikely to be a problem. Note* that if we had to *commit* the row to end the race condition, the risk* would be rather higher; therefore we use SnapshotDirty in the test,* so that we will see uncommitted rows.*/OidGetNewOid(Relation relation){Oid oidIndex;
/* If relation doesn't have OIDs at all, caller is confused */Assert(relation->rd_rel->relhasoids);
/* In bootstrap mode, we don't have any indexes to use */if (IsBootstrapProcessingMode())return GetNewObjectId();
/* The relcache will cache the identity of the OID index for us */oidIndex = RelationGetOidIndex(relation); // 获取OID的对象的OID列的唯一索引的pg_class.oid
/* If no OID index, just hand back the next OID counter value */if (!OidIsValid(oidIndex)) // 如果OID不为InvalidOid ,说明这个需要获取oid的对象的OID列上面没有唯一索引{/** System catalogs that have OIDs should *always* have a unique OID* index; we should only take this path for user tables. Give a* warning if it looks like somebody forgot an index.*/if (IsSystemRelation(relation))elog(WARNING, "generating possibly-non-unique OID for \"%s\"",RelationGetRelationName(relation));
return GetNewObjectId(); // 不考虑重复问题,因为OID列没有唯一索引}
/* Otherwise, use the index to find a nonconflicting OID */return GetNewOidWithIndex(relation, oidIndex, ObjectIdAttributeNumber); // 考虑重复}
获取
OID的对象的OID列的唯一索引的pg_class.oid,有唯一索引,返回pg_class.oid,没有唯一索引,返回
InvalidOid
。
/** RelationGetOidIndex -- get the pg_class OID of the relation's OID index** Returns InvalidOid if there is no such index.*/OidRelationGetOidIndex(Relation relation){List *ilist;
/** If relation doesn't have OIDs at all, caller is probably confused. (We* could just silently return InvalidOid, but it seems better to throw an* assertion.)*/Assert(relation->rd_rel->relhasoids);
if (relation->rd_indexvalid == 0){/* RelationGetIndexList does the heavy lifting. */ilist = RelationGetIndexList(relation);list_free(ilist);Assert(relation->rd_indexvalid != 0);}
return relation->rd_oidindex; // 返回OID列对应的唯一索引的pg_class.OID}
typedef struct RelationData{......Oid rd_oidindex; /* OID of unique index on OID, if any */......}
对于OID列有唯一索引的表,需要调用这个函数来获取OID,确保不会获得违反唯一约束的OID。
例如pg_class.oid列有唯一索引,那么在向pg_class插入数据时,需要使用
GetNewOidWithIndex来获得next oid.
/** GetNewOidWithIndex* Guts of GetNewOid: use the supplied index** This is exported separately because there are cases where we want to use* an index that will not be recognized by RelationGetOidIndex: TOAST tables* have indexes that are usable, but have multiple columns and are on* ordinary columns rather than a true OID column. This code will work* anyway, so long as the OID is the index's first column. The caller must* pass in the actual heap attnum of the OID column, however.** Caller must have a suitable lock on the relation.*/OidGetNewOidWithIndex(Relation relation, Oid indexId, AttrNumber oidcolumn){Oid newOid;SnapshotData SnapshotDirty;SysScanDesc scan;ScanKeyData key;bool collides;
InitDirtySnapshot(SnapshotDirty);
/* Generate new OIDs until we find one not in the table */ // 获取一个不会违反唯一约束的OIDdo{CHECK_FOR_INTERRUPTS();
newOid = GetNewObjectId();
ScanKeyInit(&key,oidcolumn,BTEqualStrategyNumber, F_OIDEQ,ObjectIdGetDatum(newOid));
/* see notes above about using SnapshotDirty */scan = systable_beginscan(relation, indexId, true,&SnapshotDirty, 1, &key);
collides = HeapTupleIsValid(systable_getnext(scan));
systable_endscan(scan);} while (collides);
return newOid;}
例子:
postgres=# select max(oid) from pg_class;max--------186766(1 row)
从控制文件获得 NextOID =
186769
postgres@digoal-> pg_controldataLatest checkpoint's NextOID: 186769
创建一个包含oid的表,并且oid列上面没有建唯一索引。
postgres=# create table t1(id int) with oids;CREATE TABLEpostgres=# insert into t1 values (1);INSERT 186772 1postgres=# select oid,* from t1;oid | id--------+----186772 | 1(1 row)postgres=# insert into t1 values (1);INSERT 186773 1postgres=# select oid,* from t1;oid | id--------+----186772 | 1186773 | 1(2 rows)
修改控制文件,把nextoid改为100,小于normaloid的一个值。
postgres@digoal-> pg_ctl stop -m fastwaiting for server to shut down.... doneserver stopped
postgres@digoal-> pg_resetxlog --helppg_resetxlog resets the PostgreSQL transaction log.
Usage:pg_resetxlog [OPTION]... DATADIR
Options:-e XIDEPOCH set next transaction ID epoch-f force update to be done-l XLOGFILE force minimum WAL starting location for new transaction log-m MXID,MXID set next and oldest multitransaction ID-n no update, just show what would be done (for testing)-o OID set next OID-O OFFSET set next multitransaction offset-V, --version output version information, then exit-x XID set next transaction ID-?, --help show this help, then exit
Report bugs to <pgsql-bugs@postgresql.org>.postgres@digoal-> pg_resetxlog -o 100 $PGDATATransaction log reset
启动数据库后,插入T1表,我们看到获取到的oid是16384即从
FirstNormalObjectId 开始的第一个值。
postgres=# insert into t1 values (1);INSERT 16384 1postgres=# insert into t1 values (1);INSERT 16385 1postgres=# insert into t1 values (1);INSERT 16386 1
控制文件在做完检查点后会被修改。
postgres=# \qpostgres@digoal-> pg_controldata |grep OIDLatest checkpoint's NextOID: 100
postgres=# checkpoint;CHECKPOINT
修改后的值=
24576
postgres@digoal-> pg_controldata |grep OIDLatest checkpoint's NextOID: 24576
这个值刚好等于
16384+8192,验证了前面的代码,一次获取8192个值。
正常关闭数据库的时候,这个值又会被修改到真实的值,即
16387
。
如果是非正常关闭,那么预分配的OID会跳掉,不过没有关系,因为OID是循环使用的,以后还有可能被用到跳过的OID。
postgres=# select 16384+8192;?column?----------24576(1 row)
再次创建一个T2表,查看pg_class.oid
postgres=# create table t2(id int);CREATE TABLEpostgres=# select oid from pg_class where relname='t2';oid-------16387(1 row)
因为建表时,默认会自动创建表结构对应的复合类型以及复合类型的数组类型。
postgres=# select * from pg_type where oid=16389;typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+-----------+------------+-------------+-------------+----------+-----------+------------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------t2 | 2200 | 10 | -1 | f | c | C | f | t | , | 16387 | 0 | 16388 | record_in | record_out | record_recv | record_send | - | - | -| d | x | f | 0 | -1 | 0 | 0 | | |(1 row)postgres=# select * from pg_type where oid=16388;typname | typnamespace | typowner | typlen | typbyval | typtype | typcategory | typispreferred | typisdefined | typdelim | typrelid | typelem | typarray | typinput | typoutput | typreceive | typsend | typmodin | typmodout | typanalyze | typalign | typstorage | typnotnull | typbasetype | typtypmod | typndims | typcollation | typdefaultbin | typdefault | typacl---------+--------------+----------+--------+----------+---------+-------------+----------------+--------------+----------+----------+---------+----------+----------+-----------+------------+------------+----------+-----------+------------------+----------+------------+------------+-------------+-----------+----------+--------------+---------------+------------+--------_t2 | 2200 | 10 | -1 | f | b | A | f | t | , | 0 | 16389 | 0 | array_in | array_out | array_recv | array_send | - | - | array_typanalyze | d | x | f | 0 | -1 | 0 | 0 | | |(1 row)
所以两个类型又消耗了2个OID。
再次创建T3表,你会发现这个表的OID和T2表的pg_class.oid不是连续的,原因就在此。
postgres=# create table t3(id int);CREATE TABLEpostgres=# select oid from pg_class where relname='t3';oid-------16390(1 row)
再次修改控制文件的OID
postgres@digoal-> pg_ctl stop -m fastwaiting for server to shut down.... doneserver stoppedpostgres@digoal-> pg_resetxlog -o 100 $PGDATATransaction log resetpostgres@digoal-> pg_ctl startserver startingpostgres@digoal-> 0LOG: 00000: redirecting log output to logging collector process0HINT: Future log output will appear in directory "/data03/pg_log_1921".0LOCATION: SysLogger_Start, syslogger.c:645
创建表时,用了16384这个OID,这个OID在pg_class中一定是唯一的,因为pg_class.oid有唯一约束,
Indexes:"pg_class_oid_index" UNIQUE, btree (oid)
前面分析过代码了,PG会处理唯一约束。
postgres@digoal-> psqlpsql (9.4.4)Type "help" for help.postgres=# create table t4(id int);CREATE TABLEpostgres=# select oid from pg_class where relname='t4';oid-------16384(1 row)
在来看一个例子,就是如果OID上没有唯一约束,是可能存进去重复的OID的,如下:
postgres@digoal-> pg_ctl stop -m fastwaiting for server to shut down.... doneserver stoppedpostgres@digoal-> pg_resetxlog -o 100 $PGDATATransaction log resetpostgres@digoal-> pg_ctl startserver startingpostgres@digoal-> 0LOG: 00000: redirecting log output to logging collector process0HINT: Future log output will appear in directory "/data03/pg_log_1921".0LOCATION: SysLogger_Start, syslogger.c:645
postgres@digoal-> psqlpsql (9.4.4)Type "help" for help.postgres=# \d+ t1Table "public.t1"Column | Type | Modifiers | Storage | Stats target | Description--------+---------+-----------+---------+--------------+-------------id | integer | | plain | |Has OIDs: yespostgres=# insert into t1 values (100);INSERT 16384 1postgres=# select oid,* from t1;oid | id--------+-----186772 | 1186773 | 116384 | 116385 | 116386 | 116384 | 100(6 rows)
如果t1表的oid上面建了唯一约束,就不会存进去重复值,获取OID也不会报错,因为PG已经帮你处理了。
postgres=# \d+ t6Table "public.t6"Column | Type | Modifiers | Storage | Stats target | Description--------+---------+-----------+---------+--------------+-------------id | integer | | plain | |Indexes:"i" UNIQUE, btree (oid)Has OIDs: yes
INSERT 16397 1postgres=# insert into t6 values (1);INSERT 16398 1OID 16399 被跳过.postgres=# insert into t6 values (1);INSERT 16400 1postgres=# insert into t6 values (1);INSERT 16401 1
最后小结一下,
1. 每个系统表都可以使用约2^32-1个OID(排除16384之前的OID),PostgreSQL 在获取OID时,会判断需要OID的表是否为
表的oid允许重复,
我们在PostgreSQL中最多可以建多少个表呢?2^32-1吗?
否。
因为pg_type里面每个表对应了2个类型,所以最多只能创建2^31-1个表。
2. 凡是with oids的表,只要这个表的oid字段上面包含了唯一约束,PG就会帮你处理nextoid请求,确保不出现违反唯一约束的oid。但是如果oid上面没有唯一索引或唯一约束,oid就有可能重复哦。