-----------------------------------------------------------------------------------------------------------
Database primary keys
Selecting a primary key
The candidate key is a column or set of columns that could be used to identify a
particular row in a table. To become a primary key, a candidate key must satisfy
the following properties:
■ Its value (for any column of the candidate key) is never null.
■ Each row has a unique value.
■ The value of a particular row never changes.
If a table has only one identifying attribute, it’s, by definition, the primary key.
However, several columns or combinations of columns may satisfy these properties
for a particular table; you choose between candidate keys to decide the best
primary key for the table. Candidate keys not chosen as the primary key should be
declared as unique keys in the database.
Many legacy SQL data models use natural primary keys. A natural key is a key
with business meaning: an attribute or combination of attributes that is unique by
virtue of its business semantics. Examples of natural keys are the U.S. Social Security
Number and Australian Tax File Number. Distinguishing natural keys is simple:
If a candidate key attribute has meaning outside the database context, it’s a
natural key, whether or not it’s automatically generated. Think about the application
users: If they refer to a key attribute when talking about and working with the
application, it’s a natural key.
Experience has shown that natural keys almost always cause problems in the
long run. A good primary key must be unique, constant, and required (never null
or unknown). Few entity attributes satisfy these requirements, and some that do
can’t be efficiently indexed by SQL databases (although this is an implementation
detail and shouldn’t be the primary motivation for or against a particular key). In addition,
you should make certain that a candidate key definition can never
change throughout the lifetime of the database before making it a primary key.
Changing the value (or even definition) of a primary key, and all foreign keys that
refer to it, is a frustrating task. Furthermore, natural candidate keys can often be
found only by combining several columns in a composite natural key. These composite
keys, although certainly appropriate for some relations (like a link table in
a many-to-many relationship), usually make maintenance, ad-hoc queries, and
schema evolution much more difficult.
For these reasons, we strongly recommend that you consider synthetic identifiers,
also called surrogate keys. Surrogate keys have no business meaning—they’re
unique values generated by the database or application. Application users ideally
don’t see or refer to these key values; they’re part of the system internals. Introducing
a surrogate key column is also appropriate in a common situation: If there
are no candidate keys, a table is by definition not a relation as defined by the relational
model—it permits duplicate rows—and so you have to add a surrogate key
column. There are a number of well-known approaches to generating surrogate
key values.