Schema: Unnecessary surrogate keys
Applies to ✅ Open Source Edition ✅ Express Edition ✅ Professional Edition ✅ Enterprise Edition
The surrogate key vs natural key discussion is almost as old as SQL itself. Both approaches have their pros and cons, depending on the nature of the table you're designing. In short:
- A surrogate key (e.g. an
UUID) has no business value, and can thus be generated on any data, including not well normalised data.
- A natural key is a true placeholder for the data it represents (e.g. an ISO country code), which can help prevent joins to look up the useful information, as the useful information is already referenced in the foreign keys.
While there is a lot of debate whether a schema should be uniformly designed for consistency reasons (usually all pro surrogate key), or whether a few exceptions on a few tables are possible, there is one exception where a surrogate key is often the wrong choice: Relationship tables! In a relationship table like this:
CREATE TABLE actor ( actor_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, first_name TEXT NOT NULL, last_name TEXT NOT NULL ); CREATE TABLE film ( film_id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, title TEXT NOT NULL ); CREATE TABLE film_actor ( actor_id BIGINT NOT NULL REFERENCES actor, film_id BIGINT NOT NULL REFERENCES film, PRIMARY KEY (actor_id, film_id) );
Schema designers might be tempted to add a
FILM_ACTOR_ID to the
FILM_ACTOR table, but why? There is never any need to reference a many-to-many relationship entry by its surrogate key alone. We always use the foreign keys, which, if the schema is properly normalised, should have at least a
UNIQUE constraint on them. Why not just make that the
This blog post discusses the topic of the cost of useless surrogate keys in relationship tables, from a performance perspective.