Francais | English | Espanõl

Surrogate key

From Wikipedia, the free encyclopedia

Jump to: navigation, search

Contents

[edit] History

The concept of a surrogate was first proposed in a conference paper by Hall, Owlett and Todd and the proceedings published in Nijjsen (1976). The purpose of a surrogate is to uniquely identify any entity in the modelled world. Furthermore, it is a number or string that has a unique value throughout the entire database system. It can only be generated by the database system and cannot be changed by the user or application program; it must not even be visible by them. The surrogate concept was influential in the development of the Object identifier, which is similar but not identical. See Wieringa and de Jonge (1991) for an analysis of the differences.

An important distinction exists between a surrogate and a database primary key, depending on whether the database is a current database or a temporal database. A current database stores only currently valid data, therefore there is a one-to-one association between a surrogate in the modelled world and an primary key of some object in the database. However, in a temporal database there is a many-to-one association between primary keys and a surrogate; several objects in the database correspond to a single entity in the modelled world.

From here on we shall use the terms surrogate and surrogate key interchangeably.

[edit] Surrogates in practice

In a current database, the surrogate key is the primary key, generated by the database management system and not derived from any application data in the database. The only significance of the surrogate key is to act as the primary key.

A surrogate key is frequently a sequential number (e.g. a Sybase "identity column" or an Oracle SEQUENCE) but doesn't have to be. Having the key independent of all other columns insulates the database relationships from changes in data values or database design (making your database more agile) and guarantees uniqueness.

In a temporal database, it is necessary to distinguish between the surrogate key and the primary key. Typically, every row would have both a primary key and a surrogate key. The primary key identifies the unique row in the database, the surrogate key identifies the unique entity in the modelled world; these two keys are not the same. For example, table Staff may contain two rows for "John Smith", one row when he was employed between 1990 to 1999, another row when he was employed between 2001 and 2006. The surrogate key is identical (non-unique) in both rows however the primary key will be unique.

Some database designers use surrogate keys religiously regardless of the suitability of other candidate keys, while others will use a key already present in the data, if there is one.

"Surrogate key" may also be called "System-generated key", "Database Sequence number", "Synthetic key", "Technical key" or an "Arbitrary, unique identifier" in practice, though some of these terms describe the way of generating new surrogate key values rather than the surrogate nature of the key, and are thus technically not synonyms.

Some other options for surrogate keys are:

[edit] Advantages of Surrogate Keys

Immutability: generally surrogate keys do not change while the row exists. This has two advantages:

  • Database applications won't lose their "handle" on the row because the data changes;
  • Many database systems do not have good support for cascading updates of keys across foreign key relations. This results in difficulty in modifying the key data when natural keys are used.

Performance: often surrogate keys are composed of some compact data type, such as four-byte integers. In theory, this allows the database system to perform operations and index this value much faster than it could multiple columns or large data types. In many cases, however, this is premature optimization as no real overall performance is gained.

Compatibility: several database application development systems, drivers, and object-relational mapping systems, such as Ruby on Rails or Hibernate, depend on the use of integer or GUID surrogate keys in order to support database-system-agnostic operation and object-to-row mapping.

[edit] Disadvantages of Surrogate Keys

Disassociation: because the surrogate key is completely unrelated to the data of the row to which it is attached, it is possible for the key to become disassociated with that row, or confused with a surrogate key from another row. This can result in persistent data loss bugs in database applications which are extremely hard to trace. Another problem with disassociated keys is that SQL optimisation can be very difficult because there is no semantic data in the indexes to help the optimiser make sensible judgements.

Normalization: in poor database designs, the presence of a surrogate key can cause the developer or database administrator to forget to establish, or accidentally remove, the natural key of the table. This results in a situation where duplicate rows in the table are impossible to identify, and garbage gets introduced into the database.

Business Process Modelling: since surrogate keys are not contextually meaningful, substantial flaws can occur when initiallly mapping business requirements to a data schema. This is different than a natural key that intrinically incorporates one or more business rules.

This article was originally based on material from the Free On-line Dictionary of Computing, which is licensed under the GFDL.

[edit] See also

[edit] References

Nijssen, G.M. (1976). Modelling in Data Base Management Systems. North-Holland Pub. Co.. ISBN 0720404592.

Carter, Breck. Intelligent Versus Surrogate Keys. Retrieved on 2006-12-03.

Berkus, Josh. Database Soup: Primary Keyvil, Part I. Retrieved on 2006-12-03.


Topics in database management systems (DBMS)

view  talk  edit</span>  )

Concepts
Database | Database model | Relational database | Relational model | Relational algebra | Primary key - Foreign key - Surrogate key - Superkey
Database normalization | Referential integrity | Relational DBMS | Distributed DBMS | ACID

Objects
Trigger | View | Table | Cursor | Log | Transaction | Index | Stored procedure | Partition

Topics in SQL
Select | Insert | Update | Merge | Delete | Join | Union | Create | Drop
Comparison of syntax

Implementations of database management systems

Types of implementations
Relational | Flat file | Deductive | Dimensional | Hierarchical | Object oriented | Temporal

Products
Caché | db4o | dBASE | Firebird | Helix database | DB2 | Informix | Ingres | InterBase | Microsoft SQL Server | MySQL | OpenLink Virtuoso | Oracle | PostgreSQL | SQLite | Sybase IQ | Sybase | Teradata | Visual FoxPro | Comparison - relational | Comparison - object-relational

Components
Query language | Query optimizer | Query plan | ODBC | JDBC
Lists
List of object-oriented database management systems
List of relational database management systems


de:Surrogatschlüssel ru:Суррогатный ключ

Personal tools