Blog ala Vidar

SQL, AppFrame and other cool technologies

Primary key != Clustered Index

Leave a comment Posted by Vidar Nordnes on July 10, 2009

There’s been a lot of confusion about this team. What’s the difference between a primary key, unique key, clustered index etc.
For starters, notice the keywords: KEY and INDEX
A key means that it’s used for identifying something. Primary key is the primary identifier for the ROW. A Primary Key MUST be unique, and it can not have NULL-able fields.
A unique key also ensures uniqueness, but can be defined on NULL-able fields. Although, only one NULL pr field (unlike Oracle).
So, what is the primary (and unique) key used for? Making relations between tables. To create a foreign key you first need to create a key on the parent table.
When you create a primary key, it defaults to a clustered index. This doesn’t mean that this is the best way though!
Let’s say we’ve got a Person table:
PersonID NVARCHAR(10)
Name NVARCHAR(200)
Phone VARCHAR(50)
Address ….

and so on.. PersonID will be "NordVida" (first 4 chars in last name and first name). It’s clear that PersonID should be the Primary Key, because it identifies the row.
Putting the Primary Key on this field ensures uniqueness. But, should this be a clustered index? NO!
A clustered index should be ever increasing. What does that mean? It should start on 1, next should be 2, or start with "a" and next should be "b".
PersonID will never be ever increasing, because you add Trygve Haaland (HaalTryg) which is before NordVida, then you add Werner Waage (WaagWern) which is after Trygve etc.
If you had clustered index on PersonID, this means that it has to rearrange the hole "table" (clustered index) EVERY time you insert a record.
Since the clustered indexed-fields are included in all non-clustered indexes, all non-clustered indexes also will be rearranged EVERY TIME you insert a record.

So, what should be the clustered index on this table? Many use an integer field with identity. This is a good solution, as long as you’re not importing/exporting data.
Then the identity column is just a big PAIN IN THE ***. Personally I prefer having a uniqueidentifier field with NEWSEQUENTIALID() as the default. This is unique, ever increasing and VERY nice to deal with when importing/exporting data.

Lets sum up:
My opinion is that Primary keys should be set where you’ve got your "logical keys" (in this case, PersonID). The uniqueidentifier field should be set as clustered index.