Primary Keys: Identity vs. Guids
There always seems to be a great debate with regard to what data type will be used as the primary key for tables. The debate is mostly centered on whatever we should use identity fields vs Guids.
Guids has the advantage that they are globally unique (duh!), so they are a good solution for the cases where you need to merge data from several sources. Identity fields are easier to understand (Quick, what is the difference between 8833075D-8861-4324-8814-421BD5F04C7D and AD7FF558-DFCB-4354-9D68-C1DD926BC22A) and to debug since you can easily see the difference between concecutive rows. They can cause problems in merge scenarios, since it is entirely all too possible that the identity columns from different machines will be identical.
Another, bigger, problem with Guids is that they are random by nature, which mean that using them as a clustered primary key has distinct disadvantages in high insert rate scenarios. The problem is that two concecutive rows will be put in two different pages, assuming an even distribution of the Guids, in high insert scenarios the I/O generated by using it can become the most significant cost of the whole operation. SQL Server will constantly load pages to memory and discard them when it run out of room in the cache.
In SQL Server 2005 there is a better solution for this, although I have not heard anyone talks about it. Instead of using Guids, you can use Sequencial Guids. What does this mean? SQL Server 2005 has a new functions that you can use for creating Guids, called newsequentialid(). This function can generate sequencial (per machine) guids. You get vastly improved performance for high insert rate scenarios, since all your inserts are likely to go into a small set of pages amd you can safely merge between multiply machines, since the sequencial Guids are still globally unique.
Here are a couple that I generated: 13D3AFD4-CDCA-DA11-9389-000C6EFB7322 and 14D3AFD4-CDCA-DA11-9389-000C6EFB7322
One thing to note, though: Don't use Sequencial Guids as a session id, since then an attacker knows what will be the next id generated (obviously) and try to use it.
Comments
Comment preview