表中主键的最佳实践是什么?

在设计表时，我养成了一个习惯，即有一个唯一的列，并将其作为主键。这可以通过三种方式实现，具体取决于需求:

自动递增的标识整数列。唯一标识符(GUID) 短字符(x)或整数(或其他相对较小的数字类型)列，可作为行标识符列

数字3将用于相当小的查找，主要是读取表，这些表可能有一个唯一的静态长度字符串代码，或一个数值，如年或其他数字。

在大多数情况下，所有其他表都有一个自动递增的整数或唯一标识符主键。

问题:-)

我最近开始使用一些数据库，这些数据库没有一致的行标识符，而且主键目前聚集在各个列之间。一些例子:

datetime /字符 datetime /整数 datetime / varchar 字符/ nvarchar / nvarchar

这有有效的理由吗?我总是为这些情况定义一个标识符或唯一标识符列。

此外，还有许多根本没有主键的表。如果有的话，合理的理由是什么?

我试图理解为什么桌子被设计成这样，对我来说，它似乎是一个很大的混乱，但也许有很好的理由。

第三个问题在某种程度上帮助我解析答案:在使用多个列组成复合主键的情况下，与代理/人工键相比，这种方法是否有特定的优势?我主要考虑的是性能、维护、管理等方面。

当前回答

表应该一直有一个主键。如果没有，它应该是一个自动递增字段。

有时人们会省略主键，因为他们要传输大量数据，这可能会减慢(取决于数据库)进程。但是，它应该加在它之后。

一些关于链接表的评论，这是正确的，这是一个例外，但是字段应该是FK以保持完整性，并且在某些情况下，如果链接中的重复没有被授权，这些字段也可以是主键…但是要保持简单的形式，因为异常在编程中经常出现，所以应该提供主键来保持数据的完整性。

2008-12-03 15:33:49

其他回答

如果有天然钥匙，通常是最好的。因此，如果datetime/char唯一地标识了行，并且这两部分对行都有意义，那就太好了。

如果只有datetime是有意义的，并且只是附加了char以使其唯一，那么您不妨使用一个identify字段。

2008-12-03 15:34:08

我总是使用自动编号或标识字段。

我曾经为一个客户工作，他使用SSN作为主键，然后由于HIPAA法规被迫更改为“MemberID”，这在更新相关表中的外键时引起了大量问题。坚持一致的标识列标准帮助我在所有项目中避免了类似的问题。

2008-12-03 15:53:28

这只是对一些经常被忽视的东西的额外评论。有时不使用单个代理键作为主键对子表有好处。假设我们有一种设计，允许您在一个数据库中运行多个公司(可能是一个托管解决方案，或者其他什么)。

假设我们有这些表和列:

Company:
  CompanyId   (primary key)

CostCenter:
  CompanyId   (primary key, foreign key to Company)
  CostCentre  (primary key)

CostElement
  CompanyId   (primary key, foreign key to Company)
  CostElement (primary key)

Invoice:
  InvoiceId    (primary key)
  CompanyId    (primary key, in foreign key to CostCentre, in foreign key to CostElement)
  CostCentre   (in foreign key to CostCentre)
  CostElement  (in foreign key to CostElement)

以防最后一点说不通，发票。CompanyId是两个外键的一部分，一个指向CostCentre表，另一个指向CostElement表。主键是(InvoiceId, CompanyId)。

在这个模型中，不可能搞砸并引用来自一个公司的CostElement和来自另一个公司的CostCentre。如果在CostElement和CostCentre表上使用一个代理键作为主键，并且在Invoice表中没有外键关系，那么它就是。

搞砸的机会越少越好。

2008-12-05 10:38:47

如果你真的想阅读关于这个古老争论的所有内容，可以在Stack Overflow上搜索“自然键”。你应该能拿到几页结果。

2008-12-03 16:34:54

主键有什么特别之处?

模式中表的用途是什么?表中键的作用是什么?主键有什么特别之处?围绕主键的讨论似乎忽略了一点，即主键是表的一部分，而表是模式的一部分。对表和表关系最有利的应该驱动所使用的键。

Tables (and table relationships) contain facts about information you wish to record. These facts should be self-contained, meaningful, easily understood, and non-contradictory. From a design perspective, other tables added or removed from a schema should not impact on the table in question. There must be a purpose for storing the data related only to the information itself. Understanding what is stored in a table should not require undergoing a scientific research project. No fact stored for the same purpose should be stored more than once. Keys are a whole or part of the information being recorded which is unique, and the primary key is the specially designated key that is to be the primary access point to the table (i.e. it should be chosen for data consistency and usage, not just insert performance).

ASIDE: The unfortunately side effect of most databases being designed and developed by application programmers (which I am sometimes) is that what is best for the application or application framework often drives the primary key choice for tables. This leads to integer and GUID keys (as these are simple to use for application frameworks) and monolithic table designs (as these reduce the number of application framework objects needed to represent the data in memory). These application driven database design decisions lead to significant data consistency problems when used at scale. Application frameworks designed in this manner naturally lead to table at a time designs. “Partial records” are created in tables and data filled in over time. Multi-table interaction is avoided or when used causes inconsistent data when the application functions improperly. These designs lead to data that is meaningless (or difficult to understand), data spread over tables (you have to look at other tables to make sense of the current table), and duplicated data.

It was said that primary keys should be as small as necessary. I would says that keys should be only as large as necessary. Randomly adding meaningless fields to a table should be avoided. It is even worse to make a key out of a randomly added meaningless field, especially when it destroys the join dependency from another table to the non-primary key. This is only reasonable if there are no good candidate keys in the table, but this occurrence is surely a sign of a poor schema design if used for all tables.

It was also said that primary keys should never change as updating a primary key should always be out of the question. But update is the same as delete followed by insert. By this logic, you should never delete a record from a table with one key and then add another record with a second key. Adding the surrogate primary key does not remove the fact that the other key in the table exists. Updating a non-primary key of a table can destroy the meaning of the data if other tables have a dependency on that meaning through a surrogate key (e.g. a status table with a surrogate key having the status description changed from ‘Processed’ to ‘Cancelled’ would definitely corrupt the data). What should always be out of the question is destroying data meaning.

Having said this, I am grateful for the many poorly designed databases that exist in businesses today (meaningless-surrogate-keyed-data-corrupted-1NF behemoths), because that means there is an endless amount of work for people that understand proper database design. But on the sad side, it does sometimes make me feel like Sisyphus, but I bet he had one heck of a 401k (before the crash). Stay away from blogs and websites for important database design questions. If you are designing databases, look up CJ Date. You can also reference Celko for SQL Server, but only if you hold your nose first. On the Oracle side, reference Tom Kyte.

2013-01-03 18:57:14

表中主键的最佳实践是什么?

推荐文章

最新文章

标签