与自动递增的数字相比,电子邮件地址是一个糟糕的初选候选人吗?

我们的web应用程序需要电子邮件地址在系统中是唯一的。所以,我想到使用电子邮件地址为主键。然而,我的同事认为字符串比较将比整数比较慢。

这是一个有效的理由不使用电子邮件为主键吗?

我们使用的是PostgreSQL。


当前回答

使用电子邮件地址作为主键的缺点:

Slower when doing joins. Any other record with a posted foreign key now has a larger value, taking up more disk space. (Given the cost of disk space today, this is probably a trivial issue, except to the extent that the record now takes longer to read. See #1.) An email address could change, which forces all records using this as a foreign key to be updated. As email address don't change all that often, the performance problem is probably minor. The bigger problem is that you have to make sure to provide for it. If you have to write the code, this is more work and introduces the possibility of bugs. If your database engine supports "on update cascade", it's a minor issue.

使用电邮地址作主键的优点:

You may be able to completely eliminate some joins. If all you need from the "master record" is the email address, then with an abstract integer key you would have to do a join to retrieve it. If the key is the email address, then you already have it and the join is unnecessary. Whether this helps you any depends on how often this situation comes up. When you are doing ad hoc queries, it's easy for a human being to see what master record is being referenced. This can be a big help when trying to track down data problems. You almost certainly will need an index on the email address anyway, so making it the primary key eliminates one index, thus improving the performance of inserts as they now have only one index to update instead of two.

在我看来,这两种情况都不是十拿九稳的。当有实用的键时,我倾向于使用自然键,因为它们更容易使用,而且在大多数情况下,缺点并不太重要。

其他回答

就我个人而言,我在设计数据库时不使用任何信息作为主键,因为我很可能在以后需要更改任何信息。我提供主键的唯一原因是,它方便从客户端执行大多数SQL操作,我的选择一直是自动增加整数类型。

我还要指出,电子邮件是一个糟糕的选择,使一个独特的领域,有些人,甚至是小企业共享一个电子邮件地址。和电话号码一样,电子邮件也可以重复使用。Jsmith@somecompany.com很容易属于一年前的约翰·史密斯,两年后的茱莉亚·史密斯。

电子邮件的另一个问题是它们经常变化。如果你用这个键连接到其他表,那么你也必须更新其他表,当整个客户公司更改他们的电子邮件时,这可能会对性能造成相当大的影响(我曾经见过这种情况)。

不要使用电子邮件地址为主键,保持电子邮件的唯一性,但不要使用它为主键,使用用户id或用户名为主键

字符串比较比int比较慢。但是,如果您只是使用电子邮件地址从数据库检索用户,那么这并不重要。如果您有多个连接的复杂查询,那么这很重要。

如果在多个表中存储有关用户的信息,则用户表的外键将是电子邮件地址。这意味着您将多次存储电子邮件地址。

您应该使用整数主键。如果你需要电子邮件列是唯一的,为什么不简单地在该列上设置一个唯一索引呢?