与自动递增的数字相比,电子邮件地址是一个糟糕的初选候选人吗?

我们的web应用程序需要电子邮件地址在系统中是唯一的。所以,我想到使用电子邮件地址为主键。然而,我的同事认为字符串比较将比整数比较慢。

这是一个有效的理由不使用电子邮件为主键吗?

我们使用的是PostgreSQL。


当前回答

是的,如果您使用整数来代替会更好。您还可以将电子邮件列设置为唯一约束。

是这样的:

CREATE TABLE myTable(
    id integer primary key,
    email text UNIQUE
);

其他回答

您可能需要考虑任何适用的数据法规。电子邮件是个人信息,例如,如果你的用户是欧盟公民,那么根据GDPR,他们可以指示你从你的记录中删除他们的信息(记住,无论你在哪个国家,这都适用)。

如果出于参考完整性或审计等历史原因,需要将记录本身保存在数据库中,则使用代理键将允许您将所有个人数据字段设置为NULL。如果他们的个人数据是主键,这显然不那么容易

是的,如果您使用整数来代替会更好。您还可以将电子邮件列设置为唯一约束。

是这样的:

CREATE TABLE myTable(
    id integer primary key,
    email text UNIQUE
);

是的,这是一个糟糕的主键,因为你的用户会想要更新他们的电子邮件地址。

使用电子邮件地址作为主键的缺点:

Slower when doing joins. Any other record with a posted foreign key now has a larger value, taking up more disk space. (Given the cost of disk space today, this is probably a trivial issue, except to the extent that the record now takes longer to read. See #1.) An email address could change, which forces all records using this as a foreign key to be updated. As email address don't change all that often, the performance problem is probably minor. The bigger problem is that you have to make sure to provide for it. If you have to write the code, this is more work and introduces the possibility of bugs. If your database engine supports "on update cascade", it's a minor issue.

使用电邮地址作主键的优点:

You may be able to completely eliminate some joins. If all you need from the "master record" is the email address, then with an abstract integer key you would have to do a join to retrieve it. If the key is the email address, then you already have it and the join is unnecessary. Whether this helps you any depends on how often this situation comes up. When you are doing ad hoc queries, it's easy for a human being to see what master record is being referenced. This can be a big help when trying to track down data problems. You almost certainly will need an index on the email address anyway, so making it the primary key eliminates one index, thus improving the performance of inserts as they now have only one index to update instead of two.

在我看来,这两种情况都不是十拿九稳的。当有实用的键时,我倾向于使用自然键,因为它们更容易使用,而且在大多数情况下,缺点并不太重要。

如果你有一个非int值作为主键,那么在大数据上插入和检索将会非常慢。