我不是很熟悉数据库及其工作原理。从性能的角度(插入/更新/查询),使用字符串作主键是否比整数慢?


当前回答

Technically yes, but if a string makes sense to be the primary key then you should probably use it. This all depends on the size of the table you're making it for and the length of the string that is going to be the primary key (longer strings == harder to compare). I wouldn't necessarily use a string for a table that has millions of rows, but the amount of performance slowdown you'll get by using a string on smaller tables will be minuscule to the headaches that you can have by having an integer that doesn't mean anything in relation to the data.

其他回答

Technically yes, but if a string makes sense to be the primary key then you should probably use it. This all depends on the size of the table you're making it for and the length of the string that is going to be the primary key (longer strings == harder to compare). I wouldn't necessarily use a string for a table that has millions of rows, but the amount of performance slowdown you'll get by using a string on smaller tables will be minuscule to the headaches that you can have by having an integer that doesn't mean anything in relation to the data.

使用string作为主键的另一个问题是,由于索引不断按顺序排列,当创建一个新键时,索引必须重新排序……如果使用自动编号整数,则新键只添加到索引的末尾。

Inserts to a table having a clustered index where the insertion occurs in the middle of the sequence DOES NOT cause the index to be rewritten. It does not cause the pages comprising the data to be rewritten. If there is room on the page where the row will go, then it is placed in that page. The single page will be reformatted to place the row in the right place in the page. When the page is full, a page split will happen, with half of the rows on the page going to one page, and half going on the other. The pages are then relinked into the linked list of pages that comprise a tables data that has the clustered index. At most, you will end up writing 2 pages of database.

Strings are slower in joins and in real life they are very rarely really unique (even when they are supposed to be). The only advantage is that they can reduce the number of joins if you are joining to the primary table only to get the name. However, strings are also often subject to change thus creating the problem of having to fix all related records when the company name changes or the person gets married. This can be a huge performance hit and if all tables that should be related somehow are not related (this happens more often than you think), then you might have data mismatches as well. An integer that will never change through the life of the record is a far safer choice from a data integrity standpoint as well as from a performance standpoint. Natural keys are usually not so good for maintenance of the data.

我还想指出,两者的最佳方法通常是使用自递增键(或者在某些特殊情况下,使用GUID)作为PK,然后在自然键上放置唯一索引。您可以获得更快的连接,不会得到重复的记录,也不必因为公司名称更改而更新一百万个子记录。

你为什么要用字符串作为主键?

我只需将主键设置为一个自动递增的整数字段,并在字符串字段上放置一个索引。

这样,如果您在表上进行搜索,它们应该相对较快,并且所有的连接和正常查找都不会受到速度的影响。

您还可以控制被索引的字符串字段的数量。换句话说,如果您认为这样就足够了,您可以说“只索引前5个字符”。或者如果您的数据可以相对相似,您可以索引整个字段。