utf8_general_ci和utf8_unicode_ci之间有什么区别?

在utf8_general_ci和utf8_unicode_ci之间，在性能方面有什么不同吗?

当前回答

我想知道使用utf8_general_ci和utf8_unicode_ci之间的性能差异是什么，但我没有在互联网上找到任何基准测试，所以我决定自己创建基准测试。

我创建了一个非常简单的50万行表:

CREATE TABLE test(
  ID INT(11) DEFAULT NULL,
  Description VARCHAR(20) DEFAULT NULL
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;

然后我通过运行这个存储过程，用随机数据填充它:

CREATE PROCEDURE randomizer()
BEGIN
  DECLARE i INT DEFAULT 0;
  DECLARE random CHAR(20) ;
  theloop: loop
    SET random = CONV(FLOOR(RAND() * 99999999999999), 20, 36);
    INSERT INTO test VALUES (i+1, random);
    SET i=i+1;
    IF i = 500000 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END

然后我创建了以下存储过程来测试简单的SELECT, SELECT with LIKE和排序(SELECT with ORDER BY):

CREATE PROCEDURE benchmark_simple_select()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE Description = 'test' COLLATE utf8_general_ci;
    SET i = i + 1;
    IF i = 30 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

CREATE PROCEDURE benchmark_select_like()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE Description LIKE '%test' COLLATE utf8_general_ci;
    SET i = i + 1;
    IF i = 30 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

CREATE PROCEDURE benchmark_order_by()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE ID > FLOOR(1 + RAND() * (400000 - 1))
    ORDER BY Description COLLATE utf8_general_ci LIMIT 1000;
    SET i = i + 1;
    IF i = 10 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

在上面的存储过程中使用了utf8_general_ci排序，但当然，在测试期间，我同时使用了utf8_general_ci和utf8_unicode_ci。

对于每种排序，我调用每个存储过程5次(utf8_general_ci调用5次，utf8_unicode_ci调用5次)，然后计算平均值。

我的结果是:

benchmark_simple_select ()

utf8_general_ci: 9,957 ms utf8_unicode_ci: 10,271 ms

在这个基准测试中，使用utf8_unicode_ci比使用utf8_general_ci慢3.2%。

benchmark_select_like ()

使用utf8_general_ci: 11,441 ms 使用utf8_unicode_ci: 12,811 ms

在这个基准测试中，使用utf8_unicode_ci比使用utf8_general_ci慢12%。

benchmark_order_by ()

使用utf8_general_ci: 11,944 ms 使用utf8_unicode_ci: 12,887 ms

在这个基准测试中，使用utf8_unicode_ci比使用utf8_general_ci慢7.9%。

2013-03-02 02:53:57

其他回答

我想知道使用utf8_general_ci和utf8_unicode_ci之间的性能差异是什么，但我没有在互联网上找到任何基准测试，所以我决定自己创建基准测试。

我创建了一个非常简单的50万行表:

CREATE TABLE test(
  ID INT(11) DEFAULT NULL,
  Description VARCHAR(20) DEFAULT NULL
)
ENGINE = INNODB
CHARACTER SET utf8
COLLATE utf8_general_ci;

然后我通过运行这个存储过程，用随机数据填充它:

CREATE PROCEDURE randomizer()
BEGIN
  DECLARE i INT DEFAULT 0;
  DECLARE random CHAR(20) ;
  theloop: loop
    SET random = CONV(FLOOR(RAND() * 99999999999999), 20, 36);
    INSERT INTO test VALUES (i+1, random);
    SET i=i+1;
    IF i = 500000 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END

然后我创建了以下存储过程来测试简单的SELECT, SELECT with LIKE和排序(SELECT with ORDER BY):

CREATE PROCEDURE benchmark_simple_select()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE Description = 'test' COLLATE utf8_general_ci;
    SET i = i + 1;
    IF i = 30 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

CREATE PROCEDURE benchmark_select_like()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE Description LIKE '%test' COLLATE utf8_general_ci;
    SET i = i + 1;
    IF i = 30 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

CREATE PROCEDURE benchmark_order_by()
BEGIN
  DECLARE i INT DEFAULT 0;
  theloop: loop
    SELECT *
    FROM test
    WHERE ID > FLOOR(1 + RAND() * (400000 - 1))
    ORDER BY Description COLLATE utf8_general_ci LIMIT 1000;
    SET i = i + 1;
    IF i = 10 THEN
      LEAVE theloop;
    END IF;
  END LOOP theloop;
END;

在上面的存储过程中使用了utf8_general_ci排序，但当然，在测试期间，我同时使用了utf8_general_ci和utf8_unicode_ci。

对于每种排序，我调用每个存储过程5次(utf8_general_ci调用5次，utf8_unicode_ci调用5次)，然后计算平均值。

我的结果是:

benchmark_simple_select ()

utf8_general_ci: 9,957 ms utf8_unicode_ci: 10,271 ms

在这个基准测试中，使用utf8_unicode_ci比使用utf8_general_ci慢3.2%。

benchmark_select_like ()

使用utf8_general_ci: 11,441 ms 使用utf8_unicode_ci: 12,811 ms

在这个基准测试中，使用utf8_unicode_ci比使用utf8_general_ci慢12%。

benchmark_order_by ()

使用utf8_general_ci: 11,944 ms 使用utf8_unicode_ci: 12,887 ms

在这个基准测试中，使用utf8_unicode_ci比使用utf8_general_ci慢7.9%。

2013-03-02 02:53:57

请参阅mysql手册，Unicode字符集部分:

For any Unicode character set, operations performed using the _general_ci collation are faster than those for the _unicode_ci collation. For example, comparisons for the utf8_general_ci collation are faster, but slightly less correct, than comparisons for utf8_unicode_ci. The reason for this is that utf8_unicode_ci supports mappings such as expansions; that is, when one character compares as equal to combinations of other characters. For example, in German and some other languages “ß” is equal to “ss”. utf8_unicode_ci also supports contractions and ignorable characters. utf8_general_ci is a legacy collation that does not support expansions, contractions, or ignorable characters. It can make only one-to-one comparisons between characters.

因此，总的来说，utf_general_ci使用的比较集比utf_unicode_ci更小，更不正确(根据标准)，后者应该实现整个标准。general_ci集将更快，因为要做的计算更少。

2009-04-20 04:09:58

简单来说:

如果您需要更好的排序顺序-使用utf8_unicode_ci(这是首选方法)，

但是如果您对性能非常感兴趣，可以使用utf8_general_ci，但要知道它有点过时了。

性能方面的差异非常微小。

2017-03-06 11:51:10

一些细节(PL)

正如我们可以在这里(Peter Gulutzan)读到的，排序/比较波兰字母“Ł”(L与笔画- html esc: Ł)(小写:“ova”- html esc: ł) -我们有以下假设:

utf8_polish_ci      Ł greater than L and less than M
utf8_unicode_ci     Ł greater than L and less than M
utf8_unicode_520_ci Ł equal to L
utf8_general_ci     Ł greater than Z

在波兰语中，字母Ł在字母L之后，在字母m之前，没有哪个编码更好或更差——这取决于你的需要。

2018-11-20 08:00:37

排序和字符匹配有两个很大的区别:

排序:

Utf8mb4_general_ci删除所有重音并逐个排序，这可能会产生不正确的排序结果。 Utf8mb4_unicode_ci排序准确。

字符匹配

它们以不同的方式匹配字符。

例如，在utf8mb4_unicode_ci中，你有i !=伊斯坦布尔，但在utf8mb4_general_ci中，它包含了伊斯坦布尔=伊斯坦布尔。

例如，假设您有一个name=" yilmaz "的行。然后

select id from users where name='Yilmaz';

如果搭配为utf8mb4_general_ci，则返回该行，但如果搭配为utf8mb4_unicode_ci，则不会返回该行!

另一方面，我们在utf8mb4_unicode_ci中有a=ª和ß=ss，而在utf8mb4_general_ci中则不是这样。所以想象你有一行的名字="ªßi"，然后

select id from users where name='assi';

如果并置为utf8mb4_unicode_ci则返回行，但如果并置设置为utf8mb4_general_ci则不返回行。

每个搭配的完整列表可以在这里找到。

2019-12-06 18:31:40

utf8_general_ci和utf8_unicode_ci之间有什么区别?

推荐文章

最新文章

标签