我得到下面的错误时,试图做一个选择通过一个存储过程在MySQL。

操作'='的排序规则(latin1_general_cs,IMPLICIT)和(latin1_general_ci,IMPLICIT)的非法混合

你知道哪里出了问题吗?

该表的排序规则为latin1_general_ci, where子句中的列的排序规则为latin1_general_cs。


MySQL真的不喜欢混合排序规则,除非它可以将它们强制到同一个排序规则(这在您的情况下显然是不可行的)。难道不能通过COLLATE子句强制使用相同的排序规则吗?(或更简单的二进制快捷方式,如果适用…)


这通常是由于比较两个排序规则不兼容的字符串或试图将不同排序规则的数据选择到一个组合列中而导致的。

子句COLLATE允许您指定查询中使用的排序规则。

例如,下面的WHERE子句总是会给出你发布的错误:

WHERE 'A' COLLATE latin1_general_ci = 'A' COLLATE latin1_general_cs

您的解决方案是为查询中的两个列指定共享排序规则。下面是一个使用COLLATE子句的例子:

SELECT * FROM table ORDER BY key COLLATE latin1_general_ci;

另一种选择是使用BINARY操作符:

二进制str是CAST(str AS BINARY)的简写。

你的解决方案可能是这样的:

SELECT * FROM table WHERE BINARY a = BINARY b;

or,

SELECT * FROM table ORDER BY BINARY a;

请记住,正如Jacob Stamm在评论中指出的那样,“转换列来进行比较将导致忽略该列上的任何索引”。

关于整理业务的更多细节,我强烈推荐eggyal对这个问题的出色回答。


把我的2c加入到未来谷歌员工的讨论中。

我正在调查一个类似的问题,在使用接收varchar参数的自定义函数时,我得到了以下错误:

Illegal mix of collations (utf8_unicode_ci,IMPLICIT) and 
(utf8_general_ci,IMPLICIT) for operation '='

使用以下查询:

mysql> show variables like "collation_database";
    +--------------------+-----------------+
    | Variable_name      | Value           |
    +--------------------+-----------------+
    | collation_database | utf8_general_ci |
    +--------------------+-----------------+

我能够告诉DB使用utf8_general_ci,而表是使用utf8_unicode_ci定义的:

mysql> show table status;
    +--------------+-----------------+
    | Name         | Collation       |
    +--------------+-----------------+
    | my_view      | NULL            |
    | my_table     | utf8_unicode_ci |
    ...

注意,视图具有NULL排序规则。视图和函数似乎有排序规则定义,即使该查询为一个视图显示为空。使用的排序规则是创建视图/函数时定义的DB排序规则。

可悲的解决方案是既改变db排序规则,又重新创建视图/函数,迫使它们使用当前的排序规则。

更改db的排序规则: ALTER DATABASE mydb DEFAULT COLLATE utf8 更改表格排序规则: ALTER TABLE mydb CONVERT TO CHARACTER SET utf8 COLLATE

我希望这能帮助到一些人。


有时转换字符集可能是危险的,特别是在具有大量数据的数据库上。我认为最好的选择是使用“二进制”操作符:

e.g : WHERE binary table1.column1 = binary table2.column1

我使用ALTER DATABASE mydb DEFAULT COLLATE utf8_unicode_ci;,但没有工作。

在此查询中:

Select * from table1, table2 where table1.field = date_format(table2.field,'%H');

这对我来说很有用:

Select * from table1, table2 where concat(table1.field) = date_format(table2.field,'%H');

是的,只有一个concat。


博士TL;

更改一个(或两个)字符串的排序规则,使它们匹配,或者在表达式中添加COLLATE子句。


What is this "collation" stuff anyway? As documented under Character Sets and Collations in General: A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let's make the distinction clear with an example of an imaginary character set. Suppose that we have an alphabet with four letters: “A”, “B”, “a”, “b”. We give each letter a number: “A” = 0, “B” = 1, “a” = 2, “b” = 3. The letter “A” is a symbol, the number 0 is the encoding for “A”, and the combination of all four letters and their encodings is a character set. Suppose that we want to compare two string values, “A” and “B”. The simplest way to do this is to look at the encodings: 0 for “A” and 1 for “B”. Because 0 is less than 1, we say “A” is less than “B”. What we've just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binary collation. But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters “a” and “b” as equivalent to “A” and “B”; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation. In real life, most character sets have many characters: not just “A” and “B” but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German “Ö”), and for multiple-character mappings (such as the rule that “Ö” = “OE” in one of the two German collations). Further examples are given under Examples of the Effect of Collation. Okay, but how does MySQL decide which collation to use for a given expression? As documented under Collation of Expressions: In the great majority of statements, it is obvious what collation MySQL uses to resolve a comparison operation. For example, in the following cases, it should be clear that the collation is the collation of column charset_name: SELECT x FROM T ORDER BY x; SELECT x FROM T WHERE x = x; SELECT DISTINCT x FROM T; However, with multiple operands, there can be ambiguity. For example: SELECT x FROM T WHERE x = 'Y'; Should the comparison use the collation of the column x, or of the string literal 'Y'? Both x and 'Y' have collations, so which collation takes precedence? Standard SQL resolves such questions using what used to be called “coercibility” rules. [ deletia ] MySQL uses coercibility values with the following rules to resolve ambiguities: Use the collation with the lowest coercibility value. If both sides have the same coercibility, then: If both sides are Unicode, or both sides are not Unicode, it is an error. If one of the sides has a Unicode character set, and another side has a non-Unicode character set, the side with Unicode character set wins, and automatic character set conversion is applied to the non-Unicode side. For example, the following statement does not return an error: SELECT CONCAT(utf8_column, latin1_column) FROM t1; It returns a result that has a character set of utf8 and the same collation as utf8_column. Values of latin1_column are automatically converted to utf8 before concatenating. For an operation with operands from the same character set but that mix a _bin collation and a _ci or _cs collation, the _bin collation is used. This is similar to how operations that mix nonbinary and binary strings evaluate the operands as binary strings, except that it is for collations rather than data types. So what is an "illegal mix of collations"? An "illegal mix of collations" occurs when an expression compares two strings of different collations but of equal coercibility and the coercibility rules cannot help to resolve the conflict. It is the situation described under the third bullet-point in the above quotation. The particular error given in the question, Illegal mix of collations (latin1_general_cs,IMPLICIT) and (latin1_general_ci,IMPLICIT) for operation '=', tells us that there was an equality comparison between two non-Unicode strings of equal coercibility. It furthermore tells us that the collations were not given explicitly in the statement but rather were implied from the strings' sources (such as column metadata). That's all very well, but how does one resolve such errors? As the manual extracts quoted above suggest, this problem can be resolved in a number of ways, of which two are sensible and to be recommended: Change the collation of one (or both) of the strings so that they match and there is no longer any ambiguity. How this can be done depends upon from where the string has come: Literal expressions take the collation specified in the collation_connection system variable; values from tables take the collation specified in their column metadata. Force one string to not be coercible. I omitted the following quote from the above: MySQL assigns coercibility values as follows: An explicit COLLATE clause has a coercibility of 0. (Not coercible at all.) The concatenation of two strings with different collations has a coercibility of 1. The collation of a column or a stored routine parameter or local variable has a coercibility of 2. A “system constant” (the string returned by functions such as USER() or VERSION()) has a coercibility of 3. The collation of a literal has a coercibility of 4. NULL or an expression that is derived from NULL has a coercibility of 5. Thus simply adding a COLLATE clause to one of the strings used in the comparison will force use of that collation. Whilst the others would be terribly bad practice if they were deployed merely to resolve this error: Force one (or both) of the strings to have some other coercibility value so that one takes precedence. Use of CONCAT() or CONCAT_WS() would result in a string with a coercibility of 1; and (if in a stored routine) use of parameters/local variables would result in strings with a coercibility of 2. Change the encodings of one (or both) of the strings so that one is Unicode and the other is not. This could be done via transcoding with CONVERT(expr USING transcoding_name); or via changing the underlying character set of the data (e.g. modifying the column, changing character_set_connection for literal values, or sending them from the client in a different encoding and changing character_set_client / adding a character set introducer). Note that changing encoding will lead to other problems if some desired characters cannot be encoded in the new character set. Change the encodings of one (or both) of the strings so that they are both the same and change one string to use the relevant _bin collation. Methods for changing encodings and collations have been detailed above. This approach would be of little use if one actually needs to apply more advanced collation rules than are offered by the _bin collation.


一个可能的解决方案是将整个数据库转换为UTF8(另见此问题)。


解决方案,如果文字涉及。

我使用Pentaho数据集成,不需要指定sql语法。 使用非常简单的DB查找就会出现错误 "操作'='的排序规则(cp850_general_ci, coerble)和(latin1_swedish_ci, coerble)的非法混合"

生成的代码是 SELECT DATA_DATE AS latest DATA_DATE FROM hr_cc_normalised_data_date_v WHERE PSEUDO_KEY = ?"

长话短说,查找是一个视图,当我发布

mysql> show full columns from hr_cc_normalised_data_date_v;
+------------+------------+-------------------+------+-----+
| Field      | Type       | Collation         | Null | Key |
+------------+------------+-------------------+------+-----+
| PSEUDO_KEY | varchar(1) | cp850_general_ci  | NO   |     |
| DATA_DATE  | varchar(8) | latin1_general_cs | YES  |     |
+------------+------------+-------------------+------+-----+

这就解释了“cp850_general_ci”的来源。

视图是用'SELECT 'X',......'创建的 根据手册,这样的文字应该从服务器设置继承字符集和排序规则服务器设置被正确地定义为" latin1 "和" latin1_general_cs " 因为这显然没有发生,我强迫它在创建视图

CREATE OR REPLACE VIEW hr_cc_normalised_data_date_v AS
SELECT convert('X' using latin1) COLLATE latin1_general_cs        AS PSEUDO_KEY
    ,  DATA_DATE
FROM HR_COSTCENTRE_NORMALISED_mV
LIMIT 1;

现在它为两个列显示latin1_general_cs,错误已经消失。:)


如果你遇到问题的列是“散列”,那么考虑以下…

如果“hash”是二进制字符串,你应该使用binary(…)数据类型。

如果“哈希”是一个十六进制字符串,你不需要utf8,应该避免这样做,因为字符检查等。例如,MySQL的MD5(…)会产生一个固定长度的32字节十六进制字符串。SHA1(…)给出一个40字节的十六进制字符串。这可以存储到CHAR(32)字符集ascii(或40的sha1)。

或者,更好的是,将UNHEX(MD5(…))存储为BINARY(16)。这样就把柱子的大小减少了一半。(然而,这确实使它不适合印刷。)SELECT十六进制(散列)…如果你想让它可读。

比较两个BINARY列没有排序规则问题。


排序规则问题的另一个来源是mysql。proc表。检查存储过程和函数的排序规则:

SELECT
  p.db, p.db_collation, p.type, COUNT(*) cnt
FROM mysql.proc p
GROUP BY p.db, p.db_collation, p.type;

还要注意mysql.proc。Collation_connection和mysql.proc。character_set_client列。


我有一个类似的问题,试图使用FIND_IN_SET过程与字符串变量。

SET @my_var = 'string1,string2';
SELECT * from my_table WHERE FIND_IN_SET(column_name,@my_var);

并且正在接收错误

错误码:1267。排序规则的非法混合(utf8_unicode_ci,IMPLICIT) 和(utf8_general_ci,隐式)操作'find_in_set'

简短的回答:

不需要更改任何collation_YYYY变量,只需在变量声明旁边添加正确的排序规则,即。

SET @my_var = 'string1,string2' COLLATE utf8_unicode_ci;
SELECT * from my_table WHERE FIND_IN_SET(column_name,@my_var);

长一点的回答:

我首先检查了排序变量:

mysql> SHOW VARIABLES LIKE 'collation%';
    +----------------------+-----------------+
    | Variable_name        | Value           |
    +----------------------+-----------------+
    | collation_connection | utf8_general_ci |
    +----------------------+-----------------+
    | collation_database   | utf8_general_ci |
    +----------------------+-----------------+
    | collation_server     | utf8_general_ci |
    +----------------------+-----------------+

然后我查看了表格整理:

mysql> SHOW CREATE TABLE my_table;

CREATE TABLE `my_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `column_name` varchar(40) COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=125 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

这意味着我的变量被配置为默认排序规则utf8_general_ci,而我的表被配置为utf8_unicode_ci。

通过在变量声明旁边添加COLLATE命令,变量排序规则与为表配置的排序规则相匹配。


这段代码需要放在运行SQL查询/数据库查询

SQL查询窗口

ALTER TABLE `table_name` CHANGE `column_name` `column_name`   VARCHAR(128) CHARACTER SET utf8 COLLATE utf8_unicode_ci NULL DEFAULT NULL;

请用合适的名称替换table_name和column_name。


如果安装了phpMyAdmin,可以按照以下链接中给出的说明进行操作:https://mediatemple.net/community/products/dv/204403914/default-mysql-character-set-and-collation您必须将数据库的排序规则与所有表的排序规则以及表中的字段进行匹配,然后重新编译所有存储过程和函数。这样,一切就都能正常工作了。


Very interesting... Now, be ready. I looked at all of the "add collate" solutions and to me, those are band aid fixes. The reality is the database design was "bad". Yes, standard changes and new things gets added, blah blah, but it does not change the bad database design fact. I refuse to go with the route of adding "collate" all over the SQL statements just to get my query to work. The only solution that works for me and will virtually eliminate the need to tweak my code in the future is to re-design the database/tables to match the character set that I will live with and embrace for the long term future. In this case, I choose to go with the character set "utf8mb4".

因此,当您遇到“非法”错误消息时,这里的解决方案是重新设计数据库和表。这比听起来要简单快捷得多。甚至可能不需要导出数据并从CSV重新导入数据。更改数据库的字符集,并确保所有表的字符集都匹配。

使用这些命令来指导您:

SHOW VARIABLES LIKE "collation_database";
SHOW TABLE STATUS;

现在,如果您喜欢在这里或那里添加“collate”,并通过强制完全“重写”来增强代码,请听我的猜测。


下面的方法对我很有效。

CONVERT( Table1.FromColumn USING utf8)    =  CONVERT(Table2.ToColumn USING utf8) 

我个人在一次手术中遇到过这个问题。 如果你不想改变表,你可以尝试将你的参数转换为过程。 我已经尝试了几次使用collate(与一个集到选择),但没有一个适合我。

CONVERT(my_param USING utf32)做到了这一点。


在我的例子中,函数的默认返回类型是来自数据库的类型/排序规则(utf8mb4_general_ci),但数据库列是ascii。

WHERE ascii_col = md5(concat_ws(',', a,b,c))

权宜之计是

WHERE ascii_col = BINARY md5(concat_ws(',', a,b,c))