当在c#中比较两个相等的字符串时,不变文化和序数比较之间有什么区别?


当前回答

InvariantCulture

使用“标准”字符顺序集(a,b,c,…)等等)。这与一些特定的语言环境形成对比,这些语言环境可能会对字符进行不同的排序('a-with-acute'可能在'a'之前或之后,这取决于语言环境,等等)。

序数

另一方面,它只查看代表字符的原始字节的值。


http://msdn.microsoft.com/en-us/library/e6883c06.aspx上有一个很好的示例,展示了各种stringcompare值的结果。在最后,它显示(节选):

StringComparison.InvariantCulture:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is less than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)

StringComparison.Ordinal:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is greater than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)

你可以看到,其中InvariantCulture产生(U+0069, U+0049, U+00131),序数产生(U+0049, U+0069, U+00131)。

其他回答

下面是一个例子,使用InvariantCultureIgnoreCase和OrdinalIgnoreCase进行字符串相等性比较将不会给出相同的结果:

string str = "\xC4"; //A with umlaut, Ä
string A = str.Normalize(NormalizationForm.FormC);
//Length is 1, this will contain the single A with umlaut character (Ä)
string B = str.Normalize(NormalizationForm.FormD);
//Length is 2, this will contain an uppercase A followed by an umlaut combining character
bool equals1 = A.Equals(B, StringComparison.OrdinalIgnoreCase);
bool equals2 = A.Equals(B, StringComparison.InvariantCultureIgnoreCase);

如果你运行这个,equals1将为假,而equals2将为真。

InvariantCulture

使用“标准”字符顺序集(a,b,c,…)等等)。这与一些特定的语言环境形成对比,这些语言环境可能会对字符进行不同的排序('a-with-acute'可能在'a'之前或之后,这取决于语言环境,等等)。

序数

另一方面,它只查看代表字符的原始字节的值。


http://msdn.microsoft.com/en-us/library/e6883c06.aspx上有一个很好的示例,展示了各种stringcompare值的结果。在最后,它显示(节选):

StringComparison.InvariantCulture:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is less than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)

StringComparison.Ordinal:
LATIN SMALL LETTER I (U+0069) is less than LATIN SMALL LETTER DOTLESS I (U+0131)
LATIN SMALL LETTER I (U+0069) is greater than LATIN CAPITAL LETTER I (U+0049)
LATIN SMALL LETTER DOTLESS I (U+0131) is greater than LATIN CAPITAL LETTER I (U+0049)

你可以看到,其中InvariantCulture产生(U+0069, U+0049, U+00131),序数产生(U+0049, U+0069, U+00131)。

Another handy difference (in English where accents are uncommon) is that an InvariantCulture comparison compares the entire strings by case-insensitive first, and then if necessary (and requested) distinguishes by case after first comparing only on the distinct letters. (You can also do a case-insensitive comparison, of course, which won't distinguish by case.) Corrected: Accented letters are considered to be another flavor of the same letters and the string is compared first ignoring accents and then accounting for them if the general letters all match (much as with differing case except not ultimately ignored in a case-insensitive compare). This groups accented versions of the otherwise same word near each other instead of completely separate at the first accent difference. This is the sort order you would typically find in a dictionary, with capitalized words appearing right next to their lowercase equivalents, and accented letters being near the corresponding unaccented letter.

序数比较严格地比较数字字符值,在第一个差值处停止。这种方法将大写字母与小写字母完全分开排序(重音字母可能也与这些字母分开),因此大写单词的排序将与它们的小写对等物完全不同。

InvariantCulture还认为大写字母大于小写字母,而Ordinal认为大写字母小于小写字母(这是在计算机还没有小写字母之前遗留下来的ASCII,大写字母先分配,因此比后面添加的小写字母的值更低)。

例如,通过顺序:“0”<“9”<”“< <“Z”“Ab”<”“<“Ab”< <“Z”“Ab”<”“<“Ab”<”“<“Ab”

InvariantCulture:“0”<“9”<“a”<“A”<“á”<“Á”<“ab”<“aB”<“Ab”<“áb”<“Áb”<“Z”<

不需要使用花哨的unicode字符示例来显示差异。这是我今天发现的一个令人惊讶的简单例子,它只包含ASCII字符。

根据ASCII表,0 (48,0x30)在顺序比较时小于_ (95,0x5F)。InvariantCulture会说相反的(PowerShell代码如下):

PS> [System.StringComparer]::Ordinal.Compare("_", "0")
47
PS> [System.StringComparer]::InvariantCulture.Compare("_", "0")
-1

虽然这个问题是关于平等的,但为了快速的视觉参考,这里有一些字符串的顺序,使用一些文化来排序,说明了一些特性。

Ordinal          0 9 A Ab a aB aa ab ss Ä Äb ß ä äb ぁ あ ァ ア 亜 A
IgnoreCase       0 9 a A aa ab Ab aB ss ä Ä äb Äb ß ぁ あ ァ ア 亜 A
--------------------------------------------------------------------
InvariantCulture 0 9 a A A ä Ä aa ab aB Ab äb Äb ss ß ァ ぁ ア あ 亜
IgnoreCase       0 9 A a A Ä ä aa Ab aB ab Äb äb ß ss ァ ぁ ア あ 亜
--------------------------------------------------------------------
da-DK            0 9 a A A ab aB Ab ss ß ä Ä äb Äb aa ァ ぁ ア あ 亜
IgnoreCase       0 9 A a A Ab aB ab ß ss Ä ä Äb äb aa ァ ぁ ア あ 亜
--------------------------------------------------------------------
de-DE            0 9 a A A ä Ä aa ab aB Ab äb Äb ß ss ァ ぁ ア あ 亜
IgnoreCase       0 9 A a A Ä ä aa Ab aB ab Äb äb ss ß ァ ぁ ア あ 亜
--------------------------------------------------------------------
en-US            0 9 a A A ä Ä aa ab aB Ab äb Äb ß ss ァ ぁ ア あ 亜
IgnoreCase       0 9 A a A Ä ä aa Ab aB ab Äb äb ss ß ァ ぁ ア あ 亜
--------------------------------------------------------------------
ja-JP            0 9 a A A ä Ä aa ab aB Ab äb Äb ß ss ァ ぁ ア あ 亜
IgnoreCase       0 9 A a A Ä ä aa Ab aB ab Äb äb ss ß ァ ぁ ア あ 亜

观察:

de-DE, ja-JP和en-US以同样的方式排序 Invariant只对ss和ß进行不同于上述三种文化的分类 da-DK排序完全不同 IgnoreCase标志对所有采样区域性都很重要

用于生成上表的代码:

var l = new List<string>
    { "0", "9", "A", "Ab", "a", "aB", "aa", "ab", "ss", "ß",
      "Ä", "Äb", "ä", "äb", "あ", "ぁ", "ア", "ァ", "A", "亜" };

foreach (var comparer in new[]
{
    StringComparer.Ordinal,
    StringComparer.OrdinalIgnoreCase,
    StringComparer.InvariantCulture,
    StringComparer.InvariantCultureIgnoreCase,
    StringComparer.Create(new CultureInfo("da-DK"), false),
    StringComparer.Create(new CultureInfo("da-DK"), true),
    StringComparer.Create(new CultureInfo("de-DE"), false),
    StringComparer.Create(new CultureInfo("de-DE"), true),
    StringComparer.Create(new CultureInfo("en-US"), false),
    StringComparer.Create(new CultureInfo("en-US"), true),
    StringComparer.Create(new CultureInfo("ja-JP"), false),
    StringComparer.Create(new CultureInfo("ja-JP"), true),
})
{
    l.Sort(comparer);
    Console.WriteLine(string.Join(" ", l));
}