编码。默认不应该使用…
一些答案使用编码。违约,但微软提出了警告:
Different computers can use different encodings as the default, and the default encoding can change on a single computer. If you use the Default encoding to encode and decode data streamed between computers or retrieved at different times on the same computer, it may translate that data incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback [i.e. the encoding is totally screwed up, so you can't reencode it back] to map unsupported characters to characters supported by the code page. For these reasons, using the default encoding is not recommended. To ensure that encoded bytes are decoded properly, you should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding. You could also use a higher-level protocol to ensure that the same format is used for encoding and decoding.
要检查默认编码是什么,请使用encoding . default . windowscodepage(在我的例子中是1250 -遗憾的是,没有预定义的CP1250编码类,但对象可以作为encoding . getencoding(1250)检索)。
...应该使用UTF-8/UTF-16LE编码…
编码。ASCII在得分最多的答案是7位,所以它也不工作,在我的情况下:
byte[] pass = Encoding.ASCII.GetBytes("šarže");
Console.WriteLine(Encoding.ASCII.GetString(pass)); // ?ar?e
以下是微软的建议:
var utf8 = new UTF8Encoding();
byte[] pass = utf8.GetBytes("šarže");
Console.WriteLine(utf8.GetString(pass)); // šarže
编码。其他人推荐的UTF8是UTF-8编码的一个实例,也可以直接使用或作为
var utf8 = Encoding.UTF8 as UTF8Encoding;
编码。Unicode在内存中的字符串表示中很流行,因为它每个字符使用固定的2个字节,因此可以在固定的时间内以更多内存使用为代价跳到第n个字符:它是UTF-16LE。在msvc#中,*.cs文件默认是UTF-8 BOM,其中的字符串常量在编译时转换为UTF-16LE(参见@OwnagelsMagic注释),但它没有定义为默认值:许多类,如StreamWriter使用UTF-8作为默认值。
...但它并不总是被使用
Default encoding is misleading: .NET uses UTF-8 everywhere (including strings hardcoded in the source code) and UTF-16LE (Encoding.Unicode) to store strings in memory, but Windows actually uses 2 other non-UTF8 defaults: ANSI codepage (for GUI apps before .NET) and OEM codepage (aka DOS standard). These differs from country to country (for instance, Windows Czech edition uses CP1250 and CP852) and are oftentimes hardcoded in windows API libraries. So if you just set UTF-8 to console by chcp 65001 (as .NET implicitly does and pretends it is the default) and run some localized command (like ping), it works in English version, but you get tofu text in Czech Republic.
让我分享一下我的真实经验:我为教师创建了定制git脚本的WinForms应用程序。输出是由微软描述为(我添加的粗体文本)的进程在后台任意地获得的:
在本文中,“shell”一词(UseShellExecute)指的是一个图形shell(类似于Windows shell, ANSI CP)而不是命令shell(例如bash或sh, OEM CP),允许用户在非美国环境中启动图形应用程序或打开输出混乱的文档。
So effectively GUI defaults to UTF-8, process defaults to CP1250 and console defaults to 852. So the output is in 852 interpreted as UTF-8 interpreted as CP1250. I got tofu text from which I could not deduce the original codepage due to the double conversion. I was pulling my hair for a week to figure out to explicitly set UTF-8 for process script and convert the output from CP1250 to UTF-8 in the main thread. Now it works here in the Eastern Europe, but Western Europe Windows uses 1252. ANSI CP is not determined easily as many commands like systeminfo are also localized and other methods differs from version to version: in such environment displaying national characters reliably is almost unfeasible.
因此,在21世纪中叶之前,请不要使用任何“默认代码页”并显式设置它(如果可能的话,设置为UTF-8或UTF-16LE)。