我正在把VB转换成c#。这条语句的语法有问题:

if ((searchResult.Properties["user"].Count > 0))
{
    profile.User = System.Text.Encoding.UTF8.GetString(searchResult.Properties["user"][0]);
}

然后我看到以下错误:

参数1:不能将'object'转换为'byte[]' 匹配的最佳重载方法 'System.Text.Encoding.GetString(byte[])'有一些无效的参数

我试图根据这篇文章修复代码,但仍然没有成功

string User = Encoding.UTF8.GetString("user", 0);

有什么建议吗?


当前回答

编码。默认不应该使用…

一些答案使用编码。违约,但微软提出了警告:

Different computers can use different encodings as the default, and the default encoding can change on a single computer. If you use the Default encoding to encode and decode data streamed between computers or retrieved at different times on the same computer, it may translate that data incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback [i.e. the encoding is totally screwed up, so you can't reencode it back] to map unsupported characters to characters supported by the code page. For these reasons, using the default encoding is not recommended. To ensure that encoded bytes are decoded properly, you should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding. You could also use a higher-level protocol to ensure that the same format is used for encoding and decoding.

要检查默认编码是什么,请使用encoding . default . windowscodepage(在我的例子中是1250 -遗憾的是,没有预定义的CP1250编码类,但对象可以作为encoding . getencoding(1250)检索)。

...应该使用UTF-8/UTF-16LE编码…

编码。ASCII在得分最多的答案是7位,所以它也不工作,在我的情况下:

byte[] pass = Encoding.ASCII.GetBytes("šarže");
Console.WriteLine(Encoding.ASCII.GetString(pass)); // ?ar?e

以下是微软的建议:

var utf8 = new UTF8Encoding();
byte[] pass = utf8.GetBytes("šarže");
Console.WriteLine(utf8.GetString(pass)); // šarže

编码。其他人推荐的UTF8是UTF-8编码的一个实例,也可以直接使用或作为

var utf8 = Encoding.UTF8 as UTF8Encoding;

编码。Unicode在内存中的字符串表示中很流行,因为它每个字符使用固定的2个字节,因此可以在固定的时间内以更多内存使用为代价跳到第n个字符:它是UTF-16LE。在msvc#中,*.cs文件默认是UTF-8 BOM,其中的字符串常量在编译时转换为UTF-16LE(参见@OwnagelsMagic注释),但它没有定义为默认值:许多类,如StreamWriter使用UTF-8作为默认值。

...但它并不总是被使用

Default encoding is misleading: .NET uses UTF-8 everywhere (including strings hardcoded in the source code) and UTF-16LE (Encoding.Unicode) to store strings in memory, but Windows actually uses 2 other non-UTF8 defaults: ANSI codepage (for GUI apps before .NET) and OEM codepage (aka DOS standard). These differs from country to country (for instance, Windows Czech edition uses CP1250 and CP852) and are oftentimes hardcoded in windows API libraries. So if you just set UTF-8 to console by chcp 65001 (as .NET implicitly does and pretends it is the default) and run some localized command (like ping), it works in English version, but you get tofu text in Czech Republic.

让我分享一下我的真实经验:我为教师创建了定制git脚本的WinForms应用程序。输出是由微软描述为(我添加的粗体文本)的进程在后台任意地获得的:

在本文中,“shell”一词(UseShellExecute)指的是一个图形shell(类似于Windows shell, ANSI CP)而不是命令shell(例如bash或sh, OEM CP),允许用户在非美国环境中启动图形应用程序或打开输出混乱的文档。

So effectively GUI defaults to UTF-8, process defaults to CP1250 and console defaults to 852. So the output is in 852 interpreted as UTF-8 interpreted as CP1250. I got tofu text from which I could not deduce the original codepage due to the double conversion. I was pulling my hair for a week to figure out to explicitly set UTF-8 for process script and convert the output from CP1250 to UTF-8 in the main thread. Now it works here in the Eastern Europe, but Western Europe Windows uses 1252. ANSI CP is not determined easily as many commands like systeminfo are also localized and other methods differs from version to version: in such environment displaying national characters reliably is almost unfeasible.

因此,在21世纪中叶之前,请不要使用任何“默认代码页”并显式设置它(如果可能的话,设置为UTF-8或UTF-16LE)。

其他回答

static byte[] GetBytes(string str)
{
     byte[] bytes = new byte[str.Length * sizeof(char)];
     System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
     return bytes;
}

static string GetString(byte[] bytes)
{
     char[] chars = new char[bytes.Length / sizeof(char)];
     System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
     return new string(chars);
}

如果您已经有了一个字节数组,那么您将需要知道使用了什么类型的编码使其成为字节数组。

例如,如果字节数组是这样创建的:

byte[] bytes = Encoding.ASCII.GetBytes(someString);

你需要把它转换回一个字符串,就像这样:

string someString = Encoding.ASCII.GetString(bytes);

如果可以在继承的代码中找到用于创建字节数组的编码,则应该进行设置。

var result = System.Text.Encoding.Unicode.GetBytes(text);

这个问题已经被回答了很多,但对我来说,唯一的工作方法是:

    public static byte[] StringToByteArray(string str)
    {
        byte[] array = Convert.FromBase64String(str);
        return array;
    }

对JustinStolle的编辑进行了改进(Eran Yogev使用BlockCopy)。

所提出的解决方案确实比使用编码更快。 问题是它不适用于编码长度不均匀的字节数组。 如上所述,它会引发一个越界异常。 从字符串解码时,长度增加1会留下一个尾随字节。

对我来说,当我想从DataTable编码到JSON时,就有了这个需求。 我正在寻找一种方法将二进制字段编码为字符串,并从字符串解码回字节[]。

因此,我创建了两个类——一个包装上述解决方案(当从字符串编码时,它是好的,因为长度总是偶数),另一个处理字节[]编码。

我通过添加一个字符来解决长度不均匀的问题,该字符告诉我二进制数组的原始长度是奇数('1')还是偶数('0')。

如下:

public static class StringEncoder
{
    static byte[] EncodeToBytes(string str)
    {
        byte[] bytes = new byte[str.Length * sizeof(char)];
        System.Buffer.BlockCopy(str.ToCharArray(), 0, bytes, 0, bytes.Length);
        return bytes;
    }
    static string DecodeToString(byte[] bytes)
    {
        char[] chars = new char[bytes.Length / sizeof(char)];
        System.Buffer.BlockCopy(bytes, 0, chars, 0, bytes.Length);
        return new string(chars);
    }
}

public static class BytesEncoder
{
    public static string EncodeToString(byte[] bytes)
    {
        bool even = (bytes.Length % 2 == 0);
        char[] chars = new char[1 + bytes.Length / sizeof(char) + (even ? 0 : 1)];
        chars[0] = (even ? '0' : '1');
        System.Buffer.BlockCopy(bytes, 0, chars, 2, bytes.Length);

        return new string(chars);
    }
    public static byte[] DecodeToBytes(string str)
    {
        bool even = str[0] == '0';
        byte[] bytes = new byte[(str.Length - 1) * sizeof(char) + (even ? 0 : -1)];
        char[] chars = str.ToCharArray();
        System.Buffer.BlockCopy(chars, 2, bytes, 0, bytes.Length);

        return bytes;
    }
}