我正在把VB转换成c#。这条语句的语法有问题:

if ((searchResult.Properties["user"].Count > 0))
{
    profile.User = System.Text.Encoding.UTF8.GetString(searchResult.Properties["user"][0]);
}

然后我看到以下错误:

参数1:不能将'object'转换为'byte[]' 匹配的最佳重载方法 'System.Text.Encoding.GetString(byte[])'有一些无效的参数

我试图根据这篇文章修复代码,但仍然没有成功

string User = Encoding.UTF8.GetString("user", 0);

有什么建议吗?


当前回答

在c# 11中,你可以使用UTF-8字符串字面量,这使得它超级简单,具有更好的性能,并且没有内存分配。

byte[] array = "some text";

或者,如果你已经有一个字符串值:

string input = "some text"; 
byte[] array = input;

这是一个使用旧的UTF-8编码方式(GetBytes)和c# 11 UTF-8字符串文字方式(GetBytesNew)之间的区别的例子。

其他回答

使用这个

byte[] myByte= System.Text.ASCIIEncoding.Default.GetBytes(myString);

在c# 11中,你可以使用UTF-8字符串字面量,这使得它超级简单,具有更好的性能,并且没有内存分配。

byte[] array = "some text";

或者,如果你已经有一个字符串值:

string input = "some text"; 
byte[] array = input;

这是一个使用旧的UTF-8编码方式(GetBytes)和c# 11 UTF-8字符串文字方式(GetBytesNew)之间的区别的例子。

基于阿里的回答,我会推荐一个扩展方法,允许你有选择地传入你想使用的编码:

using System.Text;
public static class StringExtensions
{
    /// <summary>
    /// Creates a byte array from the string, using the 
    /// System.Text.Encoding.Default encoding unless another is specified.
    /// </summary>
    public static byte[] ToByteArray(this string str, Encoding encoding = Encoding.Default)
    {
        return encoding.GetBytes(str);
    }
}

然后像下面这样使用它:

string foo = "bla bla";

// default encoding
byte[] default = foo.ToByteArray();

// custom encoding
byte[] unicode = foo.ToByteArray(Encoding.Unicode);

编码。默认不应该使用…

一些答案使用编码。违约,但微软提出了警告:

Different computers can use different encodings as the default, and the default encoding can change on a single computer. If you use the Default encoding to encode and decode data streamed between computers or retrieved at different times on the same computer, it may translate that data incorrectly. In addition, the encoding returned by the Default property uses best-fit fallback [i.e. the encoding is totally screwed up, so you can't reencode it back] to map unsupported characters to characters supported by the code page. For these reasons, using the default encoding is not recommended. To ensure that encoded bytes are decoded properly, you should use a Unicode encoding, such as UTF8Encoding or UnicodeEncoding. You could also use a higher-level protocol to ensure that the same format is used for encoding and decoding.

要检查默认编码是什么,请使用encoding . default . windowscodepage(在我的例子中是1250 -遗憾的是,没有预定义的CP1250编码类,但对象可以作为encoding . getencoding(1250)检索)。

...应该使用UTF-8/UTF-16LE编码…

编码。ASCII在得分最多的答案是7位,所以它也不工作,在我的情况下:

byte[] pass = Encoding.ASCII.GetBytes("šarže");
Console.WriteLine(Encoding.ASCII.GetString(pass)); // ?ar?e

以下是微软的建议:

var utf8 = new UTF8Encoding();
byte[] pass = utf8.GetBytes("šarže");
Console.WriteLine(utf8.GetString(pass)); // šarže

编码。其他人推荐的UTF8是UTF-8编码的一个实例,也可以直接使用或作为

var utf8 = Encoding.UTF8 as UTF8Encoding;

编码。Unicode在内存中的字符串表示中很流行,因为它每个字符使用固定的2个字节,因此可以在固定的时间内以更多内存使用为代价跳到第n个字符:它是UTF-16LE。在msvc#中,*.cs文件默认是UTF-8 BOM,其中的字符串常量在编译时转换为UTF-16LE(参见@OwnagelsMagic注释),但它没有定义为默认值:许多类,如StreamWriter使用UTF-8作为默认值。

...但它并不总是被使用

Default encoding is misleading: .NET uses UTF-8 everywhere (including strings hardcoded in the source code) and UTF-16LE (Encoding.Unicode) to store strings in memory, but Windows actually uses 2 other non-UTF8 defaults: ANSI codepage (for GUI apps before .NET) and OEM codepage (aka DOS standard). These differs from country to country (for instance, Windows Czech edition uses CP1250 and CP852) and are oftentimes hardcoded in windows API libraries. So if you just set UTF-8 to console by chcp 65001 (as .NET implicitly does and pretends it is the default) and run some localized command (like ping), it works in English version, but you get tofu text in Czech Republic.

让我分享一下我的真实经验:我为教师创建了定制git脚本的WinForms应用程序。输出是由微软描述为(我添加的粗体文本)的进程在后台任意地获得的:

在本文中,“shell”一词(UseShellExecute)指的是一个图形shell(类似于Windows shell, ANSI CP)而不是命令shell(例如bash或sh, OEM CP),允许用户在非美国环境中启动图形应用程序或打开输出混乱的文档。

So effectively GUI defaults to UTF-8, process defaults to CP1250 and console defaults to 852. So the output is in 852 interpreted as UTF-8 interpreted as CP1250. I got tofu text from which I could not deduce the original codepage due to the double conversion. I was pulling my hair for a week to figure out to explicitly set UTF-8 for process script and convert the output from CP1250 to UTF-8 in the main thread. Now it works here in the Eastern Europe, but Western Europe Windows uses 1252. ANSI CP is not determined easily as many commands like systeminfo are also localized and other methods differs from version to version: in such environment displaying national characters reliably is almost unfeasible.

因此,在21世纪中叶之前,请不要使用任何“默认代码页”并显式设置它(如果可能的话,设置为UTF-8或UTF-16LE)。

这个问题已经回答过很多次了,但是随着c# 7.2和Span类型的引入,在不安全的代码中有一种更快的方法来做到这一点:

public static class StringSupport
{
    private static readonly int _charSize = sizeof(char);

    public static unsafe byte[] GetBytes(string str)
    {
        if (str == null) throw new ArgumentNullException(nameof(str));
        if (str.Length == 0) return new byte[0];

        fixed (char* p = str)
        {
            return new Span<byte>(p, str.Length * _charSize).ToArray();
        }
    }

    public static unsafe string GetString(byte[] bytes)
    {
        if (bytes == null) throw new ArgumentNullException(nameof(bytes));
        if (bytes.Length % _charSize != 0) throw new ArgumentException($"Invalid {nameof(bytes)} length");
        if (bytes.Length == 0) return string.Empty;

        fixed (byte* p = bytes)
        {
            return new string(new Span<char>(p, bytes.Length / _charSize));
        }
    }
}

请记住,字节表示UTF-16编码的字符串(在c#领域称为“Unicode”)。

一些快速的基准测试表明,对于中等大小的字符串(30-50个字符),上述方法大约比它们的Encoding.Unicode.GetBytes(…)/GetString(…)实现快5倍,对于较大的字符串甚至更快。这些方法似乎也比使用Marshal.Copy(..)或Buffer.MemoryCopy(…)的指针更快。