我有一个字节[]数组,从一个文件加载,我碰巧知道包含UTF-8。
在一些调试代码中,我需要将其转换为字符串。是否有一个单行程序可以做到这一点?
在表面之下,它应该只是一个分配和一个memcopy,所以即使没有实现,也应该是可能的。
我有一个字节[]数组,从一个文件加载,我碰巧知道包含UTF-8。
在一些调试代码中,我需要将其转换为字符串。是否有一个单行程序可以做到这一点?
在表面之下,它应该只是一个分配和一个memcopy,所以即使没有实现,也应该是可能的。
至少有四种不同的转换方式。
Encoding's GetString, but you won't be able to get the original bytes back if those bytes have non-ASCII characters. BitConverter.ToString The output is a "-" delimited string, but there's no .NET built-in method to convert the string back to byte array. Convert.ToBase64String You can easily convert the output string back to byte array by using Convert.FromBase64String. Note: The output string could contain '+', '/' and '='. If you want to use the string in a URL, you need to explicitly encode it. HttpServerUtility.UrlTokenEncodeYou can easily convert the output string back to byte array by using HttpServerUtility.UrlTokenDecode. The output string is already URL friendly! The downside is it needs System.Web assembly if your project is not a web project.
完整的例子:
byte[] bytes = { 130, 200, 234, 23 }; // A byte array contains non-ASCII (or non-readable) characters
string s1 = Encoding.UTF8.GetString(bytes); // ���
byte[] decBytes1 = Encoding.UTF8.GetBytes(s1); // decBytes1.Length == 10 !!
// decBytes1 not same as bytes
// Using UTF-8 or other Encoding object will get similar results
string s2 = BitConverter.ToString(bytes); // 82-C8-EA-17
String[] tempAry = s2.Split('-');
byte[] decBytes2 = new byte[tempAry.Length];
for (int i = 0; i < tempAry.Length; i++)
decBytes2[i] = Convert.ToByte(tempAry[i], 16);
// decBytes2 same as bytes
string s3 = Convert.ToBase64String(bytes); // gsjqFw==
byte[] decByte3 = Convert.FromBase64String(s3);
// decByte3 same as bytes
string s4 = HttpServerUtility.UrlTokenEncode(bytes); // gsjqFw2
byte[] decBytes4 = HttpServerUtility.UrlTokenDecode(s4);
// decBytes4 same as bytes
定义:
public static string ConvertByteToString(this byte[] source)
{
return source != null ? System.Text.Encoding.UTF8.GetString(source) : null;
}
使用:
string result = input.ConvertByteToString();
应用(字节)b to弦(“x2”),Outputs b4b5dfe475e58b67
public static class Ext {
public static string ToHexString(this byte[] hex)
{
if (hex == null) return null;
if (hex.Length == 0) return string.Empty;
var s = new StringBuilder();
foreach (byte b in hex) {
s.Append(b.ToString("x2"));
}
return s.ToString();
}
public static byte[] ToHexBytes(this string hex)
{
if (hex == null) return null;
if (hex.Length == 0) return new byte[0];
int l = hex.Length / 2;
var b = new byte[l];
for (int i = 0; i < l; ++i) {
b[i] = Convert.ToByte(hex.Substring(i * 2, 2), 16);
}
return b;
}
public static bool EqualsTo(this byte[] bytes, byte[] bytesToCompare)
{
if (bytes == null && bytesToCompare == null) return true; // ?
if (bytes == null || bytesToCompare == null) return false;
if (object.ReferenceEquals(bytes, bytesToCompare)) return true;
if (bytes.Length != bytesToCompare.Length) return false;
for (int i = 0; i < bytes.Length; ++i) {
if (bytes[i] != bytesToCompare[i]) return false;
}
return true;
}
}
将字节[]转换为字符串似乎很简单,但任何一种编码都有可能把输出字符串弄乱。这个小函数只是工作,没有任何意想不到的结果:
private string ToString(byte[] bytes)
{
string response = string.Empty;
foreach (byte b in bytes)
response += (Char)b;
return response;
}
还有一个类UnicodeEncoding,使用起来非常简单:
ByteConverter = new UnicodeEncoding();
string stringDataForEncoding = "My Secret Data!";
byte[] dataEncoded = ByteConverter.GetBytes(stringDataForEncoding);
Console.WriteLine("Data after decoding: {0}", ByteConverter.GetString(dataEncoded));
当你不知道编码时,从字节数组转换到字符串的一般解决方案:
static string BytesToStringConverted(byte[] bytes)
{
using (var stream = new MemoryStream(bytes))
{
using (var streamReader = new StreamReader(stream))
{
return streamReader.ReadToEnd();
}
}
}
用于将从文件中读取的字节数组byteArrFilename转换为纯ASCII c风格以零结尾的字符串的LINQ一行程序如下:
String filename = new String(byteArrFilename.TakeWhile(x => x != 0)
.Select(x => x < 128 ? (Char)x : '?').ToArray());
我用'?'作为非纯ASCII的默认字符,当然,这是可以改变的。如果您想确保可以检测到它,只需使用'\0',因为开始时的TakeWhile确保以这种方式构建的字符串不可能包含来自输入源的'\0'值。
BitConverter类可用于将字节[]转换为字符串。
var convertedString = BitConverter.ToString(byteAttay);
BitConverter类的文档可以在MSDN上打印。
据我所知,没有一个给出的答案保证正确的行为与空终止。直到有人告诉我不同的,我写了自己的静态类处理以下方法:
// Mimics the functionality of strlen() in c/c++
// Needed because niether StringBuilder or Encoding.*.GetString() handle \0 well
static int StringLength(byte[] buffer, int startIndex = 0)
{
int strlen = 0;
while
(
(startIndex + strlen + 1) < buffer.Length // Make sure incrementing won't break any bounds
&& buffer[startIndex + strlen] != 0 // The typical null terimation check
)
{
++strlen;
}
return strlen;
}
// This is messy, but I haven't found a built-in way in c# that guarentees null termination
public static string ParseBytes(byte[] buffer, out int strlen, int startIndex = 0)
{
strlen = StringLength(buffer, startIndex);
byte[] c_str = new byte[strlen];
Array.Copy(buffer, startIndex, c_str, 0, strlen);
return Encoding.UTF8.GetString(c_str);
}
使用startIndex的原因是在我正在处理的示例中,我需要将byte[]解析为一个以null结尾的字符串数组。在简单的情况下,可以安全地忽略它
这是一个不需要编码的结果。我在我的网络类中使用它,并以字符串的形式发送二进制对象。
public static byte[] String2ByteArray(string str)
{
char[] chars = str.ToArray();
byte[] bytes = new byte[chars.Length * 2];
for (int i = 0; i < chars.Length; i++)
Array.Copy(BitConverter.GetBytes(chars[i]), 0, bytes, i * 2, 2);
return bytes;
}
public static string ByteArray2String(byte[] bytes)
{
char[] chars = new char[bytes.Length / 2];
for (int i = 0; i < chars.Length; i++)
chars[i] = BitConverter.ToChar(bytes, i * 2);
return new string(chars);
}
除了选择的答案,如果你使用。net 3.5或。net 3.5 CE,你必须指定解码的第一个字节的索引,以及解码的字节数:
string result = System.Text.Encoding.UTF8.GetString(byteArray, 0, byteArray.Length);
试试这个控制台应用程序:
static void Main(string[] args)
{
//Encoding _UTF8 = Encoding.UTF8;
string[] _mainString = { "Hello, World!" };
Console.WriteLine("Main String: " + _mainString);
// Convert a string to UTF-8 bytes.
byte[] _utf8Bytes = Encoding.UTF8.GetBytes(_mainString[0]);
// Convert UTF-8 bytes to a string.
string _stringuUnicode = Encoding.UTF8.GetString(_utf8Bytes);
Console.WriteLine("String Unicode: " + _stringuUnicode);
}
我在这篇文章中看到了一些答案,这可能被认为是完整的基础知识,因为我在c#编程中有几种方法来解决相同的问题。唯一需要考虑的是纯UTF-8和带有BOM的UTF-8之间的区别。
Last week, at my job, I needed to develop one functionality that outputs CSV files with a BOM and other CSV files with pure UTF-8 (without a BOM). Each CSV file encoding type will be consumed by different non-standardized APIs. One API reads UTF-8 with a BOM and the other API reads without a BOM. I needed to research the references about this concept, reading the "What's the difference between UTF-8 and UTF-8 without BOM?" Stack Overflow question, and the Wikipedia article "Byte order mark" to build my approach.
最后,我的c#编程的UTF-8编码类型(BOM和纯)需要类似于下面的例子:
// For UTF-8 with BOM, equals shared by Zanoni (at top)
string result = System.Text.Encoding.UTF8.GetString(byteArray);
//for Pure UTF-8 (without B.O.M.)
string result = (new UTF8Encoding(false)).GetString(byteArray);