在c#中读取一个大文件到字节数组的最佳方法?

我有一个网络服务器，它将读取大二进制文件(几兆字节)到字节数组。服务器可能同时读取多个文件(不同的页面请求)，因此我正在寻找一种最优化的方式来执行此操作，而不会对CPU造成太多负担。下面的代码足够好吗?

public byte[] FileToByteArray(string fileName)
{
    byte[] buff = null;
    FileStream fs = new FileStream(fileName, 
                                   FileMode.Open, 
                                   FileAccess.Read);
    BinaryReader br = new BinaryReader(fs);
    long numBytes = new FileInfo(fileName).Length;
    buff = br.ReadBytes((int) numBytes);
    return buff;
}

当前回答

简单地将整个内容替换为:

return File.ReadAllBytes(fileName);

但是，如果您关心内存消耗，就不应该将整个文件一次全部读入内存。你应该分块做。

2010-01-08 21:27:08

其他回答

我想说BinaryReader很好，但可以重构成这样，而不是所有那些获取缓冲区长度的代码行:

public byte[] FileToByteArray(string fileName)
{
    byte[] fileData = null;

    using (FileStream fs = File.OpenRead(fileName)) 
    { 
        using (BinaryReader binaryReader = new BinaryReader(fs))
        {
            fileData = binaryReader.ReadBytes((int)fs.Length); 
        }
    }
    return fileData;
}

应该比使用. readallbytes()更好，因为我在包括. readallbytes()在内的顶部响应的评论中看到，其中一个评论者对文件> 600 MB有问题，因为BinaryReader是为这类事情准备的。此外，将它放在using语句中可以确保FileStream和BinaryReader被关闭和销毁。

2016-10-12 00:18:24

Depending on the frequency of operations, the size of the files, and the number of files you're looking at, there are other performance issues to take into consideration. One thing to remember, is that each of your byte arrays will be released at the mercy of the garbage collector. If you're not caching any of that data, you could end up creating a lot of garbage and be losing most of your performance to % Time in GC. If the chunks are larger than 85K, you'll be allocating to the Large Object Heap(LOH) which will require a collection of all generations to free up (this is very expensive, and on a server will stop all execution while it's going on). Additionally, if you have a ton of objects on the LOH, you can end up with LOH fragmentation (the LOH is never compacted) which leads to poor performance and out of memory exceptions. You can recycle the process once you hit a certain point, but I don't know if that's a best practice.

关键是，在以最快的方式将所有字节读入内存之前，你应该考虑应用程序的整个生命周期，否则你可能会以整体性能换取短期性能。

2010-01-08 22:25:19

In case with 'a large file' is meant beyond the 4GB limit, then my following written code logic is appropriate. The key issue to notice is the LONG data type used with the SEEK method. As a LONG is able to point beyond 2^32 data boundaries. In this example, the code is processing first processing the large file in chunks of 1GB, after the large whole 1GB chunks are processed, the left over (<1GB) bytes are processed. I use this code with calculating the CRC of files beyond the 4GB size. (using https://crc32c.machinezoo.com/ for the crc32c calculation in this example)

private uint Crc32CAlgorithmBigCrc(string fileName)
{
    uint hash = 0;
    byte[] buffer = null;
    FileInfo fileInfo = new FileInfo(fileName);
    long fileLength = fileInfo.Length;
    int blockSize = 1024000000;
    decimal div = fileLength / blockSize;
    int blocks = (int)Math.Floor(div);
    int restBytes = (int)(fileLength - (blocks * blockSize));
    long offsetFile = 0;
    uint interHash = 0;
    Crc32CAlgorithm Crc32CAlgorithm = new Crc32CAlgorithm();
    bool firstBlock = true;
    using (FileStream fs = new FileStream(fileName, FileMode.Open, FileAccess.Read))
    {
        buffer = new byte[blockSize];
        using (BinaryReader br = new BinaryReader(fs))
        {
            while (blocks > 0)
            {
                blocks -= 1;
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(blockSize);
                if (firstBlock)
                {
                    firstBlock = false;
                    interHash = Crc32CAlgorithm.Compute(buffer);
                    hash = interHash;
                }
                else
                {
                    hash = Crc32CAlgorithm.Append(interHash, buffer);
                }
                offsetFile += blockSize;
            }
            if (restBytes > 0)
            {
                Array.Resize(ref buffer, restBytes);
                fs.Seek(offsetFile, SeekOrigin.Begin);
                buffer = br.ReadBytes(restBytes);
                hash = Crc32CAlgorithm.Append(interHash, buffer);
            }
            buffer = null;
        }
    }
    //MessageBox.Show(hash.ToString());
    //MessageBox.Show(hash.ToString("X"));
    return hash;
}

2019-04-26 04:16:45

使用c#中的BufferedStream类来提高性能。缓冲区是内存中用于缓存数据的字节块，从而减少对操作系统的调用次数。缓冲区可以提高读写性能。

请参阅下面的代码示例和其他解释: http://msdn.microsoft.com/en-us/library/system.io.bufferedstream.aspx

2010-01-08 21:37:45

概述:如果您的图像被添加为action= embedded资源，则使用GetExecutingAssembly检索jpg资源到流中，然后将流中的二进制数据读入字节数组

   public byte[] GetAImage()
    {
        byte[] bytes=null;
        var assembly = Assembly.GetExecutingAssembly();
        var resourceName = "MYWebApi.Images.X_my_image.jpg";

        using (Stream stream = assembly.GetManifestResourceStream(resourceName))
        {
            bytes = new byte[stream.Length];
            stream.Read(bytes, 0, (int)stream.Length);
        }
        return bytes;

    }

2020-06-15 21:45:07

在c#中读取一个大文件到字节数组的最佳方法?

推荐文章

最新文章

标签