

#include <fstream>
const unsigned long long size = 64ULL*1024ULL*1024ULL;
unsigned long long a[size];
int main()
    std::fstream myfile;
    myfile = std::fstream("file.binary", std::ios::out | std::ios::binary);
    //Here would be some error handling
    for(int i = 0; i < 32; ++i){
        //Some calculations to fill a[]
        myfile.write((char*)&a,size*sizeof(unsigned long long));

使用Visual Studio 2010编译,完全优化,在Windows7下运行,该程序最大可达20MB/s左右。真正困扰我的是,Windows可以以150MB/s到200MB/s之间的速度将文件从另一个SSD复制到这个SSD。至少快7倍。这就是为什么我认为我应该能跑得更快。




Smaller buffer size. Writing ~2 MiB at a time might be a good start. On my last laptop, ~512 KiB was the sweet spot, but I haven't tested on my SSD yet. Note: I've noticed that very large buffers tend to decrease performance. I've noticed speed losses with using 16-MiB buffers instead of 512-KiB buffers before. Use _open (or _topen if you want to be Windows-correct) to open the file, then use _write. This will probably avoid a lot of buffering, but it's not certain to. Using Windows-specific functions like CreateFile and WriteFile. That will avoid any buffering in the standard library.




Smaller buffer size. Writing ~2 MiB at a time might be a good start. On my last laptop, ~512 KiB was the sweet spot, but I haven't tested on my SSD yet. Note: I've noticed that very large buffers tend to decrease performance. I've noticed speed losses with using 16-MiB buffers instead of 512-KiB buffers before. Use _open (or _topen if you want to be Windows-correct) to open the file, then use _write. This will probably avoid a lot of buffering, but it's not certain to. Using Windows-specific functions like CreateFile and WriteFile. That will avoid any buffering in the standard library.





------------------------------------------------> (main thread, fills buffers)
------------------------------------------------> (writer thread)

F -填充第一个缓冲区 F -填充第二缓冲区 写入文件的第一个缓冲区 写入第二个缓冲区文件 _ -等待操作完成

当填充缓冲区需要更复杂的计算(因此需要更多时间)时,使用缓冲区交换的这种方法非常有用。 我总是实现一个CSequentialStreamWriter类,它隐藏了异步写入,所以对于最终用户来说,接口只有写入函数。


正在写入最后一个缓冲区。 当您最后一次调用Write函数时,必须确保当前正在被填充的缓冲区也应该写入磁盘。因此CSequentialStreamWriter应该有一个单独的方法,比如Finalize(最后的缓冲区刷新),它应该把最后一部分数据写入磁盘。

错误处理。 当代码开始填充第二个缓冲区时,第一个缓冲区正在另一个线程上写入,但是由于某种原因写入失败了,主线程应该知道这个失败。

------------------------------------------------> (main thread, fills buffers)
------------------------------------------------> (writer thread)


尝试使用open()/write()/close() API调用并试验输出缓冲区的大小。我的意思是不要一次传递整个“多-多-字节”缓冲区,做几次写入(即TotalNumBytes / OutBufferSize)。OutBufferSize可以从4096字节到兆字节。

另一个尝试——使用WinAPI OpenFile/CreateFile并使用这篇MSDN文章来关闭缓冲(FILE_FLAG_NO_BUFFERING)。这篇关于WriteFile()的MSDN文章展示了如何获取驱动器的块大小以了解最佳缓冲区大小。

不管怎样,std::ofstream是一个包装器,可能会阻塞I/O操作。请记住,遍历整个n gb数组也需要一些时间。当您写入一个小缓冲区时,它会更快地到达缓存并工作。


至少MSVC 2015实现在没有设置流缓冲区时一次复制1个字符到输出缓冲区(参见streambuf::xsputn)。所以一定要设置一个流缓冲区(>0)。

使用以下代码,我可以用fstream获得1500MB/s的写入速度(我的M.2 SSD的全速):

#include <iostream>
#include <fstream>
#include <chrono>
#include <memory>
#include <stdio.h>
#ifdef __linux__
#include <unistd.h>
using namespace std;
using namespace std::chrono;
const size_t sz = 512 * 1024 * 1024;
const int numiter = 20;
const size_t bufsize = 1024 * 1024;
int main(int argc, char**argv)
  unique_ptr<char[]> data(new char[sz]);
  unique_ptr<char[]> buf(new char[bufsize]);
  for (size_t p = 0; p < sz; p += 16) {
    memcpy(&data[p], "BINARY.DATA.....", 16);
  int64_t total = 0;
  if (argc < 2 || strcmp(argv[1], "fopen") != 0) {
    cout << "fstream mode\n";
    ofstream myfile("file.binary", ios::out | ios::binary);
    if (!myfile) {
      cerr << "open failed\n"; return 1;
    myfile.rdbuf()->pubsetbuf(buf.get(), bufsize); // IMPORTANT
    for (int i = 0; i < numiter; ++i) {
      auto tm1 = high_resolution_clock::now();
      myfile.write(data.get(), sz);
      if (!myfile)
        cerr << "write failed\n";
      auto tm = (duration_cast<milliseconds>(high_resolution_clock::now() - tm1).count());
      cout << tm << " ms\n";
      total += tm;
  else {
    cout << "fopen mode\n";
    FILE* pFile = fopen("file.binary", "wb");
    if (!pFile) {
      cerr << "open failed\n"; return 1;
    setvbuf(pFile, buf.get(), _IOFBF, bufsize); // NOT important
    auto tm1 = high_resolution_clock::now();
    for (int i = 0; i < numiter; ++i) {
      auto tm1 = high_resolution_clock::now();
      if (fwrite(data.get(), sz, 1, pFile) != 1)
        cerr << "write failed\n";
      auto tm = (duration_cast<milliseconds>(high_resolution_clock::now() - tm1).count());
      cout << tm << " ms\n";
      total += tm;
    auto tm2 = high_resolution_clock::now();
  cout << "Total: " << total << " ms, " << (sz*numiter * 1000 / (1024.0 * 1024 * total)) << " MB/s\n";

我在其他平台(Ubuntu, FreeBSD)上尝试了这段代码,没有注意到I/O率的差异,但CPU使用率的差异约为8:1 (fstream使用了8倍多的CPU)。所以可以想象,如果我有一个更快的磁盘,fstream写速度会比stdio版本慢。