为什么C++中从stdin读取行比Python慢得多？

我想比较使用Python和C++从stdin读取字符串输入的行数，看到我的C++代码运行速度比等效的Python代码慢了一个数量级，我很震惊。由于我的C++已经过时了，而且我还不是一个Pythonista专家，请告诉我我是不是做错了什么，或者我是不是误解了什么。

（TLDR答案：包含语句：cin.sync_with_stdio（false）或只使用fgets。

TLDR结果：一直向下滚动到问题的底部，然后查看表格。）

C++代码：

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds.";
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

Python等效：

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

以下是我的结果：

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

$ cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

我应该注意到，我在Mac OS X v10.6.8（雪豹）和Linux 2.6.32（Red Hat Linux 6.2）下都尝试了这一点。前者是MacBook Pro，后者是一个非常强大的服务器，但这并不太重要。

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done

Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000

微小的基准附录和概述

为了完整起见，我想用原始（同步）C++代码更新同一个盒子上同一文件的读取速度。同样，这是针对快速磁盘上的100M行文件。以下是几种解决方案/方法的比较：

Implementation	Lines per second
python (default)	3,571,428
cin (default/naive)	819,672
cin (no sync)	12,500,000
fgets	14,285,714
wc (not fair comparison)	54,644,808

当前回答

出于好奇，我观察了引擎盖下的情况，并在每次测试中使用了dtruss/strace。

C++

./a.out < in
Saw 6512403 lines in 8 seconds.  Crunch speed: 814050

系统调用sudo dtruss-c/a.输出<输入

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            6
pread                                           8
mprotect                                       17
mmap                                           22
stat64                                         30
read_nocancel                               25958

蟒蛇

./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402

系统调用sudo dtruss-c/a.py<英寸

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            5
pread                                           8
mprotect                                       17
mmap                                           21
stat64                                         29

2012-03-11 18:10:16

其他回答

顺便说一句，C++版本的行计数比Python版本的行数大一倍的原因是，只有当尝试读取超过eof时，才会设置eof标志。因此，正确的循环应该是：

while (cin) {
    getline(cin, input_line);

    if (!cin.eof())
        line_count++;
};

2012-03-11 16:37:21

tl；dr：因为C++中不同的默认设置需要更多的系统调用。

默认情况下，cin与stdio同步，从而避免任何输入缓冲。如果您将其添加到主菜单的顶部，您将看到更好的性能：

std::ios_base::sync_with_stdio(false);

通常，当一个输入流被缓冲时，不是一次读取一个字符，而是以更大的块读取该流。这减少了系统调用的数量，而系统调用通常相对昂贵。然而，由于基于FILE*的stdio和iostream通常有单独的实现，因此也有单独的缓冲区，如果两者同时使用，这可能会导致问题。例如：

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

如果cin读取的输入比实际需要的要多，那么第二个整数值就不能用于scanf函数，因为scanf函数有自己的独立缓冲区。这将导致意想不到的结果。

为了避免这种情况，默认情况下，流与stdio同步。实现这一点的一种常见方法是让cin根据需要使用stdio函数一次读取一个字符。不幸的是，这会带来很多开销。对于少量的输入，这不是一个大问题，但当您阅读数百万行时，性能损失非常大。

幸运的是，库设计人员决定，如果您知道自己在做什么，也可以禁用此功能以提高性能，因此他们提供了sync_with_stdio方法。从该链接（添加强调）：

如果同步被关闭，则允许C++标准流独立地缓冲其I/O，在某些情况下这可能会快得多。

2012-02-21 03:24:19

我在Mac上使用g++在电脑上再现了原始结果。

在while循环之前向C++版本添加以下语句，使其与Python版本内联：

std::ios_base::sync_with_stdio(false);
char buffer[1048576];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

sync_with_studio将速度提高到2秒，设置更大的缓冲区将速度降低到1秒。

2012-02-21 03:33:36

出于好奇，我观察了引擎盖下的情况，并在每次测试中使用了dtruss/strace。

C++

./a.out < in
Saw 6512403 lines in 8 seconds.  Crunch speed: 814050

系统调用sudo dtruss-c/a.输出<输入

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            6
pread                                           8
mprotect                                       17
mmap                                           22
stat64                                         30
read_nocancel                               25958

蟒蛇

./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402

系统调用sudo dtruss-c/a.py<英寸

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            5
pread                                           8
mprotect                                       17
mmap                                           21
stat64                                         29

2012-03-11 18:10:16

答案的第一个元素：＜iostream＞是慢的。他妈的慢。我使用scanf获得了巨大的性能提升，如下所示，但它仍然比Python慢两倍。

#include <iostream>
#include <time.h>
#include <cstdio>

using namespace std;

int main() {
    char buffer[10000];
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    int read = 1;
    while(read > 0) {
        read = scanf("%s", buffer);
        line_count++;
    };
    sec = (int) time(NULL) - start;
    line_count--;
    cerr << "Saw " << line_count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = line_count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } 
    else
        cerr << endl;
    return 0;
}

2012-02-21 03:17:34

为什么C++中从stdin读取行比Python慢得多？

推荐文章

最新文章

标签