std::vector比普通数组慢很多吗?

我一直认为std::vector是“作为数组实现的”，等等等等。今天我去测试了一下，结果似乎不是这样:

以下是一些测试结果:

UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds

这大约要慢3 - 4倍!这并不能证明“向量可能会慢几纳秒”的评论是正确的。

我使用的代码是:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector()
{
    TestTimer t("UseVector");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPushBack");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        free(pixels);
    }
}

int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

我做错了吗?还是我刚刚打破了这个性能神话?

我使用Visual Studio 2005中的发布模式。

在Visual c++中，#define _SECURE_SCL 0将UseVector减少了一半(减少到4秒)。在我看来，这真的是件大事。

当前回答

当我第一次看您的代码时，这很难说是一个公平的比较;我还以为你不是在比较苹果和苹果。所以我想，让构造函数和析构函数在所有测试中都被调用;然后比较。

const size_t dimension = 1000;

void UseArray() {
    TestTimer t("UseArray");
    for(size_t j = 0; j < dimension; ++j) {
        Pixel* pixels = new Pixel[dimension * dimension];
        for(size_t i = 0 ; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
        delete[] pixels;
    }
}

void UseVector() {
    TestTimer t("UseVector");
    for(size_t j = 0; j < dimension; ++j) {
        std::vector<Pixel> pixels(dimension * dimension);
        for(size_t i = 0; i < dimension * dimension; ++i) {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = (unsigned char) (i % 255);
        }
    }
}

int main() {
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();

    return 0;
}

我的想法是，在这样的设置下，它们应该是完全相同的。事实证明，我错了。

UseArray completed in 3.06 seconds
UseVector completed in 4.087 seconds
The whole thing completed in 10.14 seconds

那么为什么会出现30%的性能损失呢?STL的所有内容都在头文件中，因此编译器应该能够理解所需的所有内容。

我的想法是，它是在循环如何初始化默认构造函数的所有值。所以我做了一个测试:

class Tester {
public:
    static int count;
    static int count2;
    Tester() { count++; }
    Tester(const Tester&) { count2++; }
};
int Tester::count = 0;
int Tester::count2 = 0;

int main() {
    std::vector<Tester> myvec(300);
    printf("Default Constructed: %i\nCopy Constructed: %i\n", Tester::count, Tester::count2);

    return 0;
}

结果如我所料:

Default Constructed: 1
Copy Constructed: 300

这显然是减速的根源，因为vector使用复制构造函数从默认构造的对象初始化元素。

这意味着，以下伪操作顺序发生在向量的构造过程中:

Pixel pixel;
for (auto i = 0; i < N; ++i) vector[i] = pixel;

由于编译器创建了隐式复制构造函数，扩展为:

Pixel pixel;
for (auto i = 0; i < N; ++i) {
    vector[i].r = pixel.r;
    vector[i].g = pixel.g;
    vector[i].b = pixel.b;
}

所以默认像素保持未初始化，而其余的初始化使用默认像素的未初始化值。

与New[]/Delete[]相比:

int main() {
    Tester* myvec = new Tester[300];

    printf("Default Constructed: %i\nCopy Constructed:%i\n", Tester::count, Tester::count2);

    delete[] myvec;

    return 0;
}

Default Constructed: 300
Copy Constructed: 0

它们都保留了未初始化的值，并且没有对序列进行两次迭代。

有了这些信息，我们如何进行测试呢?让我们试着重写隐式复制构造函数。

Pixel(const Pixel&) {}

结果呢?

UseArray completed in 2.617 seconds
UseVector completed in 2.682 seconds
The whole thing completed in 5.301 seconds

总而言之，如果你经常制作数百个向量:重新考虑你的算法。

在任何情况下，STL实现都不会因为某些未知的原因而变慢，它只是完全按照你的要求去做;希望你能明白。

2011-09-01 10:34:03

其他回答

试试这个:

void UseVectorCtor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
    }
}

我得到了和数组几乎完全一样的性能。

The thing about vector is that it's a much more general tool than an array. And that means you have to consider how you use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initialize N elements with value X.

当你使用它时，向量和数组一样快。

所以，你还没有打破性能神话。但是你已经证明了只有当你最优地使用向量时它才成立，这也是一个很好的观点。：）

好的一面是，它确实是最简单的用法，但却是最快的。如果您将我的代码片段(一行)与John Kugelman的答案进行对比，其中包含大量的调整和优化，但仍然不能完全消除性能差异，很明显，vector的设计非常巧妙。你不必费尽周折才能得到等于数组的速度。相反，您必须使用最简单的解决方案。

2010-09-08 13:00:45

公平地说，您不能将c++实现与C实现进行比较，即我所说的malloc版本。Malloc不创建对象——它只分配原始内存。然后不调用构造函数就把内存当作对象，这是拙劣的c++(可能是无效的——我把这个问题留给语言律师吧)。

也就是说，简单地将malloc更改为新的Pixel[维度*维度]并自由删除[]个像素，这与您所拥有的Pixel的简单实现没有太大区别。下面是我的盒子(E6600, 64位)上的结果:

UseArray completed in 0.269 seconds
UseVector completed in 1.665 seconds
UseVectorPushBack completed in 7.309 seconds
The whole thing completed in 9.244 seconds

但随着一个微小的变化，情况发生了变化:

Pixel.h

struct Pixel
{
    Pixel();
    Pixel(unsigned char r, unsigned char g, unsigned char b);

    unsigned char r, g, b;
};

Pixel.cc

#include "Pixel.h"

Pixel::Pixel() {}
Pixel::Pixel(unsigned char r, unsigned char g, unsigned char b) 
  : r(r), g(g), b(b) {}

main.cc

#include "Pixel.h"
[rest of test harness without class Pixel]
[UseArray now uses new/delete not malloc/free]

编译如下:

$ g++ -O3 -c -o Pixel.o Pixel.cc
$ g++ -O3 -c -o main.o main.cc
$ g++ -o main main.o Pixel.o

我们得到了非常不同的结果:

UseArray completed in 2.78 seconds
UseVector completed in 1.651 seconds
UseVectorPushBack completed in 7.826 seconds
The whole thing completed in 12.258 seconds

使用Pixel的非内联构造函数，std::vector现在可以击败原始数组。

通过std::vector和std:allocator进行分配的复杂性似乎太多了，无法像简单的新Pixel[n]那样有效地进行优化。然而，我们可以看到问题仅仅是分配而不是vector访问，通过调整几个测试函数来创建vector/数组，将其移到循环之外:

void UseVector()
{
    TestTimer t("UseVector");

    int dimension = 999;
    std::vector<Pixel> pixels;
    pixels.resize(dimension * dimension);

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

and

void UseArray()
{
    TestTimer t("UseArray");

    int dimension = 999;
    Pixel * pixels = new Pixel[dimension * dimension];

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
    delete [] pixels;
}

我们现在得到这些结果:

UseArray completed in 0.254 seconds
UseVector completed in 0.249 seconds
UseVectorPushBack completed in 7.298 seconds
The whole thing completed in 7.802 seconds

从这里我们可以了解到std::vector可以与原始数组进行访问，但是如果你需要多次创建和删除vector/数组，在元素的构造函数没有内联的情况下，创建一个复杂的对象将比创建一个简单的数组花费更多的时间。我不认为这很令人惊讶。

2010-09-08 12:27:18

尝试禁用检查迭代器并在发布模式下构建。您应该不会看到太大的性能差异。

2010-09-08 02:45:46

根据我的经验，有时候，只是有时候，vector<int>可能比int[]慢很多倍。需要记住的一点是，向量的向量与int[][]非常不同。因为元素在内存中可能不是连续的。这意味着你可以在主向量中调整不同向量的大小，但CPU可能无法像int[][]那样缓存元素。

2020-03-25 08:48:28

下面是vector中的push_back方法的工作原理:

vector在初始化时分配X个空间。如下所述，它检查当前底层数组中是否有空间用于该项。它复制push_back调用中的项。

调用push_back X项后:

vector将kX的空间重新分配到第二个数组中。它将第一个数组的项复制到第二个数组。丢弃第一个数组。现在使用第二个数组作为存储，直到它达到kX项。

重复。如果你没有预留空间，它肯定会变慢。更重要的是，如果复制项目的成本很高，那么像这样的“push_back”会让你生吞活剥。

至于向量和数组的区别，我同意其他人的观点。在发布版中运行，打开优化，并放入更多的标志，这样微软的友好人员就不会为你而烦恼了。