我一直认为std::vector是“作为数组实现的”,等等等等。今天我去测试了一下,结果似乎不是这样:

以下是一些测试结果:

UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds

这大约要慢3 - 4倍!这并不能证明“向量可能会慢几纳秒”的评论是正确的。

我使用的代码是:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector()
{
    TestTimer t("UseVector");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPushBack");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        free(pixels);
    }
}

int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

我做错了吗?还是我刚刚打破了这个性能神话?

我使用Visual Studio 2005中的发布模式。


在Visual c++中,#define _SECURE_SCL 0将UseVector减少了一半(减少到4秒)。在我看来,这真的是件大事。


当前回答

使用以下方法:

g++ -O3 Time.cpp -I <MyBoost> . cfg . / a.o ut UseArray完成用时2.196秒 UseVector在4.412秒内完成 UseVectorPushBack在8.017秒内完成 全程用时14.626秒

数组的速度是向量的两倍。

但在更详细地查看代码后,这是预期的;当你遍历向量两次,只遍历数组一次时。注意:当你调整vector的size()时,你不仅是在分配内存,而且还在遍历vector并调用每个成员的构造函数。

稍微重新排列代码,使vector只初始化每个对象一次:

 std::vector<Pixel>  pixels(dimensions * dimensions, Pixel(255,0,0));

现在再做一次同样的计时:

g++ -O3 Time.cpp -I <MyBoost> . cfg . / a.o ut UseVector在2.216秒内完成

vector现在的性能只比数组差一点点。在我看来,这种差异是微不足道的,可能是由一大堆与测试无关的事情造成的。

我也会考虑到,你没有正确初始化/销毁像素对象在UseArrray()方法的构造函数/析构函数都没有被调用(这可能不是这个简单的类的问题,但任何稍微复杂(即指针或指针成员)将导致问题。

其他回答

GNU's STL (and others), given vector<T>(n), default constructs a prototypal object T() - the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's __uninitialized_fill_n_aux, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent stackoverflow question about this very point: the construct/copy can be more efficient for reference counted objects etc..

So:

vector<T> x(n);

or

vector<T> x;
x.resize(n);

是-在许多STL实现中-类似于:

T temp;
for (int i = 0; i < n; ++i)
    x[i] = temp;

The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.

这可以与更明显的直接实现形成对比:

for (int i = 0; i < n; ++i)
    x[i] = T();

我们可以期待一个编译器优化。

为了更明确地解释vector行为的这一方面,可以考虑:

std::vector<big_reference_counted_object> x(10000);

显然,如果我们创建10000个独立对象,而不是创建10000个引用相同数据的对象,这是一个很大的区别。有一种合理的观点认为,保护普通c++用户不意外地做一些如此昂贵的事情的好处超过了现实世界中难以优化的拷贝构造的非常小的成本。

原始答案(供参考/理解评论): 没有机会。Vector和数组一样快,至少如果你合理地保留空间. ...

我的笔记本电脑是联想G770 (4gb内存)。

操作系统为Windows 7 64位(笔记本电脑版本)

编译器是MinGW 4.6.1。

IDE为Code::Blocks。

我测试了第一篇文章的源代码。

结果

O2优化

UseArray完成用时2.841秒

UseVector在2.548秒内完成

UseVectorPushBack在11.95秒内完成

全程用时17.342秒

系统暂停

O3优化

UseArray完成用时1.452秒

UseVector在2.514秒内完成

UseVectorPushBack在12.967秒内完成

全程用时16.937秒

在O3优化下,向量的性能更差。

如果你把循环改为

    pixels[i].r = i;
    pixels[i].g = i;
    pixels[i].b = i;

在O2和O3下,数组和矢量的速度几乎相同。

试试这个:

void UseVectorCtor()
{
    TestTimer t("UseConstructor");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
    }
}

我得到了和数组几乎完全一样的性能。

The thing about vector is that it's a much more general tool than an array. And that means you have to consider how you use it. It can be used in a lot of different ways, providing functionality that an array doesn't even have. And if you use it "wrong" for your purpose, you incur a lot of overhead, but if you use it correctly, it is usually basically a zero-overhead data structure. In this case, the problem is that you separately initialized the vector (causing all elements to have their default ctor called), and then overwriting each element individually with the correct value. That is much harder for the compiler to optimize away than when you do the same thing with an array. Which is why the vector provides a constructor which lets you do exactly that: initialize N elements with value X.

当你使用它时,向量和数组一样快。

所以,你还没有打破性能神话。但是你已经证明了只有当你最优地使用向量时它才成立,这也是一个很好的观点。:)

好的一面是,它确实是最简单的用法,但却是最快的。如果您将我的代码片段(一行)与John Kugelman的答案进行对比,其中包含大量的调整和优化,但仍然不能完全消除性能差异,很明显,vector的设计非常巧妙。你不必费尽周折才能得到等于数组的速度。相反,您必须使用最简单的解决方案。

我只是想提一下vector(和smart_ptr)只是原始数组(和原始指针)上的一个薄层。 实际上在连续存储器中向量的访问时间比数组快。 下面的代码显示了初始化和访问向量和数组的结果。

#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
#include <vector>
#define SIZE 20000
int main() {
    srand (time(NULL));
    vector<vector<int>> vector2d;
    vector2d.reserve(SIZE);
    int index(0);
    boost::posix_time::ptime start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        vector2d.push_back(vector<int>(SIZE));
    }
    boost::posix_time::ptime start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            vector2d[index][index]++;
        }
    }
    boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();
    boost::posix_time::time_duration msdiff = end - start_total;
    cout << "Vector total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Vector access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 


    int index(0);
    int** raw2d = nullptr;
    raw2d = new int*[SIZE];
    start_total = boost::posix_time::microsec_clock::local_time();
    //  timer start - build + access
    for (int i = 0; i < SIZE; i++) {
        raw2d[i] = new int[SIZE];
    }
    start_access = boost::posix_time::microsec_clock::local_time();
    //  timer start - access
    for (int i = 0; i < SIZE; i++) {
        index = rand()%SIZE;
        for (int j = 0; j < SIZE; j++) {

            raw2d[index][index]++;
        }
    }
    end = boost::posix_time::microsec_clock::local_time();
    msdiff = end - start_total;
    cout << "Array total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
    msdiff = end - start_acess;
    cout << "Array access time: " << msdiff.total_milliseconds() << "milliseconds.\n"; 
    for (int i = 0; i < SIZE; i++) {
        delete [] raw2d[i];
    }
    return 0;
}

输出结果为:

    Vector total time: 925milliseconds.
    Vector access time: 4milliseconds.
    Array total time: 30milliseconds.
    Array access time: 21milliseconds.

所以如果使用得当,速度几乎是一样的。 (正如其他人提到的使用reserve()或resize())。

根据我的经验,有时候,只是有时候,vector<int>可能比int[]慢很多倍。需要记住的一点是,向量的向量与int[][]非常不同。因为元素在内存中可能不是连续的。这意味着你可以在主向量中调整不同向量的大小,但CPU可能无法像int[][]那样缓存元素。