我一直认为std::vector是“作为数组实现的”,等等等等。今天我去测试了一下,结果似乎不是这样:

以下是一些测试结果:

UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds

这大约要慢3 - 4倍!这并不能证明“向量可能会慢几纳秒”的评论是正确的。

我使用的代码是:

#include <cstdlib>
#include <vector>

#include <iostream>
#include <string>

#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>

class TestTimer
{
    public:
        TestTimer(const std::string & name) : name(name),
            start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
        {
        }

        ~TestTimer()
        {
            using namespace std;
            using namespace boost;

            posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
            posix_time::time_duration d = now - start;

            cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
                " seconds" << endl;
        }

    private:
        std::string name;
        boost::posix_time::ptime start;
};

struct Pixel
{
    Pixel()
    {
    }

    Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
    {
    }

    unsigned char r, g, b;
};

void UseVector()
{
    TestTimer t("UseVector");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
        pixels.resize(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

void UseVectorPushBack()
{
    TestTimer t("UseVectorPushBack");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        std::vector<Pixel> pixels;
            pixels.reserve(dimension * dimension);

        for(int i = 0; i < dimension * dimension; ++i)
            pixels.push_back(Pixel(255, 0, 0));
    }
}

void UseArray()
{
    TestTimer t("UseArray");

    for(int i = 0; i < 1000; ++i)
    {
        int dimension = 999;

        Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);

        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }

        free(pixels);
    }
}

int main()
{
    TestTimer t1("The whole thing");

    UseArray();
    UseVector();
    UseVectorPushBack();

    return 0;
}

我做错了吗?还是我刚刚打破了这个性能神话?

我使用Visual Studio 2005中的发布模式。


在Visual c++中,#define _SECURE_SCL 0将UseVector减少了一半(减少到4秒)。在我看来,这真的是件大事。


当前回答

向量类还调用Pixel构造函数。

每一种都会导致你在计时时运行近一百万次。

编辑:然后是外层…1000个循环,所以要做十亿次ctor调用!

编辑2:看到UseArray案例的分解会很有趣。优化器可以优化整个事情,因为它除了消耗CPU外没有其他效果。

其他回答

GNU's STL (and others), given vector<T>(n), default constructs a prototypal object T() - the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's __uninitialized_fill_n_aux, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent stackoverflow question about this very point: the construct/copy can be more efficient for reference counted objects etc..

So:

vector<T> x(n);

or

vector<T> x;
x.resize(n);

是-在许多STL实现中-类似于:

T temp;
for (int i = 0; i < n; ++i)
    x[i] = temp;

The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.

这可以与更明显的直接实现形成对比:

for (int i = 0; i < n; ++i)
    x[i] = T();

我们可以期待一个编译器优化。

为了更明确地解释vector行为的这一方面,可以考虑:

std::vector<big_reference_counted_object> x(10000);

显然,如果我们创建10000个独立对象,而不是创建10000个引用相同数据的对象,这是一个很大的区别。有一种合理的观点认为,保护普通c++用户不意外地做一些如此昂贵的事情的好处超过了现实世界中难以优化的拷贝构造的非常小的成本。

原始答案(供参考/理解评论): 没有机会。Vector和数组一样快,至少如果你合理地保留空间. ...

尝试禁用检查迭代器并在发布模式下构建。您应该不会看到太大的性能差异。

公平地说,您不能将c++实现与C实现进行比较,即我所说的malloc版本。Malloc不创建对象——它只分配原始内存。然后不调用构造函数就把内存当作对象,这是拙劣的c++(可能是无效的——我把这个问题留给语言律师吧)。

也就是说,简单地将malloc更改为新的Pixel[维度*维度]并自由删除[]个像素,这与您所拥有的Pixel的简单实现没有太大区别。下面是我的盒子(E6600, 64位)上的结果:

UseArray completed in 0.269 seconds
UseVector completed in 1.665 seconds
UseVectorPushBack completed in 7.309 seconds
The whole thing completed in 9.244 seconds

但随着一个微小的变化,情况发生了变化:

Pixel.h

struct Pixel
{
    Pixel();
    Pixel(unsigned char r, unsigned char g, unsigned char b);

    unsigned char r, g, b;
};

Pixel.cc

#include "Pixel.h"

Pixel::Pixel() {}
Pixel::Pixel(unsigned char r, unsigned char g, unsigned char b) 
  : r(r), g(g), b(b) {}

main.cc

#include "Pixel.h"
[rest of test harness without class Pixel]
[UseArray now uses new/delete not malloc/free]

编译如下:

$ g++ -O3 -c -o Pixel.o Pixel.cc
$ g++ -O3 -c -o main.o main.cc
$ g++ -o main main.o Pixel.o

我们得到了非常不同的结果:

UseArray completed in 2.78 seconds
UseVector completed in 1.651 seconds
UseVectorPushBack completed in 7.826 seconds
The whole thing completed in 12.258 seconds

使用Pixel的非内联构造函数,std::vector现在可以击败原始数组。

通过std::vector和std:allocator进行分配的复杂性似乎太多了,无法像简单的新Pixel[n]那样有效地进行优化。然而,我们可以看到问题仅仅是分配而不是vector访问,通过调整几个测试函数来创建vector/数组,将其移到循环之外:

void UseVector()
{
    TestTimer t("UseVector");

    int dimension = 999;
    std::vector<Pixel> pixels;
    pixels.resize(dimension * dimension);

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
}

and

void UseArray()
{
    TestTimer t("UseArray");

    int dimension = 999;
    Pixel * pixels = new Pixel[dimension * dimension];

    for(int i = 0; i < 1000; ++i)
    {
        for(int i = 0 ; i < dimension * dimension; ++i)
        {
            pixels[i].r = 255;
            pixels[i].g = 0;
            pixels[i].b = 0;
        }
    }
    delete [] pixels;
}

我们现在得到这些结果:

UseArray completed in 0.254 seconds
UseVector completed in 0.249 seconds
UseVectorPushBack completed in 7.298 seconds
The whole thing completed in 7.802 seconds

从这里我们可以了解到std::vector可以与原始数组进行访问,但是如果你需要多次创建和删除vector/数组,在元素的构造函数没有内联的情况下,创建一个复杂的对象将比创建一个简单的数组花费更多的时间。我不认为这很令人惊讶。

顺便说一下,你在使用vector的类中看到的减速也发生在标准类型中,比如int。这是一个多线程代码:

#include <iostream>
#include <cstdio>
#include <map>
#include <string>
#include <typeinfo>
#include <vector>
#include <pthread.h>
#include <sstream>
#include <fstream>
using namespace std;

//pthread_mutex_t map_mutex=PTHREAD_MUTEX_INITIALIZER;

long long num=500000000;
int procs=1;

struct iterate
{
    int id;
    int num;
    void * member;
    iterate(int a, int b, void *c) : id(a), num(b), member(c) {}
};

//fill out viterate and piterate
void * viterate(void * input)
{
    printf("am in viterate\n");
    iterate * info=static_cast<iterate *> (input);
    // reproduce member type
    vector<int> test= *static_cast<vector<int>*> (info->member);
    for (int i=info->id; i<test.size(); i+=info->num)
    {
        //printf("am in viterate loop\n");
        test[i];
    }
    pthread_exit(NULL);
}

void * piterate(void * input)
{
    printf("am in piterate\n");
    iterate * info=static_cast<iterate *> (input);;
    int * test=static_cast<int *> (info->member);
    for (int i=info->id; i<num; i+=info->num) {
        //printf("am in piterate loop\n");
        test[i];
    }
    pthread_exit(NULL);
}

int main()
{
    cout<<"producing vector of size "<<num<<endl;
    vector<int> vtest(num);
    cout<<"produced  a vector of size "<<vtest.size()<<endl;
    pthread_t thread[procs];

    iterate** it=new iterate*[procs];
    int ans;
    void *status;

    cout<<"begining to thread through the vector\n";
    for (int i=0; i<procs; i++) {
        it[i]=new iterate(i, procs, (void *) &vtest);
    //  ans=pthread_create(&thread[i],NULL,viterate, (void *) it[i]);
    }
    for (int i=0; i<procs; i++) {
        pthread_join(thread[i], &status);
    }
    cout<<"end of threading through the vector";
    //reuse the iterate structures

    cout<<"producing a pointer with size "<<num<<endl;
    int * pint=new int[num];
    cout<<"produced a pointer with size "<<num<<endl;

    cout<<"begining to thread through the pointer\n";
    for (int i=0; i<procs; i++) {
        it[i]->member=&pint;
        ans=pthread_create(&thread[i], NULL, piterate, (void*) it[i]);
    }
    for (int i=0; i<procs; i++) {
        pthread_join(thread[i], &status);
    }
    cout<<"end of threading through the pointer\n";

    //delete structure array for iterate
    for (int i=0; i<procs; i++) {
        delete it[i];
    }
    delete [] it;

    //delete pointer
    delete [] pint;

    cout<<"end of the program"<<endl;
    return 0;
}

代码中的行为表明vector的实例化是代码中最长的部分。一旦你通过瓶颈。其余的代码运行得非常快。无论在多少个线程上运行,这都是正确的。

顺便说一下,忽略那些疯狂的包含数。我一直在使用这段代码来测试一个项目的东西,所以包含的数量不断增长。

好吧,因为vector::resize()比普通内存分配(由malloc)做更多的处理。

尝试在复制构造函数中设置断点(定义它以便可以设置断点!),就会增加处理时间。