我一直认为std::vector是“作为数组实现的”,等等等等。今天我去测试了一下,结果似乎不是这样:
以下是一些测试结果:
UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds
这大约要慢3 - 4倍!这并不能证明“向量可能会慢几纳秒”的评论是正确的。
我使用的代码是:
#include <cstdlib>
#include <vector>
#include <iostream>
#include <string>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>
class TestTimer
{
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
{
}
~TestTimer()
{
using namespace std;
using namespace boost;
posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;
cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}
private:
std::string name;
boost::posix_time::ptime start;
};
struct Pixel
{
Pixel()
{
}
Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
{
}
unsigned char r, g, b;
};
void UseVector()
{
TestTimer t("UseVector");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}
void UseVectorPushBack()
{
TestTimer t("UseVectorPushBack");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
}
}
void UseArray()
{
TestTimer t("UseArray");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);
for(int i = 0 ; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
free(pixels);
}
}
int main()
{
TestTimer t1("The whole thing");
UseArray();
UseVector();
UseVectorPushBack();
return 0;
}
我做错了吗?还是我刚刚打破了这个性能神话?
我使用Visual Studio 2005中的发布模式。
在Visual c++中,#define _SECURE_SCL 0将UseVector减少了一半(减少到4秒)。在我看来,这真的是件大事。
我不得不说我不是c++方面的专家。但要补充一些实验结果:
编译:
gcc-6.2.0/bin/g++ -O3 -std=c++14 vector.cpp
机:
Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz
OS:
2.6.32-642.13.1.el6.x86_64
输出:
UseArray completed in 0.167821 seconds
UseVector completed in 0.134402 seconds
UseConstructor completed in 0.134806 seconds
UseFillConstructor completed in 1.00279 seconds
UseVectorPushBack completed in 6.6887 seconds
The whole thing completed in 8.12888 seconds
这里我唯一感到奇怪的是“UseFillConstructor”的性能与“UseConstructor”相比。
代码:
void UseConstructor()
{
TestTimer t("UseConstructor");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels(dimension*dimension);
for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}
void UseFillConstructor()
{
TestTimer t("UseFillConstructor");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels(dimension*dimension, Pixel(255,0,0));
}
}
因此提供的额外“值”大大降低了性能,我认为这是由于多次调用复制构造函数造成的。但是…
编译:
gcc-6.2.0/bin/g++ -std=c++14 -O vector.cpp
输出:
UseArray completed in 1.02464 seconds
UseVector completed in 1.31056 seconds
UseConstructor completed in 1.47413 seconds
UseFillConstructor completed in 1.01555 seconds
UseVectorPushBack completed in 6.9597 seconds
The whole thing completed in 11.7851 seconds
因此,在这种情况下,gcc优化非常重要,但当一个值作为默认值提供时,它帮不了你太多。这,其实是对我的学费。希望它能帮助新程序员选择哪种矢量初始化格式。
这似乎取决于编译器标志。下面是一个基准代码:
#include <chrono>
#include <cmath>
#include <ctime>
#include <iostream>
#include <vector>
int main(){
int size = 1000000; // reduce this number in case your program crashes
int L = 10;
std::cout << "size=" << size << " L=" << L << std::endl;
{
srand( time(0) );
double * data = new double[size];
double result = 0.;
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
for( int l = 0; l < L; l++ ) {
for( int i = 0; i < size; i++ ) data[i] = rand() % 100;
for( int i = 0; i < size; i++ ) result += data[i] * data[i];
}
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
std::cout << "Calculation result is " << sqrt(result) << "\n";
std::cout << "Duration of C style heap array: " << duration << "ms\n";
delete data;
}
{
srand( 1 + time(0) );
double data[size]; // technically, non-compliant with C++ standard.
double result = 0.;
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
for( int l = 0; l < L; l++ ) {
for( int i = 0; i < size; i++ ) data[i] = rand() % 100;
for( int i = 0; i < size; i++ ) result += data[i] * data[i];
}
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
std::cout << "Calculation result is " << sqrt(result) << "\n";
std::cout << "Duration of C99 style stack array: " << duration << "ms\n";
}
{
srand( 2 + time(0) );
std::vector<double> data( size );
double result = 0.;
std::chrono::steady_clock::time_point start = std::chrono::steady_clock::now();
for( int l = 0; l < L; l++ ) {
for( int i = 0; i < size; i++ ) data[i] = rand() % 100;
for( int i = 0; i < size; i++ ) result += data[i] * data[i];
}
std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::microseconds>(end - start).count();
std::cout << "Calculation result is " << sqrt(result) << "\n";
std::cout << "Duration of std::vector array: " << duration << "ms\n";
}
return 0;
}
不同的优化标志给出不同的答案:
$ g++ -O0 benchmark.cpp
$ ./a.out
size=1000000 L=10
Calculation result is 181182
Duration of C style heap array: 118441ms
Calculation result is 181240
Duration of C99 style stack array: 104920ms
Calculation result is 181210
Duration of std::vector array: 124477ms
$g++ -O3 benchmark.cpp
$ ./a.out
size=1000000 L=10
Calculation result is 181213
Duration of C style heap array: 107803ms
Calculation result is 181198
Duration of C99 style stack array: 87247ms
Calculation result is 181204
Duration of std::vector array: 89083ms
$ g++ -Ofast benchmark.cpp
$ ./a.out
size=1000000 L=10
Calculation result is 181164
Duration of C style heap array: 93530ms
Calculation result is 181179
Duration of C99 style stack array: 80620ms
Calculation result is 181191
Duration of std::vector array: 78830ms
您的确切结果会有所不同,但这在我的机器上是非常典型的。
Martin York的回答让我很困扰,因为他似乎试图掩盖初始化问题。但他将冗余的默认构造确定为性能问题的根源是正确的。
[编辑:Martin的回答不再建议更改默认构造函数。]
对于眼前的问题,你当然可以调用2参数版本的向量<Pixel> ctor:
std::vector<Pixel> pixels(dimension * dimension, Pixel(255, 0, 0));
如果你想用一个常数值初始化,这是一种常见的情况。但更普遍的问题是:如何有效地初始化比常数值更复杂的东西?
为此,您可以使用back_insert_iterator,这是一个迭代器适配器。这里有一个int类型的向量的例子,尽管一般的思想也适用于像素:
#include <iterator>
// Simple functor return a list of squares: 1, 4, 9, 16...
struct squares {
squares() { i = 0; }
int operator()() const { ++i; return i * i; }
private:
int i;
};
...
std::vector<int> v;
v.reserve(someSize); // To make insertions efficient
std::generate_n(std::back_inserter(v), someSize, squares());
或者,您可以使用copy()或transform()来代替generate_n()。
缺点是,构造初始值的逻辑需要移动到一个单独的类中,这比将其放在原位更不方便(尽管c++ 1x中的lambdas使这更好)。此外,我希望这仍然不会像基于malloc()的非stl版本那样快,但我希望它会接近,因为它只对每个元素进行一次构造。
使用以下方法:
g++ -O3 Time.cpp -I <MyBoost> . cfg
. / a.o ut
UseArray完成用时2.196秒
UseVector在4.412秒内完成
UseVectorPushBack在8.017秒内完成
全程用时14.626秒
数组的速度是向量的两倍。
但在更详细地查看代码后,这是预期的;当你遍历向量两次,只遍历数组一次时。注意:当你调整vector的size()时,你不仅是在分配内存,而且还在遍历vector并调用每个成员的构造函数。
稍微重新排列代码,使vector只初始化每个对象一次:
std::vector<Pixel> pixels(dimensions * dimensions, Pixel(255,0,0));
现在再做一次同样的计时:
g++ -O3 Time.cpp -I <MyBoost> . cfg
. / a.o ut
UseVector在2.216秒内完成
vector现在的性能只比数组差一点点。在我看来,这种差异是微不足道的,可能是由一大堆与测试无关的事情造成的。
我也会考虑到,你没有正确初始化/销毁像素对象在UseArrray()方法的构造函数/析构函数都没有被调用(这可能不是这个简单的类的问题,但任何稍微复杂(即指针或指针成员)将导致问题。
GNU's STL (and others), given vector<T>(n), default constructs a prototypal object T() - the compiler will optimise away the empty constructor - but then a copy of whatever garbage happened to be in the memory addresses now reserved for the object is taken by the STL's __uninitialized_fill_n_aux, which loops populating copies of that object as the default values in the vector. So, "my" STL is not looping constructing, but constructing then loop/copying. It's counter intuitive, but I should have remembered as I commented on a recent stackoverflow question about this very point: the construct/copy can be more efficient for reference counted objects etc..
So:
vector<T> x(n);
or
vector<T> x;
x.resize(n);
是-在许多STL实现中-类似于:
T temp;
for (int i = 0; i < n; ++i)
x[i] = temp;
The issue being that the current generation of compiler optimisers don't seem to work from the insight that temp is uninitialised garbage, and fail to optimise out the loop and default copy constructor invocations. You could credibly argue that compilers absolutely shouldn't optimise this away, as a programmer writing the above has a reasonable expectation that all the objects will be identical after the loop, even if garbage (usual caveats about 'identical'/operator== vs memcmp/operator= etc apply). The compiler can't be expected to have any extra insight into the larger context of std::vector<> or the later usage of the data that would suggest this optimisation safe.
这可以与更明显的直接实现形成对比:
for (int i = 0; i < n; ++i)
x[i] = T();
我们可以期待一个编译器优化。
为了更明确地解释vector行为的这一方面,可以考虑:
std::vector<big_reference_counted_object> x(10000);
显然,如果我们创建10000个独立对象,而不是创建10000个引用相同数据的对象,这是一个很大的区别。有一种合理的观点认为,保护普通c++用户不意外地做一些如此昂贵的事情的好处超过了现实世界中难以优化的拷贝构造的非常小的成本。
原始答案(供参考/理解评论):
没有机会。Vector和数组一样快,至少如果你合理地保留空间. ...