我一直认为std::vector是“作为数组实现的”,等等等等。今天我去测试了一下,结果似乎不是这样:
以下是一些测试结果:
UseArray completed in 2.619 seconds
UseVector completed in 9.284 seconds
UseVectorPushBack completed in 14.669 seconds
The whole thing completed in 26.591 seconds
这大约要慢3 - 4倍!这并不能证明“向量可能会慢几纳秒”的评论是正确的。
我使用的代码是:
#include <cstdlib>
#include <vector>
#include <iostream>
#include <string>
#include <boost/date_time/posix_time/ptime.hpp>
#include <boost/date_time/microsec_time_clock.hpp>
class TestTimer
{
public:
TestTimer(const std::string & name) : name(name),
start(boost::date_time::microsec_clock<boost::posix_time::ptime>::local_time())
{
}
~TestTimer()
{
using namespace std;
using namespace boost;
posix_time::ptime now(date_time::microsec_clock<posix_time::ptime>::local_time());
posix_time::time_duration d = now - start;
cout << name << " completed in " << d.total_milliseconds() / 1000.0 <<
" seconds" << endl;
}
private:
std::string name;
boost::posix_time::ptime start;
};
struct Pixel
{
Pixel()
{
}
Pixel(unsigned char r, unsigned char g, unsigned char b) : r(r), g(g), b(b)
{
}
unsigned char r, g, b;
};
void UseVector()
{
TestTimer t("UseVector");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.resize(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
}
}
void UseVectorPushBack()
{
TestTimer t("UseVectorPushBack");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
std::vector<Pixel> pixels;
pixels.reserve(dimension * dimension);
for(int i = 0; i < dimension * dimension; ++i)
pixels.push_back(Pixel(255, 0, 0));
}
}
void UseArray()
{
TestTimer t("UseArray");
for(int i = 0; i < 1000; ++i)
{
int dimension = 999;
Pixel * pixels = (Pixel *)malloc(sizeof(Pixel) * dimension * dimension);
for(int i = 0 ; i < dimension * dimension; ++i)
{
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
free(pixels);
}
}
int main()
{
TestTimer t1("The whole thing");
UseArray();
UseVector();
UseVectorPushBack();
return 0;
}
我做错了吗?还是我刚刚打破了这个性能神话?
我使用Visual Studio 2005中的发布模式。
在Visual c++中,#define _SECURE_SCL 0将UseVector减少了一半(减少到4秒)。在我看来,这真的是件大事。
我只是想提一下vector(和smart_ptr)只是原始数组(和原始指针)上的一个薄层。
实际上在连续存储器中向量的访问时间比数组快。
下面的代码显示了初始化和访问向量和数组的结果。
#include <boost/date_time/posix_time/posix_time.hpp>
#include <iostream>
#include <vector>
#define SIZE 20000
int main() {
srand (time(NULL));
vector<vector<int>> vector2d;
vector2d.reserve(SIZE);
int index(0);
boost::posix_time::ptime start_total = boost::posix_time::microsec_clock::local_time();
// timer start - build + access
for (int i = 0; i < SIZE; i++) {
vector2d.push_back(vector<int>(SIZE));
}
boost::posix_time::ptime start_access = boost::posix_time::microsec_clock::local_time();
// timer start - access
for (int i = 0; i < SIZE; i++) {
index = rand()%SIZE;
for (int j = 0; j < SIZE; j++) {
vector2d[index][index]++;
}
}
boost::posix_time::ptime end = boost::posix_time::microsec_clock::local_time();
boost::posix_time::time_duration msdiff = end - start_total;
cout << "Vector total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
msdiff = end - start_acess;
cout << "Vector access time: " << msdiff.total_milliseconds() << "milliseconds.\n";
int index(0);
int** raw2d = nullptr;
raw2d = new int*[SIZE];
start_total = boost::posix_time::microsec_clock::local_time();
// timer start - build + access
for (int i = 0; i < SIZE; i++) {
raw2d[i] = new int[SIZE];
}
start_access = boost::posix_time::microsec_clock::local_time();
// timer start - access
for (int i = 0; i < SIZE; i++) {
index = rand()%SIZE;
for (int j = 0; j < SIZE; j++) {
raw2d[index][index]++;
}
}
end = boost::posix_time::microsec_clock::local_time();
msdiff = end - start_total;
cout << "Array total time: " << msdiff.total_milliseconds() << "milliseconds.\n";
msdiff = end - start_acess;
cout << "Array access time: " << msdiff.total_milliseconds() << "milliseconds.\n";
for (int i = 0; i < SIZE; i++) {
delete [] raw2d[i];
}
return 0;
}
输出结果为:
Vector total time: 925milliseconds.
Vector access time: 4milliseconds.
Array total time: 30milliseconds.
Array access time: 21milliseconds.
所以如果使用得当,速度几乎是一样的。
(正如其他人提到的使用reserve()或resize())。
这是一个古老而流行的问题。
在这一点上,许多程序员将使用c++ 11。在c++ 11中,OP的代码对于UseArray或UseVector运行得同样快。
UseVector completed in 3.74482 seconds
UseArray completed in 3.70414 seconds
基本的问题是,当你的像素结构未初始化时,std::vector<T>::resize(size_t, T const&=T())接受一个默认构造的像素并复制它。编译器没有注意到它被要求复制未初始化的数据,所以它实际执行了复制。
在c++ 11中,std::vector<T>::resize有两个重载。第一个是std::vector<T>::resize(size_t),另一个是std::vector<T>::resize(size_t, T const&)。这意味着当调用resize而不带第二个参数时,它只是默认构造,而编译器足够聪明,可以意识到默认构造什么也不做,因此它跳过了缓冲区的传递。
(添加这两个重载是为了处理可移动、可构造和不可复制类型——处理未初始化数据时的性能提升是一个额外的好处)。
push_back解决方案还执行fencepost检查,这降低了它的速度,因此它仍然比malloc版本慢。
现场示例(我还用chrono::high_resolution_clock替换了计时器)。
注意,如果你有一个通常需要初始化的结构,但你想在增加缓冲区后处理它,你可以使用自定义std::vector分配器来做到这一点。如果你想把它移动到一个更正常的std::vector,我相信仔细使用allocator_traits和重写==可能会成功,但我不确定。
一些分析器数据(像素对齐为32位):
g++ -msse3 -O3 -ftree-vectorize -g test.cpp -DNDEBUG && ./a.out
UseVector completed in 3.123 seconds
UseArray completed in 1.847 seconds
UseVectorPushBack completed in 9.186 seconds
The whole thing completed in 14.159 seconds
Blah
andrey@nv:~$ opannotate --source libcchem/src/a.out | grep "Total samples for file" -A3
Overflow stats not available
* Total samples for file : "/usr/include/c++/4.4/ext/new_allocator.h"
*
* 141008 52.5367
*/
--
* Total samples for file : "/home/andrey/libcchem/src/test.cpp"
*
* 61556 22.9345
*/
--
* Total samples for file : "/usr/include/c++/4.4/bits/stl_vector.h"
*
* 41956 15.6320
*/
--
* Total samples for file : "/usr/include/c++/4.4/bits/stl_uninitialized.h"
*
* 20956 7.8078
*/
--
* Total samples for file : "/usr/include/c++/4.4/bits/stl_construct.h"
*
* 2923 1.0891
*/
在分配器:
: // _GLIBCXX_RESOLVE_LIB_DEFECTS
: // 402. wrong new expression in [some_] allocator::construct
: void
: construct(pointer __p, const _Tp& __val)
141008 52.5367 : { ::new((void *)__p) _Tp(__val); }
向量:
:void UseVector()
:{ /* UseVector() total: 60121 22.3999 */
...
:
:
10790 4.0201 : for (int i = 0; i < dimension * dimension; ++i) {
:
495 0.1844 : pixels[i].r = 255;
:
12618 4.7012 : pixels[i].g = 0;
:
2253 0.8394 : pixels[i].b = 0;
:
: }
数组
:void UseArray()
:{ /* UseArray() total: 35191 13.1114 */
:
...
:
136 0.0507 : for (int i = 0; i < dimension * dimension; ++i) {
:
9897 3.6874 : pixels[i].r = 255;
:
3511 1.3081 : pixels[i].g = 0;
:
21647 8.0652 : pixels[i].b = 0;
大部分开销都在复制构造函数中。例如,
std::vector < Pixel > pixels;//(dimension * dimension, Pixel());
pixels.reserve(dimension * dimension);
for (int i = 0; i < dimension * dimension; ++i) {
pixels[i].r = 255;
pixels[i].g = 0;
pixels[i].b = 0;
}
它具有与数组相同的性能。