抛弃std::allocator以支持自定义解决方案的一些真正好的理由是什么?您是否遇到过这样的情况:它对于正确性、性能、可伸缩性等来说是绝对必要的?有什么聪明的例子吗?

自定义分配器一直是标准库的一个特性,但我并不太需要它。我只是想知道是否有人能提供一些令人信服的例子来证明他们的存在。


当前回答

前一段时间我发现这个解决方案对我非常有用:STL容器的快速c++ 11分配器。它略微加快了VS2017上的STL容器(~5倍)以及GCC上的STL容器(~7倍)。它是一种基于内存池的特殊用途的分配器。它可以与STL容器一起使用,这多亏了您所要求的机制。

其他回答

One example of I time I have used these was working with very resource constrained embedded systems. Lets say you have 2k of ram free and your program has to use some of that memory. You need to store say 4-5 sequences somewhere that's not on the stack and additionally you need to have very precise access over where these things get stored, this is a situation where you might want to write your own allocator. The default implementations can fragment the memory, this might be unacceptable if you don't have enough memory and cannot restart your program.

One project I was working on was using AVR-GCC on some low powered chips. We had to store 8 sequences of variable length but with a known maximum. The standard library implementation of the memory management is a thin wrapper around malloc/free which keeps track of where to place items with by prepending every allocated block of memory with a pointer to just past the end of that allocated piece of memory. When allocating a new piece of memory the standard allocator has to walk over each of the pieces of memory to find the next block that is available where the requested size of memory will fit. On a desktop platform this would be very fast for this few items but you have to keep in mind that some of these microcontrollers are very slow and primitive in comparison. Additionally the memory fragmentation issue was a massive problem that meant we really had no choice but to take a different approach.

So what we did was to implement our own memory pool. Each block of memory was big enough to fit the largest sequence we would need in it. This allocated fixed sized blocks of memory ahead of time and marked which blocks of memory were currently in use. We did this by keeping one 8 bit integer where each bit represented if a certain block was used. We traded off memory usage here for attempting to make the whole process faster, which in our case was justified as we were pushing this microcontroller chip close to it's maximum processing capacity.

在嵌入式系统上下文中,我还可以看到编写自己的自定义分配器的其他情况,例如,如果序列的内存不在主ram中,而在这些平台上可能经常出现这种情况。

我正在研究一个mmap-分配器,它允许向量使用内存 内存映射文件。我们的目标是让向量使用这样的存储 直接在由mmap映射的虚拟内存中。我们的问题是 提高真正大的文件(>10GB)的读取到内存,而不复制 开销,因此我需要这个自定义分配器。

到目前为止,我已经有了一个自定义分配器的骨架 (它来源于std::allocator),我认为这是一个很好的开始 指向写自己的分配器。请随意使用这段代码 以任何你想要的方式:

#include <memory>
#include <stdio.h>

namespace mmap_allocator_namespace
{
        // See StackOverflow replies to this answer for important commentary about inheriting from std::allocator before replicating this code.
        template <typename T>
        class mmap_allocator: public std::allocator<T>
        {
public:
                typedef size_t size_type;
                typedef T* pointer;
                typedef const T* const_pointer;

                template<typename _Tp1>
                struct rebind
                {
                        typedef mmap_allocator<_Tp1> other;
                };

                pointer allocate(size_type n, const void *hint=0)
                {
                        fprintf(stderr, "Alloc %d bytes.\n", n*sizeof(T));
                        return std::allocator<T>::allocate(n, hint);
                }

                void deallocate(pointer p, size_type n)
                {
                        fprintf(stderr, "Dealloc %d bytes (%p).\n", n*sizeof(T), p);
                        return std::allocator<T>::deallocate(p, n);
                }

                mmap_allocator() throw(): std::allocator<T>() { fprintf(stderr, "Hello allocator!\n"); }
                mmap_allocator(const mmap_allocator &a) throw(): std::allocator<T>(a) { }
                template <class U>                    
                mmap_allocator(const mmap_allocator<U> &a) throw(): std::allocator<T>(a) { }
                ~mmap_allocator() throw() { }
        };
}

为了使用它,像下面这样声明一个STL容器:

using namespace std;
using namespace mmap_allocator_namespace;

vector<int, mmap_allocator<int> > int_vec(1024, 0, mmap_allocator<int>());

例如,每当分配内存时,就可以使用它记录日志。什么是必要的 是重新绑定结构,否则向量容器使用超类分配/释放 方法。

更新:内存映射分配器现在可以在https://github.com/johannesthoma/mmap_allocator上获得,并且是LGPL。您可以在项目中使用它。

前一段时间我发现这个解决方案对我非常有用:STL容器的快速c++ 11分配器。它略微加快了VS2017上的STL容器(~5倍)以及GCC上的STL容器(~7倍)。它是一种基于内存池的特殊用途的分配器。它可以与STL容器一起使用,这多亏了您所要求的机制。

对于共享内存来说,不仅容器头存储在共享内存中,而且容器头包含的数据也存储在共享内存中,这一点至关重要。

Boost::Interprocess的分配器就是一个很好的例子。然而,正如你在这里读到的,这个allone是不够的,要使所有STL容器共享内存兼容(由于不同进程中的映射偏移量不同,指针可能会“中断”)。

I personally use Loki::Allocator / SmallObject to optimize memory usage for small objects — it show good efficiency and satisfying performance if you have to work with moderate amounts of really small objects (1 to 256 bytes). It can be up to ~30 times more efficient than standard C++ new/delete allocation if we talk about allocating moderate amounts of small objects of many different sizes. Also, there's a VC-specific solution called "QuickHeap", it brings best possible performance (allocate and deallocate operations just read and write the address of the block being allocated/returned to heap, respectively in up to 99.(9)% cases — depends on settings and initialization), but at a cost of a notable overhead — it needs two pointers per extent and one extra for each new memory block. It's a fastest possible solution for working with huge (10 000++) amounts of objects being created and deleted if you don't need a big variety of object sizes (it creates an individual pool for each object size, from 1 to 1023 bytes in current implementation, so initialization costs may belittle the overall performance boost, but one can go ahead and allocate/deallocate some dummy objects before the application enters it's performance-critical phase(s)).

标准的c++ new/delete实现的问题是,它通常只是C malloc/free分配的包装器,它适用于较大的内存块,比如1024+字节。它在性能方面有显著的开销,有时还会占用额外的内存用于映射。因此,在大多数情况下,自定义分配器的实现方式是最大化性能和/或最小化分配小对象(≤1024字节)所需的额外内存量。