在c++动态内存分配的上下文中,我听说过几次“内存碎片”这个术语。我发现了一些关于如何处理内存碎片的问题,但找不到一个直接处理它本身的问题。所以:

什么是内存碎片? 如何判断内存碎片是否是我的应用程序的一个问题?什么样的程序最可能受到影响? 处理内存碎片的常用方法是什么?

另外:

我听说经常使用动态分配会增加内存碎片。这是真的吗?在c++环境中,我知道所有的标准容器(std::string, std::vector等)都使用动态内存分配。如果在整个程序中使用这些(特别是std::string),内存碎片更可能是一个问题吗? 在stl较多的应用程序中如何处理内存碎片?


当前回答

当你想在堆上添加一项时,会发生的事情是计算机必须搜索空间来容纳该项。这就是为什么动态分配不在内存池上执行或使用池分配程序会“减慢”速度的原因。对于一个沉重的STL应用程序,如果你正在做多线程,有囤积分配器或TBB英特尔版本。

现在,当内存碎片化时,会发生两件事:

There will have to be more searches to find a good space to stick "large" objects. That is, with many small objects scattered about finding a nice contigous chunk of memory could under certain conditions be difficult (these are extreme.) Memory is not some easily read entity. Processors are limited to how much they can hold and where. They do this by swapping pages if an item they need is one place but the current addresses are another. If you are constantly having to swap pages, processing can slow down (again, extreme scenarios where this impacts performance.) See this posting on virtual memory.

其他回答

什么是内存碎片?

Memory fragmentation is the problem of memory becoming unusable even though it is theoretically available. There are two kinds of fragmentation: internal fragmentation is memory that is allocated but cannot be used (e.g. when memory is allocated in 8 byte chunks but the program repeatedly does single allocations when it needs only 4 bytes). external fragmentation is the problem of free memory becoming divided into many small chunks so that large allocation requests cannot be met although there is enough overall free memory.

如何判断内存碎片是否是我的应用程序的一个问题?什么样的程序最可能受到影响?

如果您的程序使用的系统内存远远超过其实际有效负载数据所需的内存(并且您已经排除了内存泄漏),那么内存碎片就是一个问题。

处理内存碎片的常用方法是什么?

使用一个好的内存分配器。在IIRC中,那些使用“最佳匹配”策略的人通常在避免碎片化方面要出色得多,只是速度稍慢一些。然而,事实也表明,对于任何配置策略,都存在病态的最坏情况。幸运的是,对于分配器来说,大多数应用程序的典型分配模式实际上是相对友好的。如果你对细节感兴趣,这里有一堆文件:

Paul R. Wilson, Mark S. Johnstone, Michael Neely and David Boles. Dynamic Storage Allocation: A Survey and Critical Review. In Proceedings of the 1995 International Workshop on Memory Management, Springer Verlag LNCS, 1995 Mark S.Johnstone, Paul R. Wilson. The Memory Fragmentation Problem: Solved? In ACM SIG-PLAN Notices, volume 34 No. 3, pages 26-36, 1999 M.R. Garey, R.L. Graham and J.D. Ullman. Worst-Case analysis of memory allocation algorithms. In Fourth Annual ACM Symposium on the Theory of Computing, 1972

内存碎片是因为请求不同大小的内存块。考虑一个100字节的缓冲区。您请求两个字符,然后是一个整数。现在释放这两个字符,然后请求一个新的整数——但是这个整数不能容纳这两个字符的空间。该内存不能被重用,因为它不在一个足够大的连续块中,无法重新分配。除此之外,还为字符调用了大量分配器开销。

从本质上讲,在大多数系统上,内存只以一定大小的块形式存在。一旦你把这些块分开,它们不能重新连接,直到整个块被释放。这可能导致整个区块都在使用,而实际上只有一小部分区块在使用。

The primary way to reduce heap fragmentation is to make larger, less frequent allocations. In the extreme, you can use a managed heap that is capable of moving objects, at least, within your own code. This completely eliminates the problem - from a memory perspective, anyway. Obviously moving objects and such has a cost. In reality, you only really have a problem if you are allocating very small amounts off the heap often. Using contiguous containers (vector, string, etc) and allocating on the stack as much as humanly possible (always a good idea for performance) is the best way to reduce it. This also increases cache coherence, which makes your application run faster.

您应该记住的是,在一个32位x86桌面系统上,您有一个完整的2GB内存,它被分割成4KB的“页”(非常确定所有x86系统上的页大小是相同的)。您将不得不调用一些omgwtfbbq片段来解决问题。碎片确实是过去的一个问题,因为现代堆对于绝大多数应用程序来说都太大了,而且有一些流行的系统能够承受它,比如托管堆。

假设你有一个“大”(32字节)的空闲内存:

----------------------------------
|                                |
----------------------------------

现在,分配其中的一些(5个分配):

----------------------------------
|aaaabbccccccddeeee              |
----------------------------------

现在,释放前四个分配,但不释放第五个:

----------------------------------
|              eeee              |
----------------------------------

现在,尝试分配16个字节。哦,我不能,尽管有近两倍的免费。

在具有虚拟内存的系统上,碎片并不是您想象的那么大的问题,因为大的分配只需要在虚拟地址空间中连续,而不需要在物理地址空间中连续。所以在我的例子中,如果我有一个页面大小为2字节的虚拟内存,那么我可以毫无问题地分配16字节。物理内存看起来是这样的:

----------------------------------
|ffffffffffffffeeeeff            |
----------------------------------

而虚拟内存(要大得多)可能是这样的:

------------------------------------------------------...
|              eeeeffffffffffffffff                   
------------------------------------------------------...

内存碎片的典型症状是,您试图分配一个大块,但您不能,即使您看起来有足够的空闲内存。另一个可能的后果是进程无法将内存释放回操作系统(因为它从操作系统中分配给malloc等进行细分的每个大块中都有一些剩余的东西,即使每个块的大部分现在都没有使用)。

Tactics to prevent memory fragmentation in C++ work by allocating objects from different areas according to their size and/or their expected lifetime. So if you're going to create a lot of objects and destroy them all together later, allocate them from a memory pool. Any other allocations you do in between them won't be from the pool, hence won't be located in between them in memory, so memory will not be fragmented as a result. Or, if you're going to allocate a lot of objects of the same size then allocate them from the same pool. Then a stretch of free space in the pool can never be smaller than the size you're trying to allocate from that pool.

一般来说,您不需要太担心它,除非您的程序是长时间运行的,并且进行了大量的分配和释放。当您同时拥有短寿命和长寿命对象时,您的风险最大,但即使在这种情况下,malloc也会尽最大努力提供帮助。基本上,忽略它,直到您的程序出现分配失败或意外地导致系统内存不足(在测试中捕获它,这是首选!)。

标准库并不比其他任何分配内存的工具差,标准容器都有一个Alloc模板参数,如果绝对必要,您可以使用它来微调它们的分配策略。

更新: 谷歌TCMalloc:线程缓存Malloc 已经发现它在处理长时间运行进程中的碎片方面相当出色。


我一直在开发一个服务器应用程序,它在HP-UX 11.23/11.31 ia64上存在内存碎片问题。

它是这样的。有一个进程进行内存分配和释放,并运行了几天。即使没有内存泄漏,进程的内存消耗也在不断增加。

About my experience. On HP-UX it is very easy to find memory fragmentation using HP-UX gdb. You set a break-point and when you hit it you run this command: info heap and see all memory allocations for the process and the total size of heap. Then your continue your program and then some time later your again hit the break-point. You do again info heap. If the total size of heap is bigger but the number and the size of separate allocations are the same then it is likely that you have memory allocation problems. If necessary do this check few fore times.

My way of improving the situation was this. After I had done some analysis with HP-UX gdb I saw that memory problems were caused by the fact that I used std::vector for storing some types of information from a database. std::vector requires that its data must be kept in one block. I had a few containers based on std::vector. These containers were regularly recreated. There were often situations when new records were added to the database and after that the containers were recreated. And since the recreated containers were bigger their did not fit into available blocks of free memory and the runtime asked for a new bigger block from the OS. As a result even though there were no memory leaks the memory consumption of the process grew. I improved the situation when I changed the containers. Instead of std::vector I started using std::deque which has a different way of allocating memory for data.

我知道在HP-UX上避免内存碎片的方法之一是使用小块分配器或使用MallocNextGen。在RedHat Linux上,默认的分配器似乎可以很好地处理大量小块的分配。在Windows上有低碎片堆,它解决了大量小分配的问题。

My understanding is that in an STL-heavy application you have first to identify problems. Memory allocators (like in libc) actually handle the problem of a lot of small allocations, which is typical for std::string (for instance in my server application there are lots of STL strings but as I see from running info heap they are not causing any problems). My impression is that you need to avoid frequent large allocations. Unfortunately there are situations when you can't avoid them and have to change your code. As I say in my case I improved the situation when switched to std::deque. If you identify your memory fragmention it might be possible to talk about it more precisely.

这是一个超级简化版的傻瓜。

当对象在内存中创建时,它们被添加到内存中已使用部分的末尾。

如果一个对象不在已使用内存部分的末尾被删除,这意味着这个对象位于其他两个对象之间,它将创建一个“洞”。

这就是所谓的碎片化。