根据Linux程序员手册:
Brk()和sbrk()改变程序中断的位置 定义进程数据段的结束。
这里的数据段是什么意思?它只是数据段还是数据、BSS和堆的组合?
根据维基数据段:
有时数据、BSS和堆区域被统称为“数据段”。
我认为没有理由只改变数据段的大小。如果它是数据,BSS和堆,那么它是有意义的,因为堆将获得更多的空间。
这就引出了我第二个问题。到目前为止,在我读过的所有文章中,作者都说堆向上增长,堆栈向下增长。但是他们没有解释当堆占用了堆和栈之间的所有空间时会发生什么?
根据Linux程序员手册:
Brk()和sbrk()改变程序中断的位置 定义进程数据段的结束。
这里的数据段是什么意思?它只是数据段还是数据、BSS和堆的组合?
根据维基数据段:
有时数据、BSS和堆区域被统称为“数据段”。
我认为没有理由只改变数据段的大小。如果它是数据,BSS和堆,那么它是有意义的,因为堆将获得更多的空间。
这就引出了我第二个问题。到目前为止,在我读过的所有文章中,作者都说堆向上增长,堆栈向下增长。但是他们没有解释当堆占用了堆和栈之间的所有空间时会发生什么?
当前回答
在您发布的图表中,“break”(由brk和sbrk操作的地址)是堆顶部的虚线。
The documentation you've read describes this as the end of the "data segment" because in traditional (pre-shared-libraries, pre-mmap) Unix the data segment was continuous with the heap; before program start, the kernel would load the "text" and "data" blocks into RAM starting at address zero (actually a little above address zero, so that the NULL pointer genuinely didn't point to anything) and set the break address to the end of the data segment. The first call to malloc would then use sbrk to move the break up and create the heap in between the top of the data segment and the new, higher break address, as shown in the diagram, and subsequent use of malloc would use it to make the heap bigger as necessary.
Meantime, the stack starts at the top of memory and grows down. The stack doesn't need explicit system calls to make it bigger; either it starts off with as much RAM allocated to it as it can ever have (this was the traditional approach) or there is a region of reserved addresses below the stack, to which the kernel automatically allocates RAM when it notices an attempt to write there (this is the modern approach). Either way, there may or may not be a "guard" region at the bottom of the address space that can be used for stack. If this region exists (all modern systems do this) it is permanently unmapped; if either the stack or the heap tries to grow into it, you get a segmentation fault. Traditionally, though, the kernel made no attempt to enforce a boundary; the stack could grow into the heap, or the heap could grow into the stack, and either way they would scribble over each other's data and the program would crash. If you were very lucky it would crash immediately.
我不确定这个图中的512GB这个数字是从哪里来的。它意味着一个64位的虚拟地址空间,这与您所拥有的非常简单的内存映射不一致。真正的64位地址空间是这样的:
Legend: t: text, d: data, b: BSS
This is not remotely to scale, and it shouldn't be interpreted as exactly how any given OS does stuff (after I drew it I discovered that Linux actually puts the executable much closer to address zero than I thought it did, and the shared libraries at surprisingly high addresses). The black regions of this diagram are unmapped -- any access causes an immediate segfault -- and they are gigantic relative to the gray areas. The light-gray regions are the program and its shared libraries (there can be dozens of shared libraries); each has an independent text and data segment (and "bss" segment, which also contains global data but is initialized to all-bits-zero rather than taking up space in the executable or library on disk). The heap is no longer necessarily continous with the executable's data segment -- I drew it that way, but it looks like Linux, at least, doesn't do that. The stack is no longer pegged to the top of the virtual address space, and the distance between the heap and the stack is so enormous that you don't have to worry about crossing it.
断点仍然是堆的上限。然而,我没有展示的是,在黑色的某个地方,可能有几十个独立的内存分配,使用mmap而不是brk。(操作系统会尽量让它们远离brk区域,这样它们就不会碰撞。)
其他回答
There is a special designated anonymous private memory mapping (traditionally located just beyond the data/bss, but modern Linux will actually adjust the location with ASLR). In principle it's no better than any other mapping you could create with mmap, but Linux has some optimizations that make it possible to expand the end of this mapping (using the brk syscall) upwards with reduced locking cost relative to what mmap or mremap would incur. This makes it attractive for malloc implementations to use when implementing the main heap.
数据段是内存中保存所有静态数据的部分,在启动时从可执行文件读取,通常为零填充。
您可以自己使用brk和sbrk来避免每个人都在抱怨的“malloc开销”。但是您不能轻易地将此方法与malloc结合使用,因此它仅适用于不需要释放任何东西的情况。因为你不能。另外,应该避免任何可能在内部使用malloc的库调用。Ie。斯特伦可能安全,但fopen可能就不安全了。
调用sbrk就像调用malloc一样。它返回一个指向当前断点的指针,并使该断点增加该值。
void *myallocate(int n){
return sbrk(n);
}
虽然不能释放单个的分配(因为没有malloc开销,记住),但是可以通过使用第一次调用sbrk返回的值调用brk来释放整个空间,从而使brk返回。
void *memorypool;
void initmemorypool(void){
memorypool = sbrk(0);
}
void resetmemorypool(void){
brk(memorypool);
}
您甚至可以堆叠这些区域,通过将换行符倒回到区域的开始位置来丢弃最近的区域。
还有一件事……
SBRK在code golf中也很有用,因为它比malloc短2个字符。
最小可运行示例
brk()系统调用做什么?
请求内核允许您对称为堆的连续内存块进行读写。
如果你不问,它可能会指责你。
没有brk:
#define _GNU_SOURCE
#include <unistd.h>
int main(void) {
/* Get the first address beyond the end of the heap. */
void *b = sbrk(0);
int *p = (int *)b;
/* May segfault because it is outside of the heap. */
*p = 1;
return 0;
}
brk:
#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>
int main(void) {
void *b = sbrk(0);
int *p = (int *)b;
/* Move it 2 ints forward */
brk(p + 2);
/* Use the ints. */
*p = 1;
*(p + 1) = 2;
assert(*p == 1);
assert(*(p + 1) == 2);
/* Deallocate back. */
brk(b);
return 0;
}
GitHub上游。
即使没有brk,上面的代码也可能不会打开一个新页面,也不会发生段错误,所以这里有一个更激进的版本,它分配16MiB,并且很可能在没有brk的情况下发生段错误:
#define _GNU_SOURCE
#include <assert.h>
#include <unistd.h>
int main(void) {
void *b;
char *p, *end;
b = sbrk(0);
p = (char *)b;
end = p + 0x1000000;
brk(end);
while (p < end) {
*(p++) = 1;
}
brk(b);
return 0;
}
在Ubuntu 18.04上测试。
虚拟地址空间可视化
brk:前
+------+ <-- Heap Start == Heap End
brk(p + 2)后:
+------+ <-- Heap Start + 2 * sizof(int) == Heap End
| |
| You can now write your ints
| in this memory area.
| |
+------+ <-- Heap Start
brk (b)后:
+------+ <-- Heap Start == Heap End
为了更好地理解地址空间,您应该熟悉分页:x86分页是如何工作的?
为什么我们同时需要brk和sbrk?
BRK当然可以用SBRK +偏移量计算实现,两者的存在只是为了方便。
在后端,Linux内核v5.0有一个系统调用brk,用于实现这两者:https://github.com/torvalds/linux/blob/v5.0/arch/x86/entry/syscalls/syscall_64.tbl#L23
12 common brk __x64_sys_brk
brk是POSIX吗?
brk曾经是POSIX,但在POSIX 2001中被删除了,因此需要_GNU_SOURCE来访问glibc包装器。
这种删除可能是由于引入了mmap,这是一个允许分配多个范围和更多分配选项的超集。
我认为现在没有什么有效的情况下你应该使用brk而不是malloc或mmap。
BRK vs malloc
BRK是实现malloc的一种老方法。
mmap是一种更新但更强大的机制,目前所有POSIX系统都可能使用它来实现malloc。下面是一个最小可运行mmap内存分配示例。
我可以混合brk和malloc吗?
如果你的malloc是用brk实现的,我不知道怎么可能不炸东西,因为brk只管理一个单一的内存范围。
然而,我在glibc文档中找不到任何关于它的信息,例如:
https://www.gnu.org/software/libc/manual/html_mono/libc.html#Resizing-the-Data-Segment
我想事情可能会在那里工作,因为mmap可能用于malloc。
参见:
brk/sbrk有什么不安全/遗留问题? 为什么两次调用sbrk(0)会得到不同的值?
更多信息
在内部,内核决定进程是否可以拥有那么多内存,并为该使用指定内存页。
这解释了堆栈与堆的比较:在x86汇编中,寄存器上使用的推/弹出指令的功能是什么?
在您发布的图表中,“break”(由brk和sbrk操作的地址)是堆顶部的虚线。
The documentation you've read describes this as the end of the "data segment" because in traditional (pre-shared-libraries, pre-mmap) Unix the data segment was continuous with the heap; before program start, the kernel would load the "text" and "data" blocks into RAM starting at address zero (actually a little above address zero, so that the NULL pointer genuinely didn't point to anything) and set the break address to the end of the data segment. The first call to malloc would then use sbrk to move the break up and create the heap in between the top of the data segment and the new, higher break address, as shown in the diagram, and subsequent use of malloc would use it to make the heap bigger as necessary.
Meantime, the stack starts at the top of memory and grows down. The stack doesn't need explicit system calls to make it bigger; either it starts off with as much RAM allocated to it as it can ever have (this was the traditional approach) or there is a region of reserved addresses below the stack, to which the kernel automatically allocates RAM when it notices an attempt to write there (this is the modern approach). Either way, there may or may not be a "guard" region at the bottom of the address space that can be used for stack. If this region exists (all modern systems do this) it is permanently unmapped; if either the stack or the heap tries to grow into it, you get a segmentation fault. Traditionally, though, the kernel made no attempt to enforce a boundary; the stack could grow into the heap, or the heap could grow into the stack, and either way they would scribble over each other's data and the program would crash. If you were very lucky it would crash immediately.
我不确定这个图中的512GB这个数字是从哪里来的。它意味着一个64位的虚拟地址空间,这与您所拥有的非常简单的内存映射不一致。真正的64位地址空间是这样的:
Legend: t: text, d: data, b: BSS
This is not remotely to scale, and it shouldn't be interpreted as exactly how any given OS does stuff (after I drew it I discovered that Linux actually puts the executable much closer to address zero than I thought it did, and the shared libraries at surprisingly high addresses). The black regions of this diagram are unmapped -- any access causes an immediate segfault -- and they are gigantic relative to the gray areas. The light-gray regions are the program and its shared libraries (there can be dozens of shared libraries); each has an independent text and data segment (and "bss" segment, which also contains global data but is initialized to all-bits-zero rather than taking up space in the executable or library on disk). The heap is no longer necessarily continous with the executable's data segment -- I drew it that way, but it looks like Linux, at least, doesn't do that. The stack is no longer pegged to the top of the virtual address space, and the distance between the heap and the stack is so enormous that you don't have to worry about crossing it.
断点仍然是堆的上限。然而,我没有展示的是,在黑色的某个地方,可能有几十个独立的内存分配,使用mmap而不是brk。(操作系统会尽量让它们远离brk区域,这样它们就不会碰撞。)