堆和BST的区别是什么?
什么时候使用堆,什么时候使用BST?
如果你想以排序的方式获取元素,BST比堆更好吗?
堆和BST的区别是什么?
什么时候使用堆,什么时候使用BST?
如果你想以排序的方式获取元素,BST比堆更好吗?
当前回答
总结
Type BST (*) Heap
Insert average log(n) 1
Insert worst log(n) log(n) or n (***)
Find any worst log(n) n
Find max worst 1 (**) 1
Create worst n log(n) n
Delete worst log(n) log(n)
除Insert外,该表上的所有平均时间都与最差时间相同。
*:在这个答案的任何地方,BST ==平衡BST,因为不平衡是渐进的 **:使用这个答案中解释的一个微不足道的修改 ***: log(n)表示指针树堆,n表示动态数组堆
二进制堆相对于BST的优势
average time insertion into a binary heap is O(1), for BST is O(log(n)). This is the killer feature of heaps. There are also other heaps which reach O(1) amortized (stronger) like the Fibonacci Heap, and even worst case, like the Brodal queue, although they may not be practical because of non-asymptotic performance: Are Fibonacci heaps or Brodal queues used in practice anywhere? binary heaps can be efficiently implemented on top of either dynamic arrays or pointer-based trees, BST only pointer-based trees. So for the heap we can choose the more space efficient array implementation, if we can afford occasional resize latencies. binary heap creation is O(n) worst case, O(n log(n)) for BST.
BST相对于二进制堆的优势
对任意元素的搜索是O(log(n))这是bst的致命特性。 对于堆,它一般是O(n),除了最大的元素是O(1)。
堆相对于BST的“虚假”优势
堆是O(1)找到max, BST O(log(n))。 这是一种常见的误解,因为修改BST以跟踪最大的元素并在该元素可能更改时更新它是很简单的:插入较大的交换,删除时查找第二大的交换。我们可以用二叉搜索树来模拟堆操作吗?(杨荣文提到)。 实际上,与bst相比,这是堆的一个限制:唯一有效的搜索是对最大元素的搜索。
平均二进制堆插入是O(1)
来源:
摘要:http://i.stanford.edu/pub/cstr/reports/cs/tr/74/460/cs - tr - 74 - 460. - pdf WSU幻灯片:- WSU幻灯片:https://web.archive.org/web/20161109132222/http://www.eecs.wsu.edu/~holder/courses/CptS223/spr09/slides/heaps.pdf
直观的论点:
树的底层比顶层有指数级的元素,所以新元素几乎肯定会出现在底层 堆插入从底部开始,BST必须从顶部开始
在二进制堆中,由于相同的原因,增加给定索引处的值也是O(1)。但如果你想这样做,很可能你会想保持一个额外的索引最新的堆操作如何实现O(logn)减键操作的最小堆基于优先级队列?例如Dijkstra。不需要额外的时间成本。
GCC c++标准库在实际硬件上插入基准
我基准测试了c++ std::set(红黑树BST)和std::priority_queue(动态数组堆)插入,看看我是否正确的插入时间,这是我得到的:
基准测试代码 剧情脚本 图数据 在联想ThinkPad P51笔记本电脑上测试Ubuntu 19.04, GCC 8.3.0, CPU:英特尔酷睿i7-7820HQ CPU(4核/ 8线程,2.90 GHz基数,8 MB缓存),RAM: 2倍三星M471A2K43BB1-CRC(2倍16GiB, 2400 Mbps), SSD:三星MZVLB512HAJQ-000L7 (512GB, 3000 MB/s)
所以很明显:
heap insert time is basically constant. We can clearly see dynamic array resize points. Since we are averaging every 10k inserts to be able to see anything at all above system noise, those peaks are in fact about 10k times larger than shown! The zoomed graph excludes essentially only the array resize points, and shows that almost all inserts fall under 25 nanoseconds. BST is logarithmic. All inserts are much slower than the average heap insert. BST vs hashmap detailed analysis at: What data structure is inside std::map in C++?
GCC c++标准库在gem5上插入基准
Gem5是一个完整的系统模拟器,因此可以通过m5 dumpstats提供无限精确的时钟。因此,我尝试使用它来估计单个插入的时间。
解释:
heap is still constant, but now we see in more detail that there are a few lines, and each higher line is more sparse. This must correspond to memory access latencies are done for higher and higher inserts. TODO I can't really interpret the BST fully one as it does not look so logarithmic and somewhat more constant. With this greater detail however we can see can also see a few distinct lines, but I'm not sure what they represent: I would expect the bottom line to be thinner, since we insert top bottom?
在aarch64 HPI CPU上使用此Buildroot设置进行基准测试。
BST不能有效地在数组上实现
堆操作只需要向上或向下冒泡一个树枝,所以O(log(n))最坏情况交换,O(1)平均。
保持BST平衡需要树旋转,这可以改变顶部的元素为另一个元素,并需要移动整个数组(O(n))。
堆可以有效地在数组上实现
可以从当前索引计算父索引和子索引,如下所示。
没有像BST那样的平衡操作。
删除最小值是最令人担忧的操作,因为它必须是自顶向下的。但是它总是可以通过“向下渗透”堆的单个分支来完成,正如这里所解释的那样。这将导致O(log(n))最坏的情况,因为堆总是很好地平衡。
如果您每删除一个节点就插入一个节点,那么您就失去了堆提供的渐近O(1)平均插入的优势,因为删除将占主导地位,并且您还可以使用BST。然而,Dijkstra每次删除节点都会更新几次,所以我们没有问题。
动态数组堆与指针树堆
堆可以在指针堆上有效地实现:是否有可能实现高效的基于指针的二进制堆实现?
动态数组实现的空间利用率更高。假设每个堆元素只包含一个指向结构体的指针:
树的实现必须为每个元素存储三个指针:父元素、左子元素和右子元素。所以内存使用总是4n(3个树指针+ 1个结构指针)。 树bst还需要进一步的平衡信息,例如黑-红。 动态数组实现的大小可以是加倍后的2n。所以平均是1.5n。
另一方面,树堆有更好的最坏情况插入,因为复制后台动态数组以使其大小翻倍需要O(n)个最坏情况,而树堆只是为每个节点进行新的小分配。
不过,备份数组加倍是O(1)平摊的,所以它归结为最大延迟考虑。这里提到。
哲学
BSTs maintain a global property between a parent and all descendants (left smaller, right bigger). The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there). This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search). Heaps maintain a local property between parent and direct children (parent > children). The top node of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).
比较BST、Heap和Hashmap:
BST: can either be either a reasonable: unordered set (a structure that determines if an element was previously inserted or not). But hashmap tends to be better due to O(1) amortized insert. sorting machine. But heap is generally better at that, which is why heapsort is much more widely known than tree sort heap: is just a sorting machine. Cannot be an efficient unordered set, because you can only check for the smallest/largest element fast. hash map: can only be an unordered set, not an efficient sorting machine, because the hashing mixes up any ordering.
双链接列表
双链表可以被视为堆的子集,其中第一项优先级最高,所以让我们在这里比较它们:
insertion: position: doubly linked list: the inserted item must be either the first or last, as we only have pointers to those elements (unless we have a pointer to the position of interest e.g. during iteration) binary heap: the inserted item can end up in any position. Less restrictive than linked list. time: doubly linked list: O(1) worst case since we have pointers to the items, and the update is really simple binary heap: O(1) average, thus worse than linked list. Tradeoff for having more general insertion position. search: O(n) for both
这样做的一个用例是当堆的键是当前时间戳时:在这种情况下,新条目总是会到列表的开头。所以我们甚至可以完全忘记确切的时间戳,只保留列表中的位置作为优先级。
这可以用来实现LRU缓存。就像Dijkstra这样的堆应用程序一样,您需要保留一个从键到列表中相应节点的额外hashmap,以查找要快速更新的节点。
不同平衡BST的比较
虽然到目前为止我看到的所有通常被归类为“平衡bst”的数据结构的渐近插入和查找时间是相同的,但不同的bbst确实有不同的权衡。我还没有完全研究这个问题,但在这里总结一下这些权衡是很好的:
Red-black tree. Appears to be the most commonly used BBST as of 2019, e.g. it is the one used by the GCC 8.3.0 C++ implementation AVL tree. Appears to be a bit more balanced than BST, so it could be better for find latency, at the cost of slightly more expensive finds. Wiki summarizes: "AVL trees are often compared with red–black trees because both support the same set of operations and take [the same] time for the basic operations. For lookup-intensive applications, AVL trees are faster than red–black trees because they are more strictly balanced. Similar to red–black trees, AVL trees are height-balanced. Both are, in general, neither weight-balanced nor mu-balanced for any mu < 1/2; that is, sibling nodes can have hugely differing numbers of descendants." WAVL. The original paper mentions advantages of that version in terms of bounds on rebalancing and rotation operations.
另请参阅
CS上类似的问题:https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap
其他回答
BST over Heap的另一种用法;因为有一个重要的区别:
在BST中找到后继和前任将花费O(h)时间。(O(logn) in balanced BST) 而在Heap中,则需要O(n)时间才能找到某个元素的后继或前任。
在堆上使用BST:现在,假设我们使用一个数据结构来存储航班的着陆时间。如果着陆时间差值小于“d”,我们不能安排航班降落。并且假设许多航班已经安排在一个数据结构(BST或Heap)中降落。
现在,我们想安排另一个降落在t点的Flight,因此,我们需要计算t与它的继任者和前任的差值(应该是>d)。 因此,我们需要一个BST,如果平衡的话,它会很快,即在O(logn)。
编辑:
排序BST需要O(n)时间按排序顺序打印元素(内序遍历),而Heap可以在O(n logn)时间内完成。 Heap提取最小元素并重新堆化数组,这使得它在O(n logn)时间内完成排序。
正如其他人所提到的,Heap可以在O(1)中执行findMin或findMax,但不能在相同的数据结构中同时执行。然而,我不同意Heap在findMin/findMax中更好。事实上,只要稍加修改,BST就可以在O(1)中同时执行findMin和findMax。
在这个修改后的BST中,每次执行可能修改数据结构的操作时,都要跟踪最小节点和最大节点。例如,在插入操作中,您可以检查最小值是否大于新插入的值,然后将最小值分配给新添加的节点。同样的技术也可以应用在最大值上。因此,这个BST包含这些信息,您可以在O(1)中检索它们。(与二进制堆相同)
在这个BST(平衡BST)中,当你弹出min或弹出max时,下一个要分配的min值是min节点的继任者,而下一个要分配的max值是max节点的前任。因此它在O(1)中执行。然而,我们需要重新平衡树,因此它仍然会运行O(log n)(与二叉堆相同)。
我很想在下面的评论中听到你的想法。谢谢:)
更新
交叉参考类似问题我们可以用二叉搜索树来模拟堆操作吗?有关使用BST模拟堆的更多讨论。
堆是实现优先级队列的具体数据结构。
没有理由让分支因子固定等于2。 通常我们可以有d-ary树:
Trees that used for heap are complete trees. Heaps are created to get the highest priority element in the data. However binary search as the name says are for searching. Heaps are not good for is checking whether or not an element is stored in them. You have no other choice than going through all the elements until you find the one you are looking for, or we get to the end of the array. This means a linear time algorithm. It is O(log(n)) for binary search trees. Binary Search Tree doesn’t allow duplicates, however, the Heap does. The Binary Search Tree is ordered, but the Heap is not. Insert and remove will take O(log(n)) time in a heap. In a Binary Search Tree, this may take up to O(n) time, if the tree is completely unbalanced. In the image below you see two BSTs, right one is chained so insert and remove might take O(N) but for the left one O(Log(N))
堆可以在线性时间内构建(使用heapify),但是,BST需要创建O(n * log(n))。 堆是完整的树。这是非常重要的事实。因为当你插入的时候你必须从最后一个插入,这样交换之后 会有完整的树形结构。类似地,当你删除时, 删除从根开始,先删除根再保留 完整的树结构中最后一个元素应该是树 插入根。
参考
将数组中的所有n个元素插入BST需要O(nlogn)。一个数组中的n个元素可以在O(n)时间内插入到堆中。哪一个给堆带来了明显的优势
总结
Type BST (*) Heap
Insert average log(n) 1
Insert worst log(n) log(n) or n (***)
Find any worst log(n) n
Find max worst 1 (**) 1
Create worst n log(n) n
Delete worst log(n) log(n)
除Insert外,该表上的所有平均时间都与最差时间相同。
*:在这个答案的任何地方,BST ==平衡BST,因为不平衡是渐进的 **:使用这个答案中解释的一个微不足道的修改 ***: log(n)表示指针树堆,n表示动态数组堆
二进制堆相对于BST的优势
average time insertion into a binary heap is O(1), for BST is O(log(n)). This is the killer feature of heaps. There are also other heaps which reach O(1) amortized (stronger) like the Fibonacci Heap, and even worst case, like the Brodal queue, although they may not be practical because of non-asymptotic performance: Are Fibonacci heaps or Brodal queues used in practice anywhere? binary heaps can be efficiently implemented on top of either dynamic arrays or pointer-based trees, BST only pointer-based trees. So for the heap we can choose the more space efficient array implementation, if we can afford occasional resize latencies. binary heap creation is O(n) worst case, O(n log(n)) for BST.
BST相对于二进制堆的优势
对任意元素的搜索是O(log(n))这是bst的致命特性。 对于堆,它一般是O(n),除了最大的元素是O(1)。
堆相对于BST的“虚假”优势
堆是O(1)找到max, BST O(log(n))。 这是一种常见的误解,因为修改BST以跟踪最大的元素并在该元素可能更改时更新它是很简单的:插入较大的交换,删除时查找第二大的交换。我们可以用二叉搜索树来模拟堆操作吗?(杨荣文提到)。 实际上,与bst相比,这是堆的一个限制:唯一有效的搜索是对最大元素的搜索。
平均二进制堆插入是O(1)
来源:
摘要:http://i.stanford.edu/pub/cstr/reports/cs/tr/74/460/cs - tr - 74 - 460. - pdf WSU幻灯片:- WSU幻灯片:https://web.archive.org/web/20161109132222/http://www.eecs.wsu.edu/~holder/courses/CptS223/spr09/slides/heaps.pdf
直观的论点:
树的底层比顶层有指数级的元素,所以新元素几乎肯定会出现在底层 堆插入从底部开始,BST必须从顶部开始
在二进制堆中,由于相同的原因,增加给定索引处的值也是O(1)。但如果你想这样做,很可能你会想保持一个额外的索引最新的堆操作如何实现O(logn)减键操作的最小堆基于优先级队列?例如Dijkstra。不需要额外的时间成本。
GCC c++标准库在实际硬件上插入基准
我基准测试了c++ std::set(红黑树BST)和std::priority_queue(动态数组堆)插入,看看我是否正确的插入时间,这是我得到的:
基准测试代码 剧情脚本 图数据 在联想ThinkPad P51笔记本电脑上测试Ubuntu 19.04, GCC 8.3.0, CPU:英特尔酷睿i7-7820HQ CPU(4核/ 8线程,2.90 GHz基数,8 MB缓存),RAM: 2倍三星M471A2K43BB1-CRC(2倍16GiB, 2400 Mbps), SSD:三星MZVLB512HAJQ-000L7 (512GB, 3000 MB/s)
所以很明显:
heap insert time is basically constant. We can clearly see dynamic array resize points. Since we are averaging every 10k inserts to be able to see anything at all above system noise, those peaks are in fact about 10k times larger than shown! The zoomed graph excludes essentially only the array resize points, and shows that almost all inserts fall under 25 nanoseconds. BST is logarithmic. All inserts are much slower than the average heap insert. BST vs hashmap detailed analysis at: What data structure is inside std::map in C++?
GCC c++标准库在gem5上插入基准
Gem5是一个完整的系统模拟器,因此可以通过m5 dumpstats提供无限精确的时钟。因此,我尝试使用它来估计单个插入的时间。
解释:
heap is still constant, but now we see in more detail that there are a few lines, and each higher line is more sparse. This must correspond to memory access latencies are done for higher and higher inserts. TODO I can't really interpret the BST fully one as it does not look so logarithmic and somewhat more constant. With this greater detail however we can see can also see a few distinct lines, but I'm not sure what they represent: I would expect the bottom line to be thinner, since we insert top bottom?
在aarch64 HPI CPU上使用此Buildroot设置进行基准测试。
BST不能有效地在数组上实现
堆操作只需要向上或向下冒泡一个树枝,所以O(log(n))最坏情况交换,O(1)平均。
保持BST平衡需要树旋转,这可以改变顶部的元素为另一个元素,并需要移动整个数组(O(n))。
堆可以有效地在数组上实现
可以从当前索引计算父索引和子索引,如下所示。
没有像BST那样的平衡操作。
删除最小值是最令人担忧的操作,因为它必须是自顶向下的。但是它总是可以通过“向下渗透”堆的单个分支来完成,正如这里所解释的那样。这将导致O(log(n))最坏的情况,因为堆总是很好地平衡。
如果您每删除一个节点就插入一个节点,那么您就失去了堆提供的渐近O(1)平均插入的优势,因为删除将占主导地位,并且您还可以使用BST。然而,Dijkstra每次删除节点都会更新几次,所以我们没有问题。
动态数组堆与指针树堆
堆可以在指针堆上有效地实现:是否有可能实现高效的基于指针的二进制堆实现?
动态数组实现的空间利用率更高。假设每个堆元素只包含一个指向结构体的指针:
树的实现必须为每个元素存储三个指针:父元素、左子元素和右子元素。所以内存使用总是4n(3个树指针+ 1个结构指针)。 树bst还需要进一步的平衡信息,例如黑-红。 动态数组实现的大小可以是加倍后的2n。所以平均是1.5n。
另一方面,树堆有更好的最坏情况插入,因为复制后台动态数组以使其大小翻倍需要O(n)个最坏情况,而树堆只是为每个节点进行新的小分配。
不过,备份数组加倍是O(1)平摊的,所以它归结为最大延迟考虑。这里提到。
哲学
BSTs maintain a global property between a parent and all descendants (left smaller, right bigger). The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there). This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search). Heaps maintain a local property between parent and direct children (parent > children). The top node of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).
比较BST、Heap和Hashmap:
BST: can either be either a reasonable: unordered set (a structure that determines if an element was previously inserted or not). But hashmap tends to be better due to O(1) amortized insert. sorting machine. But heap is generally better at that, which is why heapsort is much more widely known than tree sort heap: is just a sorting machine. Cannot be an efficient unordered set, because you can only check for the smallest/largest element fast. hash map: can only be an unordered set, not an efficient sorting machine, because the hashing mixes up any ordering.
双链接列表
双链表可以被视为堆的子集,其中第一项优先级最高,所以让我们在这里比较它们:
insertion: position: doubly linked list: the inserted item must be either the first or last, as we only have pointers to those elements (unless we have a pointer to the position of interest e.g. during iteration) binary heap: the inserted item can end up in any position. Less restrictive than linked list. time: doubly linked list: O(1) worst case since we have pointers to the items, and the update is really simple binary heap: O(1) average, thus worse than linked list. Tradeoff for having more general insertion position. search: O(n) for both
这样做的一个用例是当堆的键是当前时间戳时:在这种情况下,新条目总是会到列表的开头。所以我们甚至可以完全忘记确切的时间戳,只保留列表中的位置作为优先级。
这可以用来实现LRU缓存。就像Dijkstra这样的堆应用程序一样,您需要保留一个从键到列表中相应节点的额外hashmap,以查找要快速更新的节点。
不同平衡BST的比较
虽然到目前为止我看到的所有通常被归类为“平衡bst”的数据结构的渐近插入和查找时间是相同的,但不同的bbst确实有不同的权衡。我还没有完全研究这个问题,但在这里总结一下这些权衡是很好的:
Red-black tree. Appears to be the most commonly used BBST as of 2019, e.g. it is the one used by the GCC 8.3.0 C++ implementation AVL tree. Appears to be a bit more balanced than BST, so it could be better for find latency, at the cost of slightly more expensive finds. Wiki summarizes: "AVL trees are often compared with red–black trees because both support the same set of operations and take [the same] time for the basic operations. For lookup-intensive applications, AVL trees are faster than red–black trees because they are more strictly balanced. Similar to red–black trees, AVL trees are height-balanced. Both are, in general, neither weight-balanced nor mu-balanced for any mu < 1/2; that is, sibling nodes can have hugely differing numbers of descendants." WAVL. The original paper mentions advantages of that version in terms of bounds on rebalancing and rotation operations.
另请参阅
CS上类似的问题:https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap