为什么c++ STL不提供任何“树”容器,用什么代替最好?

我想将对象的层次结构存储为树,而不是使用树来增强性能……


当前回答

"我想把对象的层次结构存储为树"

c++ 11来了又走了,他们仍然认为没有必要提供std::tree,尽管这个想法确实出现了(见这里)。也许他们没有添加的原因是,在现有容器的基础上构建自己的容器非常简单。例如……

template< typename T >
struct tree_node
   {
   T t;
   std::vector<tree_node> children;
   };

一个简单的遍历将使用递归…

template< typename T >
void tree_node<T>::walk_depth_first() const
   {
   cout<<t;
   for ( auto & n: children ) n.walk_depth_first();
   }

如果您想要维护一个层次结构,并且希望它与STL算法一起工作,那么事情可能会变得复杂。您可以构建自己的迭代器并实现一些兼容性,但是许多算法对于层次结构没有任何意义(例如,任何改变范围顺序的东西)。即使是在层次结构中定义一个范围也可能是一件混乱的事情。

其他回答

std::map基于红黑树。您还可以使用其他容器来帮助您实现自己的树类型。

问题在于,没有放之四海而皆准的解决方案。此外,对于树,甚至没有一个通用的接口。也就是说,甚至不清楚这样的树数据结构应该提供哪些方法,甚至不清楚树是什么。

这解释了为什么没有STL支持:STL是用于大多数人需要的数据结构,基本上每个人都同意什么是合理的接口和有效的实现。对于树来说,这样的东西根本不存在。

血腥的细节

如果想进一步了解问题是什么,请继续阅读。否则,上面这段话已经足以回答你的问题了。

我说过,甚至没有一个共同的界面。您可能不同意,因为您脑海中只有一个应用程序,但如果您进一步思考,您将看到在树上有无数种可能的操作。您可以使用一种数据结构,它可以有效地实现大多数操作,但因此总体上更复杂,并为这种复杂性带来开销;或者您可以使用更简单的数据结构,它只允许基本操作,但尽可能快地执行这些操作。

如果你想了解完整的故事,可以看看我关于这个主题的论文。在那里,你会发现可能的接口,不同实现上的渐近复杂性,以及问题的一般描述,以及与更多可能的实现相关的工作。

树是什么?

它已经从你认为是树的东西开始:

Rooted or unrooted: most programmers want rooted, most mathematicians want unrooted. (If you wonder what unrooted is: A - B - C is a tree where either A, B, or C could be the root. A rooted tree defines which one is. An unrooted doesn't) Single root/connected or multi root/disconnected (tree or forest) Is sibling order relevant? If no, then can the tree structure internally reorder children on updates? If so, iteration order among siblings is no longer defined. But for most trees, sibiling order is actually not meaningful, and allowing the data structure to reorder children on update is very beneficial for some updates. Really just a tree, or also allow DAG edges (sounds weird, but many people who initially want a tree eventually want a DAG) Labeled or unlabled? Do you need to store any data per node, or is it only the tree structure you're interested in (the latter can be stored very succinctly)

查询操作

在我们弄清楚我们定义什么是树之后,我们应该定义查询操作:基本操作可能是“导航到子节点,导航到父节点”,但还有更多可能的操作,例如:

Navigate to next/prev sibling: Even most people would consider this a pretty basic operation, it is actually almost impossible if you only have a parent pointer or a children array. So this already shows you that you might need a totally different implementation based on what operations you need. Navigate in pre/post order Subtree size: the number of (transitive) descendants of the current node (possibly in O(1) or O(log n), i.e., don't just enumerate them all to count) the height of the tree in the current node. That is, the longest path from this node to any leave node. Again, in less than O(n). Given two nodes, find the least common ancestor of the node (with O(1) memory consumption) How many nodes are between node A and node B in a pre-/post-order traversal? (less than O(n) runtime)

我强调了这里有趣的事情是这些方法是否能比O(n)执行得更好,因为只是枚举整个树总是一个选项。根据您的应用程序,某些操作比O(n)快可能是绝对重要的,或者您可能根本不在乎。同样,根据您的需要,您将需要非常不同的数据结构。

更新操作

到目前为止,我只讨论了查询操作。现在来更新一下。同样,有多种方法可以更新树。根据你的需要,你需要一个或多或少复杂的数据结构:

叶子更新(简单):删除或添加一个叶子节点 内部节点更新(更难):移动或删除移动内部节点,使其子节点成为子节点 它的父类 子树更新(更难):移动或删除根于节点的子树

To just give you some intuition: If you store a child array and your sibling order is important, even deleting a leaf can be O(n) as all siblings behind it have to be shifted in the child array of its parent. If you instead only have a parent pointer, leaf deletion is trivially O(1). If you don't care about sibiling order, it is also O(1) for the child array, as you can simply replace the gap with the last sibling in the array. This is just one example where different data structures will give you quite different update capabilities.

在父指针的情况下,移动整个子树仍然是O(1),但如果你有一个存储所有节点的数据结构,例如,以预顺序存储,则可以是O(n)。

然后,还有一些正交的考虑因素,比如如果执行更新,哪些迭代器保持有效。一些数据结构需要使整个树中的所有迭代器失效,即使您插入了一个新的叶。其他的迭代器只会使树中被修改的部分失效。其他迭代器则保持所有迭代器(删除节点的迭代器除外)有效。

空间的考虑

树形结构可以非常简洁。如果您需要节省空间,大约每个节点2位就足够了(例如,DFUDS或LOUDS,请参阅此解释以了解要点)。当然,很简单,一个父指针已经是64位了。一旦选择了易于导航的结构,则可能需要每个节点20个字节。

With a lot of sophisication, one can also build a data structure that only takes some bits per entry, can be updated efficiently, and still enables all query operations asymptotically fast, but this is a beast of a structure that is highly complex. I once gave a practical course where I had grad students implement this paper. Some of them were able to implement it in 6 weeks (!), others failed. And while the structure has great asymptotics, its complexity makes it have quite some overhead for very simple operations.

再说一次,没有一刀切的方法。

结论

I worked 5 years on finding the best data structure to represent a tree, and even though I came up with some and there is quite some related work, my conclusion was that there is not one. Depending on the use case, a highly sophsticated data struture will be outperformed by a simple parent pointer. Even defining the interface for a tree is hard. I tried defining one in my paper, but I have to acknowledge that there are various use cases where the interface I defined is too narrow or too large. So I doubt that this will ever end up in STL, as there are just too many tuning knobs.

The STL's philosophy is that you choose a container based on guarantees and not based on how the container is implemented. For example, your choice of container may be based on a need for fast lookups. For all you care, the container may be implemented as a unidirectional list -- as long as searching is very fast you'd be happy. That's because you're not touching the internals anyhow, you're using iterators or member functions for the access. Your code is not bound to how the container is implemented but to how fast it is, or whether it has a fixed and defined ordering, or whether it is efficient on space, and so on.

所有STL容器都可以与迭代器一起使用。你不能在树中使用迭代器,因为你没有遍历树的“唯一正确”方法。

"我想把对象的层次结构存储为树"

c++ 11来了又走了,他们仍然认为没有必要提供std::tree,尽管这个想法确实出现了(见这里)。也许他们没有添加的原因是,在现有容器的基础上构建自己的容器非常简单。例如……

template< typename T >
struct tree_node
   {
   T t;
   std::vector<tree_node> children;
   };

一个简单的遍历将使用递归…

template< typename T >
void tree_node<T>::walk_depth_first() const
   {
   cout<<t;
   for ( auto & n: children ) n.walk_depth_first();
   }

如果您想要维护一个层次结构,并且希望它与STL算法一起工作,那么事情可能会变得复杂。您可以构建自己的迭代器并实现一些兼容性,但是许多算法对于层次结构没有任何意义(例如,任何改变范围顺序的东西)。即使是在层次结构中定义一个范围也可能是一件混乱的事情。