为什么c++ STL不提供任何“树”容器,用什么代替最好?

我想将对象的层次结构存储为树,而不是使用树来增强性能……


当前回答

问题在于,没有放之四海而皆准的解决方案。此外,对于树,甚至没有一个通用的接口。也就是说,甚至不清楚这样的树数据结构应该提供哪些方法,甚至不清楚树是什么。

这解释了为什么没有STL支持:STL是用于大多数人需要的数据结构,基本上每个人都同意什么是合理的接口和有效的实现。对于树来说,这样的东西根本不存在。

血腥的细节

如果想进一步了解问题是什么,请继续阅读。否则,上面这段话已经足以回答你的问题了。

我说过,甚至没有一个共同的界面。您可能不同意,因为您脑海中只有一个应用程序,但如果您进一步思考,您将看到在树上有无数种可能的操作。您可以使用一种数据结构,它可以有效地实现大多数操作,但因此总体上更复杂,并为这种复杂性带来开销;或者您可以使用更简单的数据结构,它只允许基本操作,但尽可能快地执行这些操作。

如果你想了解完整的故事,可以看看我关于这个主题的论文。在那里,你会发现可能的接口,不同实现上的渐近复杂性,以及问题的一般描述,以及与更多可能的实现相关的工作。

树是什么?

它已经从你认为是树的东西开始:

Rooted or unrooted: most programmers want rooted, most mathematicians want unrooted. (If you wonder what unrooted is: A - B - C is a tree where either A, B, or C could be the root. A rooted tree defines which one is. An unrooted doesn't) Single root/connected or multi root/disconnected (tree or forest) Is sibling order relevant? If no, then can the tree structure internally reorder children on updates? If so, iteration order among siblings is no longer defined. But for most trees, sibiling order is actually not meaningful, and allowing the data structure to reorder children on update is very beneficial for some updates. Really just a tree, or also allow DAG edges (sounds weird, but many people who initially want a tree eventually want a DAG) Labeled or unlabled? Do you need to store any data per node, or is it only the tree structure you're interested in (the latter can be stored very succinctly)

查询操作

在我们弄清楚我们定义什么是树之后,我们应该定义查询操作:基本操作可能是“导航到子节点,导航到父节点”,但还有更多可能的操作,例如:

Navigate to next/prev sibling: Even most people would consider this a pretty basic operation, it is actually almost impossible if you only have a parent pointer or a children array. So this already shows you that you might need a totally different implementation based on what operations you need. Navigate in pre/post order Subtree size: the number of (transitive) descendants of the current node (possibly in O(1) or O(log n), i.e., don't just enumerate them all to count) the height of the tree in the current node. That is, the longest path from this node to any leave node. Again, in less than O(n). Given two nodes, find the least common ancestor of the node (with O(1) memory consumption) How many nodes are between node A and node B in a pre-/post-order traversal? (less than O(n) runtime)

我强调了这里有趣的事情是这些方法是否能比O(n)执行得更好,因为只是枚举整个树总是一个选项。根据您的应用程序,某些操作比O(n)快可能是绝对重要的,或者您可能根本不在乎。同样,根据您的需要,您将需要非常不同的数据结构。

更新操作

到目前为止,我只讨论了查询操作。现在来更新一下。同样,有多种方法可以更新树。根据你的需要,你需要一个或多或少复杂的数据结构:

叶子更新(简单):删除或添加一个叶子节点 内部节点更新(更难):移动或删除移动内部节点,使其子节点成为子节点 它的父类 子树更新(更难):移动或删除根于节点的子树

To just give you some intuition: If you store a child array and your sibling order is important, even deleting a leaf can be O(n) as all siblings behind it have to be shifted in the child array of its parent. If you instead only have a parent pointer, leaf deletion is trivially O(1). If you don't care about sibiling order, it is also O(1) for the child array, as you can simply replace the gap with the last sibling in the array. This is just one example where different data structures will give you quite different update capabilities.

在父指针的情况下,移动整个子树仍然是O(1),但如果你有一个存储所有节点的数据结构,例如,以预顺序存储,则可以是O(n)。

然后,还有一些正交的考虑因素,比如如果执行更新,哪些迭代器保持有效。一些数据结构需要使整个树中的所有迭代器失效,即使您插入了一个新的叶。其他的迭代器只会使树中被修改的部分失效。其他迭代器则保持所有迭代器(删除节点的迭代器除外)有效。

空间的考虑

树形结构可以非常简洁。如果您需要节省空间,大约每个节点2位就足够了(例如,DFUDS或LOUDS,请参阅此解释以了解要点)。当然,很简单,一个父指针已经是64位了。一旦选择了易于导航的结构,则可能需要每个节点20个字节。

With a lot of sophisication, one can also build a data structure that only takes some bits per entry, can be updated efficiently, and still enables all query operations asymptotically fast, but this is a beast of a structure that is highly complex. I once gave a practical course where I had grad students implement this paper. Some of them were able to implement it in 6 weeks (!), others failed. And while the structure has great asymptotics, its complexity makes it have quite some overhead for very simple operations.

再说一次,没有一刀切的方法。

结论

I worked 5 years on finding the best data structure to represent a tree, and even though I came up with some and there is quite some related work, my conclusion was that there is not one. Depending on the use case, a highly sophsticated data struture will be outperformed by a simple parent pointer. Even defining the interface for a tree is hard. I tried defining one in my paper, but I have to acknowledge that there are various use cases where the interface I defined is too narrow or too large. So I doubt that this will ever end up in STL, as there are just too many tuning knobs.

其他回答

在我看来,这是个疏忽。但是我认为有很好的理由不在STL中包含Tree结构。维护树有很多逻辑,最好将树作为成员函数编写到基本TreeNode对象中。当TreeNode被封装在STL头中时,它会变得更加混乱。

例如:

template <typename T>
struct TreeNode
{
  T* DATA ; // data of type T to be stored at this TreeNode

  vector< TreeNode<T>* > children ;

  // insertion logic for if an insert is asked of me.
  // may append to children, or may pass off to one of the child nodes
  void insert( T* newData ) ;

} ;

template <typename T>
struct Tree
{
  TreeNode<T>* root;

  // TREE LEVEL functions
  void clear() { delete root ; root=0; }

  void insert( T* data ) { if(root)root->insert(data); } 
} ;

因为STL不是一个“万能”库。本质上,它包含了建造东西所需的最小结构。

这一个看起来很有前途,似乎是你正在寻找的: http://tree.phi-sci.com/

在某种程度上,std::map是一棵树(它被要求具有与平衡二叉树相同的性能特征),但它不公开其他树功能。不包含真正的树数据结构的原因可能只是不包含stl中的所有内容。stl可以被视为一个框架,用于实现您自己的算法和数据结构。

一般来说,如果你想要一个基本的库功能,而它不在stl中,修复方法是查看BOOST。

除此之外,还有很多库,这取决于树的需要。

使用树有两个原因:

你想用树形结构来镜像问题: 为此我们有boost图形库

或者您想要一个具有树形访问特征的容器 我们有

Std::map(和Std::multimap) Std::set(和Std::multiset)

基本上,这两个容器的特点是,它们实际上必须使用树来实现(尽管这实际上不是一个要求)。

还有这个问题: C树实现