为什么c++ STL不提供任何“树”容器,用什么代替最好?

我想将对象的层次结构存储为树,而不是使用树来增强性能……


当前回答

所有STL容器在外部都被表示为具有一种迭代机制的“序列”。 树不遵循这个成语。

其他回答

我认为没有STL树有几个原因。树是一种递归数据结构,就像容器(列表,向量,集合)一样,具有非常不同的精细结构,这使得正确的选择非常棘手。使用STL也很容易以基本形式构造它们。

一个有限根树可以被认为是一个容器,它有一个值或有效负载,比如一个类A的实例和一个可能为空的根树(子)集合;具有空子树集合的树被认为是叶树。

template<class A>
struct unordered_tree : std::set<unordered_tree>, A
{};

template<class A>
struct b_tree : std::vector<b_tree>, A
{};

template<class A>
struct planar_tree : std::list<planar_tree>, A
{};

人们必须考虑一下迭代器的设计等,以及允许在树之间定义哪些积和积操作并使之高效——原始的STL必须写得很好——这样在默认情况下,空的集合、向量或列表容器就真的是空的。

树在许多数学结构中起着至关重要的作用(参见Butcher、Grossman和Larsen的经典论文;还有康内斯和克里默的论文,他们可以结合的例子,以及他们是如何用来列举的)。认为它们的作用只是促进某些其他行动是不正确的。相反,由于它们作为数据结构的基本角色,它们有助于完成这些任务。

然而,除了树,还有“协同树”;上面这些树都有一个特性,如果你删除根,你就删除了所有东西。

考虑树中的迭代器,它们可能会被实现为一个简单的迭代器堆栈,指向一个节点,以及它的父节点……一直到根部。

template<class TREE>
struct node_iterator : std::stack<TREE::iterator>{
operator*() {return *back();}
...};

但是,你想要多少就可以有多少;它们共同组成了一个“树”,但所有的箭头都指向根,这个协同树可以通过迭代器迭代平凡迭代器和根;但是它不能向下或向下导航(它不知道其他迭代器),也不能删除迭代器集合,除非通过跟踪所有实例。

Trees are incredibly useful, they have a lot of structure, this makes it a serious challenge to get the definitively correct approach. In my view this is why they are not implemented in the STL. Moreover, in the past, I have seen people get religious and find the idea of a type of container containing instances of its own type challenging - but they have to face it - that is what a tree type represents - it is a node containing a possibly empty collection of (smaller) trees. The current language permits it without challenge providing the default constructor for container<B> does not allocate space on the heap (or anywhere else) for an B, etc.

就我个人而言,如果它以一种良好的形式出现在标准中,我会很高兴。

在某种程度上,std::map是一棵树(它被要求具有与平衡二叉树相同的性能特征),但它不公开其他树功能。不包含真正的树数据结构的原因可能只是不包含stl中的所有内容。stl可以被视为一个框架,用于实现您自己的算法和数据结构。

一般来说,如果你想要一个基本的库功能,而它不在stl中,修复方法是查看BOOST。

除此之外,还有很多库,这取决于树的需要。

问题在于,没有放之四海而皆准的解决方案。此外,对于树,甚至没有一个通用的接口。也就是说,甚至不清楚这样的树数据结构应该提供哪些方法,甚至不清楚树是什么。

这解释了为什么没有STL支持:STL是用于大多数人需要的数据结构,基本上每个人都同意什么是合理的接口和有效的实现。对于树来说,这样的东西根本不存在。

血腥的细节

如果想进一步了解问题是什么,请继续阅读。否则,上面这段话已经足以回答你的问题了。

我说过,甚至没有一个共同的界面。您可能不同意,因为您脑海中只有一个应用程序,但如果您进一步思考,您将看到在树上有无数种可能的操作。您可以使用一种数据结构,它可以有效地实现大多数操作,但因此总体上更复杂,并为这种复杂性带来开销;或者您可以使用更简单的数据结构,它只允许基本操作,但尽可能快地执行这些操作。

如果你想了解完整的故事,可以看看我关于这个主题的论文。在那里,你会发现可能的接口,不同实现上的渐近复杂性,以及问题的一般描述,以及与更多可能的实现相关的工作。

树是什么?

它已经从你认为是树的东西开始:

Rooted or unrooted: most programmers want rooted, most mathematicians want unrooted. (If you wonder what unrooted is: A - B - C is a tree where either A, B, or C could be the root. A rooted tree defines which one is. An unrooted doesn't) Single root/connected or multi root/disconnected (tree or forest) Is sibling order relevant? If no, then can the tree structure internally reorder children on updates? If so, iteration order among siblings is no longer defined. But for most trees, sibiling order is actually not meaningful, and allowing the data structure to reorder children on update is very beneficial for some updates. Really just a tree, or also allow DAG edges (sounds weird, but many people who initially want a tree eventually want a DAG) Labeled or unlabled? Do you need to store any data per node, or is it only the tree structure you're interested in (the latter can be stored very succinctly)

查询操作

在我们弄清楚我们定义什么是树之后,我们应该定义查询操作:基本操作可能是“导航到子节点,导航到父节点”,但还有更多可能的操作,例如:

Navigate to next/prev sibling: Even most people would consider this a pretty basic operation, it is actually almost impossible if you only have a parent pointer or a children array. So this already shows you that you might need a totally different implementation based on what operations you need. Navigate in pre/post order Subtree size: the number of (transitive) descendants of the current node (possibly in O(1) or O(log n), i.e., don't just enumerate them all to count) the height of the tree in the current node. That is, the longest path from this node to any leave node. Again, in less than O(n). Given two nodes, find the least common ancestor of the node (with O(1) memory consumption) How many nodes are between node A and node B in a pre-/post-order traversal? (less than O(n) runtime)

我强调了这里有趣的事情是这些方法是否能比O(n)执行得更好,因为只是枚举整个树总是一个选项。根据您的应用程序,某些操作比O(n)快可能是绝对重要的,或者您可能根本不在乎。同样,根据您的需要,您将需要非常不同的数据结构。

更新操作

到目前为止,我只讨论了查询操作。现在来更新一下。同样,有多种方法可以更新树。根据你的需要,你需要一个或多或少复杂的数据结构:

叶子更新(简单):删除或添加一个叶子节点 内部节点更新(更难):移动或删除移动内部节点,使其子节点成为子节点 它的父类 子树更新(更难):移动或删除根于节点的子树

To just give you some intuition: If you store a child array and your sibling order is important, even deleting a leaf can be O(n) as all siblings behind it have to be shifted in the child array of its parent. If you instead only have a parent pointer, leaf deletion is trivially O(1). If you don't care about sibiling order, it is also O(1) for the child array, as you can simply replace the gap with the last sibling in the array. This is just one example where different data structures will give you quite different update capabilities.

在父指针的情况下,移动整个子树仍然是O(1),但如果你有一个存储所有节点的数据结构,例如,以预顺序存储,则可以是O(n)。

然后,还有一些正交的考虑因素,比如如果执行更新,哪些迭代器保持有效。一些数据结构需要使整个树中的所有迭代器失效,即使您插入了一个新的叶。其他的迭代器只会使树中被修改的部分失效。其他迭代器则保持所有迭代器(删除节点的迭代器除外)有效。

空间的考虑

树形结构可以非常简洁。如果您需要节省空间,大约每个节点2位就足够了(例如,DFUDS或LOUDS,请参阅此解释以了解要点)。当然,很简单,一个父指针已经是64位了。一旦选择了易于导航的结构,则可能需要每个节点20个字节。

With a lot of sophisication, one can also build a data structure that only takes some bits per entry, can be updated efficiently, and still enables all query operations asymptotically fast, but this is a beast of a structure that is highly complex. I once gave a practical course where I had grad students implement this paper. Some of them were able to implement it in 6 weeks (!), others failed. And while the structure has great asymptotics, its complexity makes it have quite some overhead for very simple operations.

再说一次,没有一刀切的方法。

结论

I worked 5 years on finding the best data structure to represent a tree, and even though I came up with some and there is quite some related work, my conclusion was that there is not one. Depending on the use case, a highly sophsticated data struture will be outperformed by a simple parent pointer. Even defining the interface for a tree is hard. I tried defining one in my paper, but I have to acknowledge that there are various use cases where the interface I defined is too narrow or too large. So I doubt that this will ever end up in STL, as there are just too many tuning knobs.

The STL's philosophy is that you choose a container based on guarantees and not based on how the container is implemented. For example, your choice of container may be based on a need for fast lookups. For all you care, the container may be implemented as a unidirectional list -- as long as searching is very fast you'd be happy. That's because you're not touching the internals anyhow, you're using iterators or member functions for the access. Your code is not bound to how the container is implemented but to how fast it is, or whether it has a fixed and defined ordering, or whether it is efficient on space, and so on.

在我看来,这是个疏忽。但是我认为有很好的理由不在STL中包含Tree结构。维护树有很多逻辑,最好将树作为成员函数编写到基本TreeNode对象中。当TreeNode被封装在STL头中时,它会变得更加混乱。

例如:

template <typename T>
struct TreeNode
{
  T* DATA ; // data of type T to be stored at this TreeNode

  vector< TreeNode<T>* > children ;

  // insertion logic for if an insert is asked of me.
  // may append to children, or may pass off to one of the child nodes
  void insert( T* newData ) ;

} ;

template <typename T>
struct Tree
{
  TreeNode<T>* root;

  // TREE LEVEL functions
  void clear() { delete root ; root=0; }

  void insert( T* data ) { if(root)root->insert(data); } 
} ;