将平面表格解析为树的最有效/最优雅的方法是什么?

假设你有一个扁平的表，存储一个有序的树层次结构:

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

这是一个图表，我们有[id] Name。根节点0是虚构的。

                       [0] ROOT
                          /    \ 
              [1] Node 1          [3] Node 2
              /       \                   \
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

您将使用什么极简的方法将其输出到HTML(或文本，就此而言)，作为一个正确有序、正确缩进的树?

进一步假设您只有基本的数据结构(数组和hashmap)，没有带有父/子引用的花哨对象，没有ORM，没有框架，只有您的两只手。该表表示为一个结果集，可以随机访问。

伪代码或简单的英语是可以的，这纯粹是一个概念问题。

附加问题:在RDBMS中是否存在从根本上更好的方法来存储这样的树结构?

编辑和添加

回答一位评论者(Mark Bessey)的问题:根节点是不必要的，因为无论如何它都不会显示。ParentId = 0是表示“这些是顶级”的惯例。Order列定义了具有相同父节点的节点如何排序。

我所说的“结果集”可以被描绘成一个hashmap数组(继续使用这个术语)。因为我的例子本来就应该在那里。有些答案是额外的，首先构建它，但这没关系。

树可以任意深。每个节点可以有N个子节点。不过，我脑子里并没有“数百万条”树。

不要把我选择的节点命名(“节点1.1.1”)误认为是可以依赖的。节点也可以被称为“Frank”或“Bob”，没有隐含的命名结构，这只是为了让它更具可读性。

我已经发布了我自己的解决方案，所以你们可以把它拆成碎片。

假设你知道根元素是0，下面是输出到文本的伪代码:

function PrintLevel (int curr, int level)
    //print the indents
    for (i=1; i<=level; i++)
        print a tab
    print curr \n;
    for each child in the table with a parent of curr
        PrintLevel (child, level+1)


for each elementID where the parentid is zero
    PrintLevel(elementID, 0)

2008-10-10 16:59:37

如果可以创建嵌套的哈希映射或数组，那么我只需从表的开头往下走，并将每个项添加到嵌套数组中。我必须跟踪每一行到根节点，以便知道要插入到嵌套数组中的哪一层。我可以使用记忆，这样我就不需要一遍又一遍地查找相同的父节点。

编辑:我会先将整个表读入一个数组，这样它就不会重复查询DB。当然，如果您的表非常大，这是不实际的。

在构建结构之后，我必须先对其进行深度遍历，并打印出HTML。

没有更好的基本方法来存储这些信息使用一个表(我可能是错的;)，并希望看到一个更好的解决方案)。然而，如果您创建了一个使用动态创建的db表的方案，那么您以牺牲简单性和SQL地狱的风险为代价打开了一个全新的世界;)。

2008-10-10 17:02:40

您可以使用hashmap模拟任何其他数据结构，因此这并不是一个可怕的限制。从上到下扫描，为数据库的每一行创建hashmap，为每一列创建一个条目。将这些hashmap添加到“master”hashmap中，并以id为键。如果任何节点都有一个尚未见过的“父”节点，请在主hashmap中为它创建一个占位符条目，并在看到实际节点时填充它。

要将其打印出来，只需对数据进行简单的深度优先遍历，并在此过程中跟踪缩进级别。您可以通过为每一行保留一个“子”条目，并在扫描数据时填充它来简化这一点。

至于是否有“更好”的方法在数据库中存储树，这取决于您将如何使用数据。我曾经见过一些系统，它们具有已知的最大深度，但却为层次结构中的每个级别使用不同的表。如果树中的级别并不完全相同(顶级类别与叶类别不同)，那么这就很有意义了。

2008-10-10 17:24:34

如果有选择的话，我会用对象。我将为每条记录创建一个对象，其中每个对象都有一个子集合，并将它们全部存储在一个assoc数组(/hashtable)中，其中Id是键。并在集合中闪动一次，将子元素添加到相关的子字段中。简单。

但是因为你限制一些好的OOP的使用是没有乐趣的，我可能会基于:

function PrintLine(int pID, int level)
    foreach record where ParentID == pID
        print level*tabs + record-data
        PrintLine(record.ID, level + 1)

PrintLine(0, 0)

编辑:这与其他几个条目类似，但我认为它稍微干净一些。我要补充一点:这是非常sql密集的。这是令人讨厌的。如果可以选择的话，选择面向对象的方法。

2008-10-10 17:36:21

现在MySQL 8.0支持递归查询，我们可以说所有流行的SQL数据库都支持标准语法的递归查询。

WITH RECURSIVE MyTree AS (
    SELECT * FROM MyTable WHERE ParentId IS NULL
    UNION ALL
    SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

我在2017年的演讲中测试了MySQL 8.0中的递归查询。

以下是我2008年的原始答案:

有几种方法可以在关系数据库中存储树形结构的数据。您在示例中展示的内容使用了两个方法:

邻接表(“父”列)和路径枚举(名称列中的虚线数字)。

另一种解决方案称为嵌套集，它也可以存储在同一个表中。阅读Joe Celko的“Smarties SQL中的树和层次结构”，了解更多关于这些设计的信息。

我通常更喜欢一种称为闭包表(又名“邻接关系”)的设计来存储树状结构的数据。它需要另一个表，但是查询树非常简单。

我在演示中用SQL和PHP构建分层数据模型时提到了闭包表，在我的书《SQL反模式第1卷:避免数据库编程的陷阱》中也提到了闭包表。

CREATE TABLE ClosureTable (
  ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
  descendant_id INT NOT NULL REFERENCES FlatTable(id),
  PRIMARY KEY (ancestor_id, descendant_id)
);

将所有路径存储在闭包表中，其中存在从一个节点到另一个节点的直接祖先。为每个节点包含一行以引用自身。例如，使用你在问题中展示的数据集:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
  (1,1), (1,2), (1,4), (1,6),
  (2,2), (2,4),
  (3,3), (3,5),
  (4,4),
  (5,5),
  (6,6);

现在你可以得到一个从节点1开始的树，像这样:

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

输出(在MySQL客户端中)如下所示:

+----+
| id |
+----+
|  1 | 
|  2 | 
|  4 | 
|  6 | 
+----+

换句话说，节点3和5被排除在外，因为它们是单独层次结构的一部分，而不是从节点1向下延伸。

回复:e-satis关于直系子女(或直系父母)的评论。您可以向ClosureTable添加一个“path_length”列，以便更容易查询直接的子节点或父节点(或任何其他距离)。

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
  (1,1,0), (1,2,1), (1,4,2), (1,6,1),
  (2,2,0), (2,4,1),
  (3,3,0), (3,5,1),
  (4,4,0),
  (5,5,0),
  (6,6,0);

然后，您可以在搜索中添加一个术语，以查询给定节点的直接子节点。这些是path_length为1的后代。

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
  AND path_length = 1;

+----+
| id |
+----+
|  2 | 
|  6 | 
+----+

回复来自@ashraf的评论:“把整棵树(按名字)排序怎么样?”

下面是一个示例查询，返回节点1的所有后代节点，将它们连接到包含其他节点属性(如名称)的FlatTable，并按名称排序。

SELECT f.name
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;

关于@Nate的评论:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id) 
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
WHERE a.ancestor_id = 1 
GROUP BY a.descendant_id 
ORDER BY f.name

+------------+-------------+
| name       | breadcrumbs |
+------------+-------------+
| Node 1     | 1           |
| Node 1.1   | 1,2         |
| Node 1.1.1 | 1,2,4       |
| Node 1.2   | 1,6         |
+------------+-------------+

一位用户建议今天进行编辑。版主批准了编辑，但我要撤销它。

编辑建议上面最后一个查询中的ORDER BY应该是ORDER BY b.path_length, f.name，大概是为了确保顺序与层次结构匹配。但这行不通，因为它会将“Node 1.1.1”排在“Node 1.2”之后。

如果您希望排序以合理的方式匹配层次结构，这是可能的，但不是简单地按路径长度排序。例如，请参阅我对MySQL闭包表分层数据库的回答-如何以正确的顺序提取信息。

2008-10-10 17:58:07

这写得很快，既不漂亮也不高效(加上它自动装箱很多，在int和Integer之间转换很烦人!)，但它是有效的。

这可能打破了规则，因为我创建自己的对象，但嘿，我这样做是为了从实际工作中转移注意力:)

这还假定在开始构建Nodes之前，resultSet/table已完全读入某种结构，如果您有数十万行，这不是最佳解决方案。

public class Node {

    private Node parent = null;

    private List<Node> children;

    private String name;

    private int id = -1;

    public Node(Node parent, int id, String name) {
        this.parent = parent;
        this.children = new ArrayList<Node>();
        this.name = name;
        this.id = id;
    }

    public int getId() {
        return this.id;
    }

    public String getName() {
        return this.name;
    }

    public void addChild(Node child) {
        children.add(child);
    }

    public List<Node> getChildren() {
        return children;
    }

    public boolean isRoot() {
        return (this.parent == null);
    }

    @Override
    public String toString() {
        return "id=" + id + ", name=" + name + ", parent=" + parent;
    }
}

public class NodeBuilder {

    public static Node build(List<Map<String, String>> input) {

        // maps id of a node to it's Node object
        Map<Integer, Node> nodeMap = new HashMap<Integer, Node>();

        // maps id of a node to the id of it's parent
        Map<Integer, Integer> childParentMap = new HashMap<Integer, Integer>();

        // create special 'root' Node with id=0
        Node root = new Node(null, 0, "root");
        nodeMap.put(root.getId(), root);

        // iterate thru the input
        for (Map<String, String> map : input) {

            // expect each Map to have keys for "id", "name", "parent" ... a
            // real implementation would read from a SQL object or resultset
            int id = Integer.parseInt(map.get("id"));
            String name = map.get("name");
            int parent = Integer.parseInt(map.get("parent"));

            Node node = new Node(null, id, name);
            nodeMap.put(id, node);

            childParentMap.put(id, parent);
        }

        // now that each Node is created, setup the child-parent relationships
        for (Map.Entry<Integer, Integer> entry : childParentMap.entrySet()) {
            int nodeId = entry.getKey();
            int parentId = entry.getValue();

            Node child = nodeMap.get(nodeId);
            Node parent = nodeMap.get(parentId);
            parent.addChild(child);
        }

        return root;
    }
}

public class NodePrinter {

    static void printRootNode(Node root) {
        printNodes(root, 0);
    }

    static void printNodes(Node node, int indentLevel) {

        printNode(node, indentLevel);
        // recurse
        for (Node child : node.getChildren()) {
            printNodes(child, indentLevel + 1);
        }
    }

    static void printNode(Node node, int indentLevel) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < indentLevel; i++) {
            sb.append("\t");
        }
        sb.append(node);

        System.out.println(sb.toString());
    }

    public static void main(String[] args) {

        // setup dummy data
        List<Map<String, String>> resultSet = new ArrayList<Map<String, String>>();
        resultSet.add(newMap("1", "Node 1", "0"));
        resultSet.add(newMap("2", "Node 1.1", "1"));
        resultSet.add(newMap("3", "Node 2", "0"));
        resultSet.add(newMap("4", "Node 1.1.1", "2"));
        resultSet.add(newMap("5", "Node 2.1", "3"));
        resultSet.add(newMap("6", "Node 1.2", "1"));

        Node root = NodeBuilder.build(resultSet);
        printRootNode(root);

    }

    //convenience method for creating our dummy data
    private static Map<String, String> newMap(String id, String name, String parentId) {
        Map<String, String> row = new HashMap<String, String>();
        row.put("id", id);
        row.put("name", name);
        row.put("parent", parentId);
        return row;
    }
}

2008-10-10 18:25:29

要扩展Bill的SQL解决方案，基本上可以使用平面数组来实现相同的功能。此外，如果你的字符串都有相同的长度，你的最大子代数是已知的(比如在一个二叉树中)，你可以使用一个单一的字符串(字符数组)。如果你有任意数量的孩子，事情就会变得复杂一些……我必须检查我的旧笔记，看看能做些什么。

然后，牺牲一点内存，特别是如果你的树是稀疏的和/或不平衡的，你可以，通过一些索引数学，通过存储你的树随机访问所有的字符串，宽度优先在数组中，就像这样(对于二叉树):

String[] nodeArray = [L0root, L1child1, L1child2, L2Child1, L2Child2, L2Child3, L2Child4] ...

你知道弦的长度，你知道我现在在工作，所以不能花太多时间在上面，但有兴趣，我可以获取一些代码来做到这一点。我们过去用它来搜索由DNA密码子组成的二叉树，一个构建树的过程，然后我们将其平铺以搜索文本模式，当找到时，尽管索引数学(从上面反向)，我们将节点找回…非常快速和有效，我们的树很少有空节点，但我们可以在一瞬间搜索千兆字节的数据。

2008-10-10 18:42:36

从Oracle 9i开始，您可以使用CONNECT BY。

SELECT LPAD(' ', (LEVEL - 1) * 4) || "Name" AS "Name"
FROM (SELECT * FROM TMP_NODE ORDER BY "Order")
CONNECT BY PRIOR "Id" = "ParentId"
START WITH "Id" IN (SELECT "Id" FROM TMP_NODE WHERE "ParentId" = 0)

从SQL Server 2005开始，您可以使用递归公共表表达式(CTE)。

WITH [NodeList] (
  [Id]
  , [ParentId]
  , [Level]
  , [Order]
) AS (
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , 0 AS [Level]
    , CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
  WHERE [Node].[ParentId] = 0
  UNION ALL
  SELECT [Node].[Id]
    , [Node].[ParentId]
    , [NodeList].[Level] + 1 AS [Level]
    , [NodeList].[Order] + '|'
      + CONVERT([varchar](MAX), [Node].[Order]) AS [Order]
  FROM [Node]
    INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[ParentId]
) SELECT REPLICATE(' ', [NodeList].[Level] * 4) + [Node].[Name] AS [Name]
FROM [Node]
  INNER JOIN [NodeList] ON [NodeList].[Id] = [Node].[Id]
ORDER BY [NodeList].[Order]

两者都将输出以下结果。

Name
'Node 1'
'    Node 1.1'
'        Node 1.1.1'
'    Node 1.2'
'Node 2'
'    Node 2.1'

2008-10-10 20:06:21

如果元素是按树顺序排列的，如你的例子所示，你可以使用以下Python示例:

delimiter = '.'
stack = []
for item in items:
  while stack and not item.startswith(stack[-1]+delimiter):
    print "</div>"
    stack.pop()
  print "<div>"
  print item
  stack.append(item)

这样做的目的是维护一个表示树中当前位置的堆栈。对于表中的每个元素，它弹出堆栈元素(关闭匹配的div)，直到找到当前项的父元素。然后它输出该节点的开始并将其推入堆栈。

如果希望使用缩进而不是嵌套元素输出树，可以简单地跳过print语句来打印div，并在每个项之前打印等于堆栈大小的若干倍的空格。例如，在Python中:

print "  " * len(stack)

您还可以轻松地使用此方法构造一组嵌套的列表或字典。

编辑:我从你的澄清中看到，这些名称并不是节点路径。这就提出了另一种方法:

idx = {}
idx[0] = []
for node in results:
  child_list = []
  idx[node.Id] = child_list
  idx[node.ParentId].append((node, child_list))

这将构造一个元组数组树(!)。Idx[0]表示树的根。数组中的每个元素都是一个二元组，由节点本身及其所有子元素的列表组成。构造后，可以保留idx[0]并丢弃idx，除非您希望通过节点的ID访问节点。

2008-10-10 21:45:05

如果您使用嵌套集(有时称为Modified preorder Tree Traversal)，您可以通过一个查询以树顺序提取整个树结构或其中的任何子树，但插入的代价更大，因为您需要管理通过树结构描述有序路径的列。

对于django-mptt，我使用了这样的结构:

id  parent_id  tree_id  level  lft  rght
--  ---------  -------  -----  ---  ----
 1       null        1      0    1    14
 2          1        1      1    2     7
 3          2        1      2    3     4
 4          2        1      2    5     6
 5          1        1      1    8    13
 6          5        1      2    9    10
 7          5        1      2    11   12

它描述了一个像这样的树(id代表每一项):

 1
 +-- 2
 |   +-- 3
 |   +-- 4
 |
 +-- 5
     +-- 6
     +-- 7

或者，作为一个嵌套的集合图，这使得left和right值的工作方式更加明显:

 __________________________________________________________________________
|  Root 1                                                                  |
|   ________________________________    ________________________________   |
|  |  Child 1.1                     |  |  Child 1.2                     |  |
|  |   ___________    ___________   |  |   ___________    ___________   |  |
|  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  |  |  C 1.2.2  |  |  |
1  2  3___________4  5___________6  7  8  9___________10 11__________12 13 14
|  |________________________________|  |________________________________|  |
|__________________________________________________________________________|

如您所见，要获得给定节点的整个子树，按照树的顺序，您只需选择在其left和right值之间具有left和right值的所有行。检索给定节点的祖先树也很简单。

The level column is a bit of denormalisation for convenience more than anything and the tree_id column allows you to restart the lft and rght numbering for each top-level node, which reduces the number of columns affected by inserts, moves and deletions, as the lft and rght columns have to be adjusted accordingly when these operations take place in order to create or close gaps. I made some development notes at the time when I was trying to wrap my head around the queries required for each operation.

为了实际使用这些数据来显示树，我创建了一个tree_item_iterator实用函数，对于每个节点，它应该为您提供足够的信息来生成您想要的任何类型的显示。

更多关于MPTT的信息:

SQL中的树在数据库中存储分层数据在MySQL中管理分层数据

2008-10-11 12:31:22

比尔的回答非常好，这个答案增加了一些东西，这让我希望SO支持线程的答案。

无论如何，我想要支持树结构和Order属性。我在每个Node中都包含了一个名为leftSibling的属性，它所做的事情与Order在原始问题中所做的事情相同(保持从左到右的顺序)。

mysql> desc nodes ;
+-------------+--------------+------+-----+---------+----------------+
| Field       | Type         | Null | Key | Default | Extra          |
+-------------+--------------+------+-----+---------+----------------+
| id          | int(11)      | NO   | PRI | NULL    | auto_increment |
| name        | varchar(255) | YES  |     | NULL    |                |
| leftSibling | int(11)      | NO   |     | 0       |                |
+-------------+--------------+------+-----+---------+----------------+
3 rows in set (0.00 sec)

mysql> desc adjacencies;
+------------+---------+------+-----+---------+----------------+
| Field      | Type    | Null | Key | Default | Extra          |
+------------+---------+------+-----+---------+----------------+
| relationId | int(11) | NO   | PRI | NULL    | auto_increment |
| parent     | int(11) | NO   |     | NULL    |                |
| child      | int(11) | NO   |     | NULL    |                |
| pathLen    | int(11) | NO   |     | NULL    |                |
+------------+---------+------+-----+---------+----------------+
4 rows in set (0.00 sec)

更多细节和SQL代码在我的博客。

谢谢你，比尔，你的回答对我的开始很有帮助!

2010-12-22 04:31:40

考虑使用像neo4j这样的nosql工具来处理层次结构。例如，像linkedin这样的网络应用程序使用couchbase(另一个nosql解决方案)

但是nosql只能用于数据集市级别的查询，而不能用于存储/维护事务

2012-11-26 15:49:12

这是一个相当老的问题，但由于有很多观点，我认为有必要提出一个替代方案，在我看来，非常优雅的解决方案。

为了读取树结构，可以使用递归通用表表达式(CTEs)。它提供了一次获取整个树结构的可能性，有关于节点的级别，它的父节点和父节点的子节点顺序的信息。

让我向你展示这在PostgreSQL 9.1中是如何工作的。

Create a structure CREATE TABLE tree ( id int NOT NULL, name varchar(32) NOT NULL, parent_id int NULL, node_order int NOT NULL, CONSTRAINT tree_pk PRIMARY KEY (id), CONSTRAINT tree_tree_fk FOREIGN KEY (parent_id) REFERENCES tree (id) NOT DEFERRABLE ); insert into tree values (0, 'ROOT', NULL, 0), (1, 'Node 1', 0, 10), (2, 'Node 1.1', 1, 10), (3, 'Node 2', 0, 20), (4, 'Node 1.1.1', 2, 10), (5, 'Node 2.1', 3, 10), (6, 'Node 1.2', 1, 20); Write a query WITH RECURSIVE tree_search (id, name, level, parent_id, node_order) AS ( SELECT id, name, 0, parent_id, 1 FROM tree WHERE parent_id is NULL UNION ALL SELECT t.id, t.name, ts.level + 1, ts.id, t.node_order FROM tree t, tree_search ts WHERE t.parent_id = ts.id ) SELECT * FROM tree_search WHERE level > 0 ORDER BY level, parent_id, node_order;

以下是调查结果:

     id |    name    | level | parent_id | node_order 
    ----+------------+-------+-----------+------------
      1 | Node 1     |     1 |         0 |         10
      3 | Node 2     |     1 |         0 |         20
      2 | Node 1.1   |     2 |         1 |         10
      6 | Node 1.2   |     2 |         1 |         20
      5 | Node 2.1   |     2 |         3 |         10
      4 | Node 1.1.1 |     3 |         2 |         10
    (6 rows)

树节点按深度排序。在最终输出中，我们将在随后的行中显示它们。

对于每一层，它们都是根据父级中的parent_id和node_order进行排序的。这告诉我们如何在输出链接节点中将它们按此顺序呈现给父节点。

有了这样的结构，用HTML制作一个真正漂亮的演示就不难了。

递归cte可用于PostgreSQL, IBM DB2, MS SQL Server, Oracle和SQLite。

如果你想阅读更多关于递归SQL查询的内容，你可以查看你最喜欢的DBMS的文档，或者阅读我关于这个主题的两篇文章:

做它在SQL:递归树遍历了解SQL递归查询的强大功能

2014-03-13 11:19:21

有一些很好的解决方案利用了sql索引的内部btree表示。这是基于1998年左右的一些伟大的研究。

下面是一个示例表(在mysql中)。

CREATE TABLE `node` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `tw` int(10) unsigned NOT NULL,
  `pa` int(10) unsigned DEFAULT NULL,
  `sz` int(10) unsigned DEFAULT NULL,
  `nc` int(11) GENERATED ALWAYS AS (tw+sz) STORED,
  PRIMARY KEY (`id`),
  KEY `node_tw_index` (`tw`),
  KEY `node_pa_index` (`pa`),
  KEY `node_nc_index` (`nc`),
  CONSTRAINT `node_pa_fk` FOREIGN KEY (`pa`) REFERENCES `node` (`tw`) ON DELETE CASCADE
)

树表示中唯一需要的字段是:

tw:从左到右的DFS预购索引，其中根= 1。 pa:对父节点的引用(使用tw)，根节点为空。 sz:包括节点本身在内的节点分支的大小。 Nc:用作语法糖。它是tw+sz，表示节点的“下一个子”的tw。

下面是一个例子，24个节点填充，按tw排序:

+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|   2 | A       |  2 |    1 |   14 |   16 |
|   3 | AA      |  3 |    2 |    1 |    4 |
|   4 | AB      |  4 |    2 |    7 |   11 |
|   5 | ABA     |  5 |    4 |    1 |    6 |
|   6 | ABB     |  6 |    4 |    3 |    9 |
|   7 | ABBA    |  7 |    6 |    1 |    8 |
|   8 | ABBB    |  8 |    6 |    1 |    9 |
|   9 | ABC     |  9 |    4 |    2 |   11 |
|  10 | ABCD    | 10 |    9 |    1 |   11 |
|  11 | AC      | 11 |    2 |    4 |   15 |
|  12 | ACA     | 12 |   11 |    2 |   14 |
|  13 | ACAA    | 13 |   12 |    1 |   14 |
|  14 | ACB     | 14 |   11 |    1 |   15 |
|  15 | AD      | 15 |    2 |    1 |   16 |
|  16 | B       | 16 |    1 |    1 |   17 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
|  18 | D       | 23 |    1 |    1 |   24 |
|  19 | E       | 24 |    1 |    1 |   25 |
+-----+---------+----+------+------+------+

每个树的结果都是非递归的。例如，要获取tw='22'节点的父节点列表

的祖先

select anc.* from node me,node anc 
where me.tw=22 and anc.nc >= me.tw and anc.tw <= me.tw 
order by anc.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

兄弟姐妹和孩子是微不足道的-只需使用pa字段按tw排序。

的后代

例如，根在tw = 17的节点的集合(分支)。

select des.* from node me,node des 
where me.tw=17 and des.tw < me.nc and des.tw >= me.tw 
order by des.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

额外的笔记

当读取的数量远远大于插入或更新的数量时，这种方法非常有用。

因为树中节点的插入、移动或更新需要调整树，所以在开始操作之前必须锁定表。

插入/删除成本很高，因为tw索引和sz(分支大小)值需要在插入点之后的所有节点上更新，并且需要分别对所有祖先节点更新。

分支移动涉及到将分支的tw值移出范围，因此在移动分支时禁用外键约束也是必要的。移动一个分支需要四个查询:

把树枝移出范围。填补它留下的缺口。(剩下的树现在是正常化的)。打开它要去的地方的缺口。移动树枝到它的新位置。

调整树查询

树中间隙的打开/关闭是创建/更新/删除方法使用的一个重要子函数，因此我将它包含在这里。

我们需要两个参数——一个标志表示是缩小还是扩大，另一个是节点的tw索引。因此，例如tw=18(分支大小为5)。让我们假设我们正在缩小(删除tw) -这意味着我们在下面的例子的更新中使用'-'而不是'+'。

我们首先使用一个(稍微改变的)祖先函数来更新sz值。

update node me, node anc set anc.sz = anc.sz - me.sz from 
node me, node anc where me.tw=18 
and ((anc.nc >= me.tw and anc.tw < me.pa) or (anc.tw=me.pa));

然后我们需要为那些tw高于要移除的分支调整tw。

update node me, node anc set anc.tw = anc.tw - me.sz from 
node me, node anc where me.tw=18 and anc.tw >= me.tw;

然后我们需要调整那些pa的tw比要移除的分支高的父类。

update node me, node anc set anc.pa = anc.pa - me.sz from 
node me, node anc where me.tw=18 and anc.pa >= me.tw;

2017-03-14 08:43:51

基于邻接表示的动态路径枚举的预序截线

嵌套集来自:

Konchog https://stackoverflow.com/a/42781302/895245 约翰尼·布坎南https://stackoverflow.com/a/194031/895245

是我见过的唯一有效的遍历方式，但代价是更新速度较慢。这可能是大多数人想要预订的。

来自https://stackoverflow.com/a/192462/895245的闭包表很有趣，但我不知道如何强制提前:MySQL闭包表分层数据库-如何以正确的顺序拉出信息

主要是为了好玩，这里有一个递归计算1.3.2.5的方法。前缀，并在最后根据它们进行排序，仅基于父ID/子索引表示。

好处:

更新只需要更新每个兄弟节点的索引

缺点:

N ^2内存使用量对于超深树来说是最坏的情况。这可能是相当严重的，这就是为什么我说这种方法可能主要只是为了好玩。但也许在某些超高更新的情况下，有人会想要使用它?谁知道递归查询，所以读的效率比嵌套集要低

创建并填充表:

CREATE TABLE "ParentIndexTree" (
  "id" INTEGER PRIMARY KEY,
  "parentId" INTEGER,
  "childIndex" INTEGER NOT NULL,
  "value" INTEGER NOT NULL,
  "name" TEXT NOT NULL,
  FOREIGN KEY ("parentId") REFERENCES "ParentIndexTree"(id)
)
;
INSERT INTO "ParentIndexTree" VALUES
  (0, NULL, 0, 1, 'one'  ),
  (1, 0,    0, 2, 'two'  ),
  (2, 0,    1, 3, 'three'),
  (3, 1,    0, 4, 'four' ),
  (4, 1,    1, 5, 'five' )
;

代表树:

然后，对于像PostgreSQL这样的数组DBMS (https://www.postgresql.org/docs/14/arrays.html):

WITH RECURSIVE "TreeSearch" (
  "id",
  "parentId",
  "childIndex",
  "value",
  "name",
  "prefix"
) AS (
  SELECT
    "id",
    "parentId",
    "childIndex",
    "value",
    "name",
    array[0]
  FROM "ParentIndexTree"
  WHERE "parentId" IS NULL

  UNION ALL

  SELECT
    "child"."id",
    "child"."parentId",
    "child"."childIndex",
    "child"."value",
    "child"."name",
    array_append("parent"."prefix", "child"."childIndex")
  FROM "ParentIndexTree" AS "child"
  JOIN "TreeSearch" AS "parent"
    ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;

这将创建动态的表单前缀:

1 -> 0
2 -> 0, 0
3 -> 0, 1
4 -> 0, 0, 0
5 -> 0, 0, 1

然后PostgreSQL按字母顺序排序:

1 -> 0
2 -> 0, 0
4 -> 0, 0, 0
5 -> 0, 0, 1
3 -> 0, 1

这就是我们想要的预购结果。

对于像SQLite这样没有数组的DBMS，可以通过使用固定宽度的整数字符串来编码前缀。二进制是理想的，但我不知道怎么做，所以十六进制可以工作。当然，这意味着你必须选择一个最大深度，以适应所选字节的数量，例如下面我选择6，允许每个节点最多16^6个子节点。

WITH RECURSIVE "TreeSearch" (
  "id",
  "parentId",
  "childIndex",
  "value",
  "name",
  "prefix"
) AS (
  SELECT
    "id",
    "parentId",
    "childIndex",
    "value",
    "name",
    '000000'
  FROM "ParentIndexTree"
  WHERE "parentId" IS NULL

  UNION ALL

  SELECT
    "child"."id",
    "child"."parentId",
    "child"."childIndex",
    "child"."value",
    "child"."name",
    "parent"."prefix" || printf('%06x', "child"."childIndex")
  FROM "ParentIndexTree" AS "child"
  JOIN "TreeSearch" AS "parent"
    ON "child"."parentId" = "parent"."id"
)
SELECT * FROM "TreeSearch"
ORDER BY "prefix"
;

一些嵌套的集合注释

在看了其他嵌套的答案后，这里有几个点让我有点困惑。

Jonny Buchanan展示了他的嵌套设置:

__________________________________________________________________________
|  Root 1                                                                  |
|   ________________________________    ________________________________   |
|  |  Child 1.1                     |  |  Child 1.2                     |  |
|  |   ___________    ___________   |  |   ___________    ___________   |  |
|  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  |  |  C 1.2.2  |  |  |
1  2  3___________4  5___________6  7  8  9___________10 11__________12 13 14
|  |________________________________|  |________________________________|  |
|__________________________________________________________________________|

这让我想知道为什么不使用更简单的外观:

__________________________________________________________________________
|  Root 1                                                                 |
|   ________________________________    _______________________________   |
|  |  Child 1.1                     |  |  Child 1.2                    |  |
|  |   ___________    ___________   |  |   ___________   ___________   |  |
|  |  |  C 1.1.1  |  |  C 1.1.2  |  |  |  |  C 1.2.1  | |  C 1.2.2  |  |  |
1  2  3___________|  4___________|  |  5  6___________| 7___________|  |  | 
|  |________________________________|  |_______________________________|  |
|_________________________________________________________________________|

每个端点都没有额外的数字。

但当我真正尝试实现它时，我注意到很难/不可能实现这样的更新查询，除非我有Konchog所使用的父级信息。问题是，当树被移动时，在某种情况下很难/不可能区分兄弟姐妹和父母，我需要这来决定是否要在缩小差距时减少右手边。

左/大小vs左/右:你可以在数据库中以任何一种方式存储它，但我认为左/右可以更有效，因为你可以用多列索引(左，右)索引DB，然后可以用来加速祖先查询，这是类型:

left < curLeft AND right > curLeft

在Ubuntu 22.04, PostgreSQL 14.5, SQLite 3.34.0上测试。

2022-09-16 07:59:35

将平面表格解析为树的最有效/最优雅的方法是什么?

推荐文章

最新文章

标签