将平面表格解析为树的最有效/最优雅的方法是什么?

假设你有一个扁平的表，存储一个有序的树层次结构:

Id   Name         ParentId   Order
 1   'Node 1'            0      10
 2   'Node 1.1'          1      10
 3   'Node 2'            0      20
 4   'Node 1.1.1'        2      10
 5   'Node 2.1'          3      10
 6   'Node 1.2'          1      20

这是一个图表，我们有[id] Name。根节点0是虚构的。

                       [0] ROOT
                          /    \ 
              [1] Node 1          [3] Node 2
              /       \                   \
    [2] Node 1.1     [6] Node 1.2      [5] Node 2.1
          /          
 [4] Node 1.1.1

您将使用什么极简的方法将其输出到HTML(或文本，就此而言)，作为一个正确有序、正确缩进的树?

进一步假设您只有基本的数据结构(数组和hashmap)，没有带有父/子引用的花哨对象，没有ORM，没有框架，只有您的两只手。该表表示为一个结果集，可以随机访问。

伪代码或简单的英语是可以的，这纯粹是一个概念问题。

附加问题:在RDBMS中是否存在从根本上更好的方法来存储这样的树结构?

编辑和添加

回答一位评论者(Mark Bessey)的问题:根节点是不必要的，因为无论如何它都不会显示。ParentId = 0是表示“这些是顶级”的惯例。Order列定义了具有相同父节点的节点如何排序。

我所说的“结果集”可以被描绘成一个hashmap数组(继续使用这个术语)。因为我的例子本来就应该在那里。有些答案是额外的，首先构建它，但这没关系。

树可以任意深。每个节点可以有N个子节点。不过，我脑子里并没有“数百万条”树。

不要把我选择的节点命名(“节点1.1.1”)误认为是可以依赖的。节点也可以被称为“Frank”或“Bob”，没有隐含的命名结构，这只是为了让它更具可读性。

我已经发布了我自己的解决方案，所以你们可以把它拆成碎片。

当前回答

现在MySQL 8.0支持递归查询，我们可以说所有流行的SQL数据库都支持标准语法的递归查询。

WITH RECURSIVE MyTree AS (
    SELECT * FROM MyTable WHERE ParentId IS NULL
    UNION ALL
    SELECT m.* FROM MyTABLE AS m JOIN MyTree AS t ON m.ParentId = t.Id
)
SELECT * FROM MyTree;

我在2017年的演讲中测试了MySQL 8.0中的递归查询。

以下是我2008年的原始答案:

有几种方法可以在关系数据库中存储树形结构的数据。您在示例中展示的内容使用了两个方法:

邻接表(“父”列)和路径枚举(名称列中的虚线数字)。

另一种解决方案称为嵌套集，它也可以存储在同一个表中。阅读Joe Celko的“Smarties SQL中的树和层次结构”，了解更多关于这些设计的信息。

我通常更喜欢一种称为闭包表(又名“邻接关系”)的设计来存储树状结构的数据。它需要另一个表，但是查询树非常简单。

我在演示中用SQL和PHP构建分层数据模型时提到了闭包表，在我的书《SQL反模式第1卷:避免数据库编程的陷阱》中也提到了闭包表。

CREATE TABLE ClosureTable (
  ancestor_id   INT NOT NULL REFERENCES FlatTable(id),
  descendant_id INT NOT NULL REFERENCES FlatTable(id),
  PRIMARY KEY (ancestor_id, descendant_id)
);

将所有路径存储在闭包表中，其中存在从一个节点到另一个节点的直接祖先。为每个节点包含一行以引用自身。例如，使用你在问题中展示的数据集:

INSERT INTO ClosureTable (ancestor_id, descendant_id) VALUES
  (1,1), (1,2), (1,4), (1,6),
  (2,2), (2,4),
  (3,3), (3,5),
  (4,4),
  (5,5),
  (6,6);

现在你可以得到一个从节点1开始的树，像这样:

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1;

输出(在MySQL客户端中)如下所示:

+----+
| id |
+----+
|  1 | 
|  2 | 
|  4 | 
|  6 | 
+----+

换句话说，节点3和5被排除在外，因为它们是单独层次结构的一部分，而不是从节点1向下延伸。

回复:e-satis关于直系子女(或直系父母)的评论。您可以向ClosureTable添加一个“path_length”列，以便更容易查询直接的子节点或父节点(或任何其他距离)。

INSERT INTO ClosureTable (ancestor_id, descendant_id, path_length) VALUES
  (1,1,0), (1,2,1), (1,4,2), (1,6,1),
  (2,2,0), (2,4,1),
  (3,3,0), (3,5,1),
  (4,4,0),
  (5,5,0),
  (6,6,0);

然后，您可以在搜索中添加一个术语，以查询给定节点的直接子节点。这些是path_length为1的后代。

SELECT f.* 
FROM FlatTable f 
  JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
  AND path_length = 1;

+----+
| id |
+----+
|  2 | 
|  6 | 
+----+

回复来自@ashraf的评论:“把整棵树(按名字)排序怎么样?”

下面是一个示例查询，返回节点1的所有后代节点，将它们连接到包含其他节点属性(如名称)的FlatTable，并按名称排序。

SELECT f.name
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id)
WHERE a.ancestor_id = 1
ORDER BY f.name;

关于@Nate的评论:

SELECT f.name, GROUP_CONCAT(b.ancestor_id order by b.path_length desc) AS breadcrumbs
FROM FlatTable f 
JOIN ClosureTable a ON (f.id = a.descendant_id) 
JOIN ClosureTable b ON (b.descendant_id = a.descendant_id) 
WHERE a.ancestor_id = 1 
GROUP BY a.descendant_id 
ORDER BY f.name

+------------+-------------+
| name       | breadcrumbs |
+------------+-------------+
| Node 1     | 1           |
| Node 1.1   | 1,2         |
| Node 1.1.1 | 1,2,4       |
| Node 1.2   | 1,6         |
+------------+-------------+

一位用户建议今天进行编辑。版主批准了编辑，但我要撤销它。

编辑建议上面最后一个查询中的ORDER BY应该是ORDER BY b.path_length, f.name，大概是为了确保顺序与层次结构匹配。但这行不通，因为它会将“Node 1.1.1”排在“Node 1.2”之后。

如果您希望排序以合理的方式匹配层次结构，这是可能的，但不是简单地按路径长度排序。例如，请参阅我对MySQL闭包表分层数据库的回答-如何以正确的顺序提取信息。

2008-10-10 17:58:07

其他回答

有一些很好的解决方案利用了sql索引的内部btree表示。这是基于1998年左右的一些伟大的研究。

下面是一个示例表(在mysql中)。

CREATE TABLE `node` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(255) NOT NULL,
  `tw` int(10) unsigned NOT NULL,
  `pa` int(10) unsigned DEFAULT NULL,
  `sz` int(10) unsigned DEFAULT NULL,
  `nc` int(11) GENERATED ALWAYS AS (tw+sz) STORED,
  PRIMARY KEY (`id`),
  KEY `node_tw_index` (`tw`),
  KEY `node_pa_index` (`pa`),
  KEY `node_nc_index` (`nc`),
  CONSTRAINT `node_pa_fk` FOREIGN KEY (`pa`) REFERENCES `node` (`tw`) ON DELETE CASCADE
)

树表示中唯一需要的字段是:

tw:从左到右的DFS预购索引，其中根= 1。 pa:对父节点的引用(使用tw)，根节点为空。 sz:包括节点本身在内的节点分支的大小。 Nc:用作语法糖。它是tw+sz，表示节点的“下一个子”的tw。

下面是一个例子，24个节点填充，按tw排序:

+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|   2 | A       |  2 |    1 |   14 |   16 |
|   3 | AA      |  3 |    2 |    1 |    4 |
|   4 | AB      |  4 |    2 |    7 |   11 |
|   5 | ABA     |  5 |    4 |    1 |    6 |
|   6 | ABB     |  6 |    4 |    3 |    9 |
|   7 | ABBA    |  7 |    6 |    1 |    8 |
|   8 | ABBB    |  8 |    6 |    1 |    9 |
|   9 | ABC     |  9 |    4 |    2 |   11 |
|  10 | ABCD    | 10 |    9 |    1 |   11 |
|  11 | AC      | 11 |    2 |    4 |   15 |
|  12 | ACA     | 12 |   11 |    2 |   14 |
|  13 | ACAA    | 13 |   12 |    1 |   14 |
|  14 | ACB     | 14 |   11 |    1 |   15 |
|  15 | AD      | 15 |    2 |    1 |   16 |
|  16 | B       | 16 |    1 |    1 |   17 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
|  18 | D       | 23 |    1 |    1 |   24 |
|  19 | E       | 24 |    1 |    1 |   25 |
+-----+---------+----+------+------+------+

每个树的结果都是非递归的。例如，要获取tw='22'节点的父节点列表

的祖先

select anc.* from node me,node anc 
where me.tw=22 and anc.nc >= me.tw and anc.tw <= me.tw 
order by anc.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|   1 | Root    |  1 | NULL |   24 |   25 |
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

兄弟姐妹和孩子是微不足道的-只需使用pa字段按tw排序。

的后代

例如，根在tw = 17的节点的集合(分支)。

select des.* from node me,node des 
where me.tw=17 and des.tw < me.nc and des.tw >= me.tw 
order by des.tw;
+-----+---------+----+------+------+------+
| id  | name    | tw | pa   | sz   | nc   |
+-----+---------+----+------+------+------+
|  17 | C       | 17 |    1 |    6 |   23 |
| 359 | C0      | 18 |   17 |    5 |   23 |
| 360 | C1      | 19 |   18 |    4 |   23 |
| 361 | C2(res) | 20 |   19 |    3 |   23 |
| 362 | C3      | 21 |   20 |    2 |   23 |
| 363 | C4      | 22 |   21 |    1 |   23 |
+-----+---------+----+------+------+------+

额外的笔记

当读取的数量远远大于插入或更新的数量时，这种方法非常有用。

因为树中节点的插入、移动或更新需要调整树，所以在开始操作之前必须锁定表。

插入/删除成本很高，因为tw索引和sz(分支大小)值需要在插入点之后的所有节点上更新，并且需要分别对所有祖先节点更新。

分支移动涉及到将分支的tw值移出范围，因此在移动分支时禁用外键约束也是必要的。移动一个分支需要四个查询:

把树枝移出范围。填补它留下的缺口。(剩下的树现在是正常化的)。打开它要去的地方的缺口。移动树枝到它的新位置。

调整树查询

树中间隙的打开/关闭是创建/更新/删除方法使用的一个重要子函数，因此我将它包含在这里。

我们需要两个参数——一个标志表示是缩小还是扩大，另一个是节点的tw索引。因此，例如tw=18(分支大小为5)。让我们假设我们正在缩小(删除tw) -这意味着我们在下面的例子的更新中使用'-'而不是'+'。

我们首先使用一个(稍微改变的)祖先函数来更新sz值。

update node me, node anc set anc.sz = anc.sz - me.sz from 
node me, node anc where me.tw=18 
and ((anc.nc >= me.tw and anc.tw < me.pa) or (anc.tw=me.pa));

然后我们需要为那些tw高于要移除的分支调整tw。

update node me, node anc set anc.tw = anc.tw - me.sz from 
node me, node anc where me.tw=18 and anc.tw >= me.tw;

然后我们需要调整那些pa的tw比要移除的分支高的父类。

update node me, node anc set anc.pa = anc.pa - me.sz from 
node me, node anc where me.tw=18 and anc.pa >= me.tw;

2017-03-14 08:43:51

如果元素是按树顺序排列的，如你的例子所示，你可以使用以下Python示例:

delimiter = '.'
stack = []
for item in items:
  while stack and not item.startswith(stack[-1]+delimiter):
    print "</div>"
    stack.pop()
  print "<div>"
  print item
  stack.append(item)

这样做的目的是维护一个表示树中当前位置的堆栈。对于表中的每个元素，它弹出堆栈元素(关闭匹配的div)，直到找到当前项的父元素。然后它输出该节点的开始并将其推入堆栈。

如果希望使用缩进而不是嵌套元素输出树，可以简单地跳过print语句来打印div，并在每个项之前打印等于堆栈大小的若干倍的空格。例如，在Python中:

print "  " * len(stack)

您还可以轻松地使用此方法构造一组嵌套的列表或字典。

编辑:我从你的澄清中看到，这些名称并不是节点路径。这就提出了另一种方法:

idx = {}
idx[0] = []
for node in results:
  child_list = []
  idx[node.Id] = child_list
  idx[node.ParentId].append((node, child_list))

这将构造一个元组数组树(!)。Idx[0]表示树的根。数组中的每个元素都是一个二元组，由节点本身及其所有子元素的列表组成。构造后，可以保留idx[0]并丢弃idx，除非您希望通过节点的ID访问节点。

2008-10-10 21:45:05

如果有选择的话，我会用对象。我将为每条记录创建一个对象，其中每个对象都有一个子集合，并将它们全部存储在一个assoc数组(/hashtable)中，其中Id是键。并在集合中闪动一次，将子元素添加到相关的子字段中。简单。

但是因为你限制一些好的OOP的使用是没有乐趣的，我可能会基于:

function PrintLine(int pID, int level)
    foreach record where ParentID == pID
        print level*tabs + record-data
        PrintLine(record.ID, level + 1)

PrintLine(0, 0)

编辑:这与其他几个条目类似，但我认为它稍微干净一些。我要补充一点:这是非常sql密集的。这是令人讨厌的。如果可以选择的话，选择面向对象的方法。

2008-10-10 17:36:21

您可以使用hashmap模拟任何其他数据结构，因此这并不是一个可怕的限制。从上到下扫描，为数据库的每一行创建hashmap，为每一列创建一个条目。将这些hashmap添加到“master”hashmap中，并以id为键。如果任何节点都有一个尚未见过的“父”节点，请在主hashmap中为它创建一个占位符条目，并在看到实际节点时填充它。

要将其打印出来，只需对数据进行简单的深度优先遍历，并在此过程中跟踪缩进级别。您可以通过为每一行保留一个“子”条目，并在扫描数据时填充它来简化这一点。

至于是否有“更好”的方法在数据库中存储树，这取决于您将如何使用数据。我曾经见过一些系统，它们具有已知的最大深度，但却为层次结构中的每个级别使用不同的表。如果树中的级别并不完全相同(顶级类别与叶类别不同)，那么这就很有意义了。

2008-10-10 17:24:34

这写得很快，既不漂亮也不高效(加上它自动装箱很多，在int和Integer之间转换很烦人!)，但它是有效的。

这可能打破了规则，因为我创建自己的对象，但嘿，我这样做是为了从实际工作中转移注意力:)

这还假定在开始构建Nodes之前，resultSet/table已完全读入某种结构，如果您有数十万行，这不是最佳解决方案。

public class Node {

    private Node parent = null;

    private List<Node> children;

    private String name;

    private int id = -1;

    public Node(Node parent, int id, String name) {
        this.parent = parent;
        this.children = new ArrayList<Node>();
        this.name = name;
        this.id = id;
    }

    public int getId() {
        return this.id;
    }

    public String getName() {
        return this.name;
    }

    public void addChild(Node child) {
        children.add(child);
    }

    public List<Node> getChildren() {
        return children;
    }

    public boolean isRoot() {
        return (this.parent == null);
    }

    @Override
    public String toString() {
        return "id=" + id + ", name=" + name + ", parent=" + parent;
    }
}

public class NodeBuilder {

    public static Node build(List<Map<String, String>> input) {

        // maps id of a node to it's Node object
        Map<Integer, Node> nodeMap = new HashMap<Integer, Node>();

        // maps id of a node to the id of it's parent
        Map<Integer, Integer> childParentMap = new HashMap<Integer, Integer>();

        // create special 'root' Node with id=0
        Node root = new Node(null, 0, "root");
        nodeMap.put(root.getId(), root);

        // iterate thru the input
        for (Map<String, String> map : input) {

            // expect each Map to have keys for "id", "name", "parent" ... a
            // real implementation would read from a SQL object or resultset
            int id = Integer.parseInt(map.get("id"));
            String name = map.get("name");
            int parent = Integer.parseInt(map.get("parent"));

            Node node = new Node(null, id, name);
            nodeMap.put(id, node);

            childParentMap.put(id, parent);
        }

        // now that each Node is created, setup the child-parent relationships
        for (Map.Entry<Integer, Integer> entry : childParentMap.entrySet()) {
            int nodeId = entry.getKey();
            int parentId = entry.getValue();

            Node child = nodeMap.get(nodeId);
            Node parent = nodeMap.get(parentId);
            parent.addChild(child);
        }

        return root;
    }
}

public class NodePrinter {

    static void printRootNode(Node root) {
        printNodes(root, 0);
    }

    static void printNodes(Node node, int indentLevel) {

        printNode(node, indentLevel);
        // recurse
        for (Node child : node.getChildren()) {
            printNodes(child, indentLevel + 1);
        }
    }

    static void printNode(Node node, int indentLevel) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < indentLevel; i++) {
            sb.append("\t");
        }
        sb.append(node);

        System.out.println(sb.toString());
    }

    public static void main(String[] args) {

        // setup dummy data
        List<Map<String, String>> resultSet = new ArrayList<Map<String, String>>();
        resultSet.add(newMap("1", "Node 1", "0"));
        resultSet.add(newMap("2", "Node 1.1", "1"));
        resultSet.add(newMap("3", "Node 2", "0"));
        resultSet.add(newMap("4", "Node 1.1.1", "2"));
        resultSet.add(newMap("5", "Node 2.1", "3"));
        resultSet.add(newMap("6", "Node 1.2", "1"));

        Node root = NodeBuilder.build(resultSet);
        printRootNode(root);

    }

    //convenience method for creating our dummy data
    private static Map<String, String> newMap(String id, String name, String parentId) {
        Map<String, String> row = new HashMap<String, String>();
        row.put("id", id);
        row.put("name", name);
        row.put("parent", parentId);
        return row;
    }
}

2008-10-10 18:25:29

将平面表格解析为树的最有效/最优雅的方法是什么?

推荐文章

最新文章

标签