Model Tree Structures with CockroachDB

Model hierarchies using the Nested Set Model with CockroachDB

·

11 min read

Overview

Modeling hierarchical tree structures in SQL databases has traditionally been difficult. At least until the introduction of Recursive Common Table Expressions (Recursive CTEs) that made it much simpler to craft queries to traverse a recursive path within a hierarchy, aka a tree structure.

A tree is an undirected, connected graph with no cycle, so when modeling product catalogs with categories for example, it's formally a special form of graph:

image.png

You don't need to use a graph database for that though. There are two main approaches to modeling tree structures in relational SQL databases:

  • Adjacency List Model
  • Nested Set Model

The Adjacency List Model with recursive CTEs provides the easiest approach. It's much less taxing on mutations of the tree structure, more performant and supports arbitrarily deep nestings / levels. The SQL is also much simpler and client code cleaner as well.

The Nested Set Model requires tracking two references per item instead of a single one in the adjacency model, making things less efficient and more difficult to implement in client code and SQL.

The Nested Set Model is not commonly used in SQL databases anymore since the introduction of recursive CTEs decades ago. That doesn't make it less interesting though, and it's likely still in use in many systems, in particular MySQL. Although completely unrelated to this, it's like exploring shortest path algorithms like Dijkstra's. Just of those things interesting to know a bit more about.

image.png

Let's look at an implementation of the nested set model using Spring Boot and CockroachDB to follow the historical order. We could have used the adjacency list model with recursive CTEs instead, but that's for a follow up post.

Source Code

The source code for this example is available in GitHub.

The Adjacency List Model

The adjacency list model is pretty straightforward. It's a parent-child relationship where each item in the table contains a pointer to its parent. The benefits are that it's simple, provides referential integrity and its also easy and efficient to update the structure using SQL.

The main drawback is that it performs poorly with deeply nested tree's (many levels), unless using recursive CTEs. Combined with CTEs, this is the preferred and portable approach to use in modern databases over the nested set model.

tree.jpg

create table category
(
    id            int          not null default unordered_unique_rowid(),
    name          varchar(64)  not null,
    description   varchar(256),
    parent_id     int,

    primary key (id)
);

Listing all the categories:

select id,
       name,
       parent_id
from category order by name;
idnameparent_id
1090122137911951370-605980017278022057985
1854157069397262337101-2505980017278022057985
3771564610750251009251-5005980017278022057985
134299851169071104161-805980017278022057985
868611770411830476981-1005980017278022057985
9138447991692328961Anis1692308957788635137
7371629562879541249Argentina9184328412896165889
4597412192419315713Australia9184328412896165889
7882788120586092545Austria3785638359585783809
1066449347072491521Available in store1942399474596052993
3447586912556285953Box1468043770094419969
5176406219513135105Bread1692308957788635137
888979374256422913Butter1692308957788635137
3175119135100370945Champagne1468043770094419969
5261939428061085697Chardonnay2664488342975152129
9124655717833506817Countrynull
939081920110919681Dessert1468043770094419969
3785638359585783809Europe9124655717833506817
7803834389618753537Findings under 1501942399474596052993
2379389375939346433France3785638359585783809
6464154237964386305Germany3785638359585783809
2664488342975152129Grapenull
7704051510374825985Honey1692308957788635137
412442238685282305Italy3785638359585783809
4060217199367028737Kanonkop3346396658428805121
1692308957788635137Labelnull
9197698474289922049Masi3346396658428805121
4113521523081609217Merlot2664488342975152129
5410171187671334913Miguel Torres3346396658428805121
9184328412896165889New World9124655717833506817
5980017278022057985Pricenull
3346396658428805121Producernull
6128495328236929025Recent News1942399474596052993
4539709822193631233Red1468043770094419969
3694686757736153089Riesling2664488342975152129
2098758824158822401Rose1468043770094419969
2381641175753031681South Africa9184328412896165889
2032612204631818241Spain3785638359585783809
6512286458981908481Sparkling1468043770094419969
1942399474596052993Specialnull
1468043770094419969Typesnull
3501067158131310593Vanilla1692308957788635137
7836344749428834305White1468043770094419969
4807111050068754433Young1692308957788635137

Tree listing using recursive CTEs

Just to show the usefulness of recursive CTEs, this query will list the entire tree structure in one single query:

with recursive category_tree (id, name, parent_id, level, path) as
                   (select id,
                           name,
                           parent_id,
                           0,
                           concat('/', name, '/')
                    from category
                    where parent_id is null -- anchor

                    union all

                    select c.id,
                           c.name,
                           c.parent_id,
                           ct.level + 1,
                           concat(ct.path, c.name, '/')
                    from category c
                             join category_tree ct
                                  on c.parent_id = ct.id)
select *
from category_tree
order by path;

To filter down on a particular sub-tree name, just change the CTE anchor query:

with recursive category_tree (id, name, parent_id, level, path) as
                   (select id,
                           name,
                           parent_id,
                           0,
                           concat('/', name, '/')
                    from category
                    where parent_id is null
                      and name = 'Label' -- anchor

                    union all

                    select c.id,
                           c.name,
                           c.parent_id,
                           ct.level + 1,
                           concat(ct.path, c.name, '/')
                    from category c
                             join category_tree ct
                                  on c.parent_id = ct.id)
select *
from category_tree
order by path;

Now, forget about the CTE approach and let's do it the hard way.

The Nested Set Model

In the Nested Set Model, each item in the table contains two pointers to track the containment of sublevels. This allows for a query to extract sub-trees in one single query rather than use recursion or one self-join per level, which is the drawback of the adjacency model without a recursive CTE.

As such, a tree hierarchy is not expressed as vertices and edges in a parent-child relationship, but instead as nested containers. Instead, picture the wine catalog like this:

tree_2.jpg

Let's update the schema:

create table category
(
    id            int          not null default unordered_unique_rowid(),
    name          varchar(64)  not null,
    description   varchar(256),
    lft           int          not null,
    rgt           int          not null,
    parent_id     int,

    primary key (id)
);
select id,
       name,
       lft,
       rgt
from category order by name;
idnamelftrgt
1090122137911951370-6023
1854157069397262337101-25045
3771564610750251009251-50067
134299851169071104161-8089
868611770411830476981-1001011
9138447991692328961Anis1415
7371629562879541249Argentina6566
4597412192419315713Australia6768
7882788120586092545Austria5354
1066449347072491521Available in store2829
3447586912556285953Box3637
5176406219513135105Bread1617
888979374256422913Butter1819
3175119135100370945Champagne3839
5261939428061085697Chardonnay8283
9124655717833506817Country5172
939081920110919681Dessert4041
3785638359585783809Europe5263
7803834389618753537Findings under 1503031
2379389375939346433France5556
6464154237964386305Germany5758
2664488342975152129Grape8188
7704051510374825985Honey2021
412442238685282305Italy5960
4060217199367028737Kanonkop7475
1692308957788635137Label1326
9197698474289922049Masi7677
4113521523081609217Merlot8687
5410171187671334913Miguel Torres7879
9184328412896165889New World6471
5980017278022057985Price112
3346396658428805121Producer7380
6128495328236929025Recent News3233
4539709822193631233Red4243
3694686757736153089Riesling8485
2098758824158822401Rose4445
2381641175753031681South Africa6970
2032612204631818241Spain6162
6512286458981908481Sparkling4647
1942399474596052993Special2734
1468043770094419969Types3550
3501067158131310593Vanilla2223
7836344749428834305White4849
4807111050068754433Young2425

The left and right values for each item is determined by traversing the graph starting from the root (there are multiple roots here) and descending to each leaf node, before assigning a number to the right and moving on. Gradually progressing from left to right. This is known as the modified preorder tree traversal (MPTT) algorithm.

Category structure example with the left and right numbers laid out:

tree_3.jpg

Tree listing using Nested Set Model

To show the nested set model in action, this is the query that will list the entire tree structure using a cross join:

select category0_.name as col_0_0_, count(category1_.name) - 1 as col_1_0_
from category category0_
         cross join category category1_
where category0_.lft between category1_.lft and category1_.rgt
group by category0_.name, category0_.lft
order by category0_.lft;
col_0_0_col_1_0_
Price0
0-601
101-2501
251-5001
61-801
81-1001
Label0
Anis1
Bread1
Butter1
Honey1
Vanilla1
Young1
Special0
Available in store1
Findings under 1501
Recent News1
Types0
Box1
Champagne1
Dessert1
Red1
Rose1
Sparkling1
White1
Country0
Europe1
Austria2
France2
Germany2
Italy2
Spain2
New World1
Argentina2
Australia2
South Africa2
Producer0
Kanonkop1
Masi1
Miguel Torres1
Grape0
Chardonnay1
Riesling1
Merlot1

To filter down on a particular sub-tree name:

SELECT node.name AS name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM category AS node,
     category AS parent,
     category AS sub_parent,
     (SELECT node.name, (COUNT(parent.name) - 1) AS depth
      FROM category AS node,
           category AS parent
      WHERE node.lft BETWEEN parent.lft AND parent.rgt
        AND node.name = 'Label'
      GROUP BY node.name, node.lft
      ORDER BY node.lft) AS sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
  AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
  AND sub_parent.name = sub_tree.name
GROUP BY node.name, node.lft, depth
ORDER BY node.lft;
namedepth
Label0
Anis1
Bread1
Butter1
Honey1
Vanilla1
Young1

The subquery could also be extracted to a CTE to improve readability a bit by rewriting the query:

with sub_tree as (SELECT node.name, (COUNT(parent.name) - 1) AS depth
                  FROM category AS node,
                       category AS parent
                  WHERE node.lft BETWEEN parent.lft AND parent.rgt
                    AND node.name = 'Label'
                  GROUP BY node.name, node.lft
                  ORDER BY node.lft)
SELECT node.name AS name, (COUNT(parent.name) - (sub_tree.depth + 1)) AS depth
FROM category AS node,
     category AS parent,
     category AS sub_parent,
     sub_tree
WHERE node.lft BETWEEN parent.lft AND parent.rgt
  AND node.lft BETWEEN sub_parent.lft AND sub_parent.rgt
  AND sub_parent.name = sub_tree.name
GROUP BY node.name, node.lft, depth
ORDER BY node.lft;

Demo Project

To explore the nested set model in a semi-realistic context, let us use a service with the following product catalog schema:

categorized_product.jpg

The entity model contains of products, product variations with attributes and categories organized into a tree structure. A fairly common layout for an online webshop and similar.

The main artifacts of interest are:

  • Category - Domain entity that models a hierarchy of categories.
  • JpaCategoryRepository - The JPA implementation using Criteria API to craft the nested set model operations.
  • ProductCatalogTest - Functional tests of the product catalog.

This implementation is using JPA with Hibernate and partly also the dreaded but type safe Criteria API which happens to be fairly suitable for this problem domain.

To take one query example, finding all products within a given non-leaf category such as "Europe" that has several country categories below it:

Client code:

    @Test
    @Order(11)
    @Transactional
    @Commit
    public void whenFindingBeveragesByInheritedCategory_expectResults() {
        // Expects beverages under 'Europe'
        Category category = categoryRepository.getByTypeAndName(DistrictCategory.class, "Europe");
        List<Product> products = productRepository.findByCategory(category);
        Assertions.assertEquals(products.size(), 1);
    }

SQL:

First for the category:

select distinct districtca0_.id          as id2_1_,
                districtca0_.description as descript3_1_,
                districtca0_.lft         as lft4_1_,
                districtca0_.name        as name5_1_,
                districtca0_.parent_id   as parent_i8_1_,
                districtca0_.rgt         as rgt6_1_,
                districtca0_.country     as country7_1_,
                categorize1_.category_id as category1_0_0__,
                categorize1_.expires_at  as expires_2_0_0__,
                categorize1_.product_id  as product_3_0_0__
from category districtca0_
         left outer join categorized_product categorize1_ on districtca0_.id = categorize1_.category_id
where districtca0_.category_type = 'DISTRICT'
  and districtca0_.name='Europe';

Then for the products in that category:

select product2_.id          as id1_2_,
       product2_.description as descript2_2_,
       product2_.name        as name3_2_,
       product2_.sku_code    as sku_code4_2_
from category category0_
         inner join categorized_product categorize1_ on category0_.id = categorize1_.category_id
         inner join product product2_ on categorize1_.product_id = product2_.id
where category0_.lft between 52 and 63;

Conclusion

Modeling tree structures in SQL databases used to be a pain in the past until recursive CTEs were introduced. Before that, the nested set model offered a more efficient method in contrast to the adjacency list model without CTEs.