Handle self reference in database using Nested Set Model

Post by Lưu Đại at 26-07-2023
When I have to handle self reference in database, lets say parent - child in a families table I often add 1 parent_id column to families, this method is called Adjacency List Model. However, if I want to fetch all parents records or all children records I have to use recursion in our code and query multiple times to database. Another way to achieve this is Recursion in SQL just like I wrote in my previous blog post. 
Adjacency List Model is good enough if the relation between records in database is simple, 1 parent have 1 or 2 children and the nested level is not too deep (1 child can have 2 - 3 ancestors). If the problem require more complex tree structure you should consider Nested Set Model.

1. Nested Set Model overview

Nested Set Model is a technique that help us handle tree structure data in relational database.
The methodology: Imagine tree structure is stored as nested circles. The children circles will be placed inside the parent and so on. After that, I numbered (ascending order) for the circle from left to right.
For example, I have a family tree:
Ví dụ gia phả
I numbered it as below:
Đánh số nested attribute
- The data will be saved to the database like this:

name          count_left  count_right group_id
Lưu Thiện     2           7           1
Lưu Bị        1           14          1
Lưu Duệ       3           4           1
Lưu Dao       5           6           1
Lưu Phong     8           9           1
Lưu Vĩnh      10          11          1
Lưu Lý        12          13          1
count_left and count_right is 2 attributes corresponding to most left number and most right number of each circle. group_id is for distinguish each family, I store this value equal to id of most ancestor (root node) - in the previous example it's will be id of Lưu Bị.

2. Use Nested Set Model

In the following code blocks there is parent_id variable, let's assume this is parent_id in a param that client send to server. Nested Set Model doesn't need parent_id but I choose to keep it to compare Nested Set Model with Adjacency List Model.
Nested Set Model will increase the amount of time create a new record (in case we already have at least 1 parent node) because it have to iterate through all the old family node and renumbered them. The code is here: 

def update_left_and_right
  parent_member = Member.find(parent_id)
  need_shift_members = Member.by_group_id(parent_member.group_id)
                             .where("count_right >= #{parent_member.count_right} OR count_left >= #{parent_member.count_right}")
  upsert_member_attributes = need_shift_members.map do |member|
                               member.count_left += 2 if member.count_left >= parent_member.count_right
                               member.count_right += 2 if member.count_right >= parent_member.count_right
                               member.attributes
                             end

  Member.upsert_all(upsert_member_attributes) if upsert_member_attributes.present?
end
for the entire file you can check the source code on github. I attach the github link in the end of this post.
Nested Set Model reduce the amount of time execute the select query. 

def ancestor_nested_set
  return unless nested_set?

  Member.by_group_id(group_id)
        .where("count_left < #{count_left} AND count_right > #{count_right}")
        .order_by_left
end

def descendant_nested_set
  return unless nested_set?

  Member.by_group_id(group_id)
        .where("count_left > #{count_left} AND count_right < #{count_right}")
        .order_by_left
end
If you want to read more about another query please check this articles

3. Summary

Nested Set Model is a method to store tree structure data.
Nested Set Model have better performance when read (query select) but worst when write (create, update parent, delete node).
There're multiple gem use NestedSetModel like act_as_tree.
Source code: https://github.com/LuuDai-bit/family-tree/blob/main/app/models/member.rb

4. References