Quick Links:
This week, we'll discuss how to design a database schema related to hierarchical data, which can be visualized as a nested structure. There are several use cases for hierarchical data.
Category and Sub-Category Classification: One of the most common use cases for hierarchical data is in organizing information into categories and sub-categories. For example, an e-commerce website may use a hierarchical structure to organize their products into different categories (such as "electronics" and "clothing"), and then further divide those categories into sub-categories (such as "laptops" and "t-shirts").
Comment and Reply Systems: Another common use case for hierarchical data is in comment and reply systems, where users can leave comments on a post or article, and other users can respond to those comments. These responses can then be nested under the original comment, creating a hierarchical structure that makes it easy to follow the conversation.
Organizational Structures with Nested Direct Reports: Hierarchical data is also useful in organizing organizational structures, particularly in companies where each position may have one or more nested direct reports. By creating a hierarchical structure for organizational data, it becomes easy to see the relationships between different positions, and to understand who reports to whom.
Multi-Level Referral Systems: Finally, as mentioned earlier, multi-level referral systems are another use case for hierarchical data. In these systems, members earn commissions by recruiting new members to join their network, and each member can have an unlimited number of nested referrals. By creating a hierarchical structure for referral data, it becomes easy to see the relationships between different members and to understand how the referral network is growing over time.
Today, we'll focus on the multi-level referral system, which is a form of MLM that is adapted to the digital business model. This use case is particularly interesting because it poses a unique challenge in terms of structuring data, as it allows for an unlimited depth of referral levels.
Understanding Multi-Level Referral Systems
A multi-level referral system is a type of marketing strategy that falls under the umbrella of Multi Level Marketing (MLM). MLM is a business model where a company compensates its members for promoting and selling its products or services, as well as for recruiting new members into the system.
It is called multi-level referral system because the hierarchical structure is generated by the network of members, with higher-level members (also known as "upline") earning commissions not only from their own sales but also from the sales of lower-level members in their downline organization. The downline can have their own downline, extending multiple levels deep. While the number of levels and commission schemes can vary depending on the company's structure, it is possible to have an unlimited number of levels.
To illustrate, imagine a person, John Stone, joins a multi-level referral system as a member. Mary then recruits two more members, Ponnappa and Mia, who become part of John downline. Ponnappa and Mia can then recruit their own downline members, extending the network further. John would earn commissions not only from her own sales but also from the sales of Ponnappa and Mia, as well as from the sales of their downline members.
Multi-level referral systems can be a lucrative opportunity for members to earn income, but they require careful management of the hierarchical network structure. A well-designed database schema can be useful in managing and organizing the data generated by this complex system, helping members and companies keep track of sales and commissions across multiple levels.
The Real-world Use Case
Suppose you're designing a multi-level referral program that has the following requirements:
Number of level can be unlimited
Commission schemas:
15% for sell products by themselves
8% from level 1 downline sales
5% from level 2 downline sales
2% from level 3 downline sales
1% from level 4 downline sales
0 % for level 5 - last
Downline can only have 1 upline
Here are the use cases you need to consider:
Adding new members to the database when they are recruited by an upline
Allowing members to view a chart of their downline organization, with each downline member visible to their respective upline members.
Providing each member with a summary of their sales and commissions, including total
commission earned
,commission breakdown by level
, andtotal sales breakdown by level
.Allowing members to request a change to their upline if their direct upline has been inactive for at least three months. For example, if
Mia Wong
becomes inactive,Nguta Ithya
,Tamzyn Frechn
,Salome Simoes
, and all of their downlines can be moved intoPonnapriya's organization
.
Database Design 1: ParentId Column
This approach is easy to read and simple when it comes to inserting or updating data. However, retrieving data becomes more challenging as this design requires complex logic.
The idea is to add upline_id
and level
as columns where upline_id
refers to another id
from the table members (similar to the parent and child concept).
Let's break down how this design solves the use cases.
Insert New Member
Suppose there is a new member who will become the downline of Nguta Ithya (id=4)
. Here is the query:
-- 1. Get Nguta Ithya's "level"
SELECT level FROM members WHERE id = 4;
-- 2. From query no. 1, the result will be "3".
-- You can use this value to insert the new member into the database
INSERT INTO members (name, upline_id, level, last_active)
VALUES ('new_member', 4, @upline_level+1, NOW());
Show Downline Organization
With this design, we need to use a recursive
function. Suppose we want to retrieve all members in Mia Wong's
organization (id = 3)
, then here is the query:
WITH RECURSIVE os (id, name, upline_id) AS (
SELECT id,
name,
upline_id
FROM members
WHERE id = 3
UNION ALL
SELECT m.id,
m.name,
m.upline_id
FROM members m
INNER JOIN os
ON m.upline_id = os.id
)
SELECT * FROM os;
This query will result in the following:
Retrieve Summary of Commission & Sales
We need to adjust how we use the recursive
function from the previous query by adding the level
and relative_level
columns to be able to calculate the members' commission. Here is the updated query:
WITH RECURSIVE
os (id, name, upline_id, level, relative_level)
AS (
select id,
name,
upline_id,
level,
0 as relative_level
from members
where id = 3
union all
select m.id,
m.name,
m.upline_id,
m.level,
os.relative_level + 1
from members m
inner join os
on m.upline_id = os.id
)
select
os.relative_level,
sum(t.total) as total_sales, sum(
CASE
WHEN os.relative_level = 0 THEN 0.15 * t.total
WHEN os.relative_level = 1 THEN 0.08 * t.total
WHEN os.relative_level = 2 THEN 0.05 * t.total
WHEN os.relative_level = 3 THEN 0.02 * t.total
WHEN os.relative_level = 4 THEN 0.01 * t.total
ELSE 0
END) commission
from os
left join transaction t on os.id = t.member_id
group by os.relative_level
This will return the following results.
To get the total commission, we just need to sum the commission values calculated in the query. In this case, the total commission would be 787
Change Upline
Let's say Mia Wong (id=3)
is inactive and all of her downlines have requested to change their upline to Ponnappa Priya. With this database design, we just need to change the upline_id
of Mia Wong's
direct downlines to the ID of Ponnappa Priya (id=2)
.
UPDATE members
SET upline_id = 2
WHERE upline_id = 3
Database Design #2: Hierarchy Column
This approach relies on a structured hierarchy that consists of several ids concatenated by “."
. With this structure, it's easier to get the level and retrieve data since we can use LIKE
syntax to filter out the members.
With this design, we need rules for the hierarchy column to make this design possible:
This column must be indexed, to make it blazingly fast.
For MySQL, there is limitation when indexing `TEXT` datatype `CREATE INDEX idx_members_hierarchy ON members (hierarchy(768))` but it is enough to create around 38 level of depthness.
For Postgres, no problem and you can indexed in without specify the length of the column `CREATE INDEX idx_members_hierarchy ON members (hierarchy)`
What is 1.3.4 on `Nguta Ithya's` data? It is hierarchy structured from id column.
“3”
means direct upline of id“4”
, which isNguta Ithya
“1”
means upline of upline for id“4”
Insert New Member
Suppose that Salome Simoes (id=6)
has recruited a new member under his downline. To add the member to the database using this design, you would need to execute the following queries
-- 1. Get Salome Simoes "hierarchy. This query will return "1.3.6"
SELECT hierarchy
FROM members
WHERE id = 6
-- 2a. [MySQL] Insert Data
START TRANSACTION;
INSERT INTO members (name, hierarchy, last_active)
VALUES ('new_member', '', NOW());
UPDATE members
SET hierarchy = CONCAT('1.3.6', '.', LAST_INSERT_ID())
WHERE id = LAST_INSERT_ID();
COMMIT;
-- 2b. [Postgres] Insert Data
BEGIN
INSERT INTO members (name, hierarchy, last_active)
VALUES ('new_member', '', NOW()) returning id;
-- Since the insert query returns the id, we can use it to update the "hierarchy".
UPDATE members
SET hierarchy = CONCAT('1.3.6', '.', 10::text)
WHERE id = 10;
COMMIT
Using a transaction ensures that all the necessary operations are executed correctly and completely, without leaving the database in an inconsistent state.
Show Downline Organization
With this design, we can use the LIKE
query for the hierarchy column. Because the hierarchy column is indexed, the query is fast.
For example, to retrieve all the members under Nguta Ithya (id=4)
, we can use the following query:
-- 1. Get Nguta Ithya's "hierarchy"
SELECT hierarchy
FROM members
WHERE id = 4
-- 2. From query no. 1, the result will be "1.3.4".
-- You can use this value to get the organization by using "LIKE"
SELECT m.*
FROM members m
WHERE
hierarchy LIKE concat('1.3.4.%')
OR hierarchy = '1.3.4';
Note that using an index on a LIKE query is tricky. In this scenario, we are always using a query with WHERE hierarchy LIKE 'keyword%,
so the indexing works. If you use a query with WHERE hierarchy LIKE '%keyword%',
the indexing will not work.
Retrieve Summary of Commission & Sales
To retrieve the summary and count the commission, we need to get the relative_level
of each member. One way to calculate the relative_level
is by counting the number of periods (.
) in the hierarchy
column and adding 1. We can achieve this using the following expression:
(LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1)
For example, if the hierarchy
of a member is 1.2.3
, then the relative_level
would be 3 (since there are 3 section in the hierarchy
string).
We can then use the relative_level
to calculate the commission based on the commission rules for each level.
-- 1. Get upline / Nguta Ithya's "hierarchy". This query will return "1.3.4"
SELECT hierarchy
FROM members
WHERE id = 4;
-- 2. Get Nguta Ithya's level. This query will return "3"
SELECT
LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1
FROM members m
WHERE id = 4;
-- 3. Use the result from 1st and 2nd query to build summary query
SELECT
(LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1) - 3 as relative_level,
SUM(t.total) as total_sales,
SUM(COALESCE((
CASE
WHEN (LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1) - 3 = 0
THEN 0.15 * t.total
WHEN (LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1) - 3 = 1
THEN 0.08 * t.total
WHEN (LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1) - 3 = 2
THEN 0.05 * t.total
WHEN (LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1) - 3 = 3
THEN 0.02 * t.total
WHEN (LENGTH(m.hierarchy) - LENGTH(REPLACE(m.hierarchy, '.', '')) + 1) - 3 = 4
THEN 0.01 * t.total
ELSE 0
END),0)) commission
FROM members m
LEFT JOIN transaction t on m.id = t.member_id
WHERE hierarchy LIKE '1.3.4.%'
OR hierarchy = '1.3.4'
GROUP BY relative_level
The query above will return results like this:
Change Upline
With the same situation as in the first design, suppose Mia Wong (id=3)
is inactive and all her downline
asks to change their upline to Ponnappa Priya
. Here are the queries:
-- 1. Get Mia Wong's "hierarchy". This query result is "1.3"
SELECT hierarchy
FROM members
WHERE id = 3;
-- 2. Get Ponnappa Priya's "hierarchy". This query result is "1.2"
SELECT hierarchy
FROM members
WHERE id = 2;
-- 3. Use 1st and 2nd query to update all related hierarchy
UPDATE members
SET hierarchy = replace(hierarchy, '1.3', '1.2')
WHERE
hierarchy LIKE '1.3.%'
OR hierarchy = '1.3';
Conclusion
It's interesting to note that database design can have a significant impact on the performance of the system. The database design #1 is simple and straightforward, it may not be optimal for certain use cases where retrieving data is more important than inserting or updating it.
On the other hand, the database design #2 offers more flexibility and makes it easier to retrieve data using structured hierarchies. However, it requires more queries and careful application of the hierarchy rules to ensure consistency.
Ultimately, the choice of database design depends on the specific use case and performance requirements. In this case, since the use case is read-heavy, for me design #2 is the clear winner despite having more write queries, as it offers greater flexibility and ease of data retrieval.
Thank you for reading today's newsletter! If you find it valuable, here are some actions you can take:
1) ✉️ Subscribe — if you aren’t already, consider becoming a paid subscriber.
2) ❤️ Share — you can help spread the word by sharing the article with your team or someone who might find it useful!
Hi Mr.HERRY GUNAWAN. Thank you for your sharing. It's really helpful for me. However, I have a concern about the in this query:
WHERE
hierarchy LIKE concat('1.3.4.%')
OR hierarchy = '1.3.4';
Why do we need to use Or hierarchy='1.3.4' in this case? I'm a newbie. Could you help to explain it for me? Thank you