“Hierarchical and recursive queries in SQL”… this is quite a hot topic, so hot that it has its own wikipedia entry. The typical “hello world” is employees and managers.
I will describe a simple use case, the same one as the wikipedia page cited above. Let’s first create the (temporary) dataset:
This table contains both the employees information and their direct relationship. This way of storing information is probably good enough for applicative usage, but it is clearly limited for exploration/analysis usage.
The main ideas behing a “hierarchical and recursive query” are
- make sense of the hierarchy
- be able to retrieve for each record (here employee), its parent (here manager) record information
in a single (yet “quite” complex) query.
We have the following table (
We want, using a single query the same information the wikipedia page provides (using
CONNECT BY), that is:
- the employee id
- the absolute hierarchical position
- the relative hierarchical position to the branch: the tree (presented as left padding)
- the manager
idname, the manager id would be too easy too retrieve from
The way to do it with Microsoft SQL Server dialect – T-SQL – is to use a CTE (Common Table Expression) query:
level part comes from a built-in feature and is understood by the parser/optimzer when parsing the query. The “tree” is constructed using the row_number() aggregating function. The basic idea is too build the hierarchy as a sorting-ready string field for examples: KING is
188.8.131.52 whereare FORD is
1.1.2 etc. We get the “direct manager” just by following the query relationship as
d.employee where d is our
cte. The only trick to add is in the last step: we don’t need to know that KING is his own manager:
Let’s now add a new head manager and a subordinate in the dataset:
The query behaves as expected, but we somehow lack one piece of information: the top manager. Let’s add it to the query:
top_manager also comes from a built-in feature, and we use the same trick to avoid displaying non-useful information.
We have seen the typical example with an employees table but there are much more situations where the hierarchy between records is stored within the same table. A common example in the e-commerce business is credits granted to customers and their usage. For example, a customer is granted 100 USD that yields a record with
id=1, amount=+100, parent_id=null in a table somewhere. The customer starts using 60 USD yielding a record with
id=6, amount=-60, parent_id=1 and then uses it
again for 40 USD:
id=200, amount=-40, parent_id=6. Now we can with a single query see the complete credit usage for the initial deposit id=1.