Tuesday, August 18, 2015

Recursive Select - tree traversal

Walking a tree is not that big of a challenge. There are a few different variations in the SQL language but they are similar enough.

Given a schema that looks like:
create table tree( id int, parentid int, type int);
SQLite:
;with recursive    tree_cte (id, parentid, type) as (       select 1,0,0       union all       select t.id, t.parentid, t.type         from tree t         inner join tree_cte tc on t.parentid=tc.id     )     select * from tree_cte tc     ;
SQL Server:
;with     tree_cte (id, parentid, type) as (       select id, parentid, type from (values(1,0,0) X(id, parentid, type, path))
       union all       select t.id, t.parentid, t.type         from tree t         inner join tree_cte tc on t.parentid=tc.id     )     select * from tree_cte tc     ;
The differences are '+' instead of '||' and "recursive" and a little of the initiator SQL in the SQL Server version.

In a project I'm working on I need to create a graph of the entire tree, however, there are a few constraints. (a) the graph is huge so limit the graph as soon as possible; (b) only report full paths (no downline nodes); (c) a downline node can only be of type '1'. (NOTE: nodes of type 1 will not be separated by other node types.)

The original query returned:
1|0|02|1|13|2|14|2|15|2|16|4|27|4|38|4|19|4|110|9|111|9|112|9|1
Nothing special to see, it's the same as if I had selected everything:
select * from tree
Let's look at the individual paths:
;with recursive    tree_cte (id, parentid, type, path) as (       select 1,0,0,'1'       union all       select t.id, t.parentid, t.type, tc.path || '.' || cast(t.id as varchar)         from tree t         inner join tree_cte tc on t.parentid=tc.id     )     select * from tree_cte tc
     ;
With this output:
1|0|0|12|1|1|1.23|2|1|1.2.34|2|1|1.2.45|2|1|1.2.56|4|2|1.2.4.67|4|3|1.2.4.78|4|1|1.2.4.89|4|1|1.2.4.910|9|1|1.2.4.9.1011|9|1|1.2.4.9.1112|9|1|1.2.4.9.12
But according to my rules there are a few extra lines:
*1|0|0|1*2|1|1|1.23|2|1|1.2.3*4|2|1|1.2.45|2|1|1.2.5+6|4|2|1.2.4.6+7|4|3|1.2.4.78|4|1|1.2.4.8*9|4|1|1.2.4.910|9|1|1.2.4.9.1011|9|1|1.2.4.9.1112|9|1|1.2.4.9.12
'+' when the node was the wrong type
'*' when the node was not a leaf node

This is the final code unless someone posts a recommendation... it is pretty simple:
;with recursive    tree_cte (id, parentid, type, path) as (       select 1,0,0,'1'       union all       select t.id, t.parentid, t.type, tc.path || '.' || cast(t.id as varchar)         from tree t         inner join tree_cte tc on t.parentid=tc.id         where t.type =1     )     select * from tree_cte tc     where not exists (select 1 from tree t where t.parentid=tc.id and type=1)
     ;
and it produced:
3|2|1|1.2.35|2|1|1.2.58|4|1|1.2.4.810|9|1|1.2.4.9.1011|9|1|1.2.4.9.1112|9|1|1.2.4.9.12
The differences being the where clause in the recursive SQL in the CTE and the conditional check in the outer select.
(1) where t.type =1
and
(2) where not exists (select 1 from tree t where t.parentid=tc.id and type=1)
(1) being some sort of a sentinel terminator for quick termination of the recursion instead of exhausting the data.
(2) when there is a mix of nodes where the non-desired nodes appear at different depths in the tree we need to make sure that the perceived leaf node is actually a leaf or the last of it's type.

Feels good to me. Now I have to put this into practice.

No comments:

Post a Comment

Fire OS and Android

The relationship between Amazon's FireOS and Google's Android OS is probably very complicated. And while there is probably some very...