Saturday, January 2, 2016

Basic recursive Postgres updates

I recently came across a need to update a large amount of Postgres data (many millions of rows) laid out in a hierarchical format with potentially inherited attributes. To make the use case generic, consider the following structure that a bookstore might employ:
db=# create table object_type (id bigint, parent_id bigint,
label text, location text, direct_location boolean);
db=# select * from object_type;
id | parent_id | label | location | direct_location
-----+-----------+-------------+------------+-----------------
100 | 0 | book | main floor | t
101 | 100 | non-fiction | main floor | f
102 | 101 | history | main floor | f
103 | 102 | ww2 | main floor | t
104 | 101 | biography | 2nd floor | t
105 | 100 | fiction | main floor | f
view raw gistfile1.txt hosted with ❤ by GitHub
If it's not clear, this is the structure being modeled. Locations with asterisks indicate those locations are not being inherited - ie, they are direct. Note 'ww2' books just happen to have the same location as its logical parent ('history' books).  This is intentional just to show a potential condition.
item
book (main floor)*
non-fiction (main floor)
history (main floor)
ww2 (main floor)*
biography (2nd floor)*
fiction (main floor)
view raw gistfile1.txt hosted with ❤ by GitHub
Let's say we want to update the 'book' location from 'main floor' to 'new floor'. This should cascade to all object_types which inherit their location from 'book'. The query (pardon the bizarre cast to bigint) utilizes a WITH RECURSIVE CTE to process through an adjacency tree:
update object_type set location = 'new floor' where id in (
with recursive tree (child) as (
values(cast(100 as bigint))
union
select ot.id from object_type ot, tree where ot.parent_id = child and ot.direct_location = false)
select child from tree)
view raw gistfile1.txt hosted with ❤ by GitHub
Perhaps more readable as a JPA named query. Note the inner query is traversing the hierarchy looking for candidates to give to the outer UPDATE.
<named-native-query name="ObjectType.updateLocation">
<query>
<![CDATA[
update object_type set location = :location where id in (
with recursive tree (child) as (
values(:parent_id)
union
select ot.id from object_type ot, tree where ot.parent_id = child and ot.direct_location = false)
select child from tree)
]]>
</query>
</named-native-query>
view raw gistfile1.txt hosted with ❤ by GitHub
The result shows the 4 relevant rows ('books' and those which descend from 'books' AND inherit their location) are updated:
db=# select * from object_type;
id | parent_id | label | location | direct_location
-----+-----------+-------------+------------+-----------------
103 | 102 | ww2 | main floor | t
104 | 101 | biography | 2nd floor | t
100 | 0 | book | new floor | t
101 | 100 | non-fiction | new floor | f
102 | 101 | history | new floor | f
105 | 100 | fiction | new floor | f
view raw gistfile1.txt hosted with ❤ by GitHub