What are the Options for Storing Hierarchical 分层 Data in a Relational Database?
Generally speaking you’re making a decision between fast read times (e.g. nested set) or fast write times (adjacency list). Usually you end up with a combination of the options below that best fit your needs. The following provides some in depth reading:
- One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I’ve found.
- Models for hierarchical data: slides with good explanations of tradeoffs and example usage
- Representing hierarchies in MySQL: very good overview of Nested Set in particular
- Hierarchical data in RDBMSs: most comprehensive and well organized set of links I’ve seen, but not much in the way on explanation
Options
Ones I am aware of and general features:
- Adjacency List:
- Columns: ID, ParentID
- Easy to implement.
- Cheap node moves, inserts, and deletes.
- Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve).
- Use Common Table Expressions in those databases that support them to traverse.
- Nested Set (a.k.a Modified Preorder Tree Traversal)
- First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties
- Columns: Left, Right
- Cheap level, ancestry, descendants
- Compared to Adjacency List, moves, inserts, deletes more expensive.
- Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
- Nested Intervals
- Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information. In the later development of this idea nested intervals gave rise to matrix encoding.
- Bridge Table (a.k.a. Closure Table: some good ideas about how to use triggers for maintaining this approach)
- Columns: ancestor, descendant
- Stands apart from table it describes.
- Can include some nodes in more than one hierarchy.
- Cheap ancestry and descendants (albeit not in what order)
- For complete knowledge of a hierarchy needs to be combined with another option.
- Flat Table
- A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
- Expensive move and delete
- Cheap ancestry and descendants
- Good Use: threaded discussion - forums / blog comments
- Lineage Column (a.k.a. Materialized Path, Path Enumeration)
- Column: lineage (e.g. /parent/child/grandchild/etc…)
- Limit to how deep the hierarchy can be.
- Descendants cheap (e.g.
LEFT(lineage, #) = '/enumerated/path'
) - Ancestry tricky (database specific queries)
Database Specific Notes
MySQL
Oracle
- Use CONNECT BY to traverse Adjacency Lists
PostgreSQL
- ltree datatype for Materialized Path
SQL Server
- General summary
- 2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.
Some articles from my blog on the subject: