The Category Problem: graph traversal by recursive SQL queries
Published on Oct 17, 2016 • Ruslan Ledesma-Garza
An interview question in the wild, the Category Problem asks to model a graph of categories of goods in SQL. The sample solution involves a couple of recursive SQL queries that traverse the edges of the model graph. The sample solution works in PostgreSQL 9.6. This post includes a reference implementation in Ruby.
Problem
The problem asks you to implement in SQL the following rules.
There are products. Each product has a name.
There are categories. Each category has a name.
Each product may belong to one or more categories.
Each category may have subcategories.
Categories that have products do not have subcategories.
Each category has at most one parent.
You may ask a product for its corresponding categories.
You may ask a category for its corresponding products.
Consider the following diagram of an example state consisting of products and categories.
In the diagram, a box corresponds to a product and a circle to a category.
Arrows start from a parent category and end in a child category.
This is relationsubcategory.
Dotted lines connect categories with their corresponding products.
This is relation belongs-to.
Your database must forbid violations of the rules.
Consider the following examples.
By rule 5, it must not be possible to add products to category Food.
By rule 6, it must not be possible for Fast Food to be subcategory of Food and Home Delivered Goods.
The categories of Sandwich are Gas Station Food and Food.
The categories of Pizza are Fast Food and Food.
The only product of category Fast Food is Pizza.
The products of category Food are Sandwich and Pizza.
Solution
The sample solution is the following.
The sample solution consists of two parts.
The first part is the construction of the database schema required by rules 1 to 6.
The second part is the construction of two recursive queries, one for rule 7 and one for rule 8.
Section Reference implementation in Ruby includes a reference implementation in Ruby that you can use to understand the semantics of the sample solution.
Database schema
The following is an explanation of the construction of the database schema.
1. Each product has a name.
We create a table of products.
Example usage.
Gives.
2. Each category has a name.
We create a table of categories.
Example usage.
Gives.
3. Each product may belong to one or more categories.
Meaning, each product may or may not belong to one or more categories.
We assume that categories may correspond to multiple products.
Given these reasons, relation belongs-to is a many-to-many relation.
The following associative table (a.k.a. cross-reference table) corresponds to relation belongs-to.
Example usage.
Gives.
4. Each category may have subcategories. AND 6. Each category has at most one parent.
We address rules 4 and 6 in one step.
These rules mean that relation subcategory is a one-to-many relation.
We implement relation subcategory by appending an additional attribute to table categories.
Example usage.
Gives.
5. Categories that have products do not have subcategories.
We implement this rule by functions if_no_sub_c and if_no_prod.
We constraint relation belongs-to by function if_no_sub_c.
Example usage.
Gives.
We constraint relation subcategory by function if_no_prod.
Example usage.
Gives.
Recursive queries
The following is an explanation of the two recursive queries.
7. You may ask a product for its corresponding categories.
This rule asks for those categories that are ancestors of a given product in relation subcategory.
We compute the ancestor categories by function categories.
For a given product, the computation starts from the categories that contain the product.
Those categories are given by the following SELECT statement.
The SELECT statement is the base case of a recursive WITH clause.
The recursive step consists in selecting those categories that are parent of some category given by a previous recursive call.
The the recursive step corresponds to the following SELECT statement.
The result of the recursive WITH clause is projected onto the name before returning by the following SELECT statement.
Example usage.
Gives.
8. You may ask a category for its corresponding products.
This rule asks that we collect the products of categories that are descendant of given category in relation subcategory.
The following function products implements the rule.
Example usage.
Gives.
Reference implementation in Ruby
The reference implementation is the following.
Example usage.
Gives.
Click here for an explanation of the reference implementation.
1. Each product has a name.
We create a class of products.
Example usage.
Gives.
2. Each category has a name.
We create a class of categories.
Example usage.
Gives.
3. Each product may belong to one or more categories.
Given a category, we record the products that belong to the category in array Category#@products.
We append products to a category by means of method Category#add_product.
Example usage.
Gives.
4. Each category may have subcategories.
Given a category, we record the children categories in array Category#@children.
We append subcategories by means of method Category#add_category.
Example usage.
Gives.
5. Categories that have products do not have subcategories.
This is a constraint on relations _belongs-to_ and _subcategory_.
We constraint each relation by constraining corresponding methods Category#add_product and Category#add_category.
Example usage.
Gives.
6. Each category has at most one parent.
This is a constraint on relation _subcategory_.
We apply the constraint by constraining method Category#add_category.
We create instance variable Category#@parent to label each category with its parent.
Example usage.
Gives.
7. You may ask a product for its corresponding categories.
This rule asks for those categories that are ancestors of a given product in relation _subcategory_.
We compute the ancestor categories by method Product#categories.
For a given product, the computation starts from the categories that contain the product.
Those categories are recorded in array Product#@parent_categories.
The array is populated by method Category#add_product.
Example usage.
Gives.
8. You may ask a category for its corresponding products.
This rule asks that we collect the products of categories that are descendant of a given category in relation _subcategory_.
We do so by traversing relation _subcategory_ from the given category all the way down to leaf descendants.
We do the traversal and collect corresponding products by method Category#all_products.
Example usage.
Gives.
Other uses of WITH clauses
Apply several data modifications in the same query (section 7.8.2).
Apply recursive self-references in a data-modifying query (section 7.8.2).