# The Category Problem: graph traversal by recursive SQL queries

An interview question in the wild, the Category Problem asks to model a graph of categories of goods in SQL. The sample solution involves a couple of recursive SQL queries that traverse the edges of the model graph. The sample solution works in PostgreSQL 9.6. This post includes a reference implementation in Ruby.

# Problem

The problem asks you to implement in SQL the following rules.

- There are products. Each product has a name.
- There are categories. Each category has a name.
- Each product may belong to one or more categories.
- Each category may have subcategories.
- Categories that have products do not have subcategories.
- Each category has at most one parent.
- You may ask a product for its corresponding categories.
- You may ask a category for its corresponding products.

Consider the following diagram of an example state consisting of products and categories.

In the diagram, a box corresponds to a product and a circle to a category.
Arrows start from a parent category and end in a child category.
This is relation *subcategory*.
Dotted lines connect categories with their corresponding products.
This is relation *belongs-to*.

Your database must forbid violations of the rules. Consider the following examples. By rule 5, it must not be possible to add products to category Food. By rule 6, it must not be possible for Fast Food to be subcategory of Food and Home Delivered Goods.

The categories of Sandwich are Gas Station Food and Food. The categories of Pizza are Fast Food and Food.

The only product of category Fast Food is Pizza. The products of category Food are Sandwich and Pizza.

# Solution

The sample solution is the following.

The sample solution consists of two parts. The first part is the construction of the database schema required by rules 1 to 6. The second part is the construction of two recursive queries, one for rule 7 and one for rule 8. Section Reference implementation in Ruby includes a reference implementation in Ruby that you can use to understand the semantics of the sample solution.

# Database schema

The following is an explanation of the construction of the database schema.

**1. Each product has a name.**

We create a table of products.

Example usage.

Gives.

**2. Each category has a name.**

We create a table of categories.

Example usage.

Gives.

**3. Each product may belong to one or more categories.**

Meaning, each product may or may not belong to one or more categories.
We assume that categories may correspond to multiple products.
Given these reasons, relation *belongs-to* is a many-to-many relation.
The following associative table (a.k.a. cross-reference table) corresponds to relation *belongs-to*.

Example usage.

Gives.

**4. Each category may have subcategories.**

**AND**

6. Each category has at most one parent.

We address rules 4 and 6 in one step.
These rules mean that relation *subcategory* is a one-to-many relation.
We implement relation *subcategory* by appending an additional attribute to table `categories`

.

Example usage.

Gives.

**5. Categories that have products do not have subcategories.**

We implement this rule by functions `if_no_sub_c`

and `if_no_prod`

.

We constraint relation *belongs-to* by function `if_no_sub_c`

.

Example usage.

Gives.

We constraint relation *subcategory* by function `if_no_prod`

.

Example usage.

Gives.

# Recursive queries

The following is an explanation of the two recursive queries.

**7. You may ask a product for its corresponding categories.**

This rule asks for those categories that are ancestors of a given product in relation *subcategory*.
We compute the ancestor categories by function `categories`

.

For a given product, the computation starts from the categories that contain the product. Those categories are given by the following SELECT statement.

The SELECT statement is the base case of a recursive WITH clause. The recursive step consists in selecting those categories that are parent of some category given by a previous recursive call. The the recursive step corresponds to the following SELECT statement.

The result of the recursive WITH clause is projected onto the name before returning by the following SELECT statement.

Example usage.

Gives.

**8. You may ask a category for its corresponding products.**

This rule asks that we collect the products of categories that are descendant of given category in relation *subcategory*.
The following function `products`

implements the rule.

Example usage.

Gives.

# Reference implementation in Ruby

The reference implementation is the following.

Example usage.

Gives.

## Click here for an explanation of the reference implementation.

**1. Each product has a name.**

We create a class of products. Example usage. Gives.

**2. Each category has a name.**

We create a class of categories. Example usage. Gives.

**3. Each product may belong to one or more categories.**

Given a category, we record the products that belong to the category in array

`Category#@products`

.
We append products to a category by means of method `Category#add_product`

.
Example usage.
Gives.
**4. Each category may have subcategories.**

Given a category, we record the children categories in array

`Category#@children`

.
We append subcategories by means of method `Category#add_category`

.
Example usage.
Gives.
**5. Categories that have products do not have subcategories.**

This is a constraint on relations _belongs-to_ and _subcategory_. We constraint each relation by constraining corresponding methods

`Category#add_product`

and `Category#add_category`

.
Example usage.
Gives.
**6. Each category has at most one parent.**

This is a constraint on relation _subcategory_. We apply the constraint by constraining method

`Category#add_category`

.
We create instance variable `Category#@parent`

to label each category with its parent.
Example usage.
Gives.
**7. You may ask a product for its corresponding categories.**

This rule asks for those categories that are ancestors of a given product in relation _subcategory_. We compute the ancestor categories by method

`Product#categories`

.
For a given product, the computation starts from the categories that contain the product.
Those categories are recorded in array `Product#@parent_categories`

.
The array is populated by method `Category#add_product`

.
Example usage.
Gives.
**8. You may ask a category for its corresponding products.**

This rule asks that we collect the products of categories that are descendant of a given category in relation _subcategory_. We do so by traversing relation _subcategory_ from the given category all the way down to leaf descendants. We do the traversal and collect corresponding products by method

`Category#all_products`

.
Example usage.
Gives.
# Other uses of WITH clauses

- Apply several data modifications in the same query (section 7.8.2).
- Apply recursive self-references in a data-modifying query (section 7.8.2).

# Want to read more?

I love to explain and answer questions on programming problems. I publish a new programming problem and its solution every month. Did I mention that I love to answer questions?

If you would like to get the latest problem + solution, subscribe to the newsletter or subscribe via RSS. You can also follow me on Twitter, GitHub, and LinkedIn.