Advanced Product Filtering in Ecommerce
One of my recent projects for Paper Source has been to introduce advanced product filtering (or faceted filtering). Paper Source runs on Interchange, a perl-based open source ecommerce platform that End Point has been involved with (as core developers & maintainers) for many years.
In the case of Paper Source, personalized products such as wedding invitations and save the dates have advanced filtering to filter by print method, number of photos, style, etc. Advanced product filtering is a very common feature in ecommerce systems with a large number of products that allows a user to narrow down a set of products to meet their needs. Advanced product filtering is not unlike faceted filtering offered by many search engines, which similarly allows a user to narrow down products based on specific tags or facets (e.g. see many Amazon filters on the left column). In the case of Paper Source, I wrote the filtering code layered on top of the current navigation. Below I’ll go through some of the details with small code examples.
The best place to start is the data model. A simplified existing data model that represents product taxonomy might look like the following:
Basic data model linking categories to products.
The existing data model links products to categories via a many-to-many relationship. This is fairly common in the ecommerce space – while looking at a specific category often identified by URL slug or id, the products tied to that category will be displayed.
And here’s where we go with the filtering:
Data model with filtering layered on top of existing category to product relationship.
Some notes on the above filtering data model:
- filters contains a list of all the files. Examples of entries in this table include “Style”, “Color”, “Size”
- filters_categories links filters to categories, to allow finite control over which filters show on which category pages, in what order. For example, this table would link category “Shirts” to filters “Style”, “Color”, “Size” and the preferred sort order of those filters.
- filter_options includes all the options for a specific filter. Examples here for various options include “Large”, “Medium”, and “Small”, all linked to the “Size” filter.
- filter_options_products links filter options to a specific product id with a many to many relationship.
Filter Options Exclusivity
One thing to consider before coding are the business rules pertaining to filter option exclusivity. If a product is assigned to one filter option, can it also have another filter option for that same filter type? IE, if a product is marked as blue, can it also be marked as red? When a user filters by color, can they filter to select products that are both blue and red? Or, if a product is is blue, can it not have any other filter options for that filter? In the case of Paper Source product filtering, we went with the former, where filter options are not exclusive to each other.
A real-life example of filter non-exclusivity is how Paper Source filters wedding invitations. Products are filtered by print method and style. Because some products have multiple print methods and styles, non-exclusivity allows a user to narrow down to a specific combination of filter options, e.g. a wedding invitation that is both tagged as “foil & embossed” and “vintage”.
Another thing to determine before coding is the URL structure. The URL must communicate the current category of products and current filter options (or what I refer to as active/activated filters).
I designed the code to recognize one component of the URL path as the category slug, and the remaining paths to map to the various filter option url slugs. For example, a URL for the category of shirts is “/shirts”, a URL for large shirts “/shirts/large”, and the URL for large blue shirts “/shirts/blue/large”. The code not only has to accept this format, but it also must create consistently ordered URLs, meaning, we don’t want both “/shirts/blue/large” and “/shirts/large/blue” (representing the same content) to be generated by the code. Here’s what simplified pseudocode might look like to retrieve the category and set the activated filters:
my @url_paths = split('/', $request_url); #url paths is e.g. /shirts/blue/large
my $category_slug = shift(@url_paths)
# find Category where slug = $category_slug
# redirect if not found
# @url_paths is active filters
Applying the Filtering
Next, we need a couple things to happen:
- If there is an activated filter for any filter option, apply it.
- Generate URLs to toggle filter options.
First, all products are retrieved in this category with a query like this:
COALESCE((SELECT GROUP_CONCAT(fo.url_slug) FROM filters_options_item foi
JOIN filters_options fo ON fo.id = foi.filter_option_id
WHERE foi.product_id = products.id), '') AS filters
JOIN categories_products cp ON cp.product_id = products.id
JOIN categories c ON c.id = cp.category_id
WHERE c.url_slug = ?
Next is where the code gets pretty hairy, so instead I’ll try to explain with pseudocode:
#@filters = all applicable filters for current category
# loop through @filters
# loop through filter options for this filter
# filter product results to include any selected filter options for this filter
# if there are no filter options selected for this filter, include all products
# build the url for each filter option, to toggle the filter option (on or off)
# loop through @filters (yes, a second time)
# loop through filter options for this filter
# count remaining products for each filter option, if none, set filter option to inactive
# build the final url for each filter option, based on all filters turned on and off
My pseudocode shows that I iterate through the filters twice, first to apply the filter and determine the base URL to toggle each filter option, and second to count the remaining filtered products and build the URL to toggle each filter. The output of this code is a) a set of filtered products and b) a set of ordered filters and filter options with corresponding counts and links to toggle on or off.
Here’s a more specific example:
- Let’s say we have a set of shirts, with the following filters & options: Style (Long Sleeve, Short Sleeve), Color (Red, Blue), Size (Large, Medium, Small).
- A URL request comes in for /shirts/blue/large
- The code recognizes this is the shirts category and retrieves all shirts.
- First, we look at the style filter. No style filter is active in this request, so to toggle these filters on, the activation URLs must include “longsleeve” and “shortsleeve”. No products are filtered out here.
- Next, we look at the color filter. The blue filter option is active because it is present in the URL. In this first loop, products not tagged as blue are removed from the set of products. To toggle the red option on, the activation URL must include “red”, and to toggle the blue filter off, the URL must not include “blue”, which is set here.
- Next, we look at the size filter. Products not tagged as large are removed from the set of products. Again, the large filter has to be toggled off in the URL because it is active, and the medium and small filter need to be toggled on.
- In the second pass through filters, the remaining items applicable to each filter option are counted, for long sleeve, short sleeve, red, medium, and small options. And the URLs are built to turn on and off all filter options (e.g. applying the longsleeve filter will yield the URL “/shirts/longsleeve/blue/large”, applying the red filter will yield the URL “/shirts/blue/red/large”, turning off the blue filter will yield the URL “/shirts/large”).
The important thing to note here is the double pass through filters is required to build non-duplicate URLs and to determine the product count after all filter options have been applied. This isn’t simple logic, and of course changing the business rules like exclusivity will change the loop behavior and URL logic.
Finally, a few notes regarding alternative approaches here:
- Rather than going with the blacklist approach described here, one could go with a whitelist approach where a set of products is built up based on the filter options set.
- Filtering could be done entirely via AJAX, in which case URL structure may not be a concern.
- If the data is simple enough, products could potentially be filtered in the database query itself. In our case, this wasn’t feasible since we generate product filter option details from a number of product attributes, not just what is shown in the simplified product filter data model above.