Faceted Search = SEO Death

October 20th, 2008

This one probably doesn’t apply to your standard SMB website, but if you run a big local search site, this one’s for you.

“Faceted search” is the rubrick for a site that allows users to refine search results by categories (aka “facets”).  This kind of interface appears on a lot of sites (Kayak and Kudzu are two examples that come to mind).  If not implemented with care, faceted metadata search interfaces can be a huge problem SEO-wise, as they create an almost infinite set of pages for a search engine robot to get hung up on.

Consider the following sets of category metadata on a restaurant search:

- restaurant

- chinese

- italian

- quiet

- romantic

- pleasanton

By clicking around on these categories you can get a huge number of results (6*5 factorial? – any Einsteins out there?) such as:

- Chinese restaurant

- Quiet Chinese restaurant

- Romantic Chinese restaurant

- Romantic quiet Chinese restaurant

- Romantic quiet Chinese restaurant in Soho

- Romantic quiet Chinese Italian restaurant in Soho

- etc.

So you get a bunch of problems – too many pages, confusing navigation and probably a duplicate page problem to boot as a lot of these pages tend to have the same data on them.

So how do you solve this and still offer faceted search?

1. Figure out which are your most important pages among the million possible combinations

2. Create linear paths for the search engine robots to follow using noindex tags, nofollow tags and your robots.txt file

3. Make sure that no page that you are pointing the bots to has more than one of each data type (e.g. quiet but not romantic, and vice-versa) – unless you think having two of a type is good for SEO, but that would involve some complex planning and coding.

4. Remember that if you already have this problem you will need to purge the problem pages from the search engines’ indexes before you try to fix it using bot herding or else the bots will never revisit your problem pages.

For more on faceted search check out the following:


Drupal faceted search

MOSS faceted search

Sharepoint faceted search

Endeca Technologies

Tags: Local Search · Search Engine Ranking Factors

14 responses so far ↓

  • 1 Gib Olander // Oct 20, 2008 at 8:11 am

    Great stuff, thanks.

  • 2 Paul Pedersen // Oct 20, 2008 at 9:56 am

    Each restaurant is simply an object in the database and each object should have its own page. How you get to that page, however, depends on the interests to which the page is relevant.

    We need to keep in mind the search engines are not in the business of showing results. They are in the business of answering questions. Each one of the searches you outlined represents a unique question, by a unique person, with a unique job to be done. The sites that answer these questions best are rewarded most (via search results).

    Because of this, each of the most common questions, like “soho chines restaurants” or “romantic chinese italian restaurants”, should have a unique page (specific answer to a specific question). The data object (restaurant) should be one of the answers presented within these pages, but that restaurant profile would not become part of that page. It would be a link from that “job to be done” page to its own unique page.

    In this way, just as if we had searched, we have separated the searchers need from the data object. We’ve controlled duplication of the restaurant while hitting each of these faceted needs …all while providing our audience with a richer answer to the questions they are asking.

  • 3 Daniel Tunkelang // Oct 20, 2008 at 12:13 pm

    I can’t speak for the other vendors, but Endeca has a solution for this problem: http://endeca.com/retail/sem.html

  • 4 Andrew Shotland // Oct 20, 2008 at 8:58 pm

    Paul, I agree with your semantic breakdown of the issue, but once you have broken your pages down into “unique questions” (or more like “unique answers to unique questions”), you then are faced with which of these pages are more important from a search perspective and which are not. Coming up with a formula to dynamically prioritize these pages across millions of records and thousands, if not millions, of keyword combinations is where it gets tricky.

    Daniel, thanks for pointing us towards the Endeca sitemap product page. This morning I heard from Steve Papa, Endeca’s CEO, who also mentioned this. Steve claims:

    “it is impossible to calculate the number of valid paths through a set of facets. The only way to do it is to empirically crawl the data…We used an 80-90,000 record data set with an average of 12 facets per record. If you did the simple factorial calculation you would get 10^34 possible paths. But as it turns out most of those paths don’t exist (for example in a car data set there are no BMW pickup trucks or Hummer hybrids but the facets suggest one might be possible). The actual number of paths was on the order of 250,000,000 — still a 3,000x increase from the number of records! But a small fraction of the “theoretically possible” paths.”

    I have not looked into how Endeca’s sitemap product solves the SEO problem, but I suspect it is a tool that allows you to create linear paths to results pages. If that is the case then you still have to decide which of the 250,000,000 results you are going to prioritize for search and that’s where the SEO headache begins. I’d love to hear from Endeca if that is not the case.

  • 5 Andrew Shotland // Oct 20, 2008 at 9:03 pm

    One other point on this re Steve’s Hummer hybrid example: In my experience a lot of product mgrs and engineers fail to anticipate the empty data set, or at least they fail to prioritize dealing with it, and so you are left with millions of empty, “theoretical” pages that seriously choke the search engines and provide a poor user experience. You can definitely tell the difference between a faceted search site that has been put together with care and one that still has some work to do.

  • 6 Joel Brazil // Oct 21, 2008 at 1:45 am

    Interesting discussion, makes me think maybe my site has a problem, any way to know/test?


  • 7 Daniel Tunkelang // Oct 21, 2008 at 5:44 pm

    Andrew, the Endeca tool is flexible, but you’re right that ultimately you have can’t submit an infinite number of pages and hence will want to prioritize. Our sitemap generator tool – part of the product I referenced earlier – is highly configurable and allows you to include product pages, pages corresponding to selections from particular facets or combinations of facets, specification of depth for hierarchical facets (e.g. categories), etc. And, of course, our software eliminates dead ends (combinations that lead to no results).

  • 8 Andrew Shotland // Oct 21, 2008 at 8:11 pm

    Joel, the best way to know/test is to have someone with SEO experience review your site :)

    Thanks Daniel. BTW for the record Daniel is chief scientist at Endeca.

  • 9 Patrick // Oct 23, 2008 at 9:52 am

    What I like about faceted search is that can kind of replace advanced search. Any thoughts?

  • 10 Andrew Shotland // Oct 23, 2008 at 11:53 am

    I agree Patrick, but it’s a fine line between adding more helpful/granular info and providing such an overwhelming amount of choice that it’s unusable.

  • 11 Patrick // Oct 24, 2008 at 1:55 am

    But that’s why metadata and taxonomy definitions are important…right?

  • 12 Andrew Shotland // Oct 24, 2008 at 7:46 am

    Yes, but I often see sites that don’t spend enough time on this.

  • 13 Daniel Tunkelang // Oct 24, 2008 at 10:46 am

    I agree with both of you here: faceted search is a great interface when it’s done right. But not all sites do it right, especially when there are a large number of facets or a large number of values per facet. Nonetheless, Faceted search offers you a much better framework for addressing information overload than conventional relevance-ranked search or parametric search.

  • 14 Gareth Dismore // Jul 27, 2010 at 3:09 pm

    We generally advocate our merchants use rel=”nofollow” attributes on faceted navigation links. Often we will also employ robots.txt-excluded redirects as well to help guide search engine robots down the right paths.