Blog

Advanced Suggest-As-You-Type with Solr

In my previous post, I talked about implementing Search-As-You-Type using Solr. In this post Ill cover a closely related functionality called Suggest-As-You-Type.

Heres the use case: A user comes to your search-driven website to find something. And it is your goal to be as helpful as possible. Part of this is by making term suggestions as they type. When you make these suggestions, it is critical to make sure that your suggestion leads to search results. If you make a suggestion of a word just because it is somewhere in your index, but it is inconsistent with the other terms that the user has typed, then the user is going to get a results page full of white space and youre going to get another dissatisfied customer!

A lot of search teams jump at the Solr suggester component because, after all, this is what it was built for. However I havent found a way to configure the suggester so that it suggests only completions that that correspond to search results. Rather, it is based upon a dictionary lookup that is agnostic of what the user is currently searching for. (Please someone tell me if Im wrong!) In any case, getting the suggester working takes a bit of configuration. – Why not use a solution that is based upon the normal, out-of-the-box Solr setup. Heres how:

Facet Based Suggestions

As your user searches through your inventory, the Solr query q parameter will constantly be changing. Additionally, you may add any number of filter queries fq to further constrain the search results. As the search results change, consider what is happening to your facets. The only facet values displayed will be those values that are consistent with the current search results. Additionally, the facet values are sorted according to the number of documents associated with that value.

Now consider what happens if you include a text field as a faceted field. When the user makes a query, a list will be returned that contains all the words that exist in the remaining documents sorted by count. Whats more, the user can type these terms in and be guaranteed to find results.

But theres one small matter, to contend with – Ideally, you would want the suggestions to complete the word that you user is currently typing. However the values remaining in the field may start with any letter of the alphabet. Fortunately, theres is a less commonly used parameter that fixes our problem: facet.prefix. By specifying facet.prefix, the facet values will be constrained to only those values that start with the stated prefix.

Implementation

This is all theoretical so far. Its a lot easier to understand if you have an example in hand. To start things off download and start the example Solr

cd solr-4.2.0/examplejava -jar start.jar

Next click this link and POOF! you will have the following documents indexed:

  • There’s nothing better than a shiny red apple on hot summer day.
  • Eat an apple!
  • I prefer a Grannie Smith apple over Fuji.
  • Apricots is kinda like a peach minus the fuzz.

(Kinda cool how that link works isn’t it?) Weve indexed these sentences into the “title” field of the example setup. This is because there is no stemming on the titles here. This is important! If you facet over a stemmed field, then the users will see suggestions that arent English!

In your case, youll probably be searching over several fields, so make sure that youre dumping the text of each of these fields into a single non-stemmed field.

Now lets take a look at a quick search and get oriented:

http://localhost:8983/solr/select?q=ap*&facet=true&facet.field=title

Here, I appear to be searching for “apricot” or “apple”, but I havent completed my search. (Theyre both so tasty… who can choose?) Ive also turned on faceting and I am faceting over the title field. Lets take a look at the contents of that facet:

          3      3      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1      1    

As you can see. This is a list of every token in the title field sorted by count. Lets add the prefix parameter.

http://localhost:8983/solr/select?q=ap*&facet=true&facet.field=title&facet.prefix=ap

Now, the facets are constrained to only those that complete the users current term. Whats more either of these terms are guaranteed to provide results to the user. We even know how many results will be returned!

      3    1  

Demo

The only thing left is to stick these results into a drop down box. And thats just what Ive done here. Nope, its not much to look at, but Ive build a simple interface that implements this pattern, works with this example here, and displays suggestions back to the user on a per-term basis. (Whats more, this also demonstrates the Search-As-You-Type described in my previous post.)

Improvements

There are plenty of fun things you could do to improve upon this pattern. As far as user experience, rather than a drop-down, consider providing the suggestions as greyed-out text ahead of the users cursor. If they see a suggestion they like, they can tab complete and be confident that when they press return the result page will not be blank.

In the past, Ive implemented this pattern to make suggestions from two separate fields. In that instance, our client wanted to provide autocomplete suggestions of user names, first or last. So we simply extended this pattern to both the first_name and last_name fields and when users typed “joh” in the search box, they would be quickly suggested John from first_name and Johnson from last_name. The suggestions were presented in a drop-down menu separated according to first and last names.

You can also extend this pattern to make multi-term suggestions. How? Just send the text to a shingling field and facet over that. (Text shingling is basically like n-gramming, but for terms rather than individual characters.) Obviously, one thing youll have to look out for with this method is pretty substantial growth in index size.

Use With Caution

This pattern provides an excellent user experience, but you should make sure to test it out first to see that it works well in your situation. Consider the following points:

  • Response time: The most obvious problem is that this technique performs yet another facet search so the request will take longer.
  • Memory: Calculating facet values requires allocation of memory for each token in the field. This is never a problem for a field composed of enumerated tags, but for a text field you may want to keep an eye on memory usage.
  • Inappropriate Content: Be very cautious about content of the fields being used for suggestions. For instance, if the content has misspellings, so will the suggestions. And dont include user comments unless you want to endorse their opinions and choice of language as your search suggestions!

Check out my LinkedIn Follow me on Twitter

Thanks to David Smiley who pointed this pattern out to me a while back.