Blog

What’s up with multi-term synonyms in Solr?

There were some questions floating around the Solr mailing lists about multi-term synonyms and a few notable answers are as follows. The short version is, it’s complicated and every use case has different considerations. Doh!

An aside, I’ve been giving hon-lucene-synonyms some love since December. I got it working on Solr 5.3.1 and Solr 6.0.0 but neglected the documentation. The latest release of hon-lucene-synonyms included a number of namespace changes which weren’t completely reflected in the README.md so there has been some confusion as to how to get the plugin running. With that, the hon-lucene-synonyms README.md is now update to date explaining how to get the plugin working in Solr 6.0.0.

Doug Turnbull said Re: Solutions for Multi-word Synonyms,

Honestly half the time I run into this problem, I end up creating a QParserPlugin because I need to do something specific. With a QParserPlugin I can run whatever analysis, slicing and dicing of the query string to manually construct whatever I need to

http://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profit 

One thing I often do is repeat the functionality of Elasticsearch's match query. Elasticsearch's match query does the following:

- Analyze the query string using the field's query-time analyzer
- Create an OR query with the tokens that come out of the analysis

You can look at the field query parser as something of a starting point for this.

I usually do this in the context of a boost query, not as the main edismax query.

OSC has gone on to open source and document his solution here.

Bernd Fehling added Re: Solutions for Multi-word Synonyms,

you should really try to build your own solution for Multi-term Synonyms because every need is different and you can customize it for your special use case, like adding a Thesaurus. 

http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html

From myself Re: Solutions for Multi-word Synonyms (where APT refers to Lucidworks’ auto-phrasing tokenfilter),

The auth-phrasing-token (APT) filter is a two pronged solution that requires index and query time processes versus hon-lucene-synonyms (HLS) which is strictly a query time implementation. The primary take away from that is, APT requires reindexing your data when you update the autophrases and synonyms while HLS does not.

APT is more precise while HLS is more flexible.

Note that hon-lucene-synonyms is also very useful for when you have a single term in documents but want multiple multi-term synonyms to find it. For example you could have FDA in your documents but can make matches like Food and Drug Administration,Food Drug Administration=>FDA which allows multi-term synonyms to be search for and inserted without reindexing the entire system.

Update 2016-06-24: Scott Stults pointed out that Querqy, maintained by René Kriegler, is another alternative. Querqy describes itself well in its README.md:

Querqy is a framework for query preprocessing in Java-based search engines. It comes with a powerful, rule-based preprocessor named 'Common Rules Preprocessor', which provides query-time synonyms, query-dependent boosting and down-ranking, and query-dependent filters. While the Common Rules Preprocessor is not specific to any search engine, Querqy provides a plugin to run it within the Solr search engine.

Because Querqy is a general toolset to manipulate queries it runs on top of Solr via a query handler. Most everything is implemented through a rules.txt file which is fed through rewrite chains.

personal computer =>
      SYNONYM: pc
personal computers =>
      SYNONYM: pc

Great stuff! The world of search is ever expanding. Whether you are using an existing plugin or trying to write a new one please reach out and contact us!