Blog

What’s up with multi-term synonyms in Solr?

There were some questions floating around the Solr mailing lists about multi-term synonyms and a few notable answers are as follows. The short version is, it’s complicated and every use case has different considerations. Doh!

An aside, I’ve been giving hon-lucene-synonyms some love since December. I got it working on Solr 5.3.1 and Solr 6.0.0 but neglected the documentation. The latest release of hon-lucene-synonyms included a number of namespace changes which weren’t completely reflected in the README.md so there has been some confusion as to how to get the plugin running. With that, the hon-lucene-synonyms README.md is now update to date explaining how to get the plugin working in Solr 6.0.0.

Doug Turnbull said Re: Solutions for Multi-word Synonyms,

Honestly half the time I run into this problem, I end up creating aQParserPlugin because I need to do something specific. With a QParserPluginI can run whatever analysis, slicing and dicing of the query string tomanually construct whatever I need tohttp://www.supermind.org/blog/1134/custom-solr-queryparsers-for-fun-and-profitOne thing I often do is repeat the functionality of Elasticsearch's matchquery. Elasticsearch's match query does the following:- Analyze the query string using the field's query-time analyzer- Create an OR query with the tokens that come out of the analysisYou can look at the field query parser as something of a starting point forthis.I usually do this in the context of a boost query, not as the main edismaxquery.

OSC has gone on to open source and document his solution here.

Bernd Fehling added Re: Solutions for Multi-word Synonyms,

you should really try to build your own solution for Multi-term Synonymsbecause every need is different and you can customize it for your specialuse case, like adding a Thesaurus.http://www.ub.uni-bielefeld.de/~befehl/base/solr/InsideBase_eurovocThesaurus.html

From myself Re: Solutions for Multi-word Synonyms (where APT refers to Lucidwork’s auto-phrasing tokenfilter),

The auth-phrasing-token (APT ) filter is a two pronged solution thatrequires index and query time processes versus hon-lucene-synonyms (HLS)which is strictly a query time implementation. The primary take away fromthat is, APT requires reindexing your data when you update the autophrasesand synonyms while HLS does not.APT is more precise while HLS is more flexible.

Note that hon-lucene-synonyms is also very useful for when you have a single term in documents but want multiple multi-term synonyms to find it. For example you could have FDA in your documents but can make matches like Food and Drug Administration,Food Drug Administration=>FDA which allows multi-term synonyms to be search for and inserted without reindexing the entire system.

Update 2016-06-24: Scott Stults pointed out that Querqy, maintained by René Kriegler, is another alternative. Querqy describes itself well in its README.md,

Querqy is a framework for query preprocessing in Java-based search engines. It comes with a powerful, rule-based preprocessor named 'Common Rules Preprocessor', which provides query-time synonyms, query-dependent boosting and down-ranking, and query-dependent filters. While the Common Rules Preprocessor is not specific to any search engine, Querqy provides a plugin to run it within the Solr search engine.

Because Querqy is a general toolset to manipulate queries it runs on top of Solr via a query handler. Most everything is implemented through a rules.txt file which is fed through rewrite chains.

personal computer =>    SYNONYM: pcpersonal computers =>    SYNONYM: pc

Great stuff! The world of search is ever expanding. Whether you are using an existing plugin or trying to write a new one please reach out and contact us!