Do you know the feeling when you think that all has been said about a specific topic and it feels like it’s time to move on? This is how I feel about applying synonyms.
How come? The synonyms module of our “Think Like a Relevance Engineer” training shows several ways to apply synonyms. When you search our blog you get a decent amount of posts diving into various aspects of synonyms. So has everything been said about synonyms? Is applying synonyms to solve search issues a topic that we can just declare “solved” [or “answered”, or “concluded”, or…]?
Let’s take a fresh look and try to figure out where we are with synonyms today.
Different kinds of synonyms
Synonym use cases can vary and we describe them in “A Synonym By Any Other Name: From Alt. Labels to Knowledge”. As relevance engineers it is important to know the use cases and how to deal with them accordingly.
I’m calling “classic” those synonyms that we find most often in synonym lists: interchangeable words, words with equal or at least very similar meaning. Usually you want the original search term to score higher than the expanded synonyms.
In most search use cases synonyms aren’t really synonyms but form something like a hierarchy. In a case like this your synonym list is rather a tree-like taxonomy than a flat list. How synonyms form a taxonomy and how to use a taxonomy in Elasticsearch is covered in another blog post. In a hierarchy you have several levels. As an entry in a taxonomy, the farther away you are in the hierarchy typically determines your impact on search ranking.
Ontologies and knowledge graphs
Moving beyond hierarchical structures takes you to knowledge graphs that model relationships between concepts. They can incorporate the data from hierarchies and enrich it. Knowledge graphs are particularly useful for answering factual questions. However they can also be used to detect entities and improve query understanding.
How should you tackle synonyms?
Almost all of our synonym blog posts have one way or another of saying “there is no silver bullet”, “it depends” or “there is no cut and dried solution” and I tend to agree. However I’ll try to summarize some actions that helped us in the past when dealing with synonyms.
Identify problematic queries from your search log
First of all you need to gain knowledge about what synonyms you should look into. For this you need proper search analytics in place telling you your problematic queries and a part of these problematic queries can typically be improved by adding synonyms. Here’s a post on how to deal with multiterm synonyms especially (called keyphrases in this context).
Know your overall strategy
What general strategy are you following in your search application? Is it a precision first approach? Or rather an approach balancing recall and precision? It is important to know your overall strategy as this already suggests how to define some of the settings the search engine of your choice offers you. A couple of considerations are listed in a post that is focused around Solr and multiterm synonyms.
We cover synonyms significantly in our TLRE training for OpenSearch, Solr and Elasticsearch to empower our participants to know which knobs to tune to match the search engine’s behavior with the overall search strategy. This usually sets a solid foundation which needs to be built on by experimentation to find your ideal settings, especially to treat edge cases accordingly. Specific combinations of settings can result in odd behavior: an expected term-centric query can be transformed to a field-centric query internally due to inconsistent field type configuration (e.g. field specific synonyms).
Use the right tools for your synonym use case
As mentioned above and in this blog post there are different kinds of synonym use cases that you may want to integrate in your search application. Not all of these use cases can be equally well solved with the classic flat lists that Lucene-based search engines offer us. Maybe a tool to build and curate a knowledge graph is an asset to look into, to not only improve search but other parts of the user experience as well? Or is a query rewriting library a useful addition to your stack? Telling the different synonym use cases apart and mapping the use cases to the right tools can get you far.
Applying synonyms – the tooling landscape
There were multiple approaches in the past addressing synonyms (or rather the issues of synonyms in Solr & Elasticsearch) with custom plugins. Some of these plugins are now obsolete, archived or inactive projects without any releases in the last couple of years.
One solution exists to this day: Querqy – the open source library for query rewriting. It is actively maintained and has a growing user base. Query rewriting is broader than just synonyms: it also covers techniques like boosting, blocking, burying or redirecting which makes it particularly useful in e-commerce contexts. Although dealing with multi-term synonyms improved a lot over time in Solr and Elasticsearch, Querqy comes with the extra benefit of expanding queries before field analysis is done. Additionally it addresses the issue of multiple terms being treated as synonyms but having different document frequencies leading to inconsistent scoring (the documents with the “rarer” synonyms win).
These advantages make Querqy our go-to-tool when it comes to applying synonyms in Solr, Elasticsearch or OpenSearch.
What’s the future of synonyms?
With the advent of large language models and vector search as a technique, I am assuming the overall importance of synonyms will decline in the future. Being trained on vast amounts of textual data these models know the “distance” between words. Without any manual effort these technologies are able to say that
birkenstocks are closer to each other in the vector space than
oranges. By using this notion of distance as a relevance signal it’s possible to say that distantly related concepts score lower than closely related ones. So searching for
sandals should show “birkenstocks” higher in the result set than “waterproof hiking boots”.
This all sounds great – but are we there yet? Not quite, and I don’t assume that the techniques for applying synonyms will vanish completely. Especially in high precision scenarios, e.g. in e-commerce, the manual curation of synonyms will remain an important part in active search management. It will continue to be vital to show documents (or products in e-commerce) that are somewhat similar to what the user has typed in and this will still mean manual effort, although likely to a smaller extent than nowadays.
For some use cases, like the ones outlined in our blog post on e-commerce search with vectors, this technique is a great solution. For others, where you rely on granular precision control, curating synonyms in one way or another will remain necessary.
If you need help with the right strategy to treat synonyms in your search application or want to evaluate how emerging techniques like vector search can help, contact us.
Image from Dictionary Vectors by Vecteezy