Back in early 2021 I wrote about how Amazon Web Services (AWS) had forked Elasticsearch 7.x following Elastic’s change of license, and how they had promised that this would be a true open source project under the Apache 2 license. As with any large project, momentum builds slowly and it would be several months before a 1.x release of OpenSearch. The question in my mind at the time was how long would it take for OpenSearch to become a serious contender to the more established search engines based on Apache Lucene.
A first major OpenSearch event
This week AWS is hosting the first OpenSearch Con in Seattle and OpenSearch v2.3 has been released to coincide with this first event. Speakers will include our own Jeff Zemerick, a veteran of open source projects including Apache OpenNLP where he is currently the chair of the Project Management Committee. As he writes: “With OpenSearch being available under the Apache License, this provides exciting possibilities to contribute to the future of the project—but how do you get the most benefit from your contributions?“ – note that this theme speaks directly to AWS’ aim to make the project truly open source and collaborative, rather than entirely written and controlled by AWS employees – they already have at least one external committer to the project and various commentators are applauding AWS’ more constructive and collaborative approach.
Search tuning tools for OpenSearch
We’re very happy to announce that Splainer, an OSC tool to display and explain why results matched a query, now works with OpenSearch. Quepid, the search relevancy toolbox will shortly be able to send queries to OpenSearch (it’s being updated this week), allowing you to gather human quality ratings, calculate search metrics such as NDCG and experiment with new queries to see how these metrics might improve. Both these tools are heavily used by OSC’s clients and by many others working on tuning search.
A few weeks ago Querqy, the rules-based query rewriting engine used in many e-commerce applications to fix problematic queries with synonyms, boosting and other techniques (and co-created by OSC’s Director of E-commerce Search), also gained OpenSearch support.
OSC’s own Elasticsearch Learning to Rank plugin has been forked and ported to OpenSearch by a small group led by Grant Ingersoll (if you want a sign that the world has changed, remember that Grant is an Apache Solr committer and was a founder and CTO at Lucidworks, who offer a search engine product built on Solr).
Note that while we’re helping Grant with this effort, we’re also making sure the Elasticsearch LTR plugin works with Elasticsearch v8 after recent breaking changes to their plugin architecture. Luckily and thanks to folks at Elastic it’s nearly back in action and should work with the soon to be released version 8.5 of Elasticsearch.
Is OpenSearch a serious third option?
We’re now speaking to a number of clients interested in OpenSearch as a core search technology, some migrating from Elasticsearch and some choosing it from scratch. There are signs of laudable ambition at OpenSearch to create new tools for the ‘plain old text search’ use case, which one might feel has been a second-level priority at Elastic with their primary focus on log analysis and monitoring. More news on these initiatives is sure to appear soon.
So is OpenSearch truly part of a new triad of Lucene-powered engines? We think so, and with AWS vast resources behind it – and a true open source license – it looks like it’s here to stay. Unlike some other engines it can benefit from a vast existing knowledge base (at present it’s not much different to Elasticsearch, although the two will diverge in time).
Vectors point the way
Perhaps true differentiation between Apache Solr, Elasticsearch and OpenSearch will occur as vector/neural search features are added to each engine (read our article in Search Insights 2022 for more on the rise of these new search features). Each engine has chosen a slightly different approach – OpenSearch relying on external libraries for their first cut, Elasticsearch using Lucene’s relatively new vector storage types for its own ANN features and Solr being a little behind but with vector features in the 9.x release.. It remains to be seen how to successfully combine vector search with traditional querying, at scale and for real applications – a true hybrid search. Of course there are many other choices for vector powered search, including some established players and a range of new projects – we’ll be hearing a lot about vector search at our Haystack Europe conference in Berlin next week.
It will be interesting to see how these new challenges are addressed by each of the three Lucene-powered projects as we look to the future.