Blog

Dogfights in Open Source Search: Solr, OpenSearch and Elasticsearch

I’ve had some interesting conversations over the last few months around the three big hitters of open source search: Solr, OpenSearch and Elasticsearch. Solr is the old dog in the fight, with a history stretching back to the mid 2000s; OpenSearch is the new, eager puppy, cloned from a slightly older canine Elasticsearch, but now growing up on its own. All have slightly different arrangements at home in terms of care and feeding, but who would come out with fur intact? Are any on their last legs?

‘Which is the best search engine’ is of course, a rather silly question, as I’ve stated before when hosting The Great Search Engine Debate, a fun panel I’ve hosted at Berlin Buzzwords. Forgive my consultant’s language but this is a clear case of ‘it depends’ – all have broadly the same feature set (unsurprising as they share a Lucene core) but there are differences in approach, community and most importantly, future direction. Let’s take a deeper look:

Solr, that gets there in the end

Solr is a lumbering beast of a search engine, a classic Apache project in some ways – there are often 5 different options to choose when implementing something (take highlighting for example), a configuration system that has grown rather than been designed, and it lags a little in terms of providing some of the latest vector-powered AI features. Solr’s strength and simultaneously its weakness is that no one company controls it or holds any kind of majority share – meaning its roadmap is decided by a group of committers from many companies large and small and what is being worked on will often depend on the interests and focus areas of the active committers (many of which aren’t paid much, if at all, to work on Solr). However, it still has a huge market share, with giants like Apple, Bloomberg and Salesforce all adopting it and funding contributions to the codebase. In the past, Lucidworks were major drivers of Solr, as they provided support for enterprise users and ran the popular Lucene Revolution conferences – but although Solr still powers their commercial search products you wouldn’t know this from their product pages, and many of the Lucene/Solr committers who worked there have moved on to other places.

I often say that even though it can seem to lag behind, eventually Solr will catch up. This apparent lag can be simply an effect of not having the marketing budget of the other engines – but you can be pretty sure that this week at Community over Code, the big US Apache conference, there will be talks about the latest amazing features coming in Solr 10. It’s not going away any time soon and many of our clients here at OSC such as German e-commerce giant Otto are using Solr quite happily; they understand the strength of the pure open source model and like many others, contribute to Solr’s development, as do we (my colleague Eric Pugh is a longstanding committer).

Elasticsearch, a hard working dog

Elasticsearch‘s original USP was that it wasn’t Solr, unencumbered by history and cleanly designed for cloud. It also found a new market in log analytics, biting at Splunk‘s heels very successfully. However it’s always been very clear that Elasticsearch has one, and only one owner, and that owner sometimes changes their mind – although they also want this dog to work for a living. This means the product roadmap is clear, but the commercial focus is also very clear – Elastic would much rather you paid them for Elasticsearch services, training and hosting than ran your own setup. The search community have benefited in terms of the investment in events (there’s a huge number of Search Meetups started by Elastic, although sadly a lot of them only ever had one event), documentation and examples, plus in development of the underlying Lucene engine. This has been a bone of contention between Elastic and Lucidworks, with both companies keen to control in some way the direction of Lucene – which is fair as they have both employed Lucene committers over the years – although luckily not all the committers work for either company. We’ve thus seen Lucene merge with Solr, only to un-merge a few years later, and the occasional flame war on mailing lists.

Elastic’s commercial focus led to a kerfuffle with Amazon Web Services a few years ago, where both wanted a slice of the same juicy steak of hosted Elasticsearch. As I wrote at the time, although this was dressed up as a battle for the soul of open source search, it was more about who got the biggest bite. Emerging from Amazon’s secret cloning lab soon after was a fork of Elasticsearch, which although initially identical has grown up into a somewhat different animal. In the meantime Elasticsearch has moved from being properly open source, to open-code (not open source), to open source again, all with a sprinkling of commercial options…it’s all very confusing, somewhat like putting a poodle costume on a Rotweiler, leaving the lingering feeling that no matter how fluffy it looks it might still bite you. I do also sometimes wish they worked with existing community-driven solutions like our Elasticsearch Learning to Rank plugin rather than creating their own, but I can understand the commercial imperatives driving this: they want you to come and play with their dog in their back garden!

OpenSearch, the puppy that’s growing up to be a contender

Diverging from its original parent, OpenSearch is no longer the puppy it once was, living in a massive kennel in the clouds. They say having a pet can change you – interestingly OpenSearch seems to have changed how its owner, Amazon Web Services, approaches open source projects. More recently ownership moved to a wider group under the Linux Foundation, a testament to the commitment of the OpenSearch team to ‘true’, non-single-vendor open source as a model for development. No one company will train and feed OpenSearch, we’ll have shared parenting much like Solr, but we also hope and expect AWS to continue to invest significantly in the project.

At OSC we’ve been helping raise OpenSearch, working with AWS on the Learning to Rank plugin (originally forked from our Elasticsearch LTR plugin) and on User Behaviour Insights, a way to record how users interact with search results – and more excitingly, how you can use that data to tune search relevance, balance vector and keyword search etc. The OpenSearch roadmap is much more open than either of the other two options, and there are ambitions to build lots of tools to make tuning search easier for the masses – truly democratising search, which is something we’re very keen on at OSC. Some have suggested to me that OpenSearch is now taking over as the leading open source search engine, but I’m not quite ready to say that. This puppy still has some growing up to do!

Why we need all three – and more

The question ‘which is the best’ can also be answered by ‘why can’t we have all three?’ A healthy search ecosystem needs many options for end users, to promote choice and to drive innovation through competition (we’ve seen this before, when Elasticsearch’s rise drove much-needed innovation in Solr). I’ve not covered the rest of the menagerie either – from Vespa, a Norwegian moose to the Berlin-based bear Qdrant to the Dutch lion Weaviate to…. [alright, enough of the animal metaphors – Editor]. Choice is good, even outside the Lucene ecosystem!

Amazingly, there are also people still writing new search engines – here’s a new one based on Lucene and S3.

A wish for the future of open source search (and AI)

I had a chat with our new associate, AI-Powered Search guru Trey Grainger, last week at Haystack Europe: he asked what my wish would be if could make anything happen in the world of search and I replied ‘a new, well funded, well led company hosting and supporting Solr with a visible commitment to open source’ – filling the gap Lucidworks seem to have left and providing healthy competition to the other two engines with a Lucene base (any VCs out there with some serious dogfood do get in touch and I can probably point you at the right people….). As search becomes the solid foundation for AI – you can’t build RAG without great Retrieval – we need a whole pack of options.

Walkies!


Need someone to lead the way? If you need help choosing or migrating to a search engine, or improving search quality no matter what search engine you use, get in touch today.

Images from Puppy Vectors by Vecteezy & Dog Vectors by Vecteezy.