Stop Worrying about Solr vs Elasticsearch Decisions

February 28, 2019 Doug Turnbull
Category: Solr

I used to fall into a trap that many search teams do. Faced with a seemingly consequential choice, with two products that seem to be fairly similar, it’s easy to fret about whether “we’re making the best choice”. It’s like staring at purchasing two seemingly similar cars. You worry one will be a lemon – an expensive mistake your team will regret later. Or perhaps you’re not sure you really need that 3rd row of seats or the extra cup holders. Everyone worries they’ll get stuck with a bad choice.

For many, Solr vs Elasticsearch decisions are dealt with this ‘product comparison’ mindset. Teams look at a feature comparison, and try to make heads or tails of the best technical choice, trying to avoid an expensive mistake.

But choosing open source search engines are not like buying cars. Solr and Elasticsearch are not fully baked, ‘products’ to compare blow-by-blow. Instead, these search engines are malleable raw material your own search team’s product is made of. They don’t come with a beautiful paint job with a third row of seats or cupholders, just a bunch of raw parts, a pretty solid engine connected to some wheels. It’s up to you to mould it into something useful.

How people solve search problems using Solr or Elasticsearch helps make it more concrete:

Where does the search engine fit in to your tech? How search relates to adjacent backend technologies changes a lot from team to team. Some orgs keep the search engine at arms length, integrating a lot of cool “magic” outside the search engine. Others become Lucene hackers getting under the hood, to really leverage the available data structures
How skillfully is it configured? Much like relational databases, how you use the search engine depends on how skilled you are at configuring it. Some things you can get a search engine to do defy a simple feature comparison. A clever query or way of performing text analysis has less to do with the specific search engine and more to do with the skill of the relevance engineer.
Extend the search engine with plugins? Of course when all else fails, you can create plugins! You can simply add the ‘3rd row seats’ as you so desire, or remold the search engine to do what you need it to.
The hardest work isn’t search engine specific. Really good relevance teams spend more of their time obsessed with experimental methodology (testing/measuring relevance) than even using the search engine.

These areas point at one missing ingredient from the discussions. Team skill. If open source search is ‘raw material’, then teams need skill to mold to their needs. I have never seen a search team fail because they chose Solr when they should have chosen Elasticsearch (or vice versa). Smart teams use both: Wikipedia uses Elasticsearch. Reddit uses Solr via Fusion. Where I have seen teams fall short is because they simply don’t have the skills to mold the putty into the ‘work of art’ that the organization needs. And these aren’t truly ‘Solr’ or ‘Elasticsearch’ skills, but skills in relevance.

How should Solr and Elasticsearch be compared?

The short answer is don’t worry about it, just pick one, and you’ll be fine.

The long answer is to think about differences in communities rather than features. Compare stock car racing vs formula one racing.

Both are different communities, with different histories, and assumptions for how you should build race cars. Both with different modes of operating, rituals, norms, and believing what car racing should prioritize. Teams choose Solr or Elasticsearch because they feel an affinity to a community’s ethos, not for technical reasons.

One imperfect way to think about Solr vs Elasticsearch is whether the community is optimizing more for the “contributor” or the “user” community. “Contributor” meaning teams that want to pull the direction of the project. “Users” meaning teams that wish to download it and ‘make it work’ with no interest in pushing back changes to the main project. Even “users” here are still rather advanced, configuring and writing plugins to do their work without pushing changes back to the project.

Solr, for example, hails from of the Apache Software Foundation (ASF). The ASF works to encourage community ownership. ASF wants to create norms and culture to allow many individuals and companies to craft the direction of the project. Consequently, there’s not just “one vision” of what Solr is or could be, there’s competing voices, all pushing and pulling the project through contributions and democratic discussion. It’s messy, full of conflict, but it also has the opportunity to get buy in (and resources) from all parties interested in the direction of the project.

On the plus side, Solr’s ‘contributor’ focus means that there’s a lot of opportunity in Solr to really dig into the source code. On the negative side, there’s often lots of “opportunities” in Solr to really dig into the source code. The contributor focus means Solr can feel very ‘design by committee’. Features are often at various levels of “baked”. With one-off, barely working, or highly evolving features making it into Solr releases. These features go through an evolution of dying off, growing, getting fixed, or maybe just orphaned and staying buggy. This is seen as OK… or maybe even a contributor recruiting strategy. Because after all, this is a democratic, open source project! Pull requests welcome. YOU can fix it! As a pure user, however, being surprised by bugs at the 11th hour can be rather frustrating.

Elasticsearch, on the other hand, optimizes for “users” of the search engine. Elastic, the company behind Elasticsearch, controls the direction of the project. Generally speaking Elastic have been good stewards of the community and project. Elastic thinks carefully about the where they want the project to go, trying to deploy well thought out, complete features. They eagerly prune the ‘bonsai tree’, removing directions they disagree with. As an organization that wants to steer the direction of the search engine, this can be a frustrating model. It looks very undemocratic. Where you want to take the search engine might go nowhere unless you can convince someone at Elastic to back your play.

But, put yourself in a “users” shoes. Someone uninterested in contributing (most orgs) want it to “just work”. Having Elastic’s single vision pushing the direction of the project helps those users stand something up and simply get it to work. Since most people are users not contributors to the code, having a well thought out API, with only fully-baked features, is a huge advantage to the Elastic stack. It ‘just works’.

More alike than different

At the end of the day, Solr and Elasticsearch have more in common with each other than they differ. As open source projects, they’re raw material, infrastructure – not fully baked solutions. You would never compare a group of car parts to a fully formed car. Of course a car taken off the lot will perform better than your pile of car parts. But enough work can turn those car parts into a really tailored, purpose-built car, or bull-dozer, or whatever weird thing you’re building.

Or more directly, see them both as frameworks for implementing your specific search and discovery solution. But don’t worry too much when your team chooses one, but not another.

At OSC we specialise in both Solr and Elasticsearch, so do get in touch if you need further advice or help with your project. We also offer search relevancy training for teams using either engine.