Building a Search Technology Radar - OpenSource Connections

January 10, 2022 Eric Pugh
Category: Community

Search is a field that is constantly reinventing itself, and it can be hard to figure out what the technologies of the future are that you should be investing in. I’ve been a long time fan of the Technology Radar concept, created by the smart folks at ThoughtWorks, it’s been my go-to resource for helping me make technology choices for software development.

In September 2021 as part of the Haystack conference we solicited ideas from the community on what the future technologies for search looks like, using a question like: “If you were to send out pings to the horizon, what would bounce back as weak signals of future interest and what would bounce back as a strong signal that you should immediately pay attention to?” We collected from the community over forty suggestions, called “blips” in Radar parlance, and then Jeff Zemerick, Khalifeh Al Jadda, and myself spent a day together and tried to make sense of the suggestions.

This is the very first time we’ve done this, and we learned a few things:

After our brainstorming session we grouped the items as best as we could into four groups. We struggled with this because of the overlap and the lack of balance between the categories. For now, those groups are:
1. Techniques – These items are the individual methods of improving search.
2. Applications – These items are use-cases of search.
3. AI/ML – This is all about ways which using machine learning can improve search.
4. Tools – In-house and third-party tools.
The borders of search are very nebulous, we were hard pressed at times to decide if a “blip” was part of the field of search, or not. We did end up bringing in almost everything related to ML, as we feel that the field of Machine Learning is becoming more and more the heart of Search.
We dropped the blips that were focused on a specific technologies, either as a vendor service, like Amazon Kendra or a specific model like BERT. There are so many of those specific efforts, that to attempt to categorize them would be exhausting, and that there are no specific technologies that we felt were so individually game changing that we need to include them by name. This may change in future Radar’s!
Lastly, this Radar was completely shaped by the views of the people who submitted blips and the people who filtered them, so please don’t take this as an objective assessment – this is very much a opinionated view of the future!

The first Search Technology Radar

The Technology Radar itself is published at https://haystackconf.com/radar, and we look forward to hearing from you on where we hit the mark, and more importantly, where you think we missed! We hope to revisit this during future Haystack events. You can give us your feedback in Relevance Slack or Twitter.

We did want to call out a couple of items we noticed during this exercise:

The Rise of Neural Search

Neural Search is the label we gave to a whole cluster of related blips: Deep Neural Nets, Using Vector Specific Search Engines, No Inverted Index, Embedding Search, and Semantic Search. Neural Search is the label that academic communities are using for going beyond token based search. We placed Neural Search in the Trial category as this is the future of search, and if you aren’t using it today, you will be using it tomorrow. It remains to be seen what actual form Neural Search ends up taking, as there are many different ideas bubbling around with that label, so we look forward to analyzing this area in more detail next year.

Blossoming of Non Lucene-based Engines

2021 makes me think of the heyday of the NoSQL movement, where we had a new database revealed every month that was going to change the future of data storage, and eliminate SQL forever! It was an exciting time. For the past few years, the conventional wisdom has been to use either Elasticsearch or Solr for search. These two products have been a duopoly for anyone looking for an open source search solution to deploy, and both of them depend on the Lucene library to power them. However, today we have a wealth of competitors that are all trying to be the Neural Search engine of the future. Some continue to use Lucene as part of the mix, others like Tantivy re-implement Lucene in a non Java language, and some like Milvus and Weaviate reject tokens matching completely in favour of vectors. This makes the question of “Solr or Elasticsearch” seem very 2016 in nature! You may remember we focused on some of these new search engines during last year’s Haystack LIVE! Meetups – you can find talks given by their creators on our Youtube channel.

If you need help navigating the changing world of search, do get in touch.

Image from Radar Vectors by Vecteezy