Solving The Search Management Minefield in Open Source Search

August 7, 2017 Doug Turnbull
Category: Elasticsearch

A common theme in software development is the tension between:

Fixing painful bugs now!!
Creating tools & practices that let us fix bugs faster (though at some theoretical time in the future)

The same is true in open source search. Item (1) “Fix bugs NOW!!!” can be the cry of the search sponsor. So for a very technical search team, the work begins. The techies scramble to try and fix search complaints. They find it frustrating and counterproductive. Feedback is inconsistent, with different answers seemingly coming from analytics, experts, and bosses. Much of the time testing is manual – a hopeless task with millions of search terms. The techies don’t have insight into the domain/business to resolve many fine-grain technical decisions. Even more disastrous, the technical changes tend to have dramatic and broad impact. If you’re not extremely careful, these changes damage the business by creating more bugs. Fixing everything with technical solutions can be a bit like getting a bulldozer to dig a few holes to plant flowers.

Ironically, while Solr / Elasticsearch are open source tools, they are more opaque to the business than targeted proprietary search engines that “speak the language” of a domain or business. How many e-commerce store managers miss the Endeca experience, only to be poo-pooed by their technical team as not being with the latest tech? Technical teams, meanwhile, burdened with fixing search tend to move to geekier solutions than what’s actually needed that make it even more obscure to tune search for a domain, not realizing search is getting even further from those who know what’s relevant. We need to moving in the opposite direction: making search more manageable by the non-technical team. I congratulate vendors like HawkSearch, Algolia, and AlphaSense for being more targeted at helping a specific type of domain/user rather than promising broad silver bullets.

Sometimes, open source search teams, prioritize (1) geeking out & fixing “bugs” but end up making fixing future bugs much harder. This is because in open source search, we fail to consider how technical and non-technical roles work together to build good search for a given domain. We think of every bug as technical, and end up gardening with bulldozers. Open source search can be a constant cycle of “fixing” (in reality creating) bugs. We never taking on the real problem of open source search. We never make resolving bugs easier. We don’t tackle the gargantuan gap between the technology and the domain.

The role of the tech team in open source search is sorely misunderstood. Instead of the open source search team being the group responsible for search bugs, it needs to be the group responsible for enabling non-technical specialists to manage search and their own relevance challenges. The tech team fixing the organization’s search bugs is almost anathema to their true purpose.

Technical and non-technical search shareholders need to work to define a working contract for how search should work. This contract, should

Communicate what’s technically possible/not possible to the non-technical search team
Negotiate what assets (synonyms, taxonomies, etc) the non-technical team can maintain to have targeted impacts on search relevance
Define a rough sense for how ranking works, without getting too fine grained into jargon
Establish criteria for high-quality content to enable other technical/non-technical teams to edit/tune/curate content appropriately (see Google’s SEO guidelines)
Coach the whole team to understand how to measure the impact of changes, especially technical and asset changes

The #1 differentiator between a successful relevance project and not-successful relevance project, has not been technical wizardry. Instead it’s been my ability to enable the client to perform this kind of robust & targeted search management. It’s been using technology to enable search management best practices, as defined by people like Martin White. To make it faster for the non-techies to fix the bugs, by giving them powerful tools and workflows.

Given a “contract” it’s easier for the technologists and non-technologists to take a search bug and with some work to escalate it accordingly. Ideally, you would first see if the bug is actually a content issue. A poorly formatted title, or misused tag, would have a narrow and conservative effect and unlikely to upset the search apple cart. But this can often be too fine of a scalpel. Extremely non-technical teams only have this option, limiting the breadth of changes to only careful pruning.

That’s why technologists arm the organization with the second level of escalation, which has a slightly broader impact. Modifying an asset like a synonym or taxonomy entry can impact a set of search use cases at one time, but without the big sledgehammer of modifying the fundamental technology. An entire field of information science is available for us to tap into, that sadly goes ignored by technologists. But the power is there to implement these assets in our organization to give domain experts power tools to control search.

As the search team gets even more sophisticated, the organization needs to break through the murky mystical corners of machine learning to manage that too. The inputs to machine learning models, it’s validation, and careful application of domain expertise need to be owned by the non-techies at the end of the day. Silver bullet promises of machine learning can never get around “garbage in, garbage out.”

Finally, when all else fails, the technical and non-technical team can sit together to ask – how can we innovate with technology to create more ways for the non-technical team to impact search? Would adding a new asset help control search? If we introduced, for example, named entity extraction from content or queries, how would non-technical team use the available bells and whistles to solve their own search problems? How would that functionality interact with the other, existing pieces of the search contract?

What I’m describing is an organizational need, not a technical one: search management. Search management is how the organization works together to manipulate search to support the business. Transcending the idea the user’s search bugs are technical bugs, is the work of the organization adopting open source search.

Combining good contract-based search management with the deep customizability of open source search is how you get the power of open source search. YOU get to constantly change what’s possible without being dependent on a vendor. Solr and Elasticsearch aren’t Amazon or Google, they are YourThing(™) where you get to iterate on what the business can do. Your audience as a search developer ISN’T USERS instead it’s your colleagues that have to answer the 3AM calls from the sales team. Your job is to give them tools to make more possible.

This is why I’m passionate about tools like Quepid, Splainer, FindTuner, PoolParty and Querqy. They each solve a targeted niche, helping to create contracts between between domain experts and techies. I see tools like these across many clients: proprietary tools that make search management easier. As proper search management tends to be neglected in the open source search field. I’m eager to see what others come up with. It’s a wide open market and there’s room to make broad impact.

But while that market is developing, be aware of what you’re getting into with open source search. Your entering a brave new frontier where you’ll need to own not just a production search infrastructure, but tooling and management of complex issue specific to your domain. Often these latter topics are what makes a solution complex and powerful!