Lets Stop Saying 'Cognitive Search' - OpenSource Connections

May 28, 2019 Doug Turnbull
Category: Learning-to-rank

I consume a lot of search materials: blogs, webinars, papers, and marketing collateral. There’s a consistent theme that crops up over the years: buzzwords! I understand why this happens, and it’s not all negative. We want to invite others outside our community into what we do. Heck I write my own marketing collateral, so I get the urge to jump on a buzzword bandwagon from time-to-time.

That being said, I want to tell you a dirty little secret. Nobody really knows what ‘cognitive search’ means in any concrete sense. Sit two people down, and ask them what problem ‘cognitive search’ solves, and you’ll get two different answers. Most likely they imagine some kind of silver-bullet solution to a unique, painful search relevance problem they’re experiencing. Problems that require careful, deep thought where there truly isn’t an easy, one-size-fits-all, silver-bullet solution. As this inevitably leads to disappointment, buzzwords like this ultimately disenchant those brought into the community. Let’s go over why it’s important to scrub buzzwords from our language if we want to help each other solve our relevance problems.

1. The ML techniques applied to search are extremely varied

‘Cognitive search’ implies a single machine learning technique. But there’s such a broad range of potential machine learning ‘lego bricks’ that just saying ‘cognitive search’ doesn’t concretely convey how a problem is solved. You might as well say the problem is solved with ‘computers.’ It could mean learning to rank to optimize relevance ranking. It might mean embeddings or knowledge graphs to improve semantic understanding of queries and content. Maybe they use contextual bandits or neural language models?

As many Haystack talks on these subjects attest many of these solutions can just as easily create a disaster as they might be a perfect fit for the problem. The takeaway is if someone says they’re going to apply ‘cognitive search’ to a problem, ask them to get specific about the techniques they use and why they’ll be appropriate to your problems.

2. Expect customization, not turn-key solutions

Search relevance requirements have subtly unique requirements that are hard for users to consciously express. An internal enterprise search engine has not much in common with Google. Even within domains, one job search experience (ie general purpose Careerbuilder) might have little resemblance to another’s (ie hourly-wage oriented Snag). None of which look anything like E-Commerce Search or Patent Examiner Search.

Matching the machine learning lego pieces to the subtle, hard to capture specifics of an application is hard. You should expect a lot of customization. I’ve noticed that search engine product companies have A LOT of professional services. If you care about the problem, you’ll either need a competent internal team or a long-term implementation relationship with a vendor. An honest question is whether you would rather own that customization yourself? Or outsource it to an external entity? If getting it right is core value, tread carefully. Don’t assume because it’s all just ‘cognitive search’ that you don’t need to think very deeply about the problem.

3. ‘Cognitive Search’ is a hypothesis, not a solution. Can you measure it?

The silver bullet to improving search quality is not machine learning, its measurement and experimental methodology. With so many possible ways of applying machine learning to relevance, you need a system for evaluating which will work, and which won’t. It’s not as simple as getting a ‘cognitive search’ engine and saying its done and dusted. Every proposed solution should be approached as a hypothesis, not a guarantee.

Can your organization scientifically study how a hypothesis impacts your business? Do you have KPIs tied to business value, that demonstrate search relevance’s impact on your business? Do you have a way to systematically measure what search results were good/bad/indifferent for a given search query? Have you broken this down by persona, and user segments?

In any progression of a search team, this is the first and best use of data scientists. It’s the foundation for everything else the team does. It’s hard work, but an extremely worthwhile way to increase your pace of innovation with real, concrete machine learning methods.

No more buzzwords, let’s talk about the real techniques used

Instead of talking about vague buzzwords like ‘cognitive search’ let’s pierce the veil. Let’s become aware of the actual techniques that are being used. In the 90s and 00s, regular software development was in the position machine learning is in now. It all sounded vaguely magical if you weren’t a software developer. But over time that’s changed. Product teams and business analysts can more carefully reason about a broad range software techniques. Should an app be built to run entirely client-side in a browser, run in Javascript? Or should it be something built out entirely server-side? Is this a relational database? Or something else? Your average BA might not be able to get deep in the technical weeds, but many know enough to think critically and push back on how these technical decisions impact product needs. We need to get there with machine learning and search relevance.

Everyone needs to be a scientist, not just data scientists

To become a ‘relevance centered’ enterprise, the hardest adjustment I’ve seen in organizations is moving away from a traditional software development mindset to one that’s more scientific and hypothesis-driven. The idea that 93% of experiments will fail is hard to fathom. All that investment, to get that little nugget of success? The reality is, that’s just honest science! In a mature, hypothesis-driven environment negative results and failures are expected. The goal is to improve the pace and quality of experimentation, where value is gained as much by knowledge gained not just movement of metrics. Fail fast, succeed with confidence.

Changing how our organizations work on software is challenging. So understandably, organizations want to outsource the problem to a ‘cognitive search’ vendor. Organizations, managers, and executives often think ‘my vendor can handle this headache for me’. But really this is just punting organizational issues down the road. If search is core business value, eventually you’ll lose to competition that can act with more scientific agility to attack problems in the marketplace. When you can quickly experiment with real solutions in the marketplace, without disruptive organizational bureaucracy or vendor dependencies, you gain an incredible advantage that’s more fundamental than any machine learning technique. So think carefully about how your team, vendors, and the broader organization fits together to create a fast, experimental pace.

We need Machine Learning literacy beyond the Data Science team

The underlying problem behind buzzwords like ‘cognitive search’ is the lack of machine learning literacy outside of data science teams. A common problem when we consult with search teams is seeing a data science team developing capabilities which the rest of the search team ends up not using. Largely because most of the search team doesn’t know what to do with them. To the search team, the delivered ML model is a black box, impossible to reason about or manage, and usually developed in a siloed data science team without a tremendous amount of external input. Understandably, the data science team is equally frustrated: why aren’t they using this promising new technique? This dynamic really needs to change.

I’m pleased to begin to see job titles like “Machine Learning Engineer”, tasked to be aware of machine learning sufficiently to think critically about how machine learning works in production. Our conception of ‘Relevance Engineer’ is similar: someone aware enough to use and develop machine learning information retrieval capabilities in production, even if they’re not going to innovate the fundamental science. For example, as engineers, we can all appreciate and learn how LambdaMART works but we don’t all need to be creating brand new learning to rank methods.

These are just initial baby steps, we can’t make progress until everyone has awareness and education of machine learning concerns. In the same way product teams can now think critically about many aspects of software development, so must they now think about how the extremely diverse and large universe of machine learning techniques fits in. It’s not just one thing called ‘cognitive search’ but a huge universe of techniques with tradeoffs and areas for your organization to innovate and grow.

That’s not all folks –

I hope this wasn’t too much of a diatribe! Let me know if you disagree with my thoughts here, eager to hear other viewpoints!

If you do take machine learning literacy seriously, please check out our Summer of Relevance – 4 days of training where we go from basic relevance, measuring search quality, up to advanced machine learning techniques that you can’ implement on your own. As always get in touch if we can be a part of increasing your orgs maturity with search and relevance through consulting or training.

Lets Stop Saying ‘Cognitive Search’