When Zero Search Results is the right answer (& how to measure this in Quepid)

July 2, 2024 Eric Pugh
Category: Quepid

The other day I was working with a colleague on improving a Known Item Lookup use case. Our goal is to correctly look up an item, and otherwise return nothing. We are very much focused on the @1 result position.

To illustrate this kind of query using the TMDB dataset featured in our Think Like a Relevance Engineer training courses, if I search for "Star Wars", I want Star Wars a New Hope to come back in position one. However, if my query is "space movies", the response we want is zero search results (a ZSR) since that is a broad category search, and NOT a known item look up.

This means that sometimes a ZSR is the right result, and sometimes it is the wrong result.

The default scorers like Precision@1 or MRR@1 don’t look at ZSR at all, they just don’t exist in the scoring model. However, in this blog I want to share how you CAN decide when ZSR is correct behavior, when it is incorrect behavior, and how to model that in Quepid.

To start out we’re going to create a simple case in Quepid. Go to app.quepid.com and run through the default TMDB case wizard using Solr as your end point. Then add three queries: "Star Wars", "Star Trek" and "Space Movies".

The best we could hope for is:

Query	Number of Results
Star Wars	> 0
Star Trek	> 0
Space Movies	0 (ZSR)

Letting Quepid know about which queries for which we want Zero Search Results

We’re going to build a custom scorer based on Reciprocal Rank that understands that the query Space Movies getting 0 results is a good thing, and that if Star Wars or Star Trek get 0 results, that is a bad thing. First we need to annotate our Space Movies query with some JSON that tells us that the query is a ZSR query. Click the Set Options button to pop open the modal that lets you provide additional information via JSON that can be used by our custom scorer:

Now we can create a custom Scorer based on Reciprocal Rank that has additional logic around forcing the score if you expect ZSR or not for a query.

const k = 1; // @Rank
let rank = 0;
//alert("numFound:" + numFound() + ", zsr_expected:" + qOption('zsr_expected'));
if (qOption('zsr_expected') && qOption('zsr_expected') == true){
    if (numFound() > 0){
        setScore(0)
    }
    else {
        setScore(1)
    }
}
else {
    eachDoc(function(doc, i) {
        if (rank === 0 && hasDocRating(i) && (docRating(i)) > 1) { // Make binary 0,1 are irrelevant, 2 and above is relevant.
            rank = i+1; // remember the rank of the first relevant document
         }
    }, k);
    const score = rank > 0 ? 1.0 / rank : 0.0;
    setScore(score);
}

Notice the commented out alert(), uncomment it to debug your custom scorer.

Seeing the Zero Search Results Scorer in Action

In the screenshot below you can see that we are returning plenty of documents for all three searches. Star Wars is returning the original Star Wars movie in the first position, and Star Trek is returning the Star Trek: Generations movie (you can’t see this in the screenshot, but trust me!). The results are being scored with basic RR@1.

The fairly broad query Space Movies is returning 176 results, of which the first result was rated as a two, so partially relevant. This would normally lead, with RR@1, to that query/doc pair being considered relevant and scoring to be 1.

However we know that in our specific use case of doing Known Item Lookup, that the correct result for Space Movies is zero documents. Therefore, returning zero documents should give us a 1, and returning 176 documents should score a 0.

Swapping over to the RR@1 w/ ZSR scorer that we created above, you can see that the overall score has dropped from 1.0 to .67, because now the fact that we are returning results for Space Movies is considered bad, regardless of the fact that the first doc was rated a 2, due to our zsr_expected query property being consulted during scoring.

The only way to get back to a 1 for Space Movies is to make the results return zero documents! To do this, we can change the query parameters to only search the title field and add a Minimum Should Match of 100%:

q=#$query##
&defType=edismax
&qf=title
&bf=vote_average
&mm=100%

Now our score has jumped because we have reduced our recall and increased our precision, and we see it with the ZSR scorer:

Conclusion

Now you know how to use Quepid to take into account zero search results as part of your scoring model! There are plenty of other clever things you can do with custom scorers in Quepid – remember that it’s not just the number of results that counts, but whether they help your user achieve their search task.

If you need more help with Quepid jump into the #quepid channel in Relevance Slack – or contact us to find out how to set up powerful processes and tools for improving search result quality

Image from Zero Vectors by Vecteezy