What Should Your Search Document Be? - OpenSource Connections

December 4, 2019 Doug Turnbull
Category: Uncategorized

In a search engine, the “document” is the basic unit of indexing and retrieval. It’s the “result” on the search results screen when a user enters a query. Many teams make the mistake of selecting whatever entity makes sense to themselves as backend developers to make the “document”. Often this works out, but many other times we don’t take a nuanced view of what the document should be. What is the real “thing” the users want back?

I wrote in Relevant Search that users don’t care how your backend data systems work, they don’t want some searchable view of your RDMS. They want a view specific to search, that structures the data in a way that satisfies their information need – answers their question, solves their problem, finds the best job, or forwards them along a more complex decision making process.

Comparing Grubhub (food delivery) and Kroger (grocery) is an interesting example of this tension around a similar information need (mmm food…).

On Grubhub, most people search for restaurant names, and then explore that restaurant’s menu. However, increasingly there seems to be movement to searching for food items directly. Here’s my search results for my local area, for the query “Shrimp Tempura”:

alt text

We wouldn’t really make this decision without more data (so we can’t really critique Grubhub…). But just as a fun thought experiment, what do you think this search UI should return as a document? Should it return

Restaurants, because the product is really about checking out from restaurants?
Individual food items, with more detail on each item?
Both?
It Depends?

Compare to what we expect in a grocery search UI, we are hyperfocused on single items, as in Kroger’s UI:

alt text

Arguably, there may be domain-specific concerns at play in these examples (Grubhub: I’m checking out from one restaurant among many; but in grocery there’s a more traditional e-commerce workflow). So Grubhub is a bit more in a pickle.

Notice the problem with this query on the Grubhub site. I’m not that convinced the results are relevant for my query. Whereas in Kroger, it looks clearly like Shrimp Tempura. There’s clearly a picture of Shrimp Tempura! I’m sold that the search knows what I want. I can kind of see that on Grubhub these restaurants offer these menu items, but it’s not as easy sell.

Perceived Relevance – It’s about selling the user this is the right ‘document’ for them

What is a “document” is about convincing users in the Search UI that we understand the thing they are asking for. This is perceived relevance: Does the “document” returned relate to my query? Does the search UI appear to know what I want? Does it look relevant to my query?

E-commerce search results give lots of information on why you would want to purchase one product or another, for example on Amazon, a search for blue shoes, you’ll notice a clear picture (all blue), badges, reviews, brands, and other facts that might be relevant to my decision making process. It says “we think you want this sort of thing, and here’s why”

alt text

Of course, it’s not just about showing me blue shoes. It’s about all the unspoken bits of my information need (that I’m an Amazon Prime member, and what I really want are sneakers!). This is why diversity is so important for head queries we often can’t fill in the rest of the user’s specification, so we need to present a range of options, and give users whatever information that helps clarify their information need so they can take another step. (also notice how the irrelevant ads detract from my belief the search engine knows what I want…)

Bringing perceived relevance into our Grubhub vs Kroger example, Kroger has an easier time convincing us that what was returned was actually relevant. All the results from Grubhub, despite their relevance, don’t look as relevant. It may be harder to take the next step. The fact that restaurants are the documents for my foot-item query is a big part of this – a better type of document returned for my query might have convinced us more of the relevance of the results.

Teams come to us all the time with what they say are “relevance” issues. Sometimes what’s really happening is the search UI isn’t selling anyone on the relevance of the returned documents. The choice of the search document may not appear to relate to the query in the search system. The items may actually be relevant, in a roundabout way. But they don’t look the part, they don’t move forward the user’s journey, and if they don’t do that, they’re not as useful “documents” in our search system.

Porque No Los Dos? (why not both kinds of documents?!)

alt text

In the future, in your search system, will it make sense for there to be one kind of “document”? What if I go to an e-commerce search experience and ask a question about products:

> Which kind of running shoe should I buy for trail running?

We’re not there yet for e-commerce:

alt text

Would the right “document” in this case be a set of products? Or would it be best to return some kind of community curated answer (as exists on many Amazon products)?

We’d still need to keep supporting our regular e-commerce use cases, but alongside perhaps another kind of document. We’d have to look at the user’s intent (which will often be ambiguous) and return one or many sets of results that corresponded to the interpretation of the user’s intent. Google, as I’m sure you know, must support many kinds of results for this purpose:

alt text

There’s 3 kinds of results here, products, regular web pages, question/answer pairs.

So for our friends at Grubhub, maybe there’s a future where both restaurants and food could be search results, depending on the user’s query?

We’re seeing more and more teams split up their content into many indexes or types of documents to satisfy different kinds of information need, with an intent classification system in between. In other words, there’s a big “It Depends” box before a query gets to a search engine. What should the document be? What kinds of things should be returned? It depends on what we think the user wants. It’s not about just one search engine and one kind of result anymore. It’s potentially moving to all-federated-all-the-time based on how we interpret the user’s intent.

In other words, it all comes back to the world’s most important search relevance pun:

alt text

And if you put that pun into action, like any good puns, you interpret it many different ways, to hit different processing centers of your brain:

Here with potentially different information needs / use cases behind an ambiguous query, we need to think about the likelihood the user wants a specific kind of response. We can use what we know about the user and other contextual factors (location, time, etc) to weigh the probability of the likely response types.

The future of search is not 10 hot links, it’s conversational, multi-modal UIs that can navigate between the many possible intents of the users.

In other words, we are teaching search engines puns. Prepare for more search dad jokes. I’m sorry.

If you have thoughts on this article, please get in touch! We’d love to work with you on your tough search & relevance problems.