When click scoring can hurt search relevance – towards better signals processing in search

Have you heard of “click scoring” or “click tracking”? In the context of search click scoring is the method whereby you collect statistics on where users click in their search results, then use that information to prefer that search result for the queried text.

Consider, Virginia Decoded. A set of Virginia’s state laws For example, if the user searches for “Virginia Tech” these are the search results that come back:

Searchs for

Searchs for “Virginia Tech” dont look so great

What would happen if we relied on simply click scoring as a measure of the user’s behavior? There’s a good chance that the first result would be reinforced as a good search result for “Virginia Tech”. Which, in the absence of other data, seems like the best possible search result for Virginia Tech. So click scoring helps, right?

Well it turns out that “Virginia Tech” is only one way to say this school’s name. As a proud hokie, I know the right thing to search for:

We get better results when searching for Virginia Polytechnic

We get better results when searching for Virginia Polytechnic

Wow, the search results are much better! Clearly laws about Virginia Tech mentioned in the title are likely much more relevant to my search.

Your next thought might (and should be) – would click tracking help or hurt in this instance? Well certainly a lot depends on the implementation. We would see many users go through something like the following click log of behavior, which our click tracker will then reinforce:

  • Query: Virginia Tech
  • Click: First search result

In the case of the search for “Virginia Tech”, the most relevant search results aren’t even on the page. Instead of helping, we’re reinforcing something fundamentally irrelevant and wrong in our relevancy implementation. It builds inertia behind what’s already there on the search results page, damaging search relevancy.

Let’s look at another case, one where the search results are already looking pretty good:

  • Query: Virginia Polytechnic
  • Click: One of many results

Here, we already surface any number of potentially relevant documents to the user. Click tracking can help us optimize the correct placement of the top search results. For example if the 3rd result is the dominant search result, we can boost accordingly, putting it in the #1 spot.

Yeah but do we even have this data?

If you build a bad search, and nobody clicks on it – does it make a sound? 🙂

To detect these patterns, we need to be generating enough search traffic for the data to be statistically significant. Does your search application fall into that category? Consider an internal search application that searches laws for a small law firm. Search traffic is relatively low. Every day the problems and intent is different. There’s always new cases to work on, and its unlikely that a search from last week will even be seen again for years. Second, it may also be unlikely for the right answer (the best result) for a search, say “car tax law” to be the same from case-to-case.

For this law firm, does it even matter what gets clicked on? Does this feedback help relevancy? Probably not. Every search is an outlier, stuck in the vagaries and oddities of the current problem and the current person doing the search. What is clicked could only interfere with relevancy. The next time a term is searched, the context will be different. The right answer not quite the same.

Now let’s say we’ve convinced a lot of law firms to use our cloud-based legal search application. It’s cheaper, but in exchange, we’re going to use your click tracking data to make everyone’s search more relevant. We’ve got the entire body of laws available. We’ve captured a lot of lawyers and paralegals doing search. Now does click tracking help?

Perhaps! It’s still unclear. In some cases, where traffic is significant and the right answer obvious its a clear win. If our searches are highly clustered around a small set of concepts – if “car tax” is searched frequently, click tracking could help us optimize this search results. However if the diversity of legal searches is far greater, if everyone’s case is different, and both queries differ a lot and right answers to queries differ enough, click tracking becomes less of a clear win.

Another important component is your audience. It’s likely that when you’re sitting down doing really intense legal research, you’re willing to page through a couple of pages of search results. So this could be good for click tracking – if the right answer for “car tax” is always page 3, item 2.

Or it could have no impact on search results. If the “right answer” differs among 30 correct search results depending on if we our user is searching for “car tax” because their client is late paying taxes or “car tax” because their client wants to get a tax break next year for their hybrid.

Better Relevancy before click tracking

The utility of click tracking rests on already having close to a good relevancy solution. If you’re first page of results gives you good results, then click tracking can help clarify user intent of what’s on the page. Poor relevancy, and the clarification will be misguided – like choosing the least-worst option of those presented. It’s like a good chef giving you a menu that looks like (1) steakums, (2) kraft mac & cheese (3) microwavable fish sticks. If everyone chooses fish sticks, does that mean the chef is pleasing their customer with fish sticks?

No that chef still needs to serve good food. And we still need to build good search results. We need to do important work to make search relevant. Things like:

  • Proper text analysis, stemming, lemmatization
  • Proper field weighting, titles are important to match on. Body text less so.
  • Weighting of phrases– “Virginia Tech” in a phrase over just “Virginia” and “Tech”
  • Recognize “Virginia Tech” and “Virginia Polytechnic” are synonyms or better, represent that these strings correspond to the entity (id 12345) which has text representations “Virginia Tech”, “Virginia Polytechnic”, “VT”, “VPI”, etc -But don’t confuse VT for Vermont!

More importantly, you need a process for iterating and improving these search results. A Test-Driven approach for provably making progress on these issues.

All of this has to be built before we can start contemplating the degree to which click tracking, or other signals processing methods matter.

Better Signals, Not just clicks

Fundamentally, another problem is trying to ascertain when clicks correspond to user success or failure. Sometimes, we might be able to ascertain that a user backtracked and searched for another term to clarify their intent just with clicks. Consider the following query log representing a user session:

  • Query: Virginia Tech
  • Click: First search result
  • Query: Virginia Polytechnic
  • Click: One of many results

If this happens enough in user sessions, a click tracker should hopefully determine that Virginia Polytechnic is a clarification on Virginia Tech. If our click tracker does this, the next question we need to ask is how often does this happen? It could be more likely that this happens:

  • Query: Virginia Tech
  • Click: First search result

If we have enough statistically significant traffic, we may determine that a small subset of users (hokies like me) know to clarify their search with a different term. More likely we’ll see that Virginia Tech results in the first search result 95% of the time, with an odd 5% changing their mind and searching for Virginia Polytechnic. Those weirdo hokies have no idea what theyre talking about – so well stick with the obvious 95% of the users that know what theyre doing.

Most importantly – what are we not capturing here? We aren’t making a determination on whether or not the user was ultimately successful with a result. Did the 5% of hokie users turn out to be more succesful with the search app than the 95% of other searches? And how do we measure this? Did they spend time with that result? Did they add an item to their cart? Was there a conversion to something we want (and they would want).

And nothing here captures whether the user failed in their attempt. What’s an anti-conversion look like? The user ends up on a document another way? More likely, in today’s world of impatient users, it might be you never see them again. If measuring failure is the absence of a user, then its extremely challenging to measure, or more importantly to engage that user ever again.

A better signal processing and monitoring framework reinforces more than clicks – it builds upon statistically significant user success. It defines metrics for success custom to your application and user base. It determines when and if a signal is statistically significant to have meaning in search results scoring, and it optimizes the statistically measurable use cases.

Perhaps more importantly, it’s a monitoring solution to prevent users from fleeing search. It’s an opportunity to flag where more in-the-trenches relevancy work can help. Where search grunt work like realizing that “Virginia Tech” and “Virginia Polytechnic” are the same thing. Or that searches for “Tyler O’Connell” turns into a search for “tyler o connell” not “tyler oconnell” because of how text is tokenized by a search engine. A way to file bugs back to your search team for areas where their expertise with Solr, Elasticsearch, Lucene, etc could be valuable.

Is “click tracking” by itself useful? Maybe. But more importantly is search success and failure monitoring. Followed by a contemplation on how to make this monitoring data useful to search for your application. Its not an easy button. Its another variable to tune, another dimension to factor into search along with your relevancy engineering work.

Would you like to take advantage of our expertise building contextual, relevant search applications? Interested in determining whether click or signals tracking is appropriate to your application? Contemplating LucidWorks Fusion or Datastax Enterprise to combine search and signals analytics? Contact us! We’d love to help.