Pete finds out how to rate search results to create a judgement list

December 18, 2020 Charlie Hull
Category: Ecommerce

Pete has just been hired as Product Manager for Search at electronics retailer Chorus Electronics. His boss, a Vice President, has asked him to build a ‘best-in-class’ e-commerce search engine, to ‘increase customer satisfaction and drive improved revenue’.

Now his team are starting to tune search relevance, Pete has realised that he needs an effective and accurate way to measure how good or bad search results are for particular queries. So far, he’s been reacting to known issues (like the accessories problem) or business drivers (like vendor deals) – he’d prefer to get ahead of the game and introduce some regular testing of search queries.

He’s also realised that his developers know a lot about software and how their search engine has been configured, but they don’t know as much about the products Chorus Electronics sell. How are they going to rate search results and give them a score?

An in-house expert

Pete’s colleague Manny is a merchandizer for Chorus Electronics: his job is to help sell certain products. He’s got a great overall view of the product catalog – for example, he knows that a drip coffee maker and a pod coffee maker are different things, and that batteries come in sizes from D to AAA. He’s a subject matter expert or SME for all the things Chorus Electronics sells. Manny is going to help Pete’s team come up with some ratings, using his knowledge of both the product catalogue and what users search for. We call this a judgement list.

Deciding which queries to test

Pete and Manny’s first step is to decide what to test. Chorus Electronics’ website gets thousands of search queries every day – which are most important? Pete asks Katherine, a data engineer on his team, to gather some website logs from the last month and prepare some graphs, which show the classic ‘long tail’ curve – a bit like this one from Max Irwin’s blog on Site Search KPIs (which shows queries from a gardening supplies website):

Taking Max’s advice to “Look at the queries that you know are popular and are important to you as a business.” Pete chooses a list of important or high-value queries:

bluetooth speaker
surround sound speaker
phone case
...

They also agree on a simple scale for rating:

0 - poor
1 - fair
2 - good
3 - perfect

If he chose, Manny could now try all these queries on the Chorus Electronics website and record how good or bad the results were. He’d have to record all the results manually – perhaps in a giant spreadsheet. However, our Chorus platform provides a much better alternative – Quepid – with a easy to use web interface, customisable scoring system and built-in snapshots.

Rating Search Results with Quepid

In our next video Eric Pugh will show you how Quepid can be used to rate the results of a search query. You can find out more about best practices for judgement rating, as well as many other tips and guides, in the Quepid Wiki.

Chorus is a joint initiative by Eric Pugh, Johannes Peter, Paul M. Bartusch and René Kriegler.

Read the complete Meet Pete series about e-commerce search:
1. Meet Pete, the e-commerce search product manager.
2. How does Pete, the e-commerce search product manager, build a web shop?
3. Pete solves the e-commerce search accessories problem with boosting & synonyms
4. Pete learns about selling search keywords with Chorus
5. Pete finds out how to rate search results to create a judgement list
6. Pete learns how to scale up search result rating
7. Pete learns how to curate search results for a single query
8. Pete establishes a Baseline Relevance Metric
9. Pete improves a new class of queries with redirects