New Quepid Features - OpenSource Connections

February 8, 2024 Eric Pugh
Category: Quepid

Quepid is a free tool, created and maintained by the OSC team, for gathering human relevance judgments and using these to generate overall metrics. It also lets you change the structure of queries and see how this impacts those metrics – so you can use it to tune and improve search relevance. 2023 was a very active year for the Quepid Qommunity and I wanted to share how Quepid has grown and improved over the last twelve months. We’ve added new interfaces for different types of user, ways to work with new – or indeed, any – search backend and new metrics.

Quepid Terminology

Quepid allows you to create Cases to encompass particular things you want to measure and test, defined by groups of Queries. You might for example create a Case about ‘3 word queries about soup’ if you noticed your search performance for that area needed improvement. For each Query you then need some Judgements created by humans – ratings of the relevance of each search result. Quepid organizes Judgements in a Book that consists of query and document pairs for human evaluation.

New Quepid Features in 2023

1. Dedicated Human Judging Interface

As anyone who has taken our Think Like a Relevance Engineer training class knows, the classic Quepid rating interface of a Query with a listed set of document Results is very prone to human bias. It required lots of mouse movements, only allowed a single rating per result and meant you needed a live connection to your search engine.

Today, we have a dedicated Human Rating interface that lets you gather judgements from multiple human judges. It is integrated into your Case so you can update the Judgements based on changes in your search engine, and then easily feed those updated ratings back into your Cases.

2. New Judges Homepage for Quepid

With a proper Human Judging Interface, we realized that we have two main users of Quepid, the Judges (people who will just rate a search result) and the Relevance Engineers (people who may do some rating but will also try tweaking query structure to improve relevance). Quepid historically just dropped you right into your last Case, which was useful to Relevance Engineers, but was quite confusing to our Judges. We now have a basic homepage that lists out Cases & Judgement Books and shows some trend information. It also prompts you on what judging is needed to stay up to date and make sure you have a complete picture of relevance as time goes by.

3. Improved Export/Import between Quepid installations

Lots of people start out using the free hosted Quepid and then move to their own on-premise installation. Some also have development and production versions of Quepid. We now have proper export/import functions for a Case and a Judgement Book that allows you to migrate your data between different Quepid installations.

4. Interact with Quepid APIs from your own scripts

Often people have their own specific ways of working with Quepid that aren’t supported directly. Luckily you can interact with Quepid’s APIs to do what you need. For example, Jeremiah Via of the New York Times has a case where the queries need to expire on an almost daily basis, and Quepid doesn’t support that. With a script (and Quepid’s new Personal Access Token support) he can delete and add Queries to a Case programmatically and automatically. We also started documenting the APIs, though this is going to be a long term effort (and a great place to get involved – we need your help!)

5. Enhanced OpenSearch Support

AWS OpenSearch really took off this year, and so we went deeper on supporting it. We reworked how OpenSearch query templates are supported in Quepid to make it simpler to reference them.

6. Vector Search Support with Vectara

Speaking of search engines, 2023 was the year of Vector search, and we added our first Vector based search engine to Quepid, Vectara! This was exciting for two reasons: firstly, vector search is a super cool and interesting area to work in, and collaborating with the Vectara folks was a great experience, and secondly we finally broke out of the historical constraint that Quepid only worked with Lucene-based search engines. While conceptually we knew Quepid could be used for any search engine, we had never attempted to make the underlying library Splainer work with a non-Lucene engine, and it turned out to be “no big deal”. A few days of JavaScript coding and we had Vectara integration!

7. Static and Custom Search APIs for Any Search Engine

Once we added Vectara, we added two more search engines in short order, although the first is actually not a search engine! The “Static Search Engine” is a file of queries and document results that we wanted to evaluate using the Quepid tooling. We’ve had this request from a lot of Data Scientists who believe they have a better model for ranking/re-ranking and want to gather human judgements and compare it to an existing search engine. Now with the Static Search Engine you can upload that static file and interact with it as if there was a real search engine responding. This was exciting, and opened up the door to the number 1 request we have received for Quepid: Integrate your custom Search API with Quepid. This lets you hook up Quepid to any search engine that supports HTTP protocols and responds with JSON formatted responses. This has been game changing for evaluating search engines, and we’ve been actively using it to evaluate AI-powered retrieval augmented generation (RAG) for our clients.

8. Reusable Search Endpoints

As you can imagine, after adding a number of search engines, we had to go back and do some work on the plumbing of Quepid. We refactored how Quepid supports the data related to your search engine configuration and introduced a new concept of “Search Endpoint”. Search Endpoints are reusable across Cases by sharing them with your Team, and are tied to individual tries. So you can easily compare Dev to Prod, or search engine A to search engine B within a single case by just swapping the Search Endpoint.

9. Jaccard & RBO Metrics

Lastly, we’ve done a number of migration projects in 2023, so we added some new metrics, Jaccard and Rank Biased Overlap (RBO) – these are great for comparing the results from two different systems, preventing nasty surprises when your new search engine doesn’t quite measure up to the old one. A quick shoutout to Tito Sierra for introducing me to RBO at this year’s US Haystack conference and to Atita Arora for making it happen. These metrics are implemented as new Jupyter notebooks that come with Quepid and can be used to compare snapshots or even two separate Cases.

Future improvements – how you can help

Quepid is available as a free hosted service and is also an Apache 2 licensed open source project. We’d love your help improving Quepid as a tool for the search relevance community – you can let us know how you’re using Quepid (for example, we had a great conversation with a UK-based recruitment company recently), let us know about features or bugs, or get involved by contributing to the project. Come and join the discussion in Relevance Slack in the #quepid channel – you can meet other Quepid users, get help or tell us your story!

If you need assistance improving search quality with Quepid, get in touch.