The Annual Search Shootout – how we’re competing at TREC to build better news recommendation

What exactly is TREC?

If you’re involved in search, you have likely heard about TREC. TREC is short for the Text REtrieval Conference. This annual conference has been running since since 1992. It’s mission is to “foster the research community in information retrieval (IR) and accelerate the transition of search technology from academia to industry”. That tagline sounds good, but what really makes TREC great is that it is formatted as a workshop. Participating in TREC means getting your hands dirty tuning search engines—in fact you are required to participate and publish your work if you attend!

Model real-world search problems

Because search applications are diverse and always evolving, TREC is flexibly organized around specific IR tasks. These tasks are called tracks, and individual tracks can come and go, but each one has its own objective, example data and grading rubric. In 2019, there were eight tracks, and it’s common for groups to compete in multiple tracks. But I wanted to start small, so I only entered a single track: the News track. These tracks are usually done in collaboration with an industry partner and are active areas of search research.

News recommendation systems

The News track is relatively new, it debuted in 2018 and is supported with data from the Washington Post. The track is composed of two separate tasks:

  1. Background linking — Given a source article, identify other articles with background information useful for understanding the source article
  2. ‘Wikification’ of article entities — Given a source article, identify and rank the people, places and things, that are most relevant to the article and also have an entry in Wikipedia.

Both of these tasks are aimed at finding new tools to help readers get an unbiased context surrounding a given article. Pew Research conducted a study in 2018 and found 34% of Americans get their news primarily online (up from 28% in 2016) and 77% percent say the internet is important to them for getting the news. While we are becoming more reliant on the internet for news, online news publication is exploding beyond traditional news agencies. In this sea of news, there is a new need for IR systems to give readers the context around a given piece. By giving readers a fuller picture of complex issues, these systems will ultimately lead to more informed decisions.

Conference agenda

The first phase of each conference is the search retrieval experimentation. Once you’re registered for a track, you are given access to the example data and the topics from past years. Topics are the grading rubric for comparing retrieval systems. The News track topics are a the reference document for background recommendation. Given a topic, the researcher is expected to retrieve the top-100 results (relevant context articles) from their search engine. This is called a run. Researchers are allowed to submit up to five runs to hedge their bets. At the time of run submission the researchers only have the unassessed topics for the upcoming conference, the actual relevance assessments are yet to be collected. So they must make their best guess based on the prior years of data.

After all of the runs are collected then the assessment happens. The resulting documents are pooled across all teams and each document in the pool gets a relevance assessment (judgment) by a trained assessor (judge) for that topic in particular. Each year this collection of assessments expands, giving researchers a better way to benchmark their search systems. This is why TREC competitions are so valuable to the search community as they are actively generating training data for specific search domains.

Finally a comparison of the different runs is done by computing the nDCG@5 score for each topic and averaging across all of the new topics for that year. The conference is then held as a forum to share results and ideas between groups. 


Besides being a place for cross pollination of search ideas, TREC is also an excellent simulation of the problems real search teams face in production. Given a small set of assessments the search team must make its best guess of search performance across all of the search queries their system must respond to. At OSC we are always preaching about measuring search well and adopting a Hypothesis Driven development approach to search engineering. Participating in TREC fits perfectly with those beliefs and gives us a great venue to test our methodology.

This post is the first post in a blog series about TREC. Next up is a high-level look at our strategy. Then some technical posts about implementing specific features (NER fields, Sentence Embeddings, and genetic algorithms for parameter tuning). So while I’m nervously awaiting the assessments for this year’s topics, I hope you will follow along with this series. I look forward to sharing my experience and learning with the search community at largeand as ever do get in touch if we can help you with any of your search projects.