Test Driving Elasticsearch Learning to Rank with a Linear Model

Last time, I created a simple linear model using three ranking signals. Using TMDB movie data, I came up with a naive model that computed a relevance score from a title field’s TF*IDF score, an overview field’s TF*IDF score, and the user rating of the movie. Our model learned the weight to apply to each score when summing – similar to the boosts you apply when doing manual relevance tuning.

What I didn’t tell you was I was using Elasticsearch to compute those ranking signals. This blog post I want to take the simple model we derived and make it usable via the Elasticsearch Learning to Rank plugin. In a future blog post, we’ll load the same model into Solr. This will give you a chance to see a very simple 101 model in action with the two search engine’s learning to rank plugins.

The ranking signals…

If you recall, learning to rank learns a ranking function as a function of ranking-time signals. Classically these are referred to as “features” when discussing machine learning models. But I like to use signals to denote that they’re signaling us something about the relationship between a query and document. Plus, selfishly, it’s what we call ranking-time information in Relevant Search to differentiate between the features that exist purely on content or derived from queries.

In this blog post, we’ll use the Python Elasticsearch client library, but mostly I’ll just be showing off the basic queries I use to derive the signals. I’ve already loaded the TMDB movie data locally, if you’d like to have this data at your disposal follow the directions in the Learning to Rank demo README

Onto the action. Below, you’ll see our three queries we use to generate the signal values: titleSearch, overviewSearch, and ratingSearch. The first two are straight-forward match queries. The latter is a function score query that just returns a movie’s rating which has no relationship to the search keywords.

from elasticsearch import Elasticsearchkeywords="rambo"titleSearch = {    "query": {        "match": {            "title": keywords        }    }}overviewSearch = {    "query": {        "match": {            "overview": keywords        }    }}ratingSearch = {    "query": {        "function_score": {                    "functions": [                {"field_value_factor": {                    "field": "vote_average",                    "missing": -1                   }}            ]        }    }}es = Elasticsearch()'tmdb', doc_type='movie', body=titleSearch)'tmdb', doc_type='movie', body=overviewSearch)'tmdb', doc_type='movie', body=ratingSearch)

If you recall, these three features were gathered for a set of judgments. Judgments let us know how relevant a document is for a query. So Rambo is a “4” (exact match) for the keyword search “rambo.” Conversely “Rocky and Bullwininkle” is a 0 (not at all relevant) for a “Rambo” query. With enough judgments, we logged the relevance scores of the above queries for the documents that were judged. This gave us a training set that looked like:

titleScore,overviewScore,movieRating,comment4,12.28,9.82,6.40,# 7555	rambo@Rambo0,0.00,10.76,7.10,# 1368	rambo@First Blood

In that blog post, we used sk-learn to run linear regression to learn which signals best predicted the resulting relevance grade. We came up with a model with a weight for each and a y-intercept. This model was:

coefs = [ 0.04999419,  0.22958357,  0.00573909] # each signals weightyIntercept = 0.97040804634516986

Uploading our Linear model to Elasticsearch

The Elasticsearch learning to rank plugin uses a scripting format known as ranklib to encode models. Following the documentation for the ranklib scripting language we know we can encode a linear model that looks like:

## Linear Regression0: 1: 2: 3: ...

So in Python code, we can format our model above in that format:

linearModel = """ ## Linear Regression0:0.97040804634516986 1:0.04999419 2:0.229585357 3:0.00573909"""

Following the documentation for the Learning to Rank Plugin we can upload this model as a ranklib script, and give it a name.

es.put_script(lang='ranklib', id='our_silly_model', body={'script': linearModel})
{'acknowledged': True}

Elasticsearch has acknowledged our upload. Great! Now we should be able to execute a simple query!

How do we construct a query that uses the model?

You almost always want to run a learning to rank model in a rescore query, but the TMDB data set isn’t huge. We can use it directly with only a few hundred milliseconds to evaluate over the whole corpus. This is fun, because it let’s us informally evaluate how well our model is doing.

To query with the model, we create a function that runs the ltr query. Remember the model we built computes a relevance score for a document from three inputs that relate to the query and document:

  • The keyword’s title TF*IDF score
  • The keyword’s overview TF*IDF score
  • The movie’s rating

To compute the first two, we need to run the titleSearch and overviewSearch above for our current keywords. So we need to pass our model a version of these queries with the current keywords. That’s what happens first in the function below. We inject our keywords into the inputs that are query-dependent. Then we add our ratingSearch that’s only document dependent.

These three queries are scored per document are then fed into the linear model. Remember last time this model is simple: each coefficient is just a weight on each signal’s score. The model is simply a weighted sum of the scores of titleSearch, overviewSearch, and ratingSearch using coefficients as the weight!

def runLtrQuery(keywords):    # Plugin our keywords    titleSearch['query']['match']['title'] = keywords    overviewSearch['query']['match']['overview'] = keywords        # Format query        ltrQuery = {        "query": {            "ltr": {                "model": {                    "stored": "our_silly_model"                },                "features": [titleSearch['query'], overviewSearch['query'], ratingSearch['query']]            }        },        "size": 3    }        # Search and print results!    results ='tmdb', doc_type='movie', body=ltrQuery)    for result in results['hits']['hits']:        if 'title' in result['_source']:            print(result['_source']['title'])        else:            print("Movie %s (unknown title)" % result['_id'])

Taking the model on a test spin…

With this function in place, let’s run some searches! First a simple title search:

runLtrQuery('Forrest Gump')
Forrest GumpDead Men Don't Wear PlaidManiac Cop

Hey, not too shabby! How long will our luck go, let’s try another one:

runLtrQuery('Robin Hood')
Robin Hood: Men in TightsWelcome to Sherwood! The Story of 'The Adventures of Robin Hood'Robin Hood

Wow lucky again. It’s almost like these aren’t lucky guesses but rather a prescient author is selecting examples that they know will look good! Ok, now let’s try something closer to home: a Stallone Movie:

Rambo IIIRamboRambo: First Blood Part II

Err, not bad, but a bit off the mark… Well this is actually a case, like we talked about before, involving the nuance that the linear model can fail to capture. The linear model just knows “more title score is good!”. But in this case, First Blood should be closer to the top. It was the original Rambo movie! Moreover Rambo III shouldn’t really before just Rambo.

Still, not bad for 40 odd examples of training data!

Let’s try something where we don’t know the title. Like an example from Relevant Search basketball with cartoon aliens. Here we’re grasping at straws. Hoping “Space Jam”, a movie where Michael Jordan saves the world by playing aliens at basketball, comes up first:

runLtrQuery('basketball with cartoon aliens')
Aliens in the AtticAbove the RimMeet Dave

Sadly, Not even close! To be fair, we didn’t show our training process examples of this use case. Most of our examples were direct, known-title navigational searches. This just goes to show you how it’s important to get a broad set of representative samples across how your users search for learning to rank to work well.

It also continues to demonstrate how the linear model struggles with nuance. Other models like gradient boosting (ala LambdaMART) can grok nuance faster, and aren’t constrained to our boring linear definitions of functions. We’ll see how these models work in future blog posts.

Next up – Solr!

One of my colleagues will be taking Solr learning to rank out for a test spin for you. A simple linear model is very easy to understand, so it’s fun for these little test-spins.

I’d love to hear from you. If you’d like our help evaluating a learning to rank solution for your business, please get in touch! And I’m always eager for feedback on these posts. Please let me know if I can learn a few things from you!

This blog post was created with Jupyter Notebook: View the source!