Splainer: The Elasticsearch Relevance Sandbox That Tells You Why

May 3, 2016 Doug Turnbull
Category: Elasticsearch

Have you looked at a scoring explain out of Elasticsearch? Perhaps you’ve had a tricky relevance problem. You’ve needed to debug Elasticsearch’s scoring.

Unfortunately the relevance scoring for Elasticsearch is a beast. Here’s a simple match query, and its corresponding explain

{  "query": {    "match": {      "title": {        "query": "star trek"      }    }  },  "explain": true}

corresponding relevance explain:

{	"value": 4.369631,	"description": "sum of:",	"details": [{		"value": 1.9833124,		"description": "weight(title:star in 385) [PerFieldSimilarity], result of:",		"details": [{			"value": 1.9833124,			"description": "score(doc=385,freq=1.0), product of:",			"details": [{				"value": 0.6737103,				"description": "queryWeight, product of:",				"details": [{					"value": 5.8877306,					"description": "idf(docFreq=22, maxDocs=3051)",					"details": []				}, {					"value": 0.11442614,					"description": "queryNorm",					"details": []				}]			}, {				"value": 2.9438653,				"description": "fieldWeight in 385, product of:",				"details": [{					"value": 1,					"description": "tf(freq=1.0), with freq of:",					"details": [{						"value": 1,						"description": "termFreq=1.0",						"details": []					}]				}, {					"value": 5.8877306,					"description": "idf(docFreq=22, maxDocs=3051)",					"details": []				}, {					"value": 0.5,					"description": "fieldNorm(doc=385)",					"details": []				}]			}]		}]	}, {		"value": 2.3863184,		"description": "weight(title:trek in 385) [PerFieldSimilarity], result of:",		"details": [{			"value": 2.3863184,			"description": "score(doc=385,freq=1.0), product of:",			"details": [{				"value": 0.73899555,				"description": "queryWeight, product of:",				"details": [{					"value": 6.4582753,					"description": "idf(docFreq=12, maxDocs=3051)",					"details": []				}, {					"value": 0.11442614,					"description": "queryNorm",					"details": []				}]			}, {				"value": 3.2291377,				"description": "fieldWeight in 385, product of:",				"details": [{					"value": 1,					"description": "tf(freq=1.0), with freq of:",					"details": [{						"value": 1,						"description": "termFreq=1.0",						"details": []					}]				}, {					"value": 6.4582753,					"description": "idf(docFreq=12, maxDocs=3051)",					"details": []				}, {					"value": 0.5,					"description": "fieldNorm(doc=385)",					"details": []				}]			}]		}]	}]}

Wow! Are your eyes bleeding yet? Imagine debugging non-trivial queries, like function_score queries, nested boolean queries, or multi_match?

Now I’m being a little hard on Elasticsearch. Under the hood is an extremely sophisticated search engine, with every little function pluggable. Because every little bit can change, you need this kind of deep visibility into the search engine’s behavior.

But you certainly don’t need deep visibility for day-to-day work! Splainer helps you see the forest for the trees. It starts with a top-down view of relevance explain. Showing you the strength of each match, and letting you drill down into the explain with a simpler, human-readable breakdown of what’s happening.

As an example, below we’ve set up Splainer with the same query as above, only this time you get a very top-down view of exactly how relevance is working:

alt text

Above, notice the match explanation on the left and the documents on the right. The matches give you a very high level sense of what factors are determining each document’s relevance. You see the magnitude of each relevance factor. Need to drill a little deeper into the ranking math? Click “detailed,” to get a human-readable description of the explain:

alt text

And when you need to really dig into what’s happening, you can see use the Full Explain tab to get back Elasticsearch’s JSON explain.

You’ll notice Splainer gives you a JSON editor to tweak your Elasticsearch query to analyze the impact. In this way, it acts as a sandbox. A convenient place to dork around with different ideas. You shouldn’t feel terrified of Elasticsearch’s query DSL! Embrace it! Play with it! See what happens!

(Puts on Billy Mays Mask) and that’s not all! One of the best features of Splainer is its ability to share URLs. For example, you can see the example above by going here. This is actually the killer feature of Splainer. With URLs you can share your work with colleagues, saying “hey buddy, what do you think of these results?”

Using Splainer

Using Splainer is as easy as following along in this GIF. Simply go to http://splainer.io and enter your Elasticsearch search URL and Query DSL query. Then hit “splain this.” Click “Tweak” to bring out the Query DSL editor.

Splainer walkthrough

Enabling CORs

Currently you need CORS enabled in elasticsearch.yml to chat with Splainer. Here’s the snippet we use in elasticsearch.yml

http.cors.allow-origin: "/https?:\\/\\/(.*?\\.)?(quepid\\.com|splainer\\.io)/"http.cors.enabled: true

In the future, we may release this as an Elasticsearch site plugin or Chrome extension to avoid the CORS annoyance. Stay tuned!

Relevance: you can do this!

Splainer has been a game changer for our relevance practice. As I write in Relevant Search, instead of Relevance being mystical, it should be a transparent and reliable engineering practice. It should be accessible to everyone. How else can the entire team make investments in relevance? Splainer helps achieve this by boiling down the Elasticsearch explain to something more easily understood by engineers, not the purview of search engine mystics.

All of these features are available in our product Quepid – our test-driven relevance toolbench. Quepid stores a set of keyword searches and analyzes your search solution’s correctness against validation criteria. Did fixing the “sandals” query break the “dress shoes” query? Quepid tells you, preventing you from shipping search algorithm changes that damage your bottom line.

But back to Splainer. Splainer for Elasticsearch is in beta. So while I’ve been using it for my own work, I’m sure you’ll find a few bugs. Splainer is open source. So your bugs and contributions are welcome. If you’d like to chat Splainer, Quepid, or our Solr/Elasticsearch relevance consulting, please get in touch!