I’ve already written about the first day of Haystack 2019, here’s a quick summary of the second day. We returned to the cinema, noted that Avengers Endgame was still playing in the next theatre (I believe everyone resisted temptation!), fought off various technical issues with the help of our committed AV team and proceeded with some excellent talks.
Jeremiah Via of the New York Times was the first presentation I attended. Jeremiah described how Elasticsearch is used to index 18 million items at the Times and how they developed both online and offline metrics to improve relevance. The Times’ index contains over 22 million unique tokens and nearly 2 million tags. He stressed the importance of being able to easily iterate through configuration changes – as he said “improving search is about making lots of little improvements”.
Next up was Tom Burgmans, describing how his team established a relevance focused culture at Wolters Kluwer. I particularly enjoyed seeing a screenshot of their advanced relevance testing tool which showed relevance judgements and also broke down the various contributions to relevance scores – I hope as he did that this tool eventually becomes open source. Wolters Kluwer have also developed a set of loosely coupled reusable search components which help to share knowledge and experience across the organisation. His last point was ‘don’t stop’ – relevance improvement is never finished!
My colleague Bertrand Rigaldies of OSC then talked about Solr query parsers (he noted that there are no less than 29 different query parsers supplied with Solr, including a good few I’d never heard of). He showed how to build a simple proximity query parser (to handle queries like “‘fish’ within 3 words of ‘chips’”) and stressed that although custom parsers can be very powerful, they are complex to write and one should try to use an out-of-the-box parser where possible.
Lunch followed, attendees again taking advantage of the various outlets in Charlottesville’s Downtown Mall.
John Berryman, one half of the team behind the Relevant Search book and now at Eventbrite, gave an engaging talk on automatic tagging using search logs and machine learning. His system creates a training set from user interactions (the events that users clicked after a particular query) then attempts to predict what tags to apply to other events – the tags being the search queries themselves.
The next session was a panel discussion on Does Learning to Rank Actually Work (my alternative title ‘Learning to Rank – or learning to tank?’ was sadly discarded 🙂 with René Kriegler, Doug Turnbull, Xun Wang (Snag) and Erik Bernhardson (Wikimedia). The audience provided some great questions for the panel.
I sadly missed most of Simon Hughes of DHI’s talk on Search with Vectors but what I did see was very interesting, including how he had built a special query parser for Lucene that stored vectors as payloads. Luckily there’s lots of detail in this Github repository.
The conference ended with thanks to all the speakers, organisers and most importantly the attendees – without whom Haystack would of course not be possible! Thanks to everyone who came and made it such a great event. Haystack will return!
If you’d like a richer description of the second day including some of the talks I missed please do read Jettro Coenradie’s blog. Alessandro Benedetti of Sease has also written about his experience of the event. You can also join many of the conference attendees in Relevance Slack – there’s a #haystack-conference channel.
You’ll be glad to know we will be releasing the slides for all the main talks and the Lightning Talks very soon, and unlike last year we managed to video all the sessions – so anything you (or I) missed (or simply didn’t understand well enough at the time) will be available to peruse at your leisure. UPDATE – The slides & video are now available here – click the ‘More’ link on each talk to see them.
If you missed out on Haystack there will be a European event on October 28th in Berlin (details to be confirmed) – and if you have questions about this or indeed anything search or relevance related, please do contact us.