Greetings! I was very fortunate to attend OpenSource Connections (OSC) Haystack Europe 2019 conference in Berlin, Germany, on October 28th, and below are my notes from the conference. Thank you again to OSC and my colleagues for sending me over the Atlantic to attend a great conference on search relevancy.
First, let me recognize that our event partners Plain Schwarz had secured us a bright and comfortable room in ExRotaprint, a converted print works now owned and run by a collective. Lunch was served by the ExRotaprint cafe and was healthy and very tasty.
I was honored to give the conference’s keynote address. In the keynote, I wished Lucene its 20th birthday, and showed my deep appreciation of the open source search technologies that we use and enjoy every day. I also looked back at six decades of Information Retrieval (IR) research & development to retrospect on where we, search practitioners, came from; what the major research milestones have been during the long journey covered thus far; and, where we are today with state-of-the-art IR. I finished the keynote by bringing attention to the latest and renewed interest in thinking of search as a vector distance (similarity) calculation in a dense vector space, which provided a segue into the day’s extremely high-quality talks on advanced search topics such as automatic filters, AI-powered search, knowledge graphs, Markov chain-based query rewriting, relevant facets, reinforced learning, and multi-objective Learning-to-Rank (LTR).
All the talks were of high quality, and delivered by experienced and passionate search practitioners. The talks covered topics that are important to search professionals who try to improve search relevancy on a daily basis, from user intent detection, to the leveraging of knowledge graphs, to query reformulation, to relevant facets, and advanced learning-to-rank techniques.
My great thanks to all the speakers for sharing their experiences, insights, and visions. Below, I provide a quick summary of what impressed upon me at each talk.
See the conference’s program here. Click on any talk on the program page for the talk’s abstract as well as links to the talk’s slides and recorded presentation.
Improving precision of e-commerce search results to generate value for customers and business
Jens Kürsten and Arne Vogt from Otto addressed the classic issues and challenges with high-recall queries containing a lot of “noise”, and with balancing the end-users’ and the business relevancy requirements. Jens and Arne presented their hypotheses that two issues turn end-users away: a) poor relevancy (“Effectiveness” issue); and b) the amount of work necessary to reach relevancy (“Efficiency” issue); and showed that boosting on product category helped alleviate both issues. Jens and Arne shared their KPIs, and both offline- and online-testing methodologies to reach that conclusion.
Balancing the Dimensions of User Intent
Trey Grainger from LucidWorks treated us to his invigorating vision of AI-powered search, where the confluence of Search, AI, Machine Learning, Deep Learning, and Data Science in general provides fertile grounds for solutions to help with many search problems. In his talk, Trey focused on user’s intent understanding, the holy grail of search practitioners! Trey broke down user’s intent understanding into: a) Content Understanding; b) Domain Understanding; and c) User Understanding, on a scale of increasing sophistication in the techniques used to infer understanding. Content Understanding is basically the “dumb” matching documents based on the query terms. Domain Understanding can be inferred using a spectrum of techniques from synonyms, to taxonomies and hierarchies, to ontologies, and finally to knowledge graphs. And finally, User Understanding can be inferred using various Collaborative Filtering techniques leveraging user-item, item-item, and query-item relationships gathered from usage. I bought Trey’s AI-Powered Search “MEAP” book, and I look forward to enjoying Trey’s visions of our future.
Query Intelligence: Understanding User Intent
Erica Lesyshyn from EBSCO Health delivered a convincing presentation on the use of knowledge graphs for content enrichment as well as query segmentation and re-writing. Erica shared the pros and cons of her approaches, and in particular noted that although knowledge graph-enhanced queries can be effective at correctly inferring the end-user’s intent, they can also hurt precision with the increased recall. Ha, the joy of the Precision/Recall yin yang!
Search to Search recommendations (Collaborative Synonym and Spell corrections)
Sadat Anwar and Matthieu Pons from Rebuy showed an innovative technique for query expansions with synonyms and spelling corrections using Markov chains. After trying somewhat unsuccessful text-to-numbers embeddings techniques such as word2vec and prod3vec (both with the Python library gensim), Sadat and Matthieu experimented with some success with Markov-chains based models to generate synonyms. The technique is interesting in that it shows us that despite the buzz and the hype around neural network-based applications in search, sometimes a good old probabilistic model such as Markov chains can provide effective mechanisms for traditional search issues such as synonyms generation.
Lucian Precup and Radu Pop from Adelean presented an often overlooked but critical aspect of the search results: How relevant are the facets? And how can we make them more relevant? Lucian and Radu explained with deep insights how relevant facets can provide a “holistic” view of the search results, as well as help with disambiguation of query terms, and overall, provide an effective support to find the “needle in the haystack”.
Ranking article comments using reinforcement learning
Lester Solbakken from Verizon Media gave us, Lucene aficionados, another chance to have another look at the still mysterious, just open-sourced, and exciting search engine Vespa. Using Vespa’s native support for Machine Learning-based ranking models (aka, “tensors” in Vespa lingua), Lester shared with us how to relevancy-rank the numerous comments associated with News articles in Yahoo using reinforcement learning techniques.
How to Kill Two Birds with One Stone: Learning to Rank with Multiple Objectives
Alexey Kurennoy from Zalando shared his research on the training and execution of a learning-to-rank model based on multi-objectives optimization. As Jens and Arne also told us earlier in the morning, it is often the case in e-commerce applications that relevancy is based on competing requirements between the end-users’ information needs and the business’ numerous and complex objectives (financial, inventory management, marketing, etc.) Alexey addresses this multi-faceted relevancy problem as a multi-objective optimization problem, and accordingly, trains a Learning-to-Rank model using an “objective function” as the model’s loss function.
After the conference a large number of the attendees visited Café Pförtner, a nearby bar and restaurant for further networking over dinner and drinks. The café is set in a former bus garage and you can even sit down to eat in a converted bus!
As I mentioned in the keynote address, I marvel as the levels of creativity and ingenuity that search practitioners exhibit in their relentless quest to bridge the gap between end-users’ information needs and the search results. I am so excited to have joined the search field a few years ago, and I look forward to continue helping push the envelope of the possible.
The Haystack conference will return - in the USA and Europe - in 2020! Contact us if you’d like to know how OSC can help you build relevant search.