Wallapop - OpenSource Connections multi-objective optimisation

Wallapop’s journey to improved search relevance for an intermediated marketplace

Wallapop is the leading platform in conscious and human consumption. Founded in Barcelona in 2013 and present in Spain, Italy and Portugal, Wallapop aspires to create a unique inventory ecosystem of reused products that facilitates a more humane and sustainable consumption model. The platform connects the 15 million users who visit it each month, who collectively create around 100 million listings annually. In Wallapop it is possible to buy and sell products from all categories of consumer goods easily, quickly and safely and, in addition, it is a benchmark in categories such as motors, in which it is the leader among private users in Spain.

Key features of the platform, such as a chat option and a focus on finding items in a nearby place, help Wallapop’s vibrant user community buy and sell goods easily. The use of the platform comes free of charge to Wallapop’s users. The company makes revenue from providing optional services such as paid ad placements, shipping services and options for professional sellers (subscriptions, bumps etc.).

Building a search engine (in this case using Apache Solr) for a unique goods marketplace has some very specific challenges, sometimes very different from more conventional e-commerce search. The catalogue is very volatile (new listings are uploaded and others deleted every few minutes) and each item is unique. Also, as we will discuss more deeply later on, search is very dependent on the location of the buyer and the seller (searching 5km away can lead to completely different results). Finally, as the content is user-generated, Wallapop’s team have to cope with high variability in quality and consistency of the catalogue.

Since its launch in 2013, Wallapop’s business has evolved from a very local marketplace to a nationwide unique goods platform. This evolution has highly impacted Search Relevance as well, from retrieving only very local items to a more complex search, involving many relevance factors and having to deal with what is usually known as multi-objective optimisation: the optimisation of search quality for two or more different objectives. Wallapop tries very hard to bring back the most relevant results to the searcher at the top of the search result list, but also includes listings with promoted visibility.

“Solving this hard problem usually requires a lot of clarity about the objectives of search optimization. The team also needs advanced capabilities in terms of understanding search relevance, measurability and experimentation.”
René Kriegler, OSC’s Director of E-commerce Search

Step 1: Setting the direction for multi-objective optimisation with OSC’s Proven Process

Over a period of nine months, Wallapop and the dedicated e-commerce team at OpenSource Connections (OSC) worked together to provide the Wallapop team with the capabilities and the maturity to tackle difficult search-related challenges. The collaboration started in 2021 when some Wallapop team members attended OSC’s Think Like a Relevance Engineer training. The Wallapop search team had already established a vision of what they wanted to achieve:

“Improve search results for users by satisfying their information needs in the context of a particular user experience, while balancing how ranking impacts our business’s needs.”
Wallapop internal document

This vision clearly mentions two objectives for search optimisation. One relates to user needs, the other to business needs – but how would the team achieve both of them?

In order to become prepared to tackle such a complex goal, the Wallapop search team would follow OSC’s systematic Proven Process, to build up the means and the methods to cope with difficult search-related challenges on their own in the future.

Using OSC's Proven Process for multi-objective optimisation

The process starts off with a Discovery, which assesses how mature a team is with respect to managing and improving search. OSC leads a week of intensive conversations with the search team to assess several maturity areas – from how the search team is being held accountable to the business, to capabilities in understanding user needs and in experimentation, to search technology and aspects of the user interface.

Besides providing an assessment, an OSC Discovery results in recommendations of measures that should be taken to improve search, including any potential adjustments to structure and processes within the organisation. In the case of Wallapop, the Discovery revealed that it would be premature to try to tackle multi-objective optimisation straight away. Instead, the focus was initially set on improving the team’s capabilities as a prerequisite:

to define and measure search-related business KPIs and user satisfaction metrics
to run experiments more easily and quickly, both online and offline,
to change the search product management process to use longer term goals and to pursue these with more focus.

Step 2: Better experimentation and optimisation of the base algorithm

The cadence of search improvements that a team can find and implement heavily depends on their experimentation capabilities.

“Search is not a solved problem, thus the more hypotheses a team can test in experiments and the more their company culture allows them to fail and iterate on experiments, the more likely it becomes that they will achieve an improvement of their search quality.”
René Kriegler, Director of E-commerce Search, OSC

Wallapop’s company culture has always welcomed experimentation, but there were opportunities to give it a more relevant role within search development.

Wallapop and OSC thus decided to work together on improving the experimentation capabilities of Wallapop’s search team. As a case for optimisation, they chose to verify and improve the base textual matching algorithm: finding the best Solr search query structure, using optimal field weights and the best text analysis. This focus on the base algorithm should improve search relevance to the benefit of both the user and the business. Harnessed with the new experimentation mindset and starting from a verifiably optimal base retrieval algorithm, the team were now much better prepared to approach multi-objective optimisation.

Introducing offline experimentation

Every attempt to improve search quality needs to be validated in an online experiment, usually by an A/B or multivariate test that tells whether the new variant improves metrics that were chosen to indicate search quality for the given search application. On the other hand, it is normally not feasible to try out each potential improvement by using online testing. For example, the Wallapop team wanted to find the optimal field weights for their search queries. If they wanted to try out just ten different field weights for three query fields, they would already need one thousand experiments to identify the best set of field weights.

A common practice to overcome this problem is to use offline experimentation. The idea is to evaluate many candidate improvements quickly in a non-production environment and only pick the top candidates for verification in a longer running online experiment.

Offline evaluation for search usually depends on a set of data that is labelled for search result quality, metrics that can be calculated based on the labelled data and that serve as a proxy for the online KPIs, and finally on a technical environment and on suitable processes to organise and run the experiments.

*from “Search Quality – A Business-Friendly Perspective”, presented at Haystack US 2018 by Peter Fr*ies

Offline experimentation at Wallapop

At Wallapop, the following measures were critical for the successful introduction of offline experimentation:

Compilation of a search catalogue: a list of around 100 queries that were assumed to be representative of Wallapop’s search traffic. For each query, the query intent was formulated explicitly in cooperation with a UX researcher and with the help of data analysts. The objective was to have a clear understanding what results should be accepted for each of these queries. Queries for which the query intent could not be established clearly were discarded.
Setup of a dedicated experimentation environment. This environment consisted of a stable search index and dataset, and of Quepid, a search relevance workbench for quick search result labelling, calculation of offline metrics and for easy search query manipulation and debugging. This environment helped to quickly try out hypotheses and solutions.
Working with Supa, a partner of OSC who provides a service as an agency to manually label search results at scale.
Introduction of an experiment backlog and a process to generate hypotheses and to track experiment results.

Results

Better search results through improved base algorithm

The team ran almost 20 experiments in 4 months, some of them trying out hundreds of thousands of configurations. The following two changes to the base retrieval algorithm were considered the best candidates to improve search result quality:

Introduction of stemming. A custom lightweight stemmer for the Spanish language that would only equal out differences in singular vs plural between the queries and the index, but not deal with further word forms, yielded the best results in offline experimentation
Optimal field weights. The team tried the best candidate based on offline metrics that were calculated over the labelled search results.

The two candidate improvements were validated in an A/B test. They resulted in an increase in search conversions of about 5% and significant business impact.

“Offline experimentation had a huge impact on our experimentation capacity. We managed to have more business impact, but also to better understand how our search actually works for our end users.“
Joan Tapia, Search Engineering Manager, Wallapop

Empowerment & Team Growth

Wallapop’s team had become convinced of the benefits of using offline experimentation as an approach to improving search quality, not just because of the success of the online test but also because of the very structured approach to generating hypotheses and to validating and following up on them. When they later implemented a new search application for Italy, they decided to set up an offline experimentation environment early in the process. They also decided to set it up and run experiments without OSC’s help as a proof that they had fully adopted the method.

Another cultural change became visible related to connecting with the open source search community. With guidance from OSC, Wallapop’s search team contributed their stemmer implementation to the Lucene search library (which powers the Solr search engine) where it has now become the default lightweight Spanish stemmer. The team also presented it in a lightning talk at the MICES 2021 conference.

*Xavier Sanchez of Wallapop presents at MICES 2021*

Being part of the wider search community brings multiple benefits including knowledge sharing and meeting others facing similar challenges and the Wallapop team have made time to attend subsequent events including MICES 2022 and Berlin Buzzwords 2022.

“It’s wonderful to see our friends at Wallapop present their achievements and contributions with pride to the rest of the search community – it also reinforces our belief in the open source model as the best way to build powerful and accurate search applications“
Charlie Hull, Managing Consultant, OSC

“We are really grateful that OSC encouraged us to open source our Spanish stemmer. It has been really gratifying to contribute back to the Lucene community as well as presenting this work to MICES 2021.“
Xavi Sanchez, Search Engineer, Wallapop

Step 3: Transaction distance and multi-objective optimisation

The Wallapop team had started some UX research to address the question of how the geographical distance between the seller of an item and a potential buyer influences the buyer’s purchase decision. The research showed that whether distance matters is a property of the type of item to be purchased. For example, before the introduction of a bulky items shipping service, some items could not easily be shipped. However it is also a question of the user’s intent: some items might be shippable but users still prefer to buy them locally, for example some users would prefer to physically inspect a collector’s item condition before making a buying decision.

While Wallapop was historically a very local marketplace where buyers and sellers had to meet physically to complete the transactions, Wallapop’s business model is shifting towards intermediated transactions, especially via shipping. This has several advantages in terms of scale and lower friction for the users, making it convenient to prioritize this trading model.

Distance sensitivity, one factor in multi-objective optimisation

Following Wallapop’s more focussed approach to search product development, the qualitative UX research about transaction distance was carried out in the background while the majority of the team implemented better experimentation capabilities and worked on improving the base retrieval algorithm in the first step.

In the next step, their data scientists worked together with OSC on a model to identify ‘distance-sensitive queries’ – queries for which users prefer results that come from sellers that are located nearby. In this classification model, distance-sensitive queries were derived from tracked user behaviour, combining NLP techniques to identify query structures with a Bayesian hierarchical approach to deal with low-traffic queries.

First experiment: encouraging results

The long-term objective was to create a generalised model that would make a prediction at query time whether a query is sensitive to distance or not and use this information for a better ranking of search results. On the other hand, and following the new ‘experimentation first’ mindset, the hypothesis that some queries are distance-sensitive, while others are not, could be validated online more quickly: The team pre-calculated a fixed list of 10,000 queries that were thought to be distance-sensitive based on past user behaviour and the classification model instead of implementing a model that would cover all queries at runtime. Depending on whether a query is on the list or not, nearby search results would be promoted or not.

The team performed an A/B test in which this treatment was compared with the baseline variant that gave high emphasis to semantic relevance. The results clearly approved the hypothesis that queries differ in how sensitive they are to the distance between the buyer and the seller: metrics that indicated a business between buyers and sellers – such as sales transactions that were reported back to the platform and chats requests between buyers and sellers – went up considerably. On the other hand, the sales of shipping services for items – essential to Wallapop’s business – went down. Maximizing one objective can lead to a negative impact in another objective.

An improved model for multi-objective optimisation

The team now had a clearer understanding of both the user needs and the business objectives for which they need to optimise their application. But how could this be tackled in the specific case of distance sensitivity? The team realized that by adding other factors, such as item shippability, they could actually further improve the model and generate a positive impact in both objectives! The details of this work were presented by Xavier Sanchez in his talk Understanding Distance: Moving from Local Search to Nation-Wide Search at MICES EU 2022. When faced with multi-objective optimisation problems, there are not only bad trade-offs, it is also important to have the capacity of challenging the initial assumptions in order to discover possible win-win solutions.

The A/B test for the new algorithm improved all critical metrics for both Wallapop’s users and for their business objectives. The algorithm was finally released to production.

“This work on distance sensitivity has been critical not only for the search team but also for business stakeholders as it made explicit how critical search is for our users. It also forced the team to challenge the assumptions of the initial solution to find an even better model that would eventually improve all metrics!”
Carolina Costanzo, Search Product Manager, Wallapop

Conclusion & Next Steps

Guided by OSC’s Proven Process, Wallapop’s search team developed their capabilities and their maturity to tackle complex challenges such as optimising search for potentially conflicting objectives. Measurability and experimentation stand at the center of their improved approach and this can now be applied to future search challenges.

“OSC’s contribution to Wallapop search has been a game changer for us. We are really glad to have met such unique search experts and professionals. While Marketplace Search was something OSC had no explicit experience of, they adapted their Proven Process to our unique challenges and it resulted in a great impact, a company-wide cultural change on experimentation and a skilled-up, more autonomous team to face new upcoming challenges. We would like to thank all OSC collaborators for their kind guidance and support and we’re looking forward to the next phase of this collaboration.”
Julien Meynet, Principal Search Relevance Engineer, Wallapop

The two companies continued their collaboration beyond the first nine-month period. In the next phase, they focused on collecting search quality signals from tracked user behaviour in preparation for introducing machine learning for ranking (Learning to Rank).

If you want to empower your search team to build more relevant search to drive conversions and delight your users contact us.