Pete has just been hired as Product Manager for Search at electronics retailer Chorus Electronics. His boss, a Vice President, has asked him to build a ‘best-in-class’ e-commerce search engine, to ‘increase customer satisfaction and drive improved revenue’.
Pete’s colleague Manny has spent the week manually rating the results from a few search queries and this has given Chorus Electronics a great set of data to baseline their search relevance – but at busy times the Chorus Electronics website gets thousands of search queries every minute! Pete is now worried that they aren’t testing enough queries to get a proper, representative sample of how users are searching their product data. He needs to scale up search result rating!
However, Pete’s in-house team doesn’t have the capacity to test more than this small number of queries on a regular basis. He’s now looking for alternatives – perhaps he can outsource the work? However, he’s concerned about the quality of the ratings generated. Scaling up search result rating, while maintaining this quality, is going to be a challenge. Who can help and how can he trust the results?
Options to scale up search result rating
“I’m Greg from Supahands, and we’re a fully-managed data labeling solution that helps organizations scale their business by providing quality training data for their Machine Learning and AI. OSC asked us to write about the options available to Pete for scaling up how search results are rated”.
Pete’s concerns when it comes to finding ways to increase the number of queries that are being processed in a short period of time without compromising the quality of the ratings are very common, and we’re happy to share some insights as to the solutions available to overcome these challenges.
Crowdsourcing
The crowdsourcing model would allow Pete to save processing time and costs by outsourcing search query rating work to non-managed individuals. It’s easy to scale up operations with this method as it allows for many contributors to be recruited in a short period of time — eliminating traditional barriers to talent sourcing in addition to being relatively cost-effective with workers being paid on a per-task basis.
With all that in mind, if Pete’s biggest concern is quality, this might not be the ideal solution. As the external workforce in this model is non-managed, it might hamper quality, compliance, and overall visibility into the rating process. Crowdsourced teams are also not very agile and Pete might have to start from scratch again in the training process if he identifies some new challenges and best practices for the query ratings along the way.
Supporting tools
To aid Pete’s team in processing the large number of queries they are receiving, another option would be to invest in a tool that can help expedite the rating process. Many such ‘data labeling’ tools are available for purchase, or the team could even create their own tool – this of course is more of a long-term investment, and would only be a viable option if the process of rating these search queries would remain the same indefinitely.
The biggest benefit from introducing a labeling tool to help Pete’s team rate the queries is that queries can be processed faster by his own subject-matter experts, maintaining the quality of the ratings while the output is sped up. On the flip side, the biggest hurdle to building or implementing a new tool will be the time it takes to get started on the rating work – sourcing or building the right tool is not as easy as it might seem, and every tool is different and ideally needs to be customizable to suit the project needs. Additionally, this is a very investment-heavy way to solve Pete’s problem as the costs required when building or buying a tool will have to be covered upfront.
Engaging a fully-managed service to scale up search result rating
This is a tried-and-tested method. So much so, that a study released at the 2019 Open Data Science Conference (ODSC) in Boston demonstrated that managed teams outperformed crowdsourced workers on accuracy and overall cost on a series of the same data labeling tasks.
This is because fully-managed services typically provide an end-to-end solution, this means taking the time to understand the project requirements, recruiting the right workforce, and supplying the right tools to do the job. Additionally, if Pete hires a great fully-managed service, they will ensure that the output and process produces not just quality and accurate data sets, but also aligns with any regulatory requirements or special needs the client might have.
Fully managed teams can create a happy medium compared to crowdsourcing and in-house teams, but may require clients to commit to a certain volume of data or an upfront fee. Additionally, ending up with the wrong partner may cost you more time, money and resources – so make sure to choose your partner wisely.
To help you understand the range of processes and quality assurances a fully-managed service covers, we have put together a quick video on how we rate search queries for our partners here at Supahands and how we generated a larger set of relevance ratings for the Chorus project:
The Chorus project includes some public datasets. These datasets let the community learn, experiment, and collaborate in a safe manner and are a key part of demonstrating how to build measurable and tunable ecommerce search with open source components. The ratings data (a.k.a. explicit judgements) allows you to measure the impact of your changes to relevance. We are profoundly grateful to the team at Supahands for voluntarily generating multiple ratings for the set of 125 representative ecommerce queries and for sharing that data with the Chorus community.
Contact us if you need our help with measuring and tuning your e-commerce search.
Chorus is a joint initiative by Eric Pugh, Johannes Peter, Paul M. Bartusch and René Kriegler.