Blog

# Estimating Ad Conversion Rates using Cassandra

Dont let anybody fool you; a good bit of what a data scientist does is a glorified form of counting. And if you had a few billion fingers, you yourself could do data science on one hand. In my recent work with Cassandra Im finding that Cassandra is quite good at counting. As a matter of fact, you can treat Cassandra as a giant, distributed, redundant, “infinitely” scalable counting framework. Thus, for analysis applications that rely heavily upon accumulating counts, Cassandra can be a useful tool. In this and the next blog post well take a look at a couple of big data analytics applications that rely heavily upon counting and we will demonstrate how Cassandra can be put to good use.

Lets say that you run an online ad company. When you started the company, your customers were happy enough to just have you host their advertising and handle the dirty work. But now, with ever increasing competition, you would provide your customers with feedback analytics so that you can retain your competitive edge. And the first thing that customers want to know is how their advertising is performing. The most straightforward metric for this: conversion rate. The conversion rate is portion of all ad clicks which lead to actual purchases. This number can be represented as a probability from 0 to 1 or as a percentage from 0 to 100%. And the easiest way to estimate the conversion rate, is to count! Basically, you need to keep track of how many times each ad has been clicked and how many times such a click leads to an eventual product purchase.

Heres how I would put it together in Cassandra. First we create a table to hold all of our counts:

CREATE TABLE conversion_rate (  company text,  ad text,  product text,  count counter,  PRIMARY KEY (company, ad, product))

UPDATE conversion_rate  SET count=count+1WHERE company='Apple'  AND ad='iPodAd'  AND product='NO_PRODUCT';

You might be wondering what product='NO_PRODUCT' is about. This is the way that we encode ad clicks, regardless of whether or not a purchase was eventually made. Of course, most of the time, a user who clicks an ad will look around for a bit and then leave. But in this case, lets assume this user goes on to buy an iPod:

UPDATE conversion_rate  SET count=count+1WHERE company='Apple'  AND ad='iPodAd'  AND product='iPod';

And so it goes; many different ads are shown to many different users. Some users click the ads and of those users, some go on to purchase products. Eventually, the marketing execs of Apple come to us to see how things are going with their new ad campaign. Specifically, they want to know which ads are performing best for their various products. To make this determination, all we have to do is make a single query:

SELECT ad,product,count  FROM conversion_rate  WHERE company = 'apple'

And the results will look something like this.

    ad    |    product    |  count----------+---------------+--------- iPadAd   |  iPad         |      210 iPadAd   |  iPhone       |        3 iPadAd   |  iPod         |      152 iPadAd   |  NO_PRODUCT   |    17533 iPhoneAd |  iPad         |        7 iPhoneAd |  iPhone       |     1409 iPhoneAd |  iPod         |        4 iPhoneAd |  NO_PRODUCT   |    28131 iPodAd   |  iPad         |        6 iPodAd   |  iPhone       |        2 iPodAd   |  iPod         |       15 iPodAd   |  NO_PRODUCT   |    11299

The results here do not directly represent the conversion rate, but if youre thinking ahead then I bet you can see that were close. Rather than giving the direct counts to the Apple, we first need to divide the count of ad clicks that lead to a purchase by the total number of ad clicks. So for instance, here we see that 17533 people clicked the iPadAd and this lead to 210 iPad purchases. This implies a conversion rate of 210/17533 or roughly 1.2%. If we assemble this into a nice table for the customer, it will like something like this: