# Estimating Ad Conversion Rates using Cassandra

John Berryman — November 12, 2013 | 0 Comments | Filed in: Cassandra

Don’t let anybody fool you; a good bit of what a data scientist does is a glorified form of counting. And if you had a few billion fingers, you yourself could do data science on one hand. In my recent work with Cassandra I’m finding that Cassandra is quite good at counting. As a matter of fact, you can treat Cassandra as a giant, distributed, redundant, “infinitely” scalable counting framework. Thus, for analysis applications that rely heavily upon accumulating counts, Cassandra can be a useful tool. In this and the next blog post we’ll take a look at a couple of big data analytics applications that rely heavily upon counting and we will demonstrate how Cassandra can be put to good use.

Let’s say that you run an online ad company. When you started the company, your customers were happy enough to just have you host their advertising and handle the dirty work. But now, with ever increasing competition, you would provide your customers with feedback analytics so that you can retain your competitive edge. And the first thing that customers want to know is how their advertising is performing. The most straightforward metric for this: conversion rate. The conversion rate is portion of all ad clicks which lead to actual purchases. This number can be represented as a probability from 0 to 1 or as a percentage from 0 to 100%. And the easiest way to estimate the conversion rate, is to count! Basically, you need to keep track of how many times each ad has been clicked and how many times such a click leads to an eventual product purchase.

Here’s how I would put it together in Cassandra. First we create a table to hold all of our counts:

``````CREATE TABLE conversion_rate (
company text,
product text,
count counter,
)
``````

``````UPDATE conversion_rate
SET count=count+1
WHERE company='Apple'
AND product='NO_PRODUCT';
``````

You might be wondering what `product='NO_PRODUCT'` is about. This is the way that we encode ad clicks, regardless of whether or not a purchase was eventually made. Of course, most of the time, a user who clicks an ad will look around for a bit and then leave. But in this case, let’s assume this user goes on to buy an iPod:

``````UPDATE conversion_rate
SET count=count+1
WHERE company='Apple'
AND product='iPod';
``````

And so it goes; many different ads are shown to many different users. Some users click the ads and of those users, some go on to purchase products. Eventually, the marketing execs of Apple come to us to see how things are going with their new ad campaign. Specifically, they want to know which ads are performing best for their various products. To make this determination, all we have to do is make a single query:

``````SELECT ad,product,count
FROM conversion_rate
WHERE company = 'apple'
``````

And the results will look something like this.

``````    ad    |    product    |  count
----------+---------------+---------
``````

The results here do not directly represent the conversion rate, but if you’re thinking ahead then I bet you can see that we’re close. Rather than giving the direct counts to the Apple, we first need to divide the count of ad clicks that lead to a purchase by the total number of ad clicks. So for instance, here we see that 17533 people clicked the iPadAd and this lead to 210 iPad purchases. This implies a conversion rate of 210/17533 or roughly 1.2%. If we assemble this into a nice table for the customer, it will like something like this: