At work we have our own in-app ad system we use to cross-promote our apps. It’s been up and running for a couple months now logging impression and click-through data into a MySQL database. One of the tasks I was assigned with was creating the analytics page that turns thousands of rows of ad data into a user-friendly chart and table.
My first hacked-together iteration of this involved running a COUNT query for each day from the first logged entry to the last. The entire result was displayed in a chart and table using Google’s Visualization API. You could narrow your search by manually typing in a start and end date into a search or by moving the zoom sliders on the annotated timeline (this wouldn’t run a new query, just narrow the existing result). I knew this was an extremely un-optomized way of going about this, but I just wanted to get my feet wet with the Visualization API and we also wanted to see how our ads had been faring in the past few months.
The second time around I knew I had to limit the amount of data a user could look at. There was just no way the server could handle running a count query for each day. I also recently added logging of ad impressions to our ad API which meant that now 2 COUNT queries (1 for impressions, 1 for click-throughs) would have to be run for each day. After discussing with someone in our marketing department we decided that monthly reports would be sufficient. This limited the number of COUNT queries that needed to be run to a maximum of 60 at any given time. The results were better, but still bad:
I threw around some ideas on how to improve the speed. I knew that the most optimized way of fixing this problem was to create a new table that just contained rows with the count of impressions and click-throughs for each day for each ad. Every time the server received a notification of an ad impression or click-though it would just up the count for that day. There were several problems with this approach, though. For starters we wouldn’t be able to track any additional information easily (device UDID, geo-location, etc). Also if the row was locked by an existing query other queries to update the count could fail resulting in loss of data. More importantly, it would have rendered much of our existing data useless without writing a script to convert it.
Eventually I realized my whole idea of using COUNT in the first place was way off base. My thinking was that running a bunch of smaller queries returning minimal a amount of data was better than running one large query returning a ton of data that PHP would then have to sort through. The fundamental problem with this (which I only recently learned) is that COUNT is still running a SELECT statement to get the result (looking through the entire table and returning data). Doing this 60 times adds up especially since our table is hundreds of thousands of rows long. It’s also wasteful if nothing shows up for that query. So I modified the query to return all the data for the selected month and had PHP do the hard work instead:
Success! I was extremely happy to see such an improvement in execution time. In the next month we’ll see how this system scales up.




