Visualizing StackOverflow Data: When Does Jon Skeet Answer Questions?

March 15, 2013 OpenSource Connections
Category: Uncategorized

Last month we found the best time to ask a question on StackOverflow using the oft-missed ‘join’ feature in Solr.

Numbers will get you far in isolating the top-performing times, but patterns and off-the-chart clusters are lost when scanning lists of numbers.

Information-rich graphics lend themselves to naturally finding patterns and, and the punchcard template view to easily spot data densities grouped around times.

So, let’s set up the visualization.

The first thing we need is data, which we’ll get from Solr. See how to set up a working StackOverflow index here For now, since we’re just using a local instance of Solr, we make a jsonp request to get the data we need.

In a future post we’ll discuss how to set up a single-page search app using EmberJS.

A quick query to:

http://localhost:8983/solr/select?q=OwnerUserId:22656&fq=PostTypeId:2&rows=0&facet=true&facet.pivot=CreationDay,CreationHour&wt=json&json.wrf=callback

This returns a summary (facet.pivot=CreationDay,CreationHour) of all of the answers (PostTypeId:2) written by Jon Skeet (OwnerUserId:22656).

D3 is primarily a visualization library; it does not come with data processing tools. Therefore all scrubbing must happen before D3. In this case, Solr provides clean JSON output by selecting wt=json, so our cleaning is reduced to converting UTC dates into Eastern Standard Time.

We set up the visualization by calling draw() from our callback method to Solr. This takes the data, processes it, and then sends it through our visualization. And here is what we get:

Conclusion

So, what have we learned?

Jon Skeet does in fact sleep! This is kind of a let down considering all the other cool things Jon Skeet can do!
Your best chance for getting your question answered by Jon Skeet is somewhere between 8 and 4 EST, Monday through Friday. You can refine your question by using Solr JOIN, an advanced feature similar to a MySQL JOIN.

The best approach to discovering patterns in large data sets is to iteratively investigate new hypotheses. From simple charts and summary statistics to detailed, interactive diagrams, all visualizations provide a unique opportunity to literally see if the data makes sense.

What other questions can you ask the data? What other visualizations would you like to see? Keep track of our progress as we explore this dataset further on Github. For some ideas, check out these related posts: