Hey Lanyrd and SpeakerRate, come crawl us!

Eric PughJanuary 28, 2015

When it comes time to share data, as a developer, my mind shifts to fun topics like building API’s, providing data in multiple formats, putting data into public S3 buckets, or even providing information in various RDF formats to support SPARQL end points.

But there is a simpler way, and all it requires is the ability to write some HTML!

Microdata is a standard for embedding meta data into existing HTML web pages that was sponsored by the WHATWG. The biggest consumer of microdata is Google, which uses it to simplify it’s understanding of content that it crawls. You can see microdata in action when you look for a movie in your town:

Google Results for 'paddington charlottesville'

The movie time information is all pulled from the HTML web pages of various theaters.

So this brings me to one of my pet peeves of going to conferences, updating my information in Lanryd, often meaning that I am the first one to set up the event, and then, if I’m speaking, doing the same thing in SpeakerRate.

So, in the spirit of making information sharing easier, I went ahead and added some microdata to the event information that we show on our website so that in the shiny future, the information about what events we’ll be at will already be indexed!

The best resource I found for getting started was to read this StackOverflow question.

Here is a simple example of the itemscope, itemtype, and itemprop tags being mixed into an event listing:

<span itemscope itemtype="http://schema.org/Event">
	<h1 class="title" itemprop="name">StrataConf 2015</h1>
	<span itemprop="location" itemscope itemtype="http://schema.org/Place">  
		<span itemprop="name">San Jose, CA</span>
	</span>
	<div class="event-url">
		<a href="http://strataconf.com/" itemprop="url">http://strataconf.com/</a>
	</div>
</span>

There are a number of tools for helping you validate your microdata, and get a sense of what will be extracted. Click here to see Structured Data Linter validate a page. The extracted contact might look in a search engine like:

Structured Data Linter output of OSC page

Google also provides both their testing tool: https://developers.google.com/webmasters/structured-data/testing-tool/ as well as a nice tool for visually marking up existing HTML content with the microdata tags: https://www.google.com/webmasters/markup-helper

One disappointment I had is that I couldn’t get the people attending/speaking at a conference added to the data structure. I tried to add both a attendee item property as well as a performer property. According to the documentation at http://schema.org/attendee, an attendee is a valid property on an http://schema.org/Event. It may be that the extraction libraries I was testing don’t really respect that however. So right now they are just listed separately.

If you know how to do it, please do let me know!




More blog articles:


Let's do a project together!

We provide tailored search, discovery and analytics solutions using Solr and Elasticsearch. Learn more about our service offerings