Microdata in HTML5 and what it means for technology

Thursday, August 25th, 2011 by Chris Jason
No Comments

Recently I’ve had the opportunity to work with some of Google’s best and brightest to create a set of sports schemas for Microdata. We announced the ESPN/Google collaboration earlier this week via a guest post on Google’s Inside Search blog.

Microdata is essentially 5 new attributes of the proposed HTML5 specification that, when used in conjunction with a Microdata schema, give semantic meaning of HTML code to a computer application (like a search engine).

The HTML5 attributes that serve as the building blocks of microdata markup are:

A common question coming up is how Microdata is different from Microformats. The answer is pretty simple. While they are both forms of rich HTML markup intended to apply meaning to HTML code, Microformats relies on the “rel” and “class” attributes of HTML tags to supply the data. Neither of these attributes were really intended for this usage, so the implementation of Microformats is a sort of “hack” as a result of there not being a native way to achieve this in HTML4. Microdata, on the other hand, has the above 5 attributes built in to the proposed HTML5 specification. No need to hack existing HTML attributes for uses they weren’t intended for. HTML5 figures to solve the problem the right way.

In this project with Google, we tackled schemas to represent athletes, teams, associations, series, and matches (games). The initial implementation is essentially a proof of concept that shows itself in Google’s results when ESPN pages are returned for MLB players, teams, or generic baseball searches. Though we started with baseball we are planning to extend this in the coming months to other sports including football, basketball, hockey, soccer, tennis, and golf.

Here’s an example of a possible representation of a boxing athlete using Microdata:


<div itemid="boxer~253" itemscope="itemscope" itemtype="http://schema.org/SportsAthlete/Boxing">
	<img itemprop="image" src="http://someurlhere.jpg" alt="Evander Holyfield" />
	<h3 itemprop="name">Evander Holyfield</h3>
	<link itemprop="url" href="http://somewebsite.com/evander-holyfield.html" />
	<link itemprop="sport" href="http://schema.org/Boxing" />
	<time itemprop="birthDate" datetime="1962-10-19">1962-10-19</time>
	<span itemprop="height">188</span>
	<span itemprop="weight">100</span>
	<div itemprop="statistics" itemscope="itemscope" itemtype="http://schema.org/SportStat/Wins">
		<span itemprop="name">Wins</span>
		<span itemprop="abbreviation">W</span>
		<span itemprop="value">44</span>
	</div>
	<div itemprop="statistics" itemscope="itemscope" itemtype="http://schema.org/SportStat/Losses">
		<span itemprop="name">Losses</span>
		<span itemprop="abbreviation">L</span>
		<span itemprop="value">10</span>
	</div>
	<div itemprop="statistics" itemscope="itemscope" itemtype="http://schema.org/SportStat/Draws">
		<span itemprop="name">Draws</span>
		<span itemprop="abbreviation">D</span>
		<span itemprop="value">2</span>
	</div>
	<div itemprop="statistics" itemscope="itemscope" itemtype="http://schema.org/SportStat/Knockouts">
		<span itemprop="name">Knockouts</span>
		<span itemprop="abbreviation">KO</span>
		<span itemprop="value">29</span>
	</div>
</div>

The ultimate goal is to standardize the sports schemas and then publish them for other developers/publishers to use in their own development. At the end of the day, a richer online experience based on standards is better for everyone involved in digital technonology–both content creators and consumers.

While Google (and other search engines like Yahoo! and Bing) are early adopters of Microdata and thus already using it to improve their search results, there will undoubtedly be other uses as well. If a computer application can know, in a structured way, what a block of HTML code is representing, it can do some interesting things to create a richer experience. Leaving the sports world for a second, marking up a Web page with Person Microdata, for example, could allow a user to easily call a friend directly from their smartphone reading their name in a friend’s Facebook post or Tweet. Or Place Microdata could be used to launch Google Maps directly from a tablet to get more information about something referenced in the Web page. When reading a review of a restaurant a user could click the linked restaurant name to quickly reserve a table. These simple uses are just the beginning, and serve as a promising improvement to the utility of many of the smart phones, tablets, and other devices that we all use every day.

Depending on how sophisticated the schemas become for Microdata, they could also potentially be used as a form of API (an insecure one, but still an API) for organizations to easily share data with one another without having to do data dumps or register for developer keys, as in the standard use of a formal API. Microdata, if widely adopted, could essentially wipe out the need for “scraper” technology and complicated regular expressions that have long been used by software engineers and Web developers to extract data from Web pages.

I believe that Microdata is going to become a major technology that attains more uses than just improving search results. My money is on big things for Microdata as more and more smart people start experimenting with the technology and extending it beyond what is being done today.

Post a Comment

You must be logged in to post a comment.