« Tuscany travel guide | Main | Experiences converting to hreview microformat »

Microformats thoughts

I finally took the plunge and did some experimentation with microformats - http://microformats.org/. I converted my Vancouver Restaurant reviews page to use the hreview mformat. I wrote up a detailed experience report convert to hrevew

I think microformats is a really interesting competitive technology to XML and the semantic Web. Briefly, mformats are an alternative architecture to the XML or RDF first architectures. Instead of publishing XML and then styling into HTML, the data is embedded in the HTML and can be extracted from the HTML. The idea is that you put class= attributes in your html and then a mformat parser will extract content based on the class(es). For example, a name of a restaurant is <span class="fn org">Tojo's</span>. I registered my restaurants page in the kritx web site, and now my web pages can be found by searching on kritx for reviewer, rating, restaurant name, etc.

Some people believe that microformats will be the way the Semantic Web really gets deployed because there will be mformat data everywhere in HTML than RDF. A description I like is that "microformats is the small s semantic web".

Apps
The real reason to use mformats is for applications. What are the killer apps for extracting XML from HTML? Seems to be lots of potentials. An app I'd love to write is a google mashup that shows all the restaurants I and others I trust have reviewed with a rating, cost, food type slider. Then I could see the best mid-priced sushi restaurants nearby.

But where's the incentive for me to do the work to publish my stuff in mformats? It's got to be around search engines/apps extracting it and doing something really interesting. Until some really killer apps show up, it'll be hard to get the mainstream to use mformats.

Tools
The tools are really rough right now, at least for the hreviews. I used the tails firefox plug-in to test the HTML and then kritx for indexing. It was a fairly lengthy and iterative process getting the HTML into shape.

ValidationThe problem is that when I create the HTML, I have to render into HTML and into some mformat parser to see if document works in both views. If the mformat doesn't work properly, I had a tough time knowing whether it was the HTML or the tool. I'm sure this will improve over time, but it is a barrier currently.

Authoring
I think mformats really needs some good authoring apps, by which I don't mean the simple "fill in form data and mformatted html is produced". How do I integrate writing my blog with the various mformats? Say I do a restaurant review on a particular date, how does the blog authoring software integrate with the hreview/hcal mformats? I don't know if it's possible for a generic call-out to mformat plug-in would work, as it seems that the authoring software needs to be tightly aware of the mformat.

Extensibility
I've been a champion of thinking about extensibility and versioning for a long time. The mformats use the html extensibility of the class attribute to add in the class of the "thing". But how far does this go? I was joking with Norm Walsh that they'll need square brackets or something to wedge extensibility into the mformat. Here's a simple example. In XML, in a decent XML language design, I can add a middle element between a first and a last name. As long as a flavour of "must ignore" is followed, the software works with the extension. But how do I add a middle name into the "fn" mformat class, which currently talks about first/last names? Currently it's space separated. Should I mark it with angle brackets? And how do I say that it is version 2 of the "fn" language, do I say it is "fn2", like <span class="fn2">David [Bryce] Orchard</span>? And how do I tell the difference between my version of a middle name and somebody elses? Hint: XML has namespaces. I know, how about colon characters meaning something in the square brackets, like [title:Mr] David [middle:Bryce] Orchard.. Or will I embed the extensibility inside another span and pray that my class ignores the embed? <span class="fn">David <span class="middle">Bryce</span< Orchard>/span<. This is almost exactly like the Extensibility Element schema design pattern for XML that I worked on a while ago..

It seems to me that there is a real potential for trouble in getting into trouble without the full extensibility/versioning story that we can have with XML into mformats. Now maybe they won't need it and this is just some architectural ivory tower concern and I've been too close to angle brackets for too long.

Speaking of ext/vers, a way that I was thinking of mformats with class attribs is like multiple inheritance in O-O languages. I can say that a particular thing is an fn org etc. Which brings up the obvious diamond problem. What happens if I have a conflict between the class values? Say the fn says you can use square brackets for extensibility but org says use embedded span?

And let's talk versioning. Every successful XML vocabulary deals with versioning, usually by adding things into the language. The use of markup + namespaces gives a processor some hope of disambiguating the versions of the terms. The mformats use a "version" attribute way up in the hreview class. With no markup and no namespaces near the actual content, I'm having a tough time seeing how the versioning story works.

But maybe it's not needed? Maybe that fine grained control and identification that XML gives isn't needed? Looking at hcal, it's a mformat spec of the iCal IETF spec. There's a big standardization process for that. If each of the mformats is an html representation of some well-defined XML spec, then that hits a different sweet spot than the publish XML first. And maybe by constraining the vocabularies to a few XML formats that are mapped to mformats, we'll get a huge amount of interesting applications that don't need XML's full decentralized creation and extensibility model.

There are some interesting effects around extensibility that can result from combining microformats and xml. For example, I want to do pub-sub to my restaurant reviews with multiple vocabularies, such as including an hcard for contact info. Doing RSS/HTML with microformats seems easier than doing RSS or WS-Eventing of a review + icard XML document.

Multiple Copies
The exact problem of needing differently formatted data often results in multiple copies of the data. For example, date/times might be in human readable form and then in a machine processable form. This causes the classic problem of what if I change one and forget to change the other?

This is generally a problem of having re-using content across different pieces of software. To the extent different software needs different formats or data and can't single source the data, the author is asking for trouble.

Test Cases
One thing that is sorely missing from the hreview microformat is bunch of test cases. It seems really obvious, but please, make a whole bunch of test cases and evangelize them like crazy.

Net
Microformats look really interesting. They give the ability to embed structured data in the "master" presentation format, HTML. The key for adoption is what the killer-apps are going to be that justify the extra cost. There are some challenges with current tooling that would surely be solved over time. I worry about the architectural issues of extensibility and versioning, but I wonder if that's a carry-over from my work on XML.

Post a comment

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)

About

This page contains a single entry from the blog posted on March 8, 2006 12:10 PM.

The previous post in this blog was Tuscany travel guide.

The next post in this blog is Experiences converting to hreview microformat.

Many more can be found on the main index page or by looking through the archives.

Powered by
Movable Type 3.34