Why HTTP uris are better than urns and even id: uris for identifiers

| | Comments (2)

When creating a URI based identifer, perhaps the most important decision is which uri scheme to use. Two of the most common schemes are http: and urn: schemes. A common reason given for using URNs for identifiers, such as namespace names, is that an http: identifier appears to humans as a location and hence dereferencable. Another common reason is to come up with an identifier that is location-independent or that is "movable" from one location to another.

URIs have context
The first argument, that http: uris are "locations", is based upon incomplete understanding of the use of URIs. Any data type exists in a context, in this case URIs. The context will define the use of a URI, and includes social and technical context. A URI on the side of a van will convey the social meaning that it can be typed into a browser and some good stuff will show up in the window. Other contexts for the use of URIs include namespace names, references to documents, and identifiers for *things*. There is never the case that a URI is simply "found" without a context. The key point is that every use of a URI for an identifier has a context.

The use of uris in namespace names is enlightening. Imagine 2 scenarios, one with a urn and another with an http: uri. The namespace specification defines a context, which roughly speaking says that a namespace name SHOULD not be considered dereferenceable. Any software component that is written assuming that a namespace name MUST be dereferencable is violating the namespace specification, ie the context. It may be that the namespace owner has guaranteed that they will provide a document at the namespace name, but this must be on a subset of the entire set of namespace names. Clearly generic XML software should not be written to assume dereferencability of namespace names.

It is natural for a human reading an xml document with a namespace name that they do not know about to want to understand more about the namespace. This is why the TAG recommends providing a document at a namespace name that provides both human and machine readable information.

Scenarios for using identifiers

The use of http: namespace names enables 3 separate scenarios:
1) an identifier that can be created in a decentralized manner
2) an identifier that may be dereferenced in a browser for more human understanding;
3) an identifier that may be dereferenced and examined for machine-centric information on a namespace name, such as schemas, wsdls, policies.

These are 2 distinct interaction patterns, with(#2) and without(#3) human involvement. The software only interaction pattern is clearly erroneous if it assumes that a namespace name is dereferenceable, and it is unlikely that an XML software written today makes this assumption.

Contrasting these scenarios with the approach of using a urn. A urn provides an identifier, though in some cases these are not decentralized. A human looking at an xml document with namespace name will not be confused about whether it is dereferencable or not.

It may be the case that machinery is provide to transform the urn into dereferencable address, such as XRI. This always require the use of an deferencable registry. The software must have some location "hard-coded". Indeed, most of the urn to location schemes use http for their lookup.

In the http: identifier scenario, the "location" to be used for knowledge is embedded in the identifier and available in a decentralized manner via DNS. In the urn: identifier scenario, the "location" to be used for knowledge is hardcoded somewhere in the application or the urn identifier. It is substantially easier for software to use a single identifier and existing http infrastructure, than to use an intermediary identifier and quite probably the existing http infrastructure.

What about an id: scheme
A main advantage of http URIs is the use of DNS to allow decentralized creation of vocabularies, but this does bear the cost that humans can be confused by the mixing of location and identifiers. Another possibility is to create a scheme that does not have any protocol associated with it, and I thought seriously about doing this at one point. The reason that this does not work and I did not proceed is that it does not address the issue of humans needing to understand context and it does not allow the flexibility of providing a namespace document.

Imagine that I create a scheme called "id" that uses the exact same syntax of the http scheme and specifically does not define a protocol. I can create "id://example.org/ns/foo" uris. There is no confusion about the namespace being dereferencable.

But what value is in this identifier? If one of these URIs shows up somewhere in a document, how will the human find out about the meaning? The human or machine dereferencing scenarios (#2 and #3) have been lost, for what benefit? In the case of http: scheme, somebody might have dereferenced the URI and gotten a 404. Is that so bad? If they try to deference an id: URI, they will get an error too. In either scheme, there's an error if nothing exists upon deref. This is the case for all id: resources, but the http: scheme allows for some ids to be dereferencable. This is a feature, not a bug.

This is why I didn't pursue registering an id: scheme: It does nothing to prevent errors from occurring, and it precludes some very important discovery cases.

The use of uris for namespace names is just one example. Any use of the URI data type in an XML document has the same issues. A provider of a URI must specify in their language how the URI will be used, whether it is intended as an identifier, a location, or both. Using a urn instead of an http uri does not make the software or humans job any easier, and it precludes some serious cases.

Last gasp: persistent identifiers past the Web
One final argument that I've heard for using urn: identifiers is to come up with an identifier scheme that will last beyond the existance of http. Imagine the Web gets replaced by Something Better(tm). All those http identifiers look pretty silly in the Brave New World(tm).

That certainly doesn't fly when most of the urn: schemes that allow deferencing use HTTP for the protocol! So let's just focus on the non-deferenceable identifiers. Well, these still work in the Brave New World(tm). If the http: identifier isn't dereferencable now, it doesn't matter whether http exists in the new world because the identifier isn't being deferenced! So there it is: If it's not dereferencable now then it will work fine post-Web, and if it's defererencable now then most of the urn: schemes are broken post-Web because they use HTTP.

And finally, I just have a really tough time throwing away simple solutions to problems that exist now for a dubious solution to problems that won't exist for decades. Does anybody really think that the Web is going to get seriously wiped out in 10 years?

Irony alert: creating urn: identifiers from qnames rather than http: identifiers
Rich Salz and I have written up a qname to URN conversion scheme that uses urns. It says create a uri by taking the name followed by the namespace name, ie urn:qname:name:foons. It leverages the decentralized aspect of http because the foons can be http, and it solves the problem of how to encode a QName into a uri and then back to a qname without the URI problems of "what's the separator for the name vs the namespace name".

I think it is really ironic that I like using http: schemes for creating namespace names, but then I propose using a urn: identifer for creating URIs from the Qnames instead of an http: URI. Why is this so? The fundamental problem is that given an http: uri, you can't know where the namespace ends and the name begins. The XML Namespace folks never provided a canonical QName to URI mapping. I've thought up a bunch of different schemes for mapping Qnames to URIs, and they all require prior agreement on how to turn the URI back into the QName.

However, the QName URN scheme does allow for dereferencing http: namespace names because the namespace name portion would contain the http: portion. If I get a urn:qname:*:Reject:http://w3.org/2002/xkms#
URI, I know that I can look at the 5th part (the namespace name) and possibly dereference it. So all is well in the world.

2 Comments

The DNS might be the best disambiguator we have at the moment, but it's pretty inadequate at times too... what if I create some software using http://glens.cool.domain and then I sell my domain name, or let it lapse? All of a sudden the identifers aren't in any sense "mine" anymore except in that I was the one to invent them in the first place.... I don't disagree with your thesis here, but there's certainly no ideal solution.

Doctor, doctor, it hurts when I don't renew my domain name! 8-)

Leave a comment

About this Entry

This page contains a single entry by Dave Orchard published on April 11, 2005 2:02 PM.

Vancouver coffee shop review: "Our Town" gets 4/4 was the previous entry in this blog.

My review of the XML Binary Characterizations Working Group is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories