We've been evangelizing a model of versioning called the "Must Ignore Unknown" rule for a while now - as described in our versioning article at http://www.xml.com/pub/a/2004/10/27/extend.html. It roughly means that any extra content that isn't known is ignored and specifically no error is generated. This works very well in the Web model because any extra markup is ignored by the browser. The human reader won't ever see the extra content.
This works very well when the software doing the ignoring is the last piece of software looking at the data. In many applications, the software that gets an extension isn't the last piece. So what does it mean for it to ignore the extra content? Should it throw it away? Should it keep it but not fault? I'll call these two models the "Ignore and Discard" and the "Ignore but Retain" models.
The application designer must choose which of the Ignore models to implement. There are pros and cons to each model. The discard model has the advantage that it may be simpler to implement and gives at least a simple versioning story.
Let us explore the Retain model with our much-used name example. An XML application receives names with given and family names. A client extends the name structure and adds a middle name. The decision about discarding versus retaining is dependent upon what the application does. If it forwards the message, it may be as simple as just keeping the extra content in the message.
One flavor of this extension is adding the extension at the end, if the schema allows:
<name xmlns="http://www.openuri.org/name/1">
<given>Dave</given>
<family>Orchard</family>
<middle>Bryce</middle>
</name>
Alternatively, using the "extension element mechanism" outlined in the earlier article,
<name xmlns="http://www.openuri.org/name/1">
<given>Dave</given>
<family>Orchard</family>
<extension>
<middle>Bryce</middle>
</extension>
</name>
If it stores the data and possibly returns the data for different requests, then there are some interesting designs that are available for retaining the data. Imagine that the family and given names are columns in a table. The middle name extension could be stored in an extra "extensions" column in the table. That way, whenever the name is returned it can create a name with the family, given and middle name in it.
Family Given Extension Orchard David <middle>Bryce</middle>
By retaining the data, then the xml application can be evolved in further interesting ways. If it is versioned to "understand" middle names including adding a middle column, then it is possible to move all the middle names out of the extensions column and into the middle column.
Family Given Middle Extension Orchard David Bryce
There are a few tricky parts in this architecture. Multiple extensions are possible, each of them needing to be stored. Order in returning the XML when it is still unknown might be important, so retaining some notion of the order of the extensions may be required. This could be accomplished by specifying an order for the extension. Another option is to retain the entire document in raw form, such as
Family Given raw Orchard David <given>David<given><family>Orchard</family><middle>Bryce</middle>
There are obvious downsides such as space, duplication and complexity in this model, but it does safely preserve all extensions and their order.
Conclusion
Language designers that have designed their systems for extensibility and versioning will usually have a flavour of the Must Ignore Unknown rule. This article describes the question of what flavor of Ignoring to use, and a sample architecture that preserves unknown content
Leave a comment