Why putting extra structure in V1.0 is good

| | Comments (0)

In this article, I'm going to argue that structural evolution is similar to data evolution, and that designers can design their interfaces to enable structural evolution. Designers should provide as much structure as possible in V1.0 in order to enable compatible evolution.

Over the past while, I've muttered a few things about extensibility and versioning:
- substitutability is the key to extensibility and versioning and needs to be in v1.0 of a format- substitutability is the combination of allow extension and how to substitute known things for the extensions if necessary.
- Ignoring unknown content is the simplest possible format substitution rule.
- a lengthly list of extensibility techniques using XML Schema and some thoughts on what Schema 1.1 could do
- protocol substitutability is really hard

This starts the examination of structural evolution, now that I've done some look at protocol and data evolution.

Another big change that often happens in interfaces is refactoring. In most of the cases that I've mentioned, additional information or content is added. My favourite is adding a last name or middle name. In the refactoring case, I have another favourite example. A name string is changed to become a name structure.

So let's imagine that V1 of an interface has method that takes a name, and in V2 it takes a name structure. In the scenario of "adding structure", as opposed to adding data, it ends up like some very campy movies: there's an easy way, and a hard way. The easy way for compatible evolution of structure is the forwards compatible case which is where the older receiver gets the newer structure. The hard way is the backwards compatible case, where a newer receiver that wants the structure doesn't get it.

It so happens that additional "optional" structural information is similar to addition "optional" data. An older receiver can easily ignore newer content, and the rules for ignoring new structure are modestly straightforward. In comparison, a receiver that requires new content has a very difficult time in "magicking" up the missing content, and it has a similarly difficult time in magicking up the missing structure.

Taking a bit of a closer look at the name example, we find that it is fairly straightforward to map a name structure into a name string. Something like "%prefix %first %middle %last %suffix" would do as a transformation. Because the new structure has provided additional information, it is easy to "lose" that information when binding to a string.

When a sender of the information moves from V1 to V2, it's fairly easy for a receiver to transform the V2 information to V1. There's a variety of techniques that could be used for specifying the transform, but let's just assume for simplicity sake that the receiver will use an intermediary to do the transform. They get the V2 message, apply the transform to convert it to a V1 message, then merrily send it on it's way.

Now it is possible, though I don't think very useful, to have built in structural substitution rules, such as a "must ignore unknown structure" rule. The rule would be something like, if you expect a string and get a structure instead, replace all structure mark up with spaces. So DaveOrchard becomes "Dave Orchard". Like I said, while perhaps theoretically possible to come up with an extensible and optional structural extension, it seems dubiously useful.

Doing the conversion of less structure to more is extremely difficult. Trying to extract the name structure from a string is insanely complicated. Companies that do this, like credit card companies, can have over 10, 000 rules for extracting name structures. Seriously. "David Orchard" is easy, "David Van Orchard" is harder, "Baron Jim-David Bryce Von Orchard-Warner III" is even harder.

Not only are names hard, so are addresses and phone numbers. We could try to standardize on these structures to help. UBL is such an effort, but I don't see your average system moving to standardized name, address, phone numbers any time soon.

When a receiver moves from V1 to V2, it can very hard for any kind of processing to apply to enrich the structural information.

A solution that I happen to like a lot for moving from V1 to V2 is to convert the structure change into an additional data change. Then we can re-use all of our mechanisms for doing type substitution, like ignoring uknown content.

The solution is require that V2 support both structures by making the string required and the structure optional. A V2 sender is required to send both the string and the structure. A V2 receiver will ignore the string and use the structure, and a V1 receiver will ignore the structure and use the string. We've already shown that the transform of structure to name is easy.

The lesson to learn is that your V1 interface is more likely to evolve gracefully if you design in more structure. A V1 interface with a name structure can easily be "downgraded" to a string, but the converse is not true. The more structure you can put into V1, the more evolvable your interface is. Instead of asking users for a "name", ask them for the name structure. Let users put the structure in for you in V1.0.

And in case you can't get the structure into v1.0, you can convert the structural change into a data extensibility problem IFF you've followed the regularly described rules for format evolution, such as must Ignore unknowns.

Obviously, there are caveats. If the data source to the system can't provide the structure, then you're kind of stuck. It may be difficult when integrating systems to require the structure in v1.0, and fair enough. I'm simply suggesting that you can can get away with it, more structure is better than less, and often times it's fairly simple to get the extra structure.

Because structural substitution is harder than data substitution, it's difficult to evolve a format that adds structure without affecting all the parties. The best way forward is to provide more structure rather than less in the first version of the interface. And make sure to plan for extensibility and versioning in case you need to add structure later on.

Leave a comment

About this Entry

This page contains a single entry by Dave Orchard published on June 14, 2004 2:39 PM.

Protocol Extensibility and Versioning was the previous entry in this blog.

Whither substitution rules is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Categories