I've spent a lot of time over the past few years thinking about the Web, Web services and distributed object technology, how they differ, and what the critical success factors. Our industry talks a lot about the various reasons, with the (I guess) expected amount of hype: "Web services are great because they're new and improved. For you, they can even be red!".
But how are the Web, Web services and Distributed objects really different? What are the specific architectural differences? And I'm not talking about the "Web services are about coarse-grained components and distributed objects are about fine grained objects" kind of ambiguity.
One way that I think Web and Web services are different from distributed objects is that they make the data format on the wire (html, xml,..), the object references (URIs), the protocol messages (HTTP) and the description human readable. Well, maybe wsdl isn't the most human readable. Distributed objects on the other hand make the description (idl) human readable and the wire format, object references, and protocol messages binary. Now that might be a sufficient reason. There are plenty of others, such as the late binding of the information content to the user, the ability to have high performance due to ease of inspecting the HTTP message method and URI to determine equivalence, etc.
But I think there's another big and untouted reason: Extensibility. If you take a look at comparing the Web to distributed objects, the invocation mechanics are quite different. In distributed objects, you might say something like:
public interface PO {
getMyPo( in int poId, out PO purchaseOrder); }
whereas HTTP might be modeled somewhat like
public interface HTTP {
get (in URI address, out MIME body ); }
Now let's focus just a minute on the HTML side of the house, where much of the innovation happened, so we'll slightly modify the interface
public interface HTTP {
get (in URI address, out HTML body ); }
There are 2 very different aspects to each of the interfaces. The PO interface can be extended to new and arbitrary methods, and the PO class can also be extended. These are both arbitrary extensible - that is specific to the interface - as opposed to generic or uniform. HTTP, SQL, etc. are called uniform because the methods cannot be varied by the application programmer. HTTP verbs can only be extended by an extraordinary amount of work as this is via a standards process. And the same with official HTML.
You would think that given the arbitrary extensibility of a dist-obj style interface, this would have taken off. But there is something that is not expressed in the interface which is incredibly important. It's perhaps one of the biggest reasons why systems that focus on contracts rather than APIs tend to win. Whereas the HTTP interface is constrained to a specific number of verbs, the content is extensible. You can put XML, MIME, etc. in the content. Looking at the HTML case, there's a critical piece of information that is in the specification.
This is the "Must Ignore" rule. HTML, and HTTP headers and even much of the URI spec, have a rule that any unknown content must be ignored. So if any content appears, in any place, and the receiver doesn't know about it, it can validate as if the unknown content was "projected" out of the instance. This rule is specified in the HTML specification but is not expressed in the schema/dtd. In fact, I believe that if HTML had not been able to express the "must ignore" rule outside of the schema, then HTML probably wouldn't have allowed nearly as much extensibility.
This allowed a huge evolution in HTML, and did not affect the HTTP API. They were orthogonal. The formats and verbs are separately evolvable.
Distributed object systems made a critical decision that any kind of extension required that both sides understand the extended interface. This is the fallacy of "single administrator". Much has been made about the fragility of distributed object systems, and I'm convinced that this lack of "touchless" extensibility was a key contributor to it's lack of uptake and the triumph of the web.
Now imagine distributed object and web world in which the reverse rules were applied. In distributed object systems, you could insert any kind of content you wanted in the PO class, and if it wasn't known by the receiver, it would simply ignore it. Distributed object systems would be far more resilient. Fewer of the exploding interfaces with a gajillion different methods. I think that distributed object technology would have had a wider appeal if parameter extensibility, and perhaps even method extensibility, were supported in a manner that didn't require both sides to simultaneously change.
And it's hard to imagine that HTML would have evolved so quickly and successfully if the Must Ignore rule was not built in, so HTML could not be evolved outside of the standards committee. Image, forms, css, etc. were all innovated after the first version of html.
How about Web services? It appears that most authors effectively make the same distributed object decision when they design their interfaces. They recreate the "getPO" distributed object method in a SOAP message without allowing extensibility in the PO. Any time they need to extend the PO, they have to extend the Schema and roll out a new version. Now they could extend the PO using XML Schema's extensibility mechanism.
XML Schema made the decision that any extensions that were to be validated would require the updated schema to be on both sides. Further, that all notions of extensibility compliance are expressible in the schema language. This is awfully close to distributed objects decisions.
However, they aren't quite the same. XML Schema provides a wildcard element, <xs:any>, which allows elements in constrained namespaces to appear in an instance. If the PO Schema provides wildcards, then PO authors can use wildcards for extensibility. They won't get the arbitrary extensibility that HTML had - and I think this is a quite a problem. There's a common pattern of allowing elements from namespaces other than the target namespace. This provides at least some extensibility, but it has some issues I detail in Examining wildcards for versioning. I argue that they can use <xs:any> in particular ways to get touchless extensibility with full validation in an article on xml.com called Versioning XML Languages, but this still requires the developer actively specify extensibility points.
There is another significant difference between the Web and XML schema extensibility technology, that of active versus passive specification of extensibility. XML Schema requires the author actively insert something in the schema document to predict where extensibility needs to occur for touchless evolution, whereas much of the web technology has extensibility passively built in to the system as the default. What would be perhaps the ideal solution is if we could unify the "must ignore" rule with our validation logic in a passive manner. Then we could give the full must ignore functionality into the infrastructure rather than requiring each data format and corresponding code to specify where ignorable elements are allowed and not allowed. And learning the lesson of HTML, the mechanism for specifying "must ignore" may need to be expressed outside of the schema language.
Now some folks argue that the very fact that there aren't constrained verbs (the RESTafarians) means Web services are doomed to the dustbin of history. I think that the difficulty in providing touchless extensibility is harming the ability to deploy loosely coupled applications, but there are sufficient techniques that enough extensibility can be provided to enable loosely coupled Web services. This does require explicit actions on the part of the interface designer. I do think that the community should provide an easier model for creating and validating extensible xml languages.
XML + "touchless extensibility" = RDF
see http://www.markbaker.ca/2002/09/Blog/2003/10/10#2003-10-rdf-and-xml
Although, it is clear you make very valid comments on extensibility of interfaces (something that is critical in a successful deployment of loosely couple components), I think the title of your article is confusing and non logical. You could make a case for a title like: Web Services = or != CORBA, IDL, etc. But not distributed objects. Web Services, CORBA, DCE are forms of Distributing Computing. Forget the word objects. Objects in the most strictly sense of OO definition means instanciation of a class. Web Services are components. I know this is a 'religious' topic, but I hope you see my point.