As I've thought through the awesome deployment of web software and the relationship to dealing with extensibility and versioning, I keep on coming up with more instances of extensibility in Web specs. I've been involved in discussions in a bunch of groups about what's the right way to do extensibility/versioning. I hope that by referring to a fairly exhaustive list of the areas that the Web arch has extensibility + Ignore rule, it will provide a canonical reference point for ongoing analysis.
We listed a few of these in the Web Architecture document section on general principles on Extensibility, so I'll start with that and expand.
HTML
HTML 2.0, rfc 1866, contains the following text in 4.2.1 Undeclared Markup Handling
To facilitate experimentation and interoperability between
implementations of various versions of HTML, the installed base of
HTML user agents supports a superset of the HTML 2.0 language by
reducing it to HTML 2.0: markup in the form of a start-tag or end-
tag, whose generic identifier is not declared is mapped to nothing
during tokenization. Undeclared attributes are treated similarly. The
entire attribute specification of an unknown attribute (i.e., the
unknown attribute and its value, if any) should be ignored.
Thus HTML Elements follow a Must Ignore, and attributes follow the Should Ignore.
HTTP
HTTP, RFC 2616, has a number of extensibility points that follow the ignore rule.
HTTP Response and Request headers
Section 5.3 says:
However, new or
experimental header fields MAY be given the semantics of request-
header fields if all parties in the communication recognize them to
be request-header fields. Unrecognized header fields are treated as
entity-header fields.
and Section 6.2 says:
However, new or
experimental header fields MAY be given the semantics of response-
header fields if all parties in the communication recognize them to
be response-header fields. Unrecognized header fields are treated as
entity-header fields.
Here we see the Must Ignore rule, as unrecognized header fields are treated as entity-header fields, which we shall shortly see follows Should Ignore.
HTTP Entity headers
Section 7.1 says:
The extension-header mechanism allows additional entity-header fields
to be defined without changing the protocol, but these fields cannot
be assumed to be recognizable by the recipient. Unrecognized header
fields SHOULD be ignored by the recipient and MUST be forwarded by
transparent proxies.
Here we see the Should Ignore rule, plus additional constraints on intermediaries.
HTTP Error codes
Section 6.1. says
HTTP status codes are extensible. HTTP applications are not required
to understand the meaning of all registered status codes, though such
understanding is obviously desirable. However, applications MUST
understand the class of any status code, as indicated by the first
digit, and treat any unrecognized response as being equivalent to the
x00 status code of that class, with the exception that an
unrecognized response MUST NOT be cached.
HTTP error codes have a Must Understand on the status code. By casting unkown subcodes to 00, this is effectively the Must Ignore rule for subcodes.
HTTP Chunked encoding
Section 3.6.1 says
All HTTP/1.1 applications MUST be able to receive and decode the
"chunked" transfer-coding, and MUST ignore chunk-extension extensions
they do not understand.
Cache Control Extensions
Section 14.9.6 says
This extension mechanism depends on an HTTP cache obeying all of the
cache-control directives defined for its native HTTP-version, obeying
certain extensions, and ignoring all directives that it does not
understand.
and
Unrecognized cache-directives MUST be ignored;
Finishing HTTP
It's pretty darned obvious where SOAP headers, mustUnderstand, and Actor/Role came from if you think about HTTP extensibility.
URIs
The URI specification is designed for extensibility. It has a number of major sections that are extensible: schemes, domain, path, query, fragment identifiers. However, the "ignore" rule doesn't really apply. For all the sections excluding frag-ids the mustUnderstand rule is in effect. A browser or software that doesn't understand a scheme or domain of a URI will generate an error. And a server or other software that doesn't understand a path or query will generate an error.
However, frag-ids, which are interpreted by the client and not sent to the origin server, have typically been implemented with the Ignore rule. A browser that has retrieved a resource that does not understand the frag-id will ignore the frag-id. A representation that doesn't have an element with an attribute that matches the frag-id (ie name attribute in HTML) is still rendered.
CSS
CSS planned for forwards compatibility and applied the Must Ignore rule at each of it's extensibility points. CSS Level 1 Forward Compatibility applies the ignore rule to a number of extensibility points:
Properties: a declaration with an unknown property is ignored
illegal values: illegal values, or values with illegal parts, are treated as if the declaration weren't there at all:
At-keywords: an invalid at-keyword is ignored together with everything following it, up to and including the next semicolon (;) or brace pair ({...}), whichever comes first.
There are more explanations and cases in the section, all of which apply the must Ignore rule.
Others
Many other examples exist, such as XSLT and SOAP.
Conclusion
The Ignore rule, in the two variants of SHOULD and MUST, is the key rule that has enabled forwards compatibility for older software to deal with new extensions it doesn't know about.
You'll probably want to remove this comment since this doesn't apply to web architecture, just a related note.
I can't verify it for sure since it pre-dates my time doing computer-related work, but a friend of mine informed me yesterday that the old Microsoft DOS API was designed such that if a function with an unknown name was called, DOS would return the success code of 0.
So it looks like they were employing the Ignore rule over at Microsoft back in the olden days.
Regarding URI
"And a server or other software that doesn't understand a path or query will generate an error."
The URI definitions are orthogonal to an operation performed by a server. For example, with HTTP, the URI in a PUT operation could be generically understood by a server - it maps the URI to a bag of bytes, no interpretation necessary. So some operations will succeed even with never-before-seen URI.
Of course, syntactic problems probably would be 'must understand'.
David Bau has written some very coherent
articles on this issue, including a mathematical
formalism for discussing compatibility:
See http://davidbau.com/
What might be interesting is a compare and contrast with systems that opt for partial understanding rather than ignore of unknown entities, e.g. RDF.
Found your site through blogspot and wanted to say hi