Validation by Projection implementations
I gave a short introduction to Validation by Projection
WSDL 2.0
The WSDL 2.0 and Schema 1.1 Working Groups first discussed the idea of “pruning” extra elements from an xml document if they didn’t match an element declaration in the schema in March 2004 (http://lists.w3.org/Archives/Public/www-ws-desc/2004Mar/0038.html) Paul Biron first proposed the idea of stripping away the nodes that after validation is attempted are valid but unknown. The WSDL 2.0 Working Group ultimately decided to not included validation by projection in WSDL 2.0.
Validate Twice with Surgery (V2S)
Henry Thompson provided an implementation of validation by projection called Validate Twice with Surgery between (V2S) at XML Europe 2004 (http://www.markuptechnology.com/XMLEu2004/). V2S is available online at ( http://www.markup.co.uk/showcase/V2S-pipe.html). V2S validates the document, then eliminates any elements with the PSVI property of ‘notKnown’ and then validates the document again
XML Beans
XML Beans supports validation by projection, though this is not documented. XML Beans generates Java classes for the schema types. Any unknown elements in an xml document are ignored from the Java perspective and the document is considered valid. It is possible to access the unknown elements using the XML Beans cursor methods.
Custom XSLT
XSLT can be used to transform a document into a projection. A very simple example is an XSLT that matches any personNames, family and given children and copies them to the output. The XSLT could use the definitions from XML Schema or the elements to match could be embedded in the XSLT. The first is more complicated to write but once built can handle arbitrary Schemas without recoding. The second is simple to write but requires coding for every schema change.
UBL
UBL uses XSLT for Validation by Projection. There are XSLT stylesheets defined for each version of UBL. When a UBL document is received, validity checking is performed against the version of the schema that the UBL receiver has. If the document fails validity checking, the XSLT stylesheet for the version of UBL that the UBL receiver has is applied. This creates a projection that has removed any unknown elements. Then validity checking is performed again.
Other Programming Languages
Concievably any programming language could be used instead of XSLT to either prune extra elements or to create a content model from only defined elements. Probably the great the xml support, the easier the implementation would be. I don't know of any other Java or any Python, Ruby, .Net, or other implementations.
Others?
Are there any implementations of validation by projectiono that I haven't listed? Does anybody know of an XSLT that will project based upon a schema document?
Ken Holman has a Python implementation for UBL filters as well as the XSLT implementation.
On XML.COM I've published with my versioning article a zip with a set of Schema's, sample instances and stylesheets which implement validation by projection. It's not an implementation, but a nice illustration.
David,
I'd not heard this term before. We are definately using this exact concept as "agile" document handling - in the OASIS CAM technology and specification.
In fact we've created many subsets and templates for handling them such as for UBL, and right now the new EXDL schemas. Anytime you have large complex schemas - you can definately use CAM to get to simpler use instances and rulesets.
CAM provides a "lax" option - which allows automagical handling of "projected" content - simply skipping over content it does not care about; or you can use a mix of ##any and leaf focused XPath expressions to be flexible in the content models you will accept.
The best news is there is an Eclipse editor tool and runtime engine available - where people can quickly create CAM templates from their own XML instance samples.
You can download this OSS resource from
http://www.jcam.org.uk
Enjoy, DW
Chair OASIS CAM TC