Escolar Documentos
Profissional Documentos
Cultura Documentos
phloc-schematron
Version 2.6.1, 28-05-2013, by Philip Helger - ph@phloc.com
Table of content
1 Introduction .................................................................................................................................. 1 1.1 2 Prerequisites .......................................................................................................................... 1
XML document validation.............................................................................................................. 2 2.1 2.2 Validation via XSLT ................................................................................................................. 2 Validation via Pure Schematron ............................................................................................ 2
Technical details ............................................................................................................................ 3 3.1 3.2 Usage with Maven ................................................................................................................. 3 Common API .......................................................................................................................... 4 Validation via XSLT ......................................................................................................... 5 Validation via Pure Schematron ..................................................................................... 5
Extensibility of Pure Schematron ........................................................................................... 7 Reading .......................................................................................................................... 7 New Query Binding ........................................................................................................ 7 Modify Existing Query Binding ....................................................................................... 7
Benchmarks ................................................................................................................................... 9
1 Introduction
phloc-schematron is a Java library that validates XML documents via ISO Schematron (http://www.schematron.com). It offers several different possibilities to perform this task where each solution offers its own advantages and disadvantages that are outlined below in more detail. phloc-schematron only supports ISO Schematron and no other Schematron version. The most common way is to convert the source Schematron file to an XSLT script and apply this XSLT on the XML document to be validated. Alternatively phloc-schematron offers a native implementation for the Schematron XPath binding which offers superior performance over the XSLT approach but has some other minor limitations.
1.1 Prerequisites
It is assumed that you have a basic knowledge what Schematron is, and what Schematron can do for you. A good introduction can be found in Dave Pawsons Schematron tutorial at http://www.dpawson.co.uk/schematron/. Page 1 of 9
phloc-schematron - an introduction - 2013 by Philip Helger version as of May 28, 2013 It is also assumed that you have basic knowledge of the Java language, so that you can understand the code examples, that you have at least basic understanding of XSLT (Extensible Stylesheet Language Transformations) and that you have good knowledge of XML itself.
phloc-schematron - an introduction - 2013 by Philip Helger version as of May 28, 2013 3. Now the Schematron needs to be pre-processed, to resolve abstract patterns, abstract rules and perform variable replacement. 4. Finally the pre-processed Schematron must be "bound". In this step a Schematron phase can be selected which should be used. When the default query binding is used, all XPath expressions are pre-compiled so that they can be evaluated faster. When you supply your own query binding, you need to make sure to create an efficient representation to use as a bound schema. 5. This created bound schema can now be used to validate arbitrary XML documents. Ideally it should also be cached like the XSLT script from above, because the XPath compilation is kind of costly, but by far not as costly as the XSLT creation. Pure Schematron is designed for maximum extensibility, meaning that you can create your own query binding, configure the reading and pre-processing of Schematron objects etc. The drawbacks of Pure Schematron are currently: Include handling, as it works only when you read a Schematron from a resource and not if you create your Schematron from scratch. If you have this in mind when creating your Schematron files it should not affect you much. XML attributes and elements from other namespaces are read from an existing Schematron resource but they have no impact on the validation process itself when the default query binding is used. If you have an idea how this can be solved in a proper way, please drop me an email.
Additionally phloc-schematron gives you the possibility to write a Schematron rule set easily to disk, it offers the possibility to check whether a Schematron is minified, preprocessed and valid. It also supports validating a Schematron resource against the RelaxNG Compact scheme with the additional library called phloc-schematron-validator. This library was externalized because it is not used in any regular workflow and brings a lot of additional dependencies.
3 Technical details
phloc-schematron is an operating system independent Java 1.6 library. As the underlying XPath Engine SaxonHE 9.5.0.2 (http://saxon.sourceforge.net/) is used. Compared to Apache Xalan 2.7.1 (http://xml.apache.org/xalan-j/) it offers more XPath functions out of the box. phloc-schematron also depends on our OSS library phloc-commons (http://code.google.com/p/phloc-commons/). For usage with Maven please look at the Wiki page http://code.google.com/p/phlocschematron/wiki/FirstSteps for details. phloc-schematron is built as an OSGI bundle via the org.apache.felix:maven-bundle-plugin. The full code of the examples used in this document can be found in the file
src/test/java/com/phloc/schematron/docs/DocumentationExamples.java .
Page 3 of 9
document node, where the second method type applies a JAXB binding, so that it is easier to access the information inside the SVRL. Internally these methods call each other depending on the concrete implementation, so they are ensured to deliver exactly the same result. The XSLT implementation is natively done in applySchematronValidation and then converted to a SchematronOutputType using the com.phloc.schematron.svrl.SVRLReader class. With Pure Schematron a SchematronOutputType object is directly created and then converted to an XML document node via the class com.phloc.schematron.svrl.SVRLWriter. The classes SVRLReader and SVRLWriter can generically be used to read and write SVRL files in a structured way. Both classes validate the SVRL based on SVRL XML Schema contained in the library.
Page 4 of 9
phloc-schematron - an introduction - 2013 by Philip Helger version as of May 28, 2013 3.2.1 Validation via XSLT As described above it is highly recommended to cache the XSLT script that is created from the source Schematron rule set. Nevertheless phloc-schematron offers both possibilities to use Schematron. The easiest way to start working is by starting from a Schematron file. com.phloc.schematron.xslt.SchematronResourceSCH is the implementation of the ISchematronResource interface to be used for this. The constructor takes at the least the Schematron resource that contains the rules. When using this class it is possibly to specify an optional Schematron phase to be used for validation. Additionally some static factory methods are present that allow creating SchematronResourceSCH objects from a String path or a java.io.File object. If a precompiled XSLT script is present (e.g. via the schematron2xslt Maven plugin or via manual preprocessing) the implementation class com.phloc.schematron.xslt.SchematronResourceXSLT should be instantiated. It offers the same constructors and factory methods as the SchematronResourceSCH class. Please recall that the chosen phase already affected the created XSLT script, so it is not possible to specify a phase when using this implementation. Both implementations use an internal cache that keeps the created pre-precompiled javax.xml.transform.Templates objects in memory while the application is running. The cache for SchematronResourceSCH is located in the class com.phloc.schematron.xslt.SchematronResourceSCHCache whereas the cache for SchematronResourceXSLT is located in the class com.phloc.schematron.xslt.SchematronResourceXSLTCache big surprise A simple example to validate an XML file based on Schematron rules from a file looks like this:
01 public static boolean validateXMLViaXSLTSchematron (@Nonnull final File aSchematronFile, @Nonnull final File aXMLFile) throws Exception 02 { 03 final ISchematronResource aResSCH = SchematronResourceSCH.fromFile (aSchematronFile); 04 if (!aResSCH.isValidSchematron ()) 05 throw new IllegalArgumentException ("Invalid Schematron!"); 06 return aResSCH.getSchematronValidity (new StreamSource(aXMLFile)).isValid (); 07 }
3.2.2 Validation via Pure Schematron For Pure Schematron the implementation of the ISchematronResource interface resides in the class com.phloc.schematron.pure.SchematronResourcePure. The constructor also takes at least the resource where to read the Schematron rules from. Additional a Schematron phase and a custom error handler can be supplied. Be careful when using the validation methods that take a javax.xml.transform.Source object as parameter. Only DOMSource and StreamSource objects are supported at the moment! A simple example to validate an XML file based on Schematron rules from a file looks like this:
01 public static boolean validateXMLViaPureSchematron (@Nonnull final File aSchematronFile, @Nonnull final File aXMLFile) throws Exception
Page 5 of 9
As an alternative you can also validate via the internal API as well, in which case the code can look like this:
01 public static boolean validateXMLViaPureSchematron2 (@Nonnull final File aSchematronFile, @Nonnull final File aXMLFile) throws Exception 02 { 03 // Read the schematron from file 04 final PSSchema aSchema = new PSReader (new FileSystemResource (aSchematronFile)).readSchema (); 05 if (!aSchema.isValid ()) 06 throw new IllegalArgumentException ("Invalid Schematron!"); 07 // Resolve the query binding to use 08 final IPSQueryBinding aQueryBinding = PSQueryBindingRegistry.getQueryBindingOfNameOrThrow (aSchema.getQueryBinding ()); 09 // Pre-process schema 10 final PSPreprocessor aPreprocessor = new PSPreprocessor (aQueryBinding); 11 aPreprocessor.setKeepTitles (true); 12 final PSSchema aPreprocessedSchema = aPreprocessor.getAsPreprocessedSchema (aSchema); 13 // Bind the pre-processed schema 14 final IPSBoundSchema aBoundSchema = aQueryBinding.bind (aPreprocessedSchema, null, null); 15 // Read the XML file 16 final Document aXMLNode = XMLReader.readXMLDOM (aXMLFile); 17 if (aXMLNode == null) 18 return false; 19 // Perform the validation 20 return aBoundSchema.validatePartially (aXMLNode).isValid (); 21 }
The code is clearly separated into the following steps: Reading the Schematron file from a File (lines 04-06). This part contains the Schematron include resolution. Determine the Schematron query binding to be used (line 08). The query binding is required to correctly pre-process the Schematron afterwards. Pre-process the read Schematron file (line 10-12). This resolves all abstract rules and patterns. Create the bound Schematron (line 14). This is the pre-compilation step, depending on the selected query binding. The second parameter that is null in the example is the name of the phase to use. When no phase is passed the defaultPhase attribute of the Schematron schema is checked and used. If no defaultPhase is present, all patterns are active. Read the XML file to be validated via DOM (line 16-18). Technical note: this is the class com.phloc.commons.xml.serialize.XMLReader which offers a simplified API to read XML files and is not be confused with org.xml.sax.XMLReader. Perform the Schematron validation of the read XML file (line 20).
Page 6 of 9
phloc-schematron - an introduction - 2013 by Philip Helger version as of May 28, 2013 It is important to note, that in the second case no caching is performed, and that the Schematron file is interpreted each time the method is called, which may not be as efficient as possible. The most customization may be done to the pre-processor. The Schematron ISO standard defines a Minimal Syntax that is still compliant Schematron but among other with all includes resolved, all abstract patterns and abstract rules resolved. Because a Schematron that is minified has implications on the created SVRL document it was chosen to call the class PSPreprocessor and not PSMinifier. For example if all <report> elements are converted to <assert> elements, the SVRL would contain a <failed-assert> element instead of a <successful-report> element. By default the preprocessor creates a minimal Schematron but if offers the possibility to avoid certain minimizations.
3.3.2 New Query Binding It is also possible to implement your own query binding that is different from the default XPathbased query binding. Therefore a class implementing the interface com.phloc.schematron.pure.binding.IPSQueryBinding must be present. This implementation class must then be registered in the com.phloc.schematron.pure.binding.PSQueryBindingRegistry via the static method registerQueryBinding. It is not possible to replace an existing query binding. The predefined XPath-based query binding is registered to the names xslt and xslt2 as well as to the default (meaning unspecified) query binding. Implementing your own query binding is kind of time consuming as you need to implement at least the interfaces com.phloc.schematron.pure.binding.IPSQueryBinding and com.phloc.schematron.pure.bound.IPSBoundSchema . 3.3.3 Modify Existing Query Binding Additionally you may alter the existing Schematron processing by either using the Pure Schematron API as outlined in the example above. Or you may subclass Page 7 of 9
protected methods for easy customization when using SchematronResourcePure. In case you have a customized implementation, you need to use the special SchematronResourcePure constructor taking the Schematron IReadableResource and the PSBoundSchemaCacheKey implementation. See the documentation in the code for details on overriding PSBoundSchemaCacheKey.
By default the plugin is run in the Maven lifecycle phase generate -resources. The basic configuration of the plugin in the pom.xml looks like this (inside the <build> + <plugins> elements):
<plugin> <groupId>com.phloc.maven</groupId> <artifactId>schematron2xslt-maven-plugin</artifactId> <version>2.6.1</version> <executions> <execution> <goals> <goal>convert</goal> </goals> </execution> </executions> <configuration> <schematronDirectory>${basedir}/src/main/schematron</schematronDirectory> <xsltDirectory>${basedir}/src/main/resources/xslt</xsltDirectory> <xsltExtension>.xsl</xsltExtension>
Page 8 of 9
The possible configuration parameters are: schematronDirectory - The directory where the Schematron files reside. schematronPattern - A pattern for the Schematron files. Can contain Ant-style wildcards and double wildcards. All files that match the pattern will be converted. Files in the schematronDirectory and its subdirectories will be considered. xsltDirectory - The directory where the XSLT files will be saved. xsltExtension - The file extension of the created XSLT files. overwriteWithoutQuestion - Overwrite existing Schematron files without notice? If this is set to false than existing XSLT files are not overwritten. phaseName - Define the phase to be used for XSLT creation. By default the defaultPhase attribute of the Schematron file is used. languageCode - Define the language code for the XSLT creation. Default is English. Supported language codes are: cs, de, en, fr, nl.
An example project that uses schematron2xslt-maven-plugin can be found in the Google Code repository at https://phloc-schematron.googlecode.com/svn/trunk/schematron2xslt-demo.
4 Benchmarks
To do
Page 9 of 9