Você está na página 1de 15

Page 1 of 5

L has two main advantages: first, it offers a standard way of structuring data, and, second, we can specify
the vocabulary the data uses. We can define the vocabulary (what elements and attributes an XML
document can use) using either a document type definition (DTD) or the XML Schema language.

DTDs were inherited from XML's origins as SGML (Standard Generalized Markup Language) and, as such,
are limited in their expressiveness. DTDs are for expressing a text document's structure, so all entities are
assumed to be text. The XML Schema language more closely resembles the way a database describes
data.

Schemas provide the ability to define an element's type (string, integer, etc.) and much finer constraints (a
positive integer, a string starting with an uppercase letter, etc.). DTDs enforce a strict ordering of elements;
schemas have a more flexible range of options (elements can be optional as a group, in any order, in strict
sequence, etc.). Finally schemas are written in XML, whereas DTDs have their own syntax.

As you'll see in this article, schemas themselves are quite straightforward—I find them easier than DTDs as
there is no extra syntax to remember. The difficulties arise in using XML Namespaces and in getting the
Java parsers to validate XML against a schema.

In this article, I first cover the basics of XML Schema, then validate XML against some schema using several
popular APIs, and finally cover some of the more powerful elements of the XML Schema language. But first,
a short detour.

A detour via the W3C


XML, the XML Schema language, XML Namespaces, and a whole range of other standards (such as
Cascading Style Sheets (CSS), HTML and XHTML, SOAP, and pretty much any standard that starts with an
X) are defined by the World Wide Web Consortium, otherwise known as the W3C. A document only is XML if
it conforms to the XML Recommendation issued by the W3C.

Various experts and interested parties gather under the umbrella of the W3C and, after much deliberation,
issue a recommendation. Companies, individuals, or foundations such as Apache, will then write
implementations of those recommendations.

This article's documents are a combination of these three recommendations:

• XML 1.0
• XML Namespaces
• XML Schema

XML 1.0 or 1.1


XML exists in two versions: 1.0 defined in 1998 and 1.1 defined in 2004. XML 1.1 adds very little to 1.0:
support for defining elements and attributes in languages such as Mongolian or Burmese, support for IBM
mainframe end-of-line characters, and almost nothing else. For the vast majority of applications, these
changes are not needed. Plus, a document declared as XML 1.1 will be rejected by a 1.0 parser. So stick
with 1.0.

Well-formed and valid XML


For an application to accept an XML document, it must be both well formed and valid. These terms are
defined in the XML 1.0 Recommendation, with XML Schema extending the meaning of valid.

To be well formed, an XML document must follow these rules:

• The document must have exactly one root element.


• Every element is either self closing (like <this />) or has a closing tag.
• Elements are nested properly (i.e., <this><and></this></and> is not allowed).
• The document has no angle brackets that are not part of tags. Characters <, >, and & outside of
tags are replaced by &lt;, &gt;, and &amp;.
• Attribute values are quoted.

For the full formal details, see Resources.

When producing XML, remember to escape text fields that might contain special characters such as &. This
is a common oversight.

A document that is not well formed is not really XML and doesn't conform to the W3C's stipulations for an
XML document. A parser will fail when given that document, even if validation is turned off.

To be valid, a document must be well formed, it must have an associated DTD or schema, and it must
comply with that DTD or schema. Ensuring a document is well formed is easy. In this article, we focus on
ensuring our documents are valid.

Let's get right down to it. First, we're going to need an XML file to validate.

Page 2 of 5

The XML document


Let's assume we have a client (say a terminal in a shop) that posts an XML order back to a server. The XML
might look like this:

<?xml version="1.0" encoding="UTF-8"?>


<order>
<user>
<fullname>Bob Jones</fullname>
<deliveryAddress>
123 This road,
That town,
Bobsville
</deliveryAddress>
</user>
<products>
<product id="12345" quantity="1" />
<product id="3232" quantity="3" />
</products>
</order>

Save this document somewhere. We will use it later in this article to try validation and interesting schema
rules later.

The first line <?xml version="1.0"?> is the prologue. It is optional in XML 1.0 and compulsory in XML 1.1. If it
is absent, parsers assume we're using XML 1.0—but we like to be thorough.

The schema
For the server to validate our XML, we need a schema:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
elementFormDefault="qualified"
xmlns="urn:nonstandard:test"
targetNamespace="urn:nonstandard:test">
<xsd:element name="order" type="Order" />
<xsd:complexType name="Order">
<xsd:all>
<xsd:element name="user" type="User" minOccurs="1" maxOccurs="1" />
<xsd:element name="products" type="Products" minOccurs="1" maxOccurs="1" />
</xsd:all>
</xsd:complexType>

<xsd:complexType name="User">
<xsd:all>

<xsd:element name="deliveryAddress" type="xsd:string" />

<xsd:element name="fullname">
<xsd:simpleType>
<xsd:restriction base="xsd:string">
<xsd:maxLength value="30" />
</xsd:restriction>
</xsd:simpleType>
</xsd:element>

</xsd:all>
</xsd:complexType>

<xsd:complexType name="Products">

<xsd:sequence>
<xsd:element name="product" type="Product" minOccurs="1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>

<xsd:complexType name="Product">
<xsd:attribute name="id" type="xsd:long" use="required" />
<xsd:attribute name="quantity" type="xsd:positiveInteger" use="required" />
</xsd:complexType>

</xsd:schema>

Save this schema as test.xsd in the same directory as the XML document. And, for the moment, ignore the
root node's attributes and the fact that everything is prefixed with xsd.

The first entry after the root schema element is:

<xsd:element name="order" type="Order" />

This says our document will have an element called order of type Order. This element is a global declaration
(with scope like a global variable). In fact, it is our only global element, so it will be the root element of any
document that conforms to this schema.

An element's type will be either built-in (such as string, long, or positiveInteger) or custom. Custom types
can be either a simpleType or a complexType. simpleType elements are variations on the built-in types:
either a restriction, a list, or a union. If the element has children, it will always be a complexType. For a full
list of built-in types, see Resources.

Our Order is a complex type made up of two elements: user and products. These two elements are local.
We cannot refer to them anywhere outside the Order type. This distinction between global and local types
will prove important when we look at XML Namespaces.

The User type is again made up of two elements. The first, deliveryAddress, is of built-in type string. The
second, fullname, lacks a type in its element declaration. Instead, the type is given in-line. This is an
anonymous type in that we cannot refer to it anywhere else by name as it doesn't have a name. Anonymous
types prevent reuse, and I find them harder to read than named types. Unless a type is simple and unlikely
to be reused, avoiding anonymous types is best.

The type of fullname is the built-in string type, like deliveryAddress, but with the restriction that it has a
maximum length of 30 characters.

The Products type is simply a sequence of product entries. The sequence element allows its children to
appear multiple times (all does not).

Finally, the Product type has two attributes and no body. For an example of a type with both attributes and a
body, see the "Database Style Constraints: Primary Keys and Foreign Keys" section that appears later in
this article

Page 3 of 5

Add a schema
We must link the document to the schema. To do this, we only need to change the root element. Thus, the
start of the document becomes:

<?xml version="1.0" encoding="UTF-8"?>


<order xmlns="urn:nonstandard:test"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="urn:nonstandard:test
file:./test.xsd">
<user>
(...)

Edit the XML document you saved earlier and change the root element to match the entry above.

To understand what we have just added, we need to know about XML Namespaces, but first, let's review
URIs.

A quick detour via URIs


A Uniform Resource Identifier (URI) is a compact string of characters for identifying an abstract or physical
resource. It can be almost anything. An absolute URI has the format <scheme>:<scheme-specific-part>,
where <scheme> starts with a lowercase character (a-z) and is followed by any alphanumeric character. The
scheme-specific part can be almost anything. A relative URI doesn't even need the "scheme" part. So
this:something is a valid URI, and anythingAtAll is a valid relative URI. To make this workable, a URI is
usually a name or a locator.

A Uniform Resource Name (URN) identifies a resource forever—a good example being a book's ISBN
number or a product's barcode.

A Uniform Resource Locator (URL) identifies a resource by its location. URIs, in the context of XML
Namespaces, are nearly always URLs. The URI identifying a namespace is not required to point to a
document, so, if the URI is pasted into a browser, it may not find anything. However, as the URI identifying
your namespace looks exactly like a URL, users will expect there to be something at that address, so it is
good practice to put something there. Sun and the W3C, for example, have pages at their namespace
URLs.

This article's example document does not have a URL as its namespace identifier; instead, it has a made-up
URN. Though unusual, it helps to show that the namespace identifier is just that: an identifier. In a real
application, our root element would probably read:

<order xmlns="http://www.mycompany.com/xml/myproject"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.mycompany.com/xml/myproject file:./test.xsd">
Namespaces
An XML namespace is a collection of names, identified by a URI reference, which are used in XML
documents as element types and attribute names. A namespace in XML is a bit like a package in Java. It
groups a set of elements together. The type user in the urn:nonstandard:test namespace differs from a type
user in any other namespace.

Only one namespace can be the default—the others must be given a prefix. The xmlns attribute (which
comes from the XML Namespaces Recommendation) defines the default namespace—i.e., the namespace
for unprefixed elements. The form xmlns:xsd defines the namespace for entries prefixed with xsd (xsd is
commonly used for the schema prefix, but any prefix would do).

When defining a schema, we refer to our own types (Order, User, Product, etc.) and use types from the
schema namespace (element, complexType, string, etc.). For this reason, we usually prefix the schema
namespace. We could also prefix our types instead and use the schema namespace unprefixed. The first
part of our schema would then look like this:

<?xml version="1.0"?>
<schema xmlns="http://www.w3.org/2001/XMLSchema" targetNamespace="urn:nonstandard:test"
elementFormDefault="qualified" xmlns:ts="urn:nonstandard:test">

<element name="order" type="ts:Order" />


<complexType name="Order">
<sequence>
<element name="user" type="ts:User" minOccurs="1" maxOccurs="1" />
<element name="products" type="ts:Products" minOccurs="1" maxOccurs="1" />
</sequence>
</complexType>
(...)

Prefixed names are called qualified names. They contain a single colon separating the name into a
namespace prefix and a local part. The prefix, which is mapped to a URI reference, selects a namespace.

In writing schema, we define new elements and attributes. The targetNamespace attribute specifies the
namespace these new elements will be a part of. An XML document that conforms to this schema will import
that namespace (via an xmlns or xmlns:prefix attribute).

The xmlns:xsi attribute simply imports a namespace and maps it the xsi prefix. The namespace here is a
special one: the XML Schema instance namespace. Every XML document that conforms to XML Schema
imports that namespace. The XMLSchema-instance schema declares only four attributes: type, nil,
schemaLocation, noNamespaceSchemaLocation.

The schemaLocation attribute indicates where to find the schema to validate each namespace. The format is
the namespace, a space, and the URL. A comma can separate several namespace/URL entries. Since we
are only interested in validating our namespace, we just declare the location of the schema for
urn:nonstandard:test—in this case, a file called test.xsd in the current directory (the schema we saved
earlier). In a real application, the location would usually be a publicly accessible URL. schemaLocation just
provides a hint to the parser; if the parser is given a different schema by the code invoking it, it will use that
schema, not schemaLocation's.

If the XML document we want to validate comes in at the interface between our application and the external
world, we will probably want to use our own copy of the schema for validation. For internal documents,
trusting the document's header is probably okay.

The targetNamespace and the schemaLocation are attributes of a schema's root element. An XML Schema
document's root element (xsd:schema) must always include at least:

The namespace of the schema elements (usually via the line


xmlns:xsd="http://www.w3.org/2001/XMLSchema").
The namespace of the elements we are defining via the targetNamespace attribute.

Unless all our types are anonymous, we must include the namespace of the entries we are defining for use
within the document. This namespace is usually unprefixed: xmlns="sameAsTargetNamespace.

The elementFormDefault attribute indicates whether locally declared elements should be qualified (prefixed)
or not. The following section describes that attribute.

Page 4 of 5

Global versus local declarations


In our schema, the order entry is the only globally declared element. Every other element is local to a type.
For example, user is local to the Order type. We could declare more global types and link to them. For
example, the document's beginning could read:

(...)
<xsd:element name="order" type="Order" />
<xsd:element name="user" type="User" />
<xsd:element name="products" type="Products" />
<xsd:complexType name="Order">
<xsd:all>
<xsd:element ref="user" minOccurs="1" maxOccurs="1" />
<xsd:element ref="products" minOccurs="1" maxOccurs="1" />
</xsd:all>
</xsd:complexType>
(...)

In a DTD, all elements are global, which can make DTDs difficult to read. In a schema, declaring only the
root element as global makes it easier to read.

A schema's schema root element can take the elementFormDefault attribute, which indicates whether locally
declared elements should be qualified or unqualified. If elementFormDefault is unqualified (the default), our
XML document will need to specify which namespace the global elements are in (remember our only global
element is the root node order), but not where the local elements are located. If elementFormDefault is
unqualified, declaring a namespace for local elements will result in an error.

This document shows unqualified locally declared elements, which is how your documents will usually look:

<ts:order xmlns:ts="urn:nonstandard:test">

<user>
<!-- etc -->
</user>
</ts:order>

This tells the parser that order is in the urn:nonstandard:test namespace and says nothing about user.
Internally, order turns into urn:nonstandard:test:order, but user remains as is. It is not qualified by a
namespace, but instead is assumed to be in the namespace of its first global parent—in this case, order.

If, in the schema, we set elementFormDefault="qualified", we would have to do this:

<ts:order xmlns:ts="urn:nonstandard:test">
<ts:user>
<!-- etc -->
</ts:user>
</ts:order>
In qualified mode, the parser does not assume anything about local elements—we must specify their
namespaces too. Internally, order becomes urn:nonstandard:test:order, as before, and user now becomes
urn:nonstandard:test:user. The internal expansion of namespace prefixes is important, because it explains
why the example below will only work if the schema is set as elementFormDefault='qualified':

<order xmlns="urn:nonstandard:test">
<user>
<!-- etc -->
</user>
</order>

Here, we declare the default namespace as urn:nonstandard:test, so all elements are assumed to be in that
namespace. It is an easier-to-read version of the example above, where we qualified everything.

If elementFormDefault had been left as unqualified (remember, that's the default) we would get the error:

error: cvc-complex-type.2.4.a: Invalid content was found starting with element 'user'. One of '{"":user,
"":products}' is expected.

This error indicates that the parser was looking for an unqualified local element ("":user), but instead found
an element qualified by the default namespace (urn:nonstandard:test:user).

The schema element can also be given the attributeFormDefault attribute, which behaves exactly like
elementFormDefault, but for attributes.

Now that we have an XML file and a schema we understand, let's validate the first against the second.

Parse and validate XML with JAXP


We will use the Java API for XML Processing to find a parser, which we will use to validate the XML, then
W3C DOM (Document Object Model) to look at our document. JAXP is an API for finding a parser and has
shipped as part of Java since version 1.4 (prior versions of Java must use a separate download). JAXP
allows our code to be parser-independent. J2SE 5's default parser is Xerces 2.6.2. J2SE 1.4's default parser
is Crimson (Crimson cannot validate XML Schema). Other parsers include Aelfred and Oracle's parser, XDK
(short for XML Developer Kit).

If a different parser is on the classpath, JAXP will automatically use that parser. For example, if you include
Oracle's parser on the classpath, the DocumentBuilderFactory you get from
DocumentBuilderFactory.newInstance() will be an Oracle implementation, instead of one based on Xerces,
which is packaged into J2SE 5.

If you have J2SE 5, the code below should work as is. If you have 1.4, then download either Xerces or
Oracle XDK, and make sure it is on your classpath. Both those parsers are XML Namespaces and XML
Schema-aware. If you have an earlier version of Java, you'll also need to download JAXP.

This class takes an XML file as a command line argument, validates it using a parser obtained via JAXP, and
prints the name of the XML document's root node:

import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.xml.sax.SAXException;

public class XmlTester {

public static final void main(String[] args) {


if ( args.length != 1 ) {
System.out.println("Usage: java XmlTester myFile.xml");
System.exit(-1);
}
String xmlFile = args[0];

try {
XmlTester xmlTester = new XmlTester(xmlFile);
}
catch (Exception e) {
System.out.println( e.getClass().getName() +": "+ e.getMessage() );
}
}

public XmlTester(String xmlFile) throws ParserConfigurationException, SAXException, IOException {

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();


System.out.println("DocumentBuilderFactory: "+ factory.getClass().getName());

factory.setNamespaceAware(true);
factory.setValidating(true);
factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");

// Specify our own schema - this overrides the schemaLocation in the xml file
//factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaSource", "file:./test.xsd");

DocumentBuilder builder = factory.newDocumentBuilder();


builder.setErrorHandler( new SimpleErrorHandler() );

Document document = builder.parse(xmlFile);


Node rootNode = document.getFirstChild();
System.out.println("Root node: "+ rootNode.getNodeName() );
}

For the above code to work, we also need an error handler:

import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXParseException;

public class SimpleErrorHandler implements ErrorHandler {

public void error(SAXParseException exception) {


System.out.println("error: "+ exception.getMessage());
}

public void fatalError(SAXParseException exception) {


System.out.println("fatalError: "+ exception.getMessage());
}

public void warning(SAXParseException exception) {


System.out.println("warning: "+ exception.getMessage());
}
}
Parsers can have various features set on them. Two standard features are turning on support for
namespaces and turning on validation (which, by default, will look for a DTD). As these are standard, JAXP
has methods to set them, and you don't need to remember the feature strings.

Here are the DocumentBuilderFactory methods to set those two standard features:

factory.setNamespaceAware(true);
factory.setValidating(true);

Unfortunately, the feature for turning on schema validation has not been standardized. With JAXP you do:

factory.setAttribute("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");

For a full list of feature strings, see Resources.

Save the XmlTester class given at the start of this section to XmlTester.java and the error handler that
follows it to SimpleErrorHandler.java, in the same directory as the document (XML file) and schema.
Compile them, then run using java XmlTester test.xml. The name of the class implementing
DocumentBuilderFactory and the document's root node should print. You should now be able to change the
XML document and/or the schema and check validation fails if they don't match.

Unless you are parsing big documents, most likely, you won't use the Simple API for XML Parsing (SAX)
directly as it is a cumbersome API. For most parsing, the W3C DOM included in J2SE is a good choice.
Learning and using the W3C DOM has a big benefit: it is standard, meaning that, in Python, Javascript, or
C#, you will use the same objects with the same methods. However, the W3C DOM can be verbose, so,
sometimes, you will want a more powerful API, or one that fits more naturally with Java.

Commons Digester
The best example of an API more powerful than DOM is Jakarta Commons Digester. This API turns XML
into Java objects on the fly. Commons Digester removes the need for manual XML parsing in cases where
you need to read the whole file and allows you to work with regular JavaBeans instead.

Since Commons Digester turns XML into JavaBeans, we need a bean. Here is a simple Order bean:

public class Order {


String getName() { return "order"; }
}

Then, to parse and validate a document, we do:

import java.io.IOException;
import org.apache.commons.digester.Digester;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXNotRecognizedException;
import org.xml.sax.SAXNotSupportedException;
import org.xml.sax.SAXParseException;

public class DigesterTester {

public static final void main(String[] args) {

if ( args.length != 1 ) {
System.out.println("Usage: java DigesterTester myFile.xml");
System.exit(-1);
}
String xmlFile = args[0];
try {
DigesterTester xmlTester = new DigesterTester(xmlFile);
}

catch (Exception e) {
System.out.println( e.getClass().getName() +": "+ e.getMessage() );
}

public DigesterTester(String xmlFile) throws SAXNotRecognizedException, IOException, SAXException,


SAXNotSupportedException {

Digester digester = new Digester();


digester.setValidating(true);
digester.setNamespaceAware(true);
digester.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
digester.setErrorHandler( new SimpleErrorHandler() );

digester.addObjectCreate("order", "Order");

Order order = (Order) digester.parse( xmlFile );


System.out.println("Order: "+ order.getName() );
}
}

To compile this class, you need the Commons Digester jar on your classpath. To run it, you will need
Digester, Commons Collections, and Commons Logging on your classpath. These are all available from the
Jakarta Commons project.

Running the example on a valid XML file should print Order: order. An invalid file will produce an exception
stack trace.

The three important lines from our earlier XmlTester example resemble the ones in this DigesterTester class.
They are:

digester.setValidating(true);
digester.setNamespaceAware(true);
digester.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");

In fact, this approach can be used with most APIs.

JDOM
JDOM is an API similar to DOM, but fits more naturally with Java. For JDOM, the relevant sections are:

import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import javax.xml.parsers.ParserConfigurationException;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.jdom.Document;
import org.jdom.Element;
import org.jdom.input.SAXBuilder;
import org.jdom.JDOMException;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;

public class JDOMTester {

public static final void main(String[] args) {

if ( args.length != 1 ) {
System.out.println("Usage: java JDOMTester file:./myFile.xml");
System.exit(-1);
}
String xmlFile = args[0];

try {
JDOMTester tester = new JDOMTester(xmlFile);
}
catch (Exception e) {
System.out.println( e.getClass().getName() +": "+ e.getMessage());
}
}

public JDOMTester(String xmlURL) throws JDOMException, IOException, MalformedURLException,


SAXException, ParserConfigurationException {

SAXBuilder builder = new SAXBuilder(true);


builder.setProperty("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
builder.setErrorHandler( new SimpleErrorHandler() );

Document document = builder.build( new URL(xmlURL) );


Element rootNode = document.getRootElement();
System.out.println("Root node: "+ rootNode.getName() );
}

To compile and run this example, you need JDOM on the classpath. Also, note that JDOM takes its input as
a URL, so, for a local file, you need a URL like file:./test.xml.

One API that doesn't seem to provide a way for setting the schemaLanguage feature on the underlying
parser is dom4j. If you use dom4j, you will need to create your own SAXParser with validation turned on and
pass that to dom4j.

If you use Xerces with dom4j and don't mind losing the ability to swap parsers, you can use the Xerces-
specific feature:

org.dom4j.io.SAXReader reader = new org.dom4j.io.SAXReader(true);


reader.setFeature("http://apache.org/xml/features/validation/schema", true);

More schema
Now we're going to explore a range of things XML Schema can do for us:

Schema validation

Grouping

Schema separation
Adding uniqueness

Implementing primary keys and foreign keys

Having the XML file, schema, and a program to validate one with the other will prove useful in this section. If
you haven't already, save them locally and try the validation.

Validating the schema


A schema is itself a well-formed and valid XML document. In fact, we can validate our schema against its
own schema. Change the root element to:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/XMLSchema
http://www.w3.org/2001/XMLSchema.xsd"
xmlns="urn:nonstandard:test"
elementFormDefault="qualified"
targetNamespace="urn:nonstandard:test">

Note the two new elements—one to link to the XML Schema instance namespace and a second to point to
the schema's schema. You should now be able to use the XMLTester class from earlier to validate this
schema by running: java XmlTester test.xsd. Validating this schema should take a little longer than with the
XML example earlier as the parser has to fetch the schema from the W3C site.

Grouping
A complexType has three ways of grouping its child elements: all, sequence, or choice.

All indicates that all the elements listed can appear zero or one time in any order. In this example, order and
product must both appear once in any order, otherwise, neither can appear:

<xsd:complexType name="Order">
<xsd:all minOccurs="0">
<xsd:element name="user" type="User" minOccurs="1" maxOccurs="1" />
<xsd:element name="products" type="Products" minOccurs="1" maxOccurs="1" />
</xsd:all>
</xsd:complexType>

Here, user is compulsory, but products is optional:

<xsd:complexType name="Order">
<xsd:all minOccurs="1">
<xsd:element name="user" type="User" minOccurs="1" maxOccurs="1" />
<xsd:element name="products" type="Products" minOccurs="0" maxOccurs="1" />
</xsd:all>
</xsd:complexType>

In a sequence, the ordering of elements is strictly enforced, and each element can appear several times.
This example says user must always come before products, and several products entries can appear:

<xsd:complexType name="Order">
<xsd:sequence>
<xsd:element name="user" type="User" minOccurs="1" maxOccurs="1" />
<xsd:element name="products" type="Products" minOccurs="1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>
The choice element only allows one of its children to be used, but the child element used can appear
multiple times (if its maxOccurs allows it). This example allows either one user element or one or more
products elements:

<xsd:complexType name="Order">
<xsd:choice>
<xsd:element name="user" type="User" minOccurs="1" maxOccurs="1" />
<xsd:element name="products" type="Products" minOccurs="1" maxOccurs="unbounded" />
</xsd:choice>
</xsd:complexType>

Separating the schema into several files


If we had another application that used a different schema that also referred to our products, we might want
to split the Products and Product type into their own files and include them in both our schemas.

To split the types, the common schema would be:

<?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="urn:nonstandard:test"
elementFormDefault="qualified"
targetNamespace="urn:nonstandard:test">

<xsd:complexType name="Products">
<xsd:sequence>
<xsd:element name="product" type="Product" minOccurs="1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>

<xsd:complexType name="Product">
<xsd:attribute name="id" type="xsd:long" use="required" />
<xsd:attribute name="quantity" type="xsd:positiveInteger" use="required" />
</xsd:complexType>

</xsd:schema>

Save this code as products.xsd. Then remove the Products and Product type from test.xsd and insert this
line right after the root node and before the declaration of the order element:

<xsd:include schemaLocation="file:./products.xsd" />

This tells our schema to include the types defined in products.xsd (which are the types we have just
removed from test.xsd).

In products.xsd, we use the same target namespace as the file it is included in, which is a requirement for
using the include tag. To use types from a different namespace, we would use the import tag. See
Resources for more information.

Database style constraints: Uniqueness


Say we wanted the product ID to be unique—i.e., if the user ordered more than one product, the product's
quantity attribute would reflect that so we never have the same product listed twice. To add uniqueness to
our schema, first, reverse the changes from the previous section's example (i.e., don't split the types out).
For reasons of XPath namespace resolution (explained later), we next need to add this attribute to our XML
file's root element: xmlns:test="urn:nonstandard:test". Then, edit the schema and change the order element
(just the element, not the complexType declaration) to:

<xsd:element name="order" type="Order">

<xsd:unique name="productIdUnique">
<xsd:selector xpath="test:products/test:product" />
<xsd:field xpath="@id" />
</xsd:unique>

</xsd:element>

We have added a unique element, which uses XPath to find the elements that must be unique. As we are in
the order element, we can only declare a unique constraint on children of that node. The
test:products/test:product expression selects all the product nodes that are direct children of the products
node that are direct children of the current node (order). The @id selects the id attribute of the nodes we
have just selected (i.e., the product nodes).

See Resources for more details on the XPath syntax and for a good XPath tutorial.

In XPath, there is no way to select the default namespace. If our document had no namespace (and we
used a xsi:noNamespaceSchemaLocation), then we could select the product node simply by using the
XPath expression products/product.

We do have a namespace that we import as the default. XPath has no syntax for selecting that namespace,
so we must import the same namespace again with a prefix.

Neither Xerces nor Oracle XDK warns you if the XPath selector matches nothing, so, if you don't receive the
expected results, check the expression carefully and remember to take namespaces into account.

Database style constraints: Primary keys and foreign keys


Let's say we want to start charging for our products, but, since the price warrants negotiation, we want the
client to submit the price in the XML. To add this functionality, add these prices to the end of the XML file,
just before the closing tag:

<prices>
<price productId="12345">$34.99</price>
<price productId="3232">$4.99</price>
</prices>

Next, add a prices element into the Order complexType declaration, after products:

<xsd:element name="prices" type="Prices" minOccurs="1" maxOccurs="1" />

Then, add the Prices and Price types at the bottom of the file, just before the closing </xsd:schema> tag:

<xsd:complexType name="Prices">
<xsd:sequence>
<xsd:element name="price" type="Price" minOccurs="1" maxOccurs="unbounded" />
</xsd:sequence>
</xsd:complexType>

<xsd:complexType name="Price">
<xsd:simpleContent>
<xsd:extension base="xsd:string">
<xsd:attribute name="productId" type="xsd:long" use="required" />
</xsd:extension>
</xsd:simpleContent>
</xsd:complexType>

Note how the Price type is the string type extended by adding an attribute to it.
Now we need to define product's id attribute as a primary key. Simply change the unique element we added
in order earlier to read:

<xsd:key name="productIdKey">
<xsd:selector xpath="test:products/test:product" />
<xsd:field xpath="@id" />
</xsd:key>

The XPath expression is the same as before. The field(s) that form a key are always unique, just like a
unique element, but additionally they must be present.

Finally, we make the price's productId attribute a foreign key to the product's id attribute. Add a keyref so that
the order element becomes:

<xsd:element name="order" type="Order">

<xsd:key name="productIdKey">
<xsd:selector xpath="test:products/test:product" />
<xsd:field xpath="@id" />
</xsd:key>

<xsd:keyref name="productIdRef" refer="productIdKey">


<xsd:selector xpath="test:prices/test:price" />
<xsd:field xpath="@productId" />
</xsd:keyref>

</xsd:element>

The syntax should be familiar by now. These new constraints will produce a validation error if you put a price
in with a productId that does not match the product ID.

As we have seen, XML Schema provides a powerful way of putting constraints on data. However, there are
other ways of ensuring data integrity, and schema constraints are not always the best choice.

Limits of schema enforcement


Data constraints prove appropriate for enforcing a contract on data coming to our application from
permanent store or over the wire—for example, from a supplier or automated service. These constraints are
not appropriate when validating data received from a user. The error messages from schema validation are
not intended for end users. Even if we choose to display them on a GUI, we would have no straightforward
way of relating an error back to the field that caused the error. Schema validation is no substitute for
application validation.

Conclusion
XML Schema is an easy-to-learn and powerful way of describing XML data. The next time you need an XML
document, sketch how it should appear, then write a schema, and get your code to validate it. If you
encounter difficulties working with XML Schema, they are more likely to do with XML Namespaces than with
the schema syntax itself.

To learn more about XML Schema, a good place to start for a complete yet digestible schema reference is
the XML Schema Recommendation.