XML

System Integration
Topic 3
Module SE-C-03
XML
Supported by:
Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006
A bit of History
1989 - "Information Management: A Proposal" is written and circulated by Tim Berners-Lee of CERN (European Laboratory for Particle Physics). He proposed a hypertext system including HTML and HTTP
1990 - Berners-Lee proposal is reformulated and the name World Wide Web (WEB, WWW) is coined.
1993 - Marc Andreessen unleashed the alpha version of Mosaic
1996 - WEB become widely used

2002 - more than 30,000,000 web servers
SE-C-05 System Integration
Pre-XML: HTML
Problems with HTML
primarily presentation hard to derive meaning from the markup fixed tag set static
Web browsers were being viewed as potential application platforms
Pre-XML: SGML
SGML - Standard Generalized Markup Language
Working standards draft 1980 Allow text editing, formatting, and information retrieval systems to share documents
Did not become widely used

general consensus: too heavy-weight
HTML and XML are instances of SGML
What Is XML?
XML stands for Extensible Markup Language (often written as eXtensibleMarkup Language to justify the acronym). Goal: combine the power of SGML (extensibility) with the simplicity of HTML
1998: XML 1.0 standard published XML is a set of rules for defining semantic tags that break a document into parts and identify the different parts of the document. It is a meta-markup language that defines a syntax used to define other domain-specific, semantic, structured markup languages. Its value as a data interchange language quickly became evident
5
HTML - example
<DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 // EN> <HTML> <HEAD> <TITLE> Begining ASP 3.0 </TITLE> </HEAD> <BODY> <B> Begining ASP 3.0 </B> <H3> ISBN 1-861003-38-2</H3> <H4>Authors> </H4> <H4>Brian Frencis, Chris Ullman, Dave Sussman, John Kauffman> </H4> <P> US $49.99 <BR> <P> ASP je napredna tehnika za dinamicko kreiranje sadrzaja Web sajta. </P> </BODY> </HTML>
SE-C-05 System Integration 6
XML - example
<?xml version =1.0?> <books> <book> <title> Begining ASP 3.0 </title> <ISBN> ISBN 1-861003-38-2</ISBN> <authors> <author_name> Brian Frencis </author_name> <author_name> Chris Ullman </author_name> <author_name> Dave Sussman </author_name> <author_name> John Kauffman </author_name> </authors> <description> Server side scripting technologies</description> <price US $49.99>/ </prace> </book> </books>
XML is a Meta-Markup Language

The first thing you need to understand about XML is that it isnt just another markup language like the Hypertext Markup Language (HTML). These languages define a fixed set of tags that describe a fixed number of elements.
XML, however, is a meta-markup language. Its a language in which you make up the tags you need as you go along.
These tags must be organized according to certain general principles, but theyre quite flexible in their meaning. You dont have to force your data to fit into paragraphs, list items, strong emphasis, or other very general categories.
XML Describes Structure and Meaning, Not Formatting

XML markup describes a documents structure and meaning. It does not describe the formatting of the elements on the page. Formatting can be added to a document with a style sheet. The document itself only contains tags that say what is in the document, not what the document looks like. By contrast, HTML encompasses formatting, structural, and semantic markup. <B> is a formatting tag that makes its content bold. <AUTHORS> is a semantic tag that means its contents are the names of authors of the book. <P> is a structural tag that indicates beginning of new paragraf.. In fact, some tags can have all three kinds of meaning. An <H1> tag can simultaneously mean 20 point Helvetica bold, a level-1 heading, and the title of the page.
Advantages of XML
Instead of generic tags like <dt> and <li>, this listing uses meaningful tags like <BOOKS>, <BOOK>, <AUTHORS>, and <ISBN>. This has a number of advantages, including that its easier for a human to read the source code to determine what the author intended. XML markup also makes it easier for non-human automated robots to locate all of the books in the document. In HTML robots cant tell more than that an element is a dt. They cannot determine whether that dt represents a song title, a definition, or just some designers favorite means of indenting text. In fact, a single document may well contain dt elements with all three meanings.
Design of Domain-Specific Markup Languages

XML allows various professions (e.g., music, chemistry, math) to develop their own domain-specific markup languages. This allows individuals in the field to trade notes, data, and information without worrying about whether or not the person on the receiving end has the particular proprietary pay ware that was used to create the data. You may not be interested in electrical engineering diagrams, but electrical engineers are. You may not need to include sheet music in your Web pages, but composers do. XML lets the electrical engineers describe their circuits and the composers notate their scores, mostly without stepping on each others toes. Neither field will need special support from the browser manufacturers or complicated plug-ins, as is true today.
Self-Describing Data
At a higher level, XML is self-describing. Suppose youre an information archaeologist in the 23rd century and you encounter this chunk of XML code on an old floppy disk that has survived the ravages of time:
<PERSON ID=p1100 SEX=M> <NAME> <GIVEN>Judson</GIVEN> <SURNAME> McDaniel</SURNAME> </NAME> <BIRTH> <DATE>21 Feb 1834</DATE> </BIRTH> <DEATH> <DATE>9 Dec 1905</DATE> </DEATH> </PERSON>
12
Plain Text
Since XML is not a binary format, you can create and edit files with anything from a standard text editor to a visual development environment. That makes it easy to debug your programs, and makes it useful for storing small amounts of data. An XML front end to a database makes it possible to efficiently store large amounts of XML data as well. So XML provides scalability for anything from small configuration files to a company-wide data repository.
13
Interchange of Data Among Applications

Since XML is non-proprietary and easy to read and write, its an excellent format for the interchange of data among different applications. It has been designed to be extremely powerful, while at the same time being easy for both human beings and computer programs to read and write. Thus its an obvious choice for exchange languages. XML is an incredibly simple, well-documented, straightforward data format. Not just the data, but also the markup is text, and it's present right there in the XML file as tags. You can read the tag names directly to find out exactly what is in the document. All the important details about the structure of the document are explicit. You don't have to reverse-engineer the format or rely on incomplete and often unavailable documentation.
Data Identification
XML tells you what kind of data you have, not how to display it. Because the markup tags identify the information and break up the data into parts, an email program can process it, a search program can look for messages sent to particular people, and an address book can extract the address information from the rest of the message. Because the different parts of the information have been identified, they can be used in different ways by different applications.
15
Structured and Integrated Data

XML is ideal for large and complex documents because the data is structured. It not only lets you specify a vocabulary that defines the elements in the document; it also lets you specify the relations between elements. XML also provides a client-side include mechanism that integrates data from multiple sources and displays it as a single document. The data can even be rearranged on the fly. Parts of it can be show or hidden depending on user actions. This is extremely useful when youre working with large information repositories like relational databases.
16
Hierarchical
XML documents benefit from their hierarchical structure. Hierarchical document structures are, in general, faster to access because you can drill down to the part you need, like stepping through a table of contents. They are also easier to rearrange, because each piece is delimited. In a document, for example, you could move a heading to a new location and drag everything under it along with the heading, instead of having to page down to make a selection, cut, and then paste the selection into a new location.
How XML Works

<?xml version=1.0?> <library> <cd> <title>Just Singin Along</title> <artist>The Happy Guys</artist> <description> A lovely collection of songs that the whole family can sing right along with. </description> <song><title>Im Really Fine</title></song> <song><title>Cant Stop Grinnin</title></song> <song><title>Im Really Fine</title></song> <purchace_date>2/23/1959</purchace_date> </cd>
18
How XML Works

This document is text and might well be stored in a text file. You can edit this file with any standard text editor. You do not need a special XML editor. Programs that actually try to understand the contents of the XML documentthat is, do more than merely treat it as any other text filewill use an XML parser to read the document. The parser is responsible for dividing the document into individual elements, attributes, and other pieces. It passes the contents of the XML document to an application piece by piece. If at any point the parser detects a violation of the wellformedness rules of XML, then it reports the error to the application and stops parsing.
How XML Works

Individual XML applications normally dictate more precise rules about exactly which elements and attributes are allowed where. Some of these rules can be precisely specified with a schema written in any of several languages including the W3C XML Schema Language, RELAX NG, and DTDs. A document may contain a URI indicating where the schema can be found. Some XML parsers will notice this and compare the document to its schema as they read it to see if the document satisfies the constraints specified there. Such a parser is called a validating parser .
20
How XML Works

A violation of those constraints is called a validity error , and the whole process of checking a document against a schema is called validation. If a validating parser finds a validity error, it will report it to the application on whose behalf it's parsing the document. This application can then decide whether it wishes to continue parsing the document. However, validity errors are not necessarily fatal (unlike well-formedness errors), and an application may choose to ignore them. Not all parsers are validating parsers. Some merely check for well-formedness.
The application that receives data from the parser

A web browser such as Internet Explorer or Mozilla Firefox that displays the document to a reader. A database such as Microsoft SQL Server that stores the XML data in a new record. A program that you yourself wrote in Java, C, Python or some other language that does exactly what you want it to do. A drawing program such as Adobe Illustrator that interprets the XML as two-dimensional coordinates for the contents of a picture. A word processor such as StarOffice Writer that loads the XML document for editing.
22
The application that receives data from the parser

A spreadsheet such as Gnumeric that parses the XML to find numbers and functions used in a calculation. A personal finance program such as Microsoft Money that sees the XML as a bank statement. A syndication program that reads the XML document and extracts the headlines for today's news. Almost anything else.
23
Usage of XML
Raniji nain upita
XMLHTTP XML-Tekstdatoteka Transformacija u isti HTML ADO-2.1 datoteka Transformacija u HTML sa ostrvima podataka Transformacija u prozvoljni format ADO-2.1ODBC-Poziv
24
Usage of XML
HTML view#1 HTML view#2
XML
Server XML received from other application
XML Transport (HTTP)

WEB SERVER
MF
DB
25
Basic XML technologies

DTD (Document Type Definition) XML Shemas XSLT XML DOM (XML Document Object Model) SAX (Simple API for XML)
26
Fundamental concepts
Supported by:
XML Documents
The precise meaning of XML document is defined by the XML 1.0 specification published by the Worldwide Web Consortium (W3C). This specification provides a detailed BNF grammar defining exactly what is and is not an XML document. Anything that satisfies the document production in that BNF grammar and adheres to the fifteen well-formedness constraints is an XML document. Anything that does not is not an XML document. Well-formedness is the minimum requirement for an XML document. A document that is not well-formed is not an XML document. Parsers cannot read it.
28
Elements and atributes

Element: <tag>content</tag>
basic unit tag name defines what the content is opening and closing tags enclose content
Attribute: Information about the data

Attribute names are usually adjectives Stored as attribute="value" pairs:
<tag attribute="value"> content </tag>
29
Simple XML document

<person> Alan Turing </person>
30
More complex XML document

<person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
31
Rules for well-formed XML

Elements that contain data must have start and end tags Empty tags must be closed
<br/> or <br></br>
Elements should not overlap

Bad Nesting:
<trunk> <branch> </trunk> </branch>
All attribute values must be wrapped in quotes
<a href="newpage.html"> XML is case sensitive: <TAG> and <Tag> are treated differently. (Standard: use lower case.)
More Rules
A document begins with:
an XML Declaration <?xml version="1.0" encoding="UTF-8"?> and a DocType Declaration: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
Root element immediately follows; encloses entire content of the document.

<book> everything else </book>
An XML document is a tree

There are roughly five different kinds of nodes in an XML tree: root Also known as the document node, this is the abstract node that contains the entire XML document. Its children include comments, processing instructions, and the root element of the document. element An XML element with a name, a list of attributes, a list of in-scope namespaces, and a list of children. text The parsed character data between two tags (or any other kind of nontext node). comment An XML comment such as . The contents of the comment are its data. A comment does not have any children. processing instruction A processing instruction such as <?xml-stylesheet type="text/css" href="order.css"?> A processing instruction has a target and a value. It does not have any children.
A tree diagram
Person
Name
Profession
Computer
Profession
mathematician
Profession
cryptographer
First name Alan
Last name Turing
scientist
35
Narrative documents
XML can also be used for more free-form, narrative documents such as business reports, magazine articles, student essays, short stories, web pages, and so forth, as shown by following example:
<biography> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> was one of the first people to truly deserve the name <emphasize>computer scientist</emphasize>. Although his contributions to the field are too numerous to list, his best-known are the eponymous <emphasize>Turing Test</emphasize> and <emphasize>Turing Machine</emphasize>. <definition>The <term>Turing Test</term> is to this day the standard test for determining whether a computer is truly intelligent. This test has yet to be passed. </definition>
<definition>The <term>Turing Machine</term> is an abstract finite state automaton with infinite memory that can be proven equivalent to any any other finite state automaton with arbitrarily large memory. Thus what is true for a Turing machine is true for all equivalent machines no matter how implemented. </definition> <name> <last_name>Turing</last_name> </name> was also an accomplished <profession>mathematician</profession> and <profession>cryptographer</profession> His assistance was crucial in helping the Allies decode the German Enigma machine. He committed suicide on <date> <month>June</month> <day>7</day>, <year>1954</year> </date> after being convicted of homosexuality and forced to take female hormone injections. </biography>
37
XML Applications
XML applications limit the very flexible rules of XML to a finite set of elements of certain types. For example, DocBook is an XML application designed for technical manuscripts. Elements it defines include book, chapter, para, sect1, sect2, programlisting, and several hundred others. When writing a DocBook document, you have to use these elements; and you have to use them in certain ways. For instance, a sect2 element can be a child of a sect1 but not a child of a sect3 or a chapter. Scalable Vector Graphics (SVG) is an XML application for line art. Elements it defines include line, circle, ellipse, polygon, polyline, and so forth. All SVG documents are XML documents, but not all XML documents are SVG documents. An XML application can have a schema that defines what is and is not a legal document for that application. Schemas can be written in a variety of languages including Document Type Definitions (DTDs), the W3C XML Schema Language, RELAX NG, Schematron, and numerous others
Elements and Tags

The fundamental unit of XML is the element Logically every element has four key pieces: A name The attributes of the element The namespaces in scope on the element The content of the element
39
Elements and Tags

Syntactically, in the text form of an XML document, elements are delimited by tags. Everything in between the two tags is the content of the element. <Quantity>12</Quantity> An element can also contain one or more child elements <ShipTo> <Street>135 Airline Highway</Street > <City>Narragansett</City> <State>RI</State> <Zip>02882</Zip>
</ShipTo>
Elements and Tags

An element can also have mixed content
<ShipTo> Chez Fred <Street>135 Airline Highway</Street > Apt. 17D <City>Narragansett</City> <State>RI</State> <Zip>02882</Zip> </ShipTo>
41
Attributes
Attributes are name value pairs associated with elements <Subtotal currency='USD'> 393.85 </Subtotal> Attributes are unordered. There is no difference between these two elements:
<Tax rate="7.0" currency="USD">27.57</Tax> <Tax currency="USD" rate="7.0">27.57</Tax>
42
Attributes
<person>
<name first="Alan" last="Turing"/> <profession value="computer scientist"/> <profession value="mathematician"/> <profession value="cryptographer"/>
</person>
43
XML-data model
Document Atributes
Element
Value of the element
XML is transformed in the tree with elements as nodes and values of the elements as lifs.
XML Example
<car type =auto year=2001> <producer> Opel </producer> <model> Astra </model>
<price/>
</car>
45
XML Example - Tree
type
auto 2001
car price
year
producer
model
Opel
Astra
46
Attributes in narative documents

<biography xmlns:xlink="http://www.w3.org/1999/xlink/namespace/"> <image source="http://www.turing.org.uk/turing/pi1/bus.jpg" width="152" height="345"/>
<person born='1912-06-23' died='1954-0607'>
<first_name>Alan</first_name>
<last_name>Turing</last_name> </person> was one of the first people to truly deserve the name <emphasize>computer scientist</emphasize>. Although his contributions to the field were too numerous to list, his best-known are the eponymous <emphasize xlink:type="simple" xlink:href="http://cogsci.ucsd.edu/~asaygin/tt/ttest.html">Turing Test</emphasize> and <emphasize xlink:type="simple" xlink:href="http://mathworld.wolfram.com/TuringMachine.html"> Turing Machine</emphasize>. <last_name>Turing</last_name> was also an accomplished <profession>mathematician</profession> and 47 <profession>cryptographer</profession>. His assistance was crucial in SE-C-05 System Integration
XML Declaration
Most XML documents begin with an XML declaration
<?xml version="1.0" encoding="ISO8859-1" standalone="yes"?>
48
Comments

49
Processing Instructions
Processing instructions are used to tell particular software how it should handle an XML document after the document has been parsed. Generally, processing instructions are used for metainformation that may apply to documents from many different domains and XML vocabularies. For instance, the most common processing instruction, xml-stylesheet, tells a browser or other formatter where it can find the stylesheet it should apply to the document.
<?xml-stylesheet type="text/xml" href="limited.xsl"?>

Processing Instructions
<?php mysql_connect("database.unc.edu", "clerk", "password"); $result = mysql("HR", "SELECT LastName, FirstName FROM Employees ORDER BY LastName, FirstName"); $i = 0; while ($i < mysql_numrows ($result))
{ $fields = mysql_fetch_row($result);
echo "<person>$fields[1] $fields[0] </person>\r\n"; $i++; }
mysql_close( );
?>
51
Checking Documents for Well-Formedness

Every start-tag must have a matching end-tag. Elements may nest, but may not overlap. There must be exactly one root element. Attribute values must be quoted. An element may not have two attributes with the same name. Comments and processing instructions may not appear inside tags. No unescaped < or & signs may occur in the character data of an element or attribute
52
DTD (Document Type Definition)
Supported by:
DTD and validation

DTDs are written in a formal syntax that explains precisely which elements and entities may appear where in the document and what the elements' contents and attributes are. A validating parser compares a document to its DTD and lists any places where the document differs from the constraints specified in the DTD. The program can then decide what it wants to do about any violations. Some programs may reject the document. Others may try to fix the document or reject just the invalid element. Validation is an optional step in processing XML. A validity error is not necessarily a fatal error like a wellformedness error, though some applications may choose to treat it as one.
A Simple DTD Example

<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT person (name, profession*)> name (first_name, last_name)> first_name (#PCDATA)> last_name (#PCDATA)>
<!ELEMENT profession (#PCDATA)>
55
Valid person element

<person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> </person>
56
Not valid person element

<person> <profession>computer scientist </profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
57
Not valid person element

<person> <profession>computer scientist</profession> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
58
An alternate DTD for the person element

<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT first_name (#PCDATA)> last_name (#PCDATA)> profession (#PCDATA)> name (first_name, last_name)> person (name, profession*)>
59
The Document Type Declaration

<!DOCTYPE person SYSTEM "http://www.cafeconleche.org/dtds/person.dtd">
60
A valid person document

<?xml version="1.0" standalone="no"?> <!DOCTYPE person SYSTEM "http://www.cafeconleche.org/dtds/person.dtd"> <person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
A valid person document with an internal DTD

<?xml version="1.0"?> <!DOCTYPE person [<!ELEMENT first_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ELEMENT profession (#PCDATA)> <!ELEMENT name (first_name, last_name)> <!ELEMENT person (name, profession*)>]> <person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
Internal DTD subset

<!DOCTYPE person SYSTEM "name.dtd" [
<!ELEMENT profession (#PCDATA)>

<!ELEMENT person (name, profession*)>
]>
63
Element Declarations
Basic form of element declaration: <!ELEMENT element_name content_specification> Example of element which contain parsed character data, but not contain any child elements of any type: <!ELEMENT phone_number (#PCDATA)> Element with one child element: <!ELEMENT fax (phone_number)> Element with sequence of child elements: <!ELEMENT name (first_name, last_name)>
The Number of Children

Not all instances of a given element necessarily have exactly the same children. Possible suffixes: ? - Zero or one of the element is allowed. * - Zero or more of the element is allowed. + One or more of the element is required. Examples: <!ELEMENT name (first_name, middle_name?, last_name?)> This declaration says that a name element must contain a first_name, may or may not contain a middle_name, and may or may not contain a last_name.
Choices
Sometimes one instance of an element may contain one kind of child, and another instance may contain a different child. This can be indicated with a choice.
Examples:
<!ELEMENT methodResponse (params | fault)> <!ELEMENT digit (zero | one | two | three | four | five | six | seven | eight | nine) > <!ELEMENT circle (center, (radius | diameter))> <!ELEMENT name (last_name | (first_name, ( (middle_name+, last_name) | (last_name?) ) ) >
66
Mixed Content
Examples:
<!ELEMENT definition (#PCDATA | term)*> <!ELEMENT paragraph (#PCDATA | name | profession | footnote | emphasize | date )* >
67
Empty Element
<!ELEMENT image EMPTY> Valid examples:
<image source="bus.jpg" width="152" height="345" alt="Alan Turing standing in front of a bus" /> <image source="bus.jpg" width="152" height="345" alt="Alan Turing standing in front of a bus"></image>
Not valid example:

<image source="bus.jpg" width="152" height="345" alt="Alan Turing standing in front of a bus"> </image>
ANY
<!ELEMENT page ANY>
This declaration says that a page element can contain any content including mixed content, child elements, and even other page elements. The children that actually appear in the page elements' content in the document must still be declared in element declarations of their own. ANY does not allow you to use undeclared elements.
69
Attribute Declarations
As well as declaring its elements, a valid document must declare all the elements' attributes. This is done with ATTLIST declarations. A single ATTLIST can declare multiple attributes for a single element type. <!ATTLIST image source CDATA #REQUIRED> <!ATTLIST image source CDATA #REQUIRED width CDATA #REQUIRED height CDATA #REQUIRED alt CDATA #IMPLIED >
Examples:
70
Attribute Types
CDATA NMTOKEN NMTOKENS Enumeration ENTITY ENTITIES ID IDREF IDREFS NOTATION
Default declaration for that attribute

#IMPLIED The attribute is optional. Each instance of the element may or may not provide a value for the attribute. No default value is provided. #REQUIRED The attribute is required. Each instance of the element must provide a value for the attribute. No default value is provided. #FIXED The attribute value is constant and immutable. This attribute has the specified value regardless of whether the attribute is explicitly noted on an individual instance of the element. If it is included, though, it must have the specified value. Literal The actual default value is given as a quoted string.
XML schema
Supported by:
XML schema
An XML schema is an XML document containing a formal description of what comprises a valid XML document. A W3C XML Schema Language schema is an XML schema written in the particular syntax recommended by the W3C.
74
Schemas Versus DTDs

DTDs provide the capability to do basic validation of the following items in XML documents:
Element nesting Element occurrence constraints Permitted attributes Attribute types and default values
However, DTDs do not provide fine control over the format and data types of element and attribute values.
75
Schemas Versus DTDs

The W3C XML Schema standard includes the following features:
Simple and complex data types Type derivation and inheritance Element occurrence constraints Namespace-aware element and attribute declarations
76
Schema Basics
Example shows a very simple well-formed XML document. Example addressdoc.xml <?xml version="1.0"?> <fullName>Scott Means</fullName> Assuming that the fullName element can only contain a simple string value, the schema for this document would look like: Example address-schema.xsd <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="fullName" type="xs:string"/> </xs:schema>
Example with schema reference

It is also common to associate the sample instance document explicitly with the schema document. Since the fullName element is not in any namespace, the xsi:noNamespaceSchemaLocation attribute is used as:
<?xml version="1.0"?> <fullName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="address-schema.xsd"> Scott Means </fullName>
78
Processing of XML documents
Supported by:
Common XML Processing Models

XML's structured and labeled text can be processed by developers in several of ways. Programs can look at XML as:
text, a stream of events, a tree, a serialization of some other structure.
Tools supporting all of these options are widely available.
80
Treating XML as Text

At their foundation, XML documents are text. The content and markup are both represented as text, and text-editing tools can be extremely useful for XML document inspection, creation, and modification. Textual tools are a key part of the XML toolset, however. Many developers use text editors such as vi, Emacs, NotePad, WordPad, BBEdit, and UltraEdit to create or modify XML documents. Regular expressions in environments such as sed, grep, Perl, and Python can be used for search and replace or for tweaking documents prior to XML parsing or XSLT processing. These tools can also be very useful for searching and querying the information in XMLocuments, even without an understanding of the surrounding structure.
Treating XML as Events

As an XML parser reads a document, it moves from the beginning of the document to the end. Event-based parsers report this reading as it happens, in a stream of events representing the information in the document. The "events" are, for example, the start of an element, the content of an element, and the end of an element. For example, given this document: <name> <given>Keith</given> <family>Johnson</family> </name> An event-based parser might report events such as this: startElement:name startElement:given content: Keith endElement:given startElement:family content:Johnson endElement:family endElement:name
Treating XML as Events

Event-based parsers are very useful for a wide variety of tasks.
Filters can process and modify events before passing them to another processor, efficiently performing a wide range of transformations. Filters can be stacked, providing a relatively simple means of building XML processing pipelines, where the information from one processor flows directly into another. Applications that want to feed information directly from XML documents into their own internal structures may find events to be the most efficient means of doing that.
Treating XML as Tree Models

XML documents, because of the requirements for wellformedness, describe tree structures. Documents typically contain an element that then contains text, attributes, and other elements, and these may contain elements, text, and attributes, and so on. Declarations, comments, and processing instructions enrich the mix, but all basically hold positions in the overall tree.
The Document Object Model (DOM), is the most common treebased API. JDOM and DOM4J are Java-only alternatives.
84
Treating XML as Tree Models

Working with a tree model of a document isn't very different conceptually from working with a document as text. The entire document is always available, and moving around well-formed portions of a document or modifying them is fairly easy. The complete set of context for any given part of the document is always available. Developers can use XPath expressions to locate content and make decisions based on content anywhere in the document where APIs support XPath. (DOM Level 3 adds formal support for XPath, and various implementations provide their own support.)
XML APIs
XML processors make the structure and contents of XML documents available to applications through APIs
Event-based APIs
notify application through parsing events e.g., the SAX call-back interfaces
Object-model (or tree) based APIs

provide a full parse tree
e.g, DOM, W3C Recommendation

more convenient, but may require too much resources with the largest documents
86
Major parsers support both SAX and DOM

DOM Document Object Model
Supported by:
DOM: What is it?

An object-based, language-neutral API for XML and HTML documents
allows programs and scripts to build documents, navigate their structure, add, modify or delete elements and content. Provides a foundation for developing querying, filtering, transformation, rendering etc. applications on top of DOM implementations.
In contrast to Serial Access XML could think as Directly Obtainable in Memory

Document Object Model (DOM)

How to provide uniform access to structured documents in diverse applications (parsers, browsers, editors, databases)? Overview of W3C DOM Specification
second one in the XML-family of recommendations
Level 1, W3C Rec, Oct. 1998 Level 2, W3C Rec, Nov. 2000 Level 3, W3C Working Draft (January 2002)
What does DOM specify, and how to use it?
89
The Document Object Model (DOM) is a language- and platform-independent object framework for manipulating structured documents The DOM structures a document as a hierarchy of Node objects.
The Node interface is the base interface for every member of a DOM document tree. It exposes attributes common to every type of document object and provides a few simple methods to retrieve type-specific information.
This interface also exposes all methods used to query, insert, and remove objects from the document hierarchy.
The Node interface makes it easier to build general- purpose tree-manipulation routines that are not dependent on specificdocument element types.
DOM structure model

Based on O-O concepts: methods (to access or change objects state) interfaces (declaration of a set of methods) objects (encapsulation of data and methods) a parse tree Tree-like structure implied by the abstract relationships defined by the programming interfaces;
91
<invoice> <invoicepage form="00" type="estimatedbill"> <addressee> <addressdata> <name> Tijana Petrovic </name> <address> <streetaddress> Beogradska 14 </streetaddress> <postoffice>18000 NIS </postoffice> </address> </addressdata> </addressee> ...
Document
DOM structure model

invoice
invoicepage addressee addressdata name address form="00" type="estimatedbill"
Element
Tijana Petrovic
Text
streetaddress
Beogradska 14
postoffice
18000 NIS
92
Atributes
Structure of DOM Level 1

I: DOM Core Interfaces
Fundamental interfaces basic interfaces to structured documents Extended interfaces XML specific: CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction
II: DOM HTML Interfaces

more convenient to access HTML documents
93
DOM Level 2
Level 1: basic representation and manipulation of document structure and content (No access to the contents of a DTD) support for namespaces accessing elements by ID attribute values optional features
DOM Level 2 adds

interfaces to document views and style sheets an event model (for, say, user actions on elements) methods for traversing the document tree and manipulating regions of document (e.g., selected by the user of an editor) Loading and writing of docs not specified (-> Level 3)
94
DOM Language Bindings

Language-independence:
DOM interfaces are defined using OMG Interface Definition Language (IDL; Defined in Corba Specification)
Language bindings (implementations of DOM interfaces) defined in the Recommendation for

Java and (Jaxp implementation for Java) ECMAScript (standardised JavaScript)
95
Core Interfaces: Node

Node Document DocumentFragment Element Attr
CharacterData Comment Text CDATASection Extended interfaces
DocumentType EntityReference
Notation
Entity
ProcessingInstruction
96
getNodeType getNodeValue getOwnerDocument getParentNode hasChildNodes getChildNodes getFirstChild getLastChild getPreviousSibling getNextSibling hasAttributes getAttributes appendChild(newChild) insertBefore(newChild,refChild) replaceChild(newChild,oldChild) removeChild(oldChild)
Node Interface- methods
http://java.sun.com/webservices/jaxp/dist/1.1/docs/api/org/w3c/dom/Node.html
Object Creation in DOM

Objects implementing interfaces are created by factory methods D.create*() , where D is a Document object. E.g: createElement("A"), createAttribute("href"), createTextNode("Hello!") Creation and persistent saving of Documents left to be specified by implementations.
98
Document Interface - Methods

Node
Document getDocumentElement createAttribute(name) createElement(tagName) createTextNode(data) getDocType() getElementById(IdVal)

http://java.sun.com/webservices/jaxp/dist/1.1/docs/api/org/w3c/dom/Document.html
99
Accessing properties of a Node

Node.getNodeName()
for an Element = getTagName() for an Attr: the name of the attribute for Text = "#text" etc
Node.getNodeValue()
content of a text node, value of attribute, ; null for an Element (!!) (in XSLT/Xpath: the full textual content)
Node.getNodeType():
numeric constants (1, 2, 3, , 12) for ELEMENT_NODE, ATTRIBUTE_NODE,TEXT_NODE, , NOTATION_NODE
100
Content and element manipulation

Manipulating CharacterData D:
D.substringData(offset, count) D.appendData(string) D.insertData(offset, string) D.deleteData(offset, count) D.replaceData(offset, count, string) (= delete + insert)
Accessing attributes of an Element object E:

E.getAttribute(name) E.setAttribute(name, value) E.removeAttribute(name)
Additional Core Interfaces

NodeList for ordered lists of nodes
e.g. from Node.getChildNodes() or Element.getElementsByTagName("name")
all descendant elements of type "name" in document order (wild-card "*"matches any element type)
Accessing a specific node, or iterating over all nodes of a NodeList: E.g. Java code to process all children:
for (i=0; i<node.getChildNodes().getLength(); i++) process(node.getChildNodes().item(i));
http://java.sun.com/webservices/jaxp/dist/1.1/docs/api/org/w3c/dom/package-summary.html
DOM: Implementations
Java-based parsers e.g. IBM XML4J, Apache Xerces, Apache Crimson
MS IE5 browser: COM programming interfaces for C/C++ and MS Visual Basic, ActiveX object programming interfaces for script languages XML::DOM (Perl implementation of DOM Level 1)
Others? Non-parser-implementations? (Participation of vendors of different kinds of systems in DOM WG has been active.)
103
A Java-DOM Example
A stand-alone toy application BuildXml
either creates a new db document with two person elements, or adds them to an existing db document
Technical basis
DOM support in Sun JAXP native XML document initialisation and storage methods of the JAXP 1.1 default parser (Apache Crimson)
104
Example Code Begin by importing necessary packages

import java.io.*; import org.w3c.dom.*; import org.xml.sax.*; import javax.xml.parsers.*; // Native (parse and write) methods of the // JAXP 1.1 default parser (Apache Crimson): import org.apache.crimson.tree.XmlDocument;
105
Class for modifying the document in file fileName:

public class BuildXml { private Document document; public BuildXml(String fileName) { File docFile = new File(fileName); Element root = null; // doc root element // Obtain a SAX-based parser: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
try { // to get a new DocumentBuilder: documentBuilder builder = factory.newDocumentBuilder(); if (!docFile.exists()) { //create new doc document = builder.newDocument(); // add a comment: Comment comment = document.createComment( "A simple personnel list"); document.appendChild(comment); // Create the root element: root = document.createElement("db"); document.appendChild(root);
107
or if docFile already exists: } else { // access an existing doc try { // to parse docFile document = builder.parse(docFile); root = document.getDocumentElement(); } catch (SAXException se) { System.err.println("Error: " + se.getMessage() ); System.exit(1); } /* A similar catch for a possible IOException */
Subroutine to create person elements

public Node createPersonNode(Document document, String idNum, String fName, String lName) { Element person = document.createElement("person"); person.setAttribute("idnum", idNum); Element firstName = document. createElement("first"); person.appendChild(firstName); firstName. appendChild( document. createTextNode(fName) ); /* similarly for a lastName */ return person;
}
109
Create and add two child elements to root:

Node personNode = createPersonNode(document, "1234", "Pekka", "Kilpelinen"); root.appendChild(personNode); personNode = createPersonNode(document, "5678", "Irma", "Knnen"); root.appendChild(personNode);
110
Finally, store the result document:

try { // to write the // XML document to file fileName ((XmlDocument) document).write( new FileOutputStream(fileName)); } catch ( IOException ioe ) { ioe.printStackTrace(); }
111
The main routine

public static void main(String args[]){ if (args.length > 0) { String fileName = args[0]; BuildXml buildXml = new BuildXml(fileName); } else { System.err.println( "Give filename as argument"); }; } // main
112
SAX Simple API for XML
Supported by:
http://www.brics.dk/~amoeller/XML/programming/saxapi.html
What is SAX?
Simple API for XML Originally developed through the xml-dev mailing list after Peter got bored of working numerous noninterchangeable XML parsers Primarily a Java API but there implementations in most languages Unfortunately they differ quite a lot So you will need to get a feeling for your particular implementation The full specification is not so 'simple' But a useful application usually only requires a small subset of SAX Currently at version 2.0 Version 2.0 was needed to provide support for namespaces
How does SAX work?

An XML tree is not viewed as a data structure, but as a stream of events generated by the parser. Each event triggers a subroutine call or callback procedure An XML tree can be built in response, but it is not required to construct a data structure This is sometimes much more efficient: the document can be piped through the application the only real option for very large documents good for local processing, not for random access
115
The kinds of events are:

The start of the document is encountered The end of the document is encountered The start tag of an element is encountered The end tag of an element is encountered Character data is encountered A processing instruction is encountered
Scanning the XML file from start to end, each event invokes a corresponding callback method that the programmer writes.
116
What are Callbacks?

Callbacks are just procedures/subroutines That the user supplies to the program You are maybe familiar with writing a program that uses someone elses routines Here someone else writes the program and you write the subroutines They allow you to modify the behaviour of a program from the outside The parser calls the subroutines Every time it encounters an event Passing arguments if necessary There has to be a mechanism for registering your routines with the program
Events and Callbacks

<cml>
<metadataList>
<metadata name=age value=27/> <metadata name=colour value=blue/> </metadataList> <property title=bigness>
<scalar units=cubic feet>2304</scalar>

</property> </cml>
118
Events and Callbacks

---------------------------------> startDocument
<?xml version=1.0 ?>

<cml> ---------------------------> startElement <property title=dim> -------> startElement <array> --------------------> startElement
0.0000 12.35000 6.45550 ---> characters
</array>
-------------------> endElement
</property> -------------------> endElement </cml> ---------------------------> endElement ---------------------------------> endDocument
119
saxexample.html xmlfile
Example?
import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.apache.xerces.parsers.SAXParser; public class Flour extends DefaultHandler { float amount = 0; public void startElement(String namespaceURI, String localName, String qName, Attributes atts) { if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) { String n = atts.getValue("","name"); if (n.equals("flour")) { String a = atts.getValue("","amount"); // assume 'amount' exists amount = amount + Float.valueOf(a).floatValue(); } } } public static void main(String[] args) { Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try { p.parse(args[0]); } catch (Exception e) {e.printStackTrace();} System.out.println(f.amount); } }
Saxevents.htm
Events in example
start document processing instruction: dsd starting element: collection -character data, length 3 -starting element: description --character data, length 47 -end element: description -character data, length 3 -starting element: recipe --character data, length 5 ... -end element: recipe -character data, length 1 end element: collection end document
SAX 2 Interfaces
Defines interfaces for standard routines and callbacks ContentHandler the most important interface Attributes Interface the second most important
interface
DTDHandler EntitiyResolver ErrorHandler Locator XMLFilter XMLReader SAXException

ContentHandler Interface
This is the bit that handles the most important events
The methods that handle the events are referred to as callback routines The parser fires events according to what it finds in the XML file. Every times it encounters an event it calls the appropriate callback routine
The ContentHandler specifies 11 methods

No methods for dealing with comments or XML declarations Attributes are not consider to be events
123
ContentHandler Interface
The most important piece of SAX startDocument() endDocument() startElement(uri, localName, qName, attrs) endElement(uri, localName, qName) characters(text, start, length) ignorableWhitespace(text, start, length) startPrefixMapping(prefix, uri) endPrefixMapping(prefix) processingInstruction(target, data) setDocumentLocator(locator) skippedEnitity(name)
Attributes Interface
Specifies methods for accessing individual attributes An attributes object is passed to the startElement routine The order of the attributes is unimportant and need not be in the same order as in the XML document. However we can refer to attributes by their index for convenience Uses overloaded functions allowing us to refer to an attribute by it's qualified name Or by its URI and it local name Or by an index (for convenience)
125
Attributes Interface
getLength () getQName(index) getURI(index) getType(uri, localName) getType(qualifiedName) getType(index)
getLocalName(index)
getIndex(uri, localPart)
getValue(uri, localName)
getValue(qualifiedName)
getIndex(qualifiedName)
getValue(index)
126
ErrorHandler Interface
ErrorHandler Allows you to catch errors and deal with them appropriately Again you have to write these functions The ErrorHandler only specifies the interface warning(exception) ambiguities/non-XML errors error (exception) non fatal errors (invalid documents) fatalError(exception) fatal errors (not well-formed)
127
SAX Pros & Cons

Pros
SAX has a very small memory footprint and is ideal for large documents It is also useful when the Tree-like model of DOM is not the most appropriate one for storing your data particular if the structure is very flat
Cons
A document is intuitive an event is less so There is no default storage model Because SAX only stores a small part of the document in memory at any given time, it is up to you to keep track of where you are in the document If the document has a lot of structure, and latter events need to know about earlier events you can find yourself storing a lot of data in memory
128
XML and Namespaces
Supported by:
Namespaces?
Since element names in XML are not predefined, a name conflict will occur when two different documents use the same element names. Namespaces are a simple and straightforward way to distinguish names used in XML documents, no matter where they come from. <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>
<table> <name>Coffee Table </name> <width>80</width> <length>120</length> </table>

Solving Name Conflicts Using a Prefix

<h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table>
<f:table> <f:name>Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>
131
Using Namespaces
<h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table xmlns:f="http://www.w3schools.com/furniture"> <f:name>Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>
Default Namespaces
<table xmlns="http://www.w3.org/TR/html4/"> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>
<table xmlns="http://www.w3schools.com/furniture"> <name>Coffee Table</name> <width>80</width> <length>120</length> </table>

<h:html xmlns:xdc="http://www.xml.com/books" xmlns:h="http://www.w3.org/HTML/1998/html4"> <h:head> <h:title>Book Review</h:title></h:head> <h:body> <xdc:bookreview> <xdc:title>XML: A Primer</xdc:title> <h:table> <h:tr align="center"> <h:td>Author</h:td> <h:td>Price</h:td> <h:td>Pages</h:td> <h:td>Date</h:td> </h:tr> <h:tr align="left"> <h:td><xdc:author>Simon St.Laurent</xdc:author></h:td> <h:td><xdc:price>31.98</xdc:price></h:td> <h:td><xdc:pages>352</xdc:pages></h:td> <h:td><xdc:date>1998/01</xdc:date></h:td> </h:tr> </h:table> </xdc:bookreview> </h:body> 134 </h:html> SE-C-05 System Integration
Example
<h:html xmlns:xdc="http://www.xml.com/books" xmlns:h="http://www.w3.org/HTML/1998/html4"> <h:head><h:title>Book Review</h:title></h:head> <h:body> <xdc:bookreview> <xdc:title h:style="font-family: sans-serif;"> XML: A Primer</xdc:title> <h:table> <h:tr align="center"> <h:td>Author</h:td> <h:td>Price</h:td> <h:td>Pages</h:td> <h:td>Date</h:td> </h:tr> <h:tr align="left"> <h:td> <xdc:author>Simon St. Laurent</xdc:author> </h:td> <h:td><xdc:price>31.98</xdc:price></h:td> <h:td><xdc:pages>352</xdc:pages></h:td> <h:td><xdc:date>1998/01</xdc:date></h:td> </h:tr> </h:table> </xdc:bookreview> </h:body> </h:html>
<html xmlns="http://www.w3.org/HTML/1998/html4" xmlns:xdc="http://www.xml.com/books"> <head><title>Book Review</title></head> <:body> <xdc:bookreview> <xdc:title>XML: A Primer</xdc:title> <table> <tr align="center"> <td>Author</td> <td>Price</td> <td>Pages</td> <td>Date</td></tr> <tr align="left"> <td><xdc:author>Simon St. Laurent</xdc:author></td> <td><xdc:price>31.98</xdc:price></td> <td><xdc:pages>352</xdc:pages></td> <td><xdc:date>1998/01</xdc:date></td> </tr> </table> </xdc:bookreview> </body> </html>
XSL Formating XML data
Supported by:
XSL: eXtensible Stylesheet Language

Not a mark-up in the sense of HTML Built on the idea of Templates which are themselves XML documents. XSL templates provide the mechanism for transforming data, and applying formatting information to data
138
Applying the Style

Take an XML document representing a hierarchy of nodes Apply one of several possible, possibly independent, XSL stylesheets Browser produces something that looks like HTML
139
XSL: example
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
DOCTYPE declaration
</xsl:stylesheet>
XSL: example
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <xsl:apply-templates/> </xsl:template>
Find root of DOM tree and apply templates
</xsl:stylesheet>
XSL: example
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <xsl:apply-templates/>
</xsl:template>
<xsl:template match="clrcstructures"> <xsl:apply-templates/> </xsl:template>

Find clrcstructures node and apply templates
</xsl:stylesheet>
XSL: example
<xsl:template match="clrcstructures"> <xsl:apply-templates/> </xsl:template>
Find department node and output details
<xsl:template match="department"> <P><xsl:value-of select="deptabbrev"/> (<xsl:value-of select="deptname"/>)</P> </xsl:template>
XSL: example
... <xsl:template match="department"> <P><xsl:value-of select="deptabbrev"/> (<xsl:value-of select="deptname"/>)</P>
<UL><xsl:apply-templates/></UL>
</xsl:template>
Find department nodes, output details and apply templates
<xsl:template match="division"> <LI><xsl:value-of select="divnabbrev"/> (<xsl:value-of select="divnname"/>) </LI> </xsl:template>
Find division nodes and output details
144
XSL: example
...
<xsl:template match="group"> <P> <xsl:choose> <xsl:when test = "structureID [ . = 'ITDISEW3G']" > <B><xsl:value-of select="grpname"/></B> </xsl:when> <xsl:otherwise> <xsl:value-of select="grpname"/> Match W3G and </xsl:otherwise> display differently </xsl:choose> </P> </xsl:template>
Example
XML file: people.xml XSLT file: people.xsl Formated XML: peoplexsl.xml
146
Converting Relational Database to XML

Example: Export the following data into XML and group books by store Relational Database: Store (sid, name, phone) Book (bid, title, authors) StoreBook (sid , bid, price, stock)
price Store stock Book authors
name
StoreBook
phone
sid
title
bid
147
Converting Relational Database to XML

XML: <store> <name> </name> <phone> </phone> <book> <title> </title> <authors> </authors> <price> </price> </book> <book></book> </store>
148
Extracting data as XML

Most databases now have the ability to return the results of a query in XML format. For example, in SQL Server you can enter the query: SELECT * FROM EMPLOYEE FOR XML RAW which will give you a dump of the entire EMPLOYEE table in "raw" XML format: this means you get no control over the representation. Each row in the result is output as an element, with the column values represented as attributes. You can then put it through an XSLT transformation to turn it into something else.
149
Extracting data as XML

For Oracle the equivalent is the XML SQL utility. Using XSU, you can enter a standard SQL query such as SELECT * FROM EMPLOYEE WHERE EMPLOYEE_NR='517541', and get back the answer in the form of an XML document such as:
<ROWSET> <ROW num="1"> <EMPLOYEE_NR>517541</EMPLOYEE_NR> <NAME>Michael Kay</NAME> </ROW> </ROWSET>
Oracle also has a utility, called XSQL pages, that allows you to embed SQL statements in a skeletal XML document. A request from a browser to this document is directed to a servlet, which executes the SQL statements and enters the results into the page before delivering it back to the browser. Formatting of the page can then be controlled on the client side using either CSS or client-side XSLT.
Some other XML formats

MathML for Mathematics Chemical Markup Language (CML) for Chemistry Astronomical Markup Language (AML) for Astronomy Bioinformatic Sequence Markup Language (BSML) for the human genome project Extensible Scientific Interchange Language (XSIL) DDI for Social Science Data
151
CONCLUSIONS
Supported by:
XML in Data Management

Integration of Heterogeneous Data
common interface for exchange delivered across a common medium different data formats into the same XML format web based metadata for management, searching and control widely available economic tools client-side processing for presentation and analysis
153
XML...
Can be pre-generated or created on-the-fly at the server Provides an easily parsable, platform and vendor neutral format for transmitting data Needs no network etc support beyond the Web browser (or other transport) Provides the means to validate and transform the data at the desktop
154
Data validation
Even in a perfect world there can be problems in:
Generation Transmission Editing/processing after reception
XML provides a means to declare the data structure to the desktop
155
Metadata - internal
Basic provided by Document Type Definitions (DTDs)
Simplified from SGML version Provides basic structure and cardinality
Several proposals, including MS version shipped in IE5, extending capabilities

Data typing Extended value and structure constraints
156
Semistructured Data and Mediators

Semistructured data is often encountered in data exchange and integration At the sources the data may be structured (e.g. from relational databases) We model the data as semistructured to facilitate exchange and integration Users see an integrated semistructured view that they can query Queries are eventually reformulated into queries over the structured resources (e.g. SQL) Only results need to be materialized
157
Schema archiving - a future?

XML Schema under development
Support for hierarchic and OO views Usable for relational, but lacks proper key support
Problem is document-driven approach

Concentrates on instances of data rather than The Big Picture
158
What is a mediator ?
A complex software component that integrates and transforms data from one or several sources using a declarative specification Two main contexts: Data conversion: converts data between two different models
e.g. by translating data from a relational database into XML

Data integration: integrates data from different sources into a common view
CONCLUSION
XML is now achieving momentum The scientific data management community should be at the forefront of its use.
users will demand it advantages of widely available tools advantages in integration advantages in information management
160
Sources
http://sax.sourceforge.net/ - Official Sax Web site http://www.xml.com/pub/a/1999/01/na mespaces.html - Site with tutorial related to namespaces
161

XML

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

XML

Enviado por

Direitos autorais:

Formatos disponíveis

System Integration

1996 - WEB become widely used

SE-C-05 System Integration

Web browsers were being viewed as potential application platforms

SE-C-05 System Integration

Did not become widely used

SE-C-05 System Integration

SE-C-05 System Integration

XML is a Meta-Markup Language

SE-C-05 System Integration

XML Describes Structure and Meaning, Not Formatting

Design of Domain-Specific Markup Languages

SE-C-05 System Integration

SE-C-05 System Integration

Interchange of Data Among Applications

SE-C-05 System Integration

Structured and Integrated Data

SE-C-05 System Integration

How XML Works

SE-C-05 System Integration

How XML Works

How XML Works

SE-C-05 System Integration

How XML Works

The application that receives data from the parser

SE-C-05 System Integration

The application that receives data from the parser

SE-C-05 System Integration

Raniji nain upita

SE-C-05 System Integration

XML Transport (HTTP)

SE-C-05 System Integration

Basic XML technologies

SE-C-05 System Integration

SE-C-05 System Integration

Elements and atributes

Attribute: Information about the data

SE-C-05 System Integration

Simple XML document

SE-C-05 System Integration

More complex XML document

SE-C-05 System Integration

Rules for well-formed XML

Elements should not overlap

<trunk> <branch> </trunk> </branch>

All attribute values must be wrapped in quotes

Root element immediately follows; encloses entire content of the document.

An XML document is a tree

First name Alan

Last name Turing

SE-C-05 System Integration

SE-C-05 System Integration

Elements and Tags

SE-C-05 System Integration

Elements and Tags

Elements and Tags

SE-C-05 System Integration

SE-C-05 System Integration

SE-C-05 System Integration

Value of the element

SE-C-05 System Integration

XML Example - Tree

Attributes in narative documents