Você está na página 1de 161

System Integration

Topic 3

Module SE-C-03

XML

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

A bit of History
1989 - "Information Management: A Proposal" is written and circulated by Tim Berners-Lee of CERN (European Laboratory for Particle Physics). He proposed a hypertext system including HTML and HTTP

1990 - Berners-Lee proposal is reformulated and the name World Wide Web (WEB, WWW) is coined.
1993 - Marc Andreessen unleashed the alpha version of Mosaic

1996 - WEB become widely used


2002 - more than 30,000,000 web servers

SE-C-05 System Integration

Pre-XML: HTML
Problems with HTML
primarily presentation hard to derive meaning from the markup fixed tag set static

Web browsers were being viewed as potential application platforms

SE-C-05 System Integration

Pre-XML: SGML
SGML - Standard Generalized Markup Language
Working standards draft 1980 Allow text editing, formatting, and information retrieval systems to share documents

Did not become widely used


general consensus: too heavy-weight
HTML and XML are instances of SGML

SE-C-05 System Integration

What Is XML?
XML stands for Extensible Markup Language (often written as eXtensibleMarkup Language to justify the acronym). Goal: combine the power of SGML (extensibility) with the simplicity of HTML

1998: XML 1.0 standard published XML is a set of rules for defining semantic tags that break a document into parts and identify the different parts of the document. It is a meta-markup language that defines a syntax used to define other domain-specific, semantic, structured markup languages. Its value as a data interchange language quickly became evident
5

SE-C-05 System Integration

HTML - example
<DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.0 // EN> <HTML> <HEAD> <TITLE> Begining ASP 3.0 </TITLE> </HEAD> <BODY> <B> Begining ASP 3.0 </B> <H3> ISBN 1-861003-38-2</H3> <H4>Authors> </H4> <H4>Brian Frencis, Chris Ullman, Dave Sussman, John Kauffman> </H4> <P> US $49.99 <BR> <P> ASP je napredna tehnika za dinamicko kreiranje sadrzaja Web sajta. </P> </BODY> </HTML>
SE-C-05 System Integration 6

XML - example
<?xml version =1.0?> <books> <book> <title> Begining ASP 3.0 </title> <ISBN> ISBN 1-861003-38-2</ISBN> <authors> <author_name> Brian Frencis </author_name> <author_name> Chris Ullman </author_name> <author_name> Dave Sussman </author_name> <author_name> John Kauffman </author_name> </authors> <description> Server side scripting technologies</description> <price US $49.99>/ </prace> </book> </books>
SE-C-05 System Integration 7

XML is a Meta-Markup Language


The first thing you need to understand about XML is that it isnt just another markup language like the Hypertext Markup Language (HTML). These languages define a fixed set of tags that describe a fixed number of elements.

XML, however, is a meta-markup language. Its a language in which you make up the tags you need as you go along.
These tags must be organized according to certain general principles, but theyre quite flexible in their meaning. You dont have to force your data to fit into paragraphs, list items, strong emphasis, or other very general categories.

SE-C-05 System Integration

XML Describes Structure and Meaning, Not Formatting


XML markup describes a documents structure and meaning. It does not describe the formatting of the elements on the page. Formatting can be added to a document with a style sheet. The document itself only contains tags that say what is in the document, not what the document looks like. By contrast, HTML encompasses formatting, structural, and semantic markup. <B> is a formatting tag that makes its content bold. <AUTHORS> is a semantic tag that means its contents are the names of authors of the book. <P> is a structural tag that indicates beginning of new paragraf.. In fact, some tags can have all three kinds of meaning. An <H1> tag can simultaneously mean 20 point Helvetica bold, a level-1 heading, and the title of the page.
SE-C-05 System Integration 9

Advantages of XML
Instead of generic tags like <dt> and <li>, this listing uses meaningful tags like <BOOKS>, <BOOK>, <AUTHORS>, and <ISBN>. This has a number of advantages, including that its easier for a human to read the source code to determine what the author intended. XML markup also makes it easier for non-human automated robots to locate all of the books in the document. In HTML robots cant tell more than that an element is a dt. They cannot determine whether that dt represents a song title, a definition, or just some designers favorite means of indenting text. In fact, a single document may well contain dt elements with all three meanings.
SE-C-05 System Integration 10

Design of Domain-Specific Markup Languages


XML allows various professions (e.g., music, chemistry, math) to develop their own domain-specific markup languages. This allows individuals in the field to trade notes, data, and information without worrying about whether or not the person on the receiving end has the particular proprietary pay ware that was used to create the data. You may not be interested in electrical engineering diagrams, but electrical engineers are. You may not need to include sheet music in your Web pages, but composers do. XML lets the electrical engineers describe their circuits and the composers notate their scores, mostly without stepping on each others toes. Neither field will need special support from the browser manufacturers or complicated plug-ins, as is true today.
SE-C-05 System Integration 11

Self-Describing Data
At a higher level, XML is self-describing. Suppose youre an information archaeologist in the 23rd century and you encounter this chunk of XML code on an old floppy disk that has survived the ravages of time:
<PERSON ID=p1100 SEX=M> <NAME> <GIVEN>Judson</GIVEN> <SURNAME> McDaniel</SURNAME> </NAME> <BIRTH> <DATE>21 Feb 1834</DATE> </BIRTH> <DEATH> <DATE>9 Dec 1905</DATE> </DEATH> </PERSON>

SE-C-05 System Integration

12

Plain Text
Since XML is not a binary format, you can create and edit files with anything from a standard text editor to a visual development environment. That makes it easy to debug your programs, and makes it useful for storing small amounts of data. An XML front end to a database makes it possible to efficiently store large amounts of XML data as well. So XML provides scalability for anything from small configuration files to a company-wide data repository.

SE-C-05 System Integration

13

Interchange of Data Among Applications


Since XML is non-proprietary and easy to read and write, its an excellent format for the interchange of data among different applications. It has been designed to be extremely powerful, while at the same time being easy for both human beings and computer programs to read and write. Thus its an obvious choice for exchange languages. XML is an incredibly simple, well-documented, straightforward data format. Not just the data, but also the markup is text, and it's present right there in the XML file as tags. You can read the tag names directly to find out exactly what is in the document. All the important details about the structure of the document are explicit. You don't have to reverse-engineer the format or rely on incomplete and often unavailable documentation.
SE-C-05 System Integration 14

Data Identification
XML tells you what kind of data you have, not how to display it. Because the markup tags identify the information and break up the data into parts, an email program can process it, a search program can look for messages sent to particular people, and an address book can extract the address information from the rest of the message. Because the different parts of the information have been identified, they can be used in different ways by different applications.

SE-C-05 System Integration

15

Structured and Integrated Data


XML is ideal for large and complex documents because the data is structured. It not only lets you specify a vocabulary that defines the elements in the document; it also lets you specify the relations between elements. XML also provides a client-side include mechanism that integrates data from multiple sources and displays it as a single document. The data can even be rearranged on the fly. Parts of it can be show or hidden depending on user actions. This is extremely useful when youre working with large information repositories like relational databases.

SE-C-05 System Integration

16

Hierarchical
XML documents benefit from their hierarchical structure. Hierarchical document structures are, in general, faster to access because you can drill down to the part you need, like stepping through a table of contents. They are also easier to rearrange, because each piece is delimited. In a document, for example, you could move a heading to a new location and drag everything under it along with the heading, instead of having to page down to make a selection, cut, and then paste the selection into a new location.
SE-C-05 System Integration 17

How XML Works


<?xml version=1.0?> <library> <cd> <title>Just Singin Along</title> <artist>The Happy Guys</artist> <description> A lovely collection of songs that the whole family can sing right along with. </description> <song><title>Im Really Fine</title></song> <song><title>Cant Stop Grinnin</title></song> <song><title>Im Really Fine</title></song> <purchace_date>2/23/1959</purchace_date> </cd>

SE-C-05 System Integration

18

How XML Works


This document is text and might well be stored in a text file. You can edit this file with any standard text editor. You do not need a special XML editor. Programs that actually try to understand the contents of the XML documentthat is, do more than merely treat it as any other text filewill use an XML parser to read the document. The parser is responsible for dividing the document into individual elements, attributes, and other pieces. It passes the contents of the XML document to an application piece by piece. If at any point the parser detects a violation of the wellformedness rules of XML, then it reports the error to the application and stops parsing.
SE-C-05 System Integration 19

How XML Works


Individual XML applications normally dictate more precise rules about exactly which elements and attributes are allowed where. Some of these rules can be precisely specified with a schema written in any of several languages including the W3C XML Schema Language, RELAX NG, and DTDs. A document may contain a URI indicating where the schema can be found. Some XML parsers will notice this and compare the document to its schema as they read it to see if the document satisfies the constraints specified there. Such a parser is called a validating parser .

SE-C-05 System Integration

20

How XML Works


A violation of those constraints is called a validity error , and the whole process of checking a document against a schema is called validation. If a validating parser finds a validity error, it will report it to the application on whose behalf it's parsing the document. This application can then decide whether it wishes to continue parsing the document. However, validity errors are not necessarily fatal (unlike well-formedness errors), and an application may choose to ignore them. Not all parsers are validating parsers. Some merely check for well-formedness.
SE-C-05 System Integration 21

The application that receives data from the parser


A web browser such as Internet Explorer or Mozilla Firefox that displays the document to a reader. A database such as Microsoft SQL Server that stores the XML data in a new record. A program that you yourself wrote in Java, C, Python or some other language that does exactly what you want it to do. A drawing program such as Adobe Illustrator that interprets the XML as two-dimensional coordinates for the contents of a picture. A word processor such as StarOffice Writer that loads the XML document for editing.
22

SE-C-05 System Integration

The application that receives data from the parser


A spreadsheet such as Gnumeric that parses the XML to find numbers and functions used in a calculation. A personal finance program such as Microsoft Money that sees the XML as a bank statement. A syndication program that reads the XML document and extracts the headlines for today's news. Almost anything else.

SE-C-05 System Integration

23

Usage of XML

Raniji nain upita

XMLHTTP XML-Tekstdatoteka Transformacija u isti HTML ADO-2.1 datoteka Transformacija u HTML sa ostrvima podataka Transformacija u prozvoljni format ADO-2.1ODBC-Poziv

SE-C-05 System Integration

24

Usage of XML
HTML view#1 HTML view#2

XML
Server XML received from other application

XML Transport (HTTP)


WEB SERVER

MF

DB

SE-C-05 System Integration

25

Basic XML technologies


DTD (Document Type Definition) XML Shemas XSLT XML DOM (XML Document Object Model) SAX (Simple API for XML)

SE-C-05 System Integration

26

Fundamental concepts

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

XML Documents
The precise meaning of XML document is defined by the XML 1.0 specification published by the Worldwide Web Consortium (W3C). This specification provides a detailed BNF grammar defining exactly what is and is not an XML document. Anything that satisfies the document production in that BNF grammar and adheres to the fifteen well-formedness constraints is an XML document. Anything that does not is not an XML document. Well-formedness is the minimum requirement for an XML document. A document that is not well-formed is not an XML document. Parsers cannot read it.

SE-C-05 System Integration

28

Elements and atributes


Element: <tag>content</tag>
basic unit tag name defines what the content is opening and closing tags enclose content

Attribute: Information about the data


Attribute names are usually adjectives Stored as attribute="value" pairs:
<tag attribute="value"> content </tag>

SE-C-05 System Integration

29

Simple XML document


<person> Alan Turing </person>

SE-C-05 System Integration

30

More complex XML document


<person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>

SE-C-05 System Integration

31

Rules for well-formed XML


Elements that contain data must have start and end tags Empty tags must be closed
<br/> or <br></br>

Elements should not overlap


Bad Nesting:

<trunk> <branch> </trunk> </branch>

All attribute values must be wrapped in quotes

<a href="newpage.html"> XML is case sensitive: <TAG> and <Tag> are treated differently. (Standard: use lower case.)
SE-C-05 System Integration 32

More Rules
A document begins with:
an XML Declaration <?xml version="1.0" encoding="UTF-8"?> and a DocType Declaration: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">

Root element immediately follows; encloses entire content of the document.


<book> everything else </book>
SE-C-05 System Integration 33

An XML document is a tree


There are roughly five different kinds of nodes in an XML tree: root Also known as the document node, this is the abstract node that contains the entire XML document. Its children include comments, processing instructions, and the root element of the document. element An XML element with a name, a list of attributes, a list of in-scope namespaces, and a list of children. text The parsed character data between two tags (or any other kind of nontext node). comment An XML comment such as <!-- This needs to be fixed. -->. The contents of the comment are its data. A comment does not have any children. processing instruction A processing instruction such as <?xml-stylesheet type="text/css" href="order.css"?> A processing instruction has a target and a value. It does not have any children.
SE-C-05 System Integration 34

A tree diagram
Person

Name

Profession
Computer

Profession
mathematician

Profession
cryptographer

First name Alan

Last name Turing

scientist

SE-C-05 System Integration

35

Narrative documents
XML can also be used for more free-form, narrative documents such as business reports, magazine articles, student essays, short stories, web pages, and so forth, as shown by following example:
<biography> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> was one of the first people to truly deserve the name <emphasize>computer scientist</emphasize>. Although his contributions to the field are too numerous to list, his best-known are the eponymous <emphasize>Turing Test</emphasize> and <emphasize>Turing Machine</emphasize>. <definition>The <term>Turing Test</term> is to this day the standard test for determining whether a computer is truly intelligent. This test has yet to be passed. </definition>
SE-C-05 System Integration 36

<definition>The <term>Turing Machine</term> is an abstract finite state automaton with infinite memory that can be proven equivalent to any any other finite state automaton with arbitrarily large memory. Thus what is true for a Turing machine is true for all equivalent machines no matter how implemented. </definition> <name> <last_name>Turing</last_name> </name> was also an accomplished <profession>mathematician</profession> and <profession>cryptographer</profession> His assistance was crucial in helping the Allies decode the German Enigma machine. He committed suicide on <date> <month>June</month> <day>7</day>, <year>1954</year> </date> after being convicted of homosexuality and forced to take female hormone injections. </biography>

SE-C-05 System Integration

37

XML Applications
XML applications limit the very flexible rules of XML to a finite set of elements of certain types. For example, DocBook is an XML application designed for technical manuscripts. Elements it defines include book, chapter, para, sect1, sect2, programlisting, and several hundred others. When writing a DocBook document, you have to use these elements; and you have to use them in certain ways. For instance, a sect2 element can be a child of a sect1 but not a child of a sect3 or a chapter. Scalable Vector Graphics (SVG) is an XML application for line art. Elements it defines include line, circle, ellipse, polygon, polyline, and so forth. All SVG documents are XML documents, but not all XML documents are SVG documents. An XML application can have a schema that defines what is and is not a legal document for that application. Schemas can be written in a variety of languages including Document Type Definitions (DTDs), the W3C XML Schema Language, RELAX NG, Schematron, and numerous others
SE-C-05 System Integration 38

Elements and Tags


The fundamental unit of XML is the element Logically every element has four key pieces: A name The attributes of the element The namespaces in scope on the element The content of the element

SE-C-05 System Integration

39

Elements and Tags


Syntactically, in the text form of an XML document, elements are delimited by tags. Everything in between the two tags is the content of the element. <Quantity>12</Quantity> An element can also contain one or more child elements <ShipTo> <Street>135 Airline Highway</Street > <City>Narragansett</City> <State>RI</State> <Zip>02882</Zip>

</ShipTo>
SE-C-05 System Integration 40

Elements and Tags


An element can also have mixed content
<ShipTo> Chez Fred <Street>135 Airline Highway</Street > Apt. 17D <City>Narragansett</City> <State>RI</State> <Zip>02882</Zip> </ShipTo>

SE-C-05 System Integration

41

Attributes
Attributes are name value pairs associated with elements <Subtotal currency='USD'> 393.85 </Subtotal> Attributes are unordered. There is no difference between these two elements:
<Tax rate="7.0" currency="USD">27.57</Tax> <Tax currency="USD" rate="7.0">27.57</Tax>

SE-C-05 System Integration

42

Attributes
<person>
<name first="Alan" last="Turing"/> <profession value="computer scientist"/> <profession value="mathematician"/> <profession value="cryptographer"/>

</person>

SE-C-05 System Integration

43

XML-data model

Document Atributes

Element

Value of the element

XML is transformed in the tree with elements as nodes and values of the elements as lifs.
SE-C-05 System Integration 44

XML Example

<car type =auto year=2001> <producer> Opel </producer> <model> Astra </model>

<price/>
</car>

SE-C-05 System Integration

45

XML Example - Tree

type

auto 2001

car price

year

producer

model

Opel
SE-C-05 System Integration

Astra
46

Attributes in narative documents


<biography xmlns:xlink="http://www.w3.org/1999/xlink/namespace/"> <image source="http://www.turing.org.uk/turing/pi1/bus.jpg" width="152" height="345"/>
<person born='1912-06-23' died='1954-0607'>

<first_name>Alan</first_name>
<last_name>Turing</last_name> </person> was one of the first people to truly deserve the name <emphasize>computer scientist</emphasize>. Although his contributions to the field were too numerous to list, his best-known are the eponymous <emphasize xlink:type="simple" xlink:href="http://cogsci.ucsd.edu/~asaygin/tt/ttest.html">Turing Test</emphasize> and <emphasize xlink:type="simple" xlink:href="http://mathworld.wolfram.com/TuringMachine.html"> Turing Machine</emphasize>. <last_name>Turing</last_name> was also an accomplished <profession>mathematician</profession> and 47 <profession>cryptographer</profession>. His assistance was crucial in SE-C-05 System Integration

XML Declaration
Most XML documents begin with an XML declaration

<?xml version="1.0" encoding="ISO8859-1" standalone="yes"?>

SE-C-05 System Integration

48

Comments
<!-- Please make sure this order goes out ASAP! -->

SE-C-05 System Integration

49

Processing Instructions
Processing instructions are used to tell particular software how it should handle an XML document after the document has been parsed. Generally, processing instructions are used for metainformation that may apply to documents from many different domains and XML vocabularies. For instance, the most common processing instruction, xml-stylesheet, tells a browser or other formatter where it can find the stylesheet it should apply to the document.

<?xml-stylesheet type="text/xml" href="limited.xsl"?>


SE-C-05 System Integration 50

Processing Instructions
<?php mysql_connect("database.unc.edu", "clerk", "password"); $result = mysql("HR", "SELECT LastName, FirstName FROM Employees ORDER BY LastName, FirstName"); $i = 0; while ($i < mysql_numrows ($result))

{ $fields = mysql_fetch_row($result);
echo "<person>$fields[1] $fields[0] </person>\r\n"; $i++; }

mysql_close( );
?>

SE-C-05 System Integration

51

Checking Documents for Well-Formedness


Every start-tag must have a matching end-tag. Elements may nest, but may not overlap. There must be exactly one root element. Attribute values must be quoted. An element may not have two attributes with the same name. Comments and processing instructions may not appear inside tags. No unescaped < or & signs may occur in the character data of an element or attribute

SE-C-05 System Integration

52

DTD (Document Type Definition)

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

DTD and validation


DTDs are written in a formal syntax that explains precisely which elements and entities may appear where in the document and what the elements' contents and attributes are. A validating parser compares a document to its DTD and lists any places where the document differs from the constraints specified in the DTD. The program can then decide what it wants to do about any violations. Some programs may reject the document. Others may try to fix the document or reject just the invalid element. Validation is an optional step in processing XML. A validity error is not necessarily a fatal error like a wellformedness error, though some applications may choose to treat it as one.
SE-C-05 System Integration 54

A Simple DTD Example


<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT person (name, profession*)> name (first_name, last_name)> first_name (#PCDATA)> last_name (#PCDATA)>

<!ELEMENT profession (#PCDATA)>

SE-C-05 System Integration

55

Valid person element


<person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> </person>

SE-C-05 System Integration

56

Not valid person element


<person> <profession>computer scientist </profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>

SE-C-05 System Integration

57

Not valid person element


<person> <profession>computer scientist</profession> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>mathematician</profession> <profession>cryptographer</profession> </person>

SE-C-05 System Integration

58

An alternate DTD for the person element


<!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT <!ELEMENT first_name (#PCDATA)> last_name (#PCDATA)> profession (#PCDATA)> name (first_name, last_name)> person (name, profession*)>

SE-C-05 System Integration

59

The Document Type Declaration


<!DOCTYPE person SYSTEM "http://www.cafeconleche.org/dtds/person.dtd">

SE-C-05 System Integration

60

A valid person document


<?xml version="1.0" standalone="no"?> <!DOCTYPE person SYSTEM "http://www.cafeconleche.org/dtds/person.dtd"> <person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
SE-C-05 System Integration 61

A valid person document with an internal DTD


<?xml version="1.0"?> <!DOCTYPE person [<!ELEMENT first_name (#PCDATA)> <!ELEMENT last_name (#PCDATA)> <!ELEMENT profession (#PCDATA)> <!ELEMENT name (first_name, last_name)> <!ELEMENT person (name, profession*)>]> <person> <name> <first_name>Alan</first_name> <last_name>Turing</last_name> </name> <profession>computer scientist</profession> <profession>mathematician</profession> <profession>cryptographer</profession> </person>
SE-C-05 System Integration 62

Internal DTD subset


<!DOCTYPE person SYSTEM "name.dtd" [

<!ELEMENT profession (#PCDATA)>


<!ELEMENT person (name, profession*)>

]>

SE-C-05 System Integration

63

Element Declarations
Basic form of element declaration: <!ELEMENT element_name content_specification> Example of element which contain parsed character data, but not contain any child elements of any type: <!ELEMENT phone_number (#PCDATA)> Element with one child element: <!ELEMENT fax (phone_number)> Element with sequence of child elements: <!ELEMENT name (first_name, last_name)>
SE-C-05 System Integration 64

The Number of Children


Not all instances of a given element necessarily have exactly the same children. Possible suffixes: ? - Zero or one of the element is allowed. * - Zero or more of the element is allowed. + One or more of the element is required. Examples: <!ELEMENT name (first_name, middle_name?, last_name?)> This declaration says that a name element must contain a first_name, may or may not contain a middle_name, and may or may not contain a last_name.
SE-C-05 System Integration 65

Choices
Sometimes one instance of an element may contain one kind of child, and another instance may contain a different child. This can be indicated with a choice.

Examples:
<!ELEMENT methodResponse (params | fault)> <!ELEMENT digit (zero | one | two | three | four | five | six | seven | eight | nine) > <!ELEMENT circle (center, (radius | diameter))> <!ELEMENT name (last_name | (first_name, ( (middle_name+, last_name) | (last_name?) ) ) >

SE-C-05 System Integration

66

Mixed Content
Examples:
<!ELEMENT definition (#PCDATA | term)*> <!ELEMENT paragraph (#PCDATA | name | profession | footnote | emphasize | date )* >

SE-C-05 System Integration

67

Empty Element
<!ELEMENT image EMPTY> Valid examples:
<image source="bus.jpg" width="152" height="345" alt="Alan Turing standing in front of a bus" /> <image source="bus.jpg" width="152" height="345" alt="Alan Turing standing in front of a bus"></image>

Not valid example:


<image source="bus.jpg" width="152" height="345" alt="Alan Turing standing in front of a bus"> </image>
SE-C-05 System Integration 68

ANY
<!ELEMENT page ANY>
This declaration says that a page element can contain any content including mixed content, child elements, and even other page elements. The children that actually appear in the page elements' content in the document must still be declared in element declarations of their own. ANY does not allow you to use undeclared elements.

SE-C-05 System Integration

69

Attribute Declarations
As well as declaring its elements, a valid document must declare all the elements' attributes. This is done with ATTLIST declarations. A single ATTLIST can declare multiple attributes for a single element type. <!ATTLIST image source CDATA #REQUIRED> <!ATTLIST image source CDATA #REQUIRED width CDATA #REQUIRED height CDATA #REQUIRED alt CDATA #IMPLIED >

Examples:

SE-C-05 System Integration

70

Attribute Types
CDATA NMTOKEN NMTOKENS Enumeration ENTITY ENTITIES ID IDREF IDREFS NOTATION
SE-C-05 System Integration 71

Default declaration for that attribute


#IMPLIED The attribute is optional. Each instance of the element may or may not provide a value for the attribute. No default value is provided. #REQUIRED The attribute is required. Each instance of the element must provide a value for the attribute. No default value is provided. #FIXED The attribute value is constant and immutable. This attribute has the specified value regardless of whether the attribute is explicitly noted on an individual instance of the element. If it is included, though, it must have the specified value. Literal The actual default value is given as a quoted string.
SE-C-05 System Integration 72

XML schema

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

XML schema
An XML schema is an XML document containing a formal description of what comprises a valid XML document. A W3C XML Schema Language schema is an XML schema written in the particular syntax recommended by the W3C.

SE-C-05 System Integration

74

Schemas Versus DTDs


DTDs provide the capability to do basic validation of the following items in XML documents:
Element nesting Element occurrence constraints Permitted attributes Attribute types and default values

However, DTDs do not provide fine control over the format and data types of element and attribute values.

SE-C-05 System Integration

75

Schemas Versus DTDs


The W3C XML Schema standard includes the following features:
Simple and complex data types Type derivation and inheritance Element occurrence constraints Namespace-aware element and attribute declarations

SE-C-05 System Integration

76

Schema Basics
Example shows a very simple well-formed XML document. Example addressdoc.xml <?xml version="1.0"?> <fullName>Scott Means</fullName> Assuming that the fullName element can only contain a simple string value, the schema for this document would look like: Example address-schema.xsd <?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="fullName" type="xs:string"/> </xs:schema>
SE-C-05 System Integration 77

Example with schema reference


It is also common to associate the sample instance document explicitly with the schema document. Since the fullName element is not in any namespace, the xsi:noNamespaceSchemaLocation attribute is used as:

<?xml version="1.0"?> <fullName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="address-schema.xsd"> Scott Means </fullName>

SE-C-05 System Integration

78

Processing of XML documents

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

Common XML Processing Models


XML's structured and labeled text can be processed by developers in several of ways. Programs can look at XML as:
text, a stream of events, a tree, a serialization of some other structure.

Tools supporting all of these options are widely available.

SE-C-05 System Integration

80

Treating XML as Text


At their foundation, XML documents are text. The content and markup are both represented as text, and text-editing tools can be extremely useful for XML document inspection, creation, and modification. Textual tools are a key part of the XML toolset, however. Many developers use text editors such as vi, Emacs, NotePad, WordPad, BBEdit, and UltraEdit to create or modify XML documents. Regular expressions in environments such as sed, grep, Perl, and Python can be used for search and replace or for tweaking documents prior to XML parsing or XSLT processing. These tools can also be very useful for searching and querying the information in XMLocuments, even without an understanding of the surrounding structure.
SE-C-05 System Integration 81

Treating XML as Events


As an XML parser reads a document, it moves from the beginning of the document to the end. Event-based parsers report this reading as it happens, in a stream of events representing the information in the document. The "events" are, for example, the start of an element, the content of an element, and the end of an element. For example, given this document: <name> <given>Keith</given> <family>Johnson</family> </name> An event-based parser might report events such as this: startElement:name startElement:given content: Keith endElement:given startElement:family content:Johnson endElement:family endElement:name
SE-C-05 System Integration 82

Treating XML as Events


Event-based parsers are very useful for a wide variety of tasks.
Filters can process and modify events before passing them to another processor, efficiently performing a wide range of transformations. Filters can be stacked, providing a relatively simple means of building XML processing pipelines, where the information from one processor flows directly into another. Applications that want to feed information directly from XML documents into their own internal structures may find events to be the most efficient means of doing that.
SE-C-05 System Integration 83

Treating XML as Tree Models


XML documents, because of the requirements for wellformedness, describe tree structures. Documents typically contain an element that then contains text, attributes, and other elements, and these may contain elements, text, and attributes, and so on. Declarations, comments, and processing instructions enrich the mix, but all basically hold positions in the overall tree.

The Document Object Model (DOM), is the most common treebased API. JDOM and DOM4J are Java-only alternatives.

SE-C-05 System Integration

84

Treating XML as Tree Models


Working with a tree model of a document isn't very different conceptually from working with a document as text. The entire document is always available, and moving around well-formed portions of a document or modifying them is fairly easy. The complete set of context for any given part of the document is always available. Developers can use XPath expressions to locate content and make decisions based on content anywhere in the document where APIs support XPath. (DOM Level 3 adds formal support for XPath, and various implementations provide their own support.)
SE-C-05 System Integration 85

XML APIs
XML processors make the structure and contents of XML documents available to applications through APIs

Event-based APIs
notify application through parsing events e.g., the SAX call-back interfaces

Object-model (or tree) based APIs


provide a full parse tree

e.g, DOM, W3C Recommendation


more convenient, but may require too much resources with the largest documents
86

Major parsers support both SAX and DOM


SE-C-05 System Integration

DOM Document Object Model

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

DOM: What is it?


An object-based, language-neutral API for XML and HTML documents
allows programs and scripts to build documents, navigate their structure, add, modify or delete elements and content. Provides a foundation for developing querying, filtering, transformation, rendering etc. applications on top of DOM implementations.

In contrast to Serial Access XML could think as Directly Obtainable in Memory


SE-C-05 System Integration 88

Document Object Model (DOM)


How to provide uniform access to structured documents in diverse applications (parsers, browsers, editors, databases)? Overview of W3C DOM Specification
second one in the XML-family of recommendations

Level 1, W3C Rec, Oct. 1998 Level 2, W3C Rec, Nov. 2000 Level 3, W3C Working Draft (January 2002)

What does DOM specify, and how to use it?

SE-C-05 System Integration

89

The Document Object Model (DOM) is a language- and platform-independent object framework for manipulating structured documents The DOM structures a document as a hierarchy of Node objects.

The Node interface is the base interface for every member of a DOM document tree. It exposes attributes common to every type of document object and provides a few simple methods to retrieve type-specific information.
This interface also exposes all methods used to query, insert, and remove objects from the document hierarchy.

The Node interface makes it easier to build general- purpose tree-manipulation routines that are not dependent on specificdocument element types.
SE-C-05 System Integration 90

DOM structure model


Based on O-O concepts: methods (to access or change objects state) interfaces (declaration of a set of methods) objects (encapsulation of data and methods) a parse tree Tree-like structure implied by the abstract relationships defined by the programming interfaces;

SE-C-05 System Integration

91

<invoice> <invoicepage form="00" type="estimatedbill"> <addressee> <addressdata> <name> Tijana Petrovic </name> <address> <streetaddress> Beogradska 14 </streetaddress> <postoffice>18000 NIS </postoffice> </address> </addressdata> </addressee> ...
Document

DOM structure model


invoice
invoicepage addressee addressdata name address form="00" type="estimatedbill"

Element

Tijana Petrovic
Text

streetaddress
Beogradska 14

postoffice
18000 NIS
92

Atributes

SE-C-05 System Integration

Structure of DOM Level 1


I: DOM Core Interfaces
Fundamental interfaces basic interfaces to structured documents Extended interfaces XML specific: CDATASection, DocumentType, Notation, Entity, EntityReference, ProcessingInstruction

II: DOM HTML Interfaces


more convenient to access HTML documents

SE-C-05 System Integration

93

DOM Level 2
Level 1: basic representation and manipulation of document structure and content (No access to the contents of a DTD) support for namespaces accessing elements by ID attribute values optional features

DOM Level 2 adds


interfaces to document views and style sheets an event model (for, say, user actions on elements) methods for traversing the document tree and manipulating regions of document (e.g., selected by the user of an editor) Loading and writing of docs not specified (-> Level 3)

SE-C-05 System Integration

94

DOM Language Bindings


Language-independence:
DOM interfaces are defined using OMG Interface Definition Language (IDL; Defined in Corba Specification)

Language bindings (implementations of DOM interfaces) defined in the Recommendation for


Java and (Jaxp implementation for Java) ECMAScript (standardised JavaScript)

SE-C-05 System Integration

95

Core Interfaces: Node


Node Document DocumentFragment Element Attr

CharacterData Comment Text CDATASection Extended interfaces

DocumentType EntityReference

Notation

Entity

ProcessingInstruction
96

SE-C-05 System Integration

getNodeType getNodeValue getOwnerDocument getParentNode hasChildNodes getChildNodes getFirstChild getLastChild getPreviousSibling getNextSibling hasAttributes getAttributes appendChild(newChild) insertBefore(newChild,refChild) replaceChild(newChild,oldChild) removeChild(oldChild)

Node Interface- methods

http://java.sun.com/webservices/jaxp/dist/1.1/docs/api/org/w3c/dom/Node.html
SE-C-05 System Integration 97

Object Creation in DOM


Objects implementing interfaces are created by factory methods D.create*() , where D is a Document object. E.g: createElement("A"), createAttribute("href"), createTextNode("Hello!") Creation and persistent saving of Documents left to be specified by implementations.

SE-C-05 System Integration

98

Document Interface - Methods


Node

Document getDocumentElement createAttribute(name) createElement(tagName) createTextNode(data) getDocType() getElementById(IdVal)


http://java.sun.com/webservices/jaxp/dist/1.1/docs/api/org/w3c/dom/Document.html

SE-C-05 System Integration

99

Accessing properties of a Node


Node.getNodeName()
for an Element = getTagName() for an Attr: the name of the attribute for Text = "#text" etc

Node.getNodeValue()
content of a text node, value of attribute, ; null for an Element (!!) (in XSLT/Xpath: the full textual content)

Node.getNodeType():
numeric constants (1, 2, 3, , 12) for ELEMENT_NODE, ATTRIBUTE_NODE,TEXT_NODE, , NOTATION_NODE

SE-C-05 System Integration

100

Content and element manipulation


Manipulating CharacterData D:
D.substringData(offset, count) D.appendData(string) D.insertData(offset, string) D.deleteData(offset, count) D.replaceData(offset, count, string) (= delete + insert)

Accessing attributes of an Element object E:


E.getAttribute(name) E.setAttribute(name, value) E.removeAttribute(name)
SE-C-05 System Integration 101

Additional Core Interfaces


NodeList for ordered lists of nodes
e.g. from Node.getChildNodes() or Element.getElementsByTagName("name")
all descendant elements of type "name" in document order (wild-card "*"matches any element type)

Accessing a specific node, or iterating over all nodes of a NodeList: E.g. Java code to process all children:
for (i=0; i<node.getChildNodes().getLength(); i++) process(node.getChildNodes().item(i));

http://java.sun.com/webservices/jaxp/dist/1.1/docs/api/org/w3c/dom/package-summary.html
SE-C-05 System Integration 102

DOM: Implementations
Java-based parsers e.g. IBM XML4J, Apache Xerces, Apache Crimson

MS IE5 browser: COM programming interfaces for C/C++ and MS Visual Basic, ActiveX object programming interfaces for script languages XML::DOM (Perl implementation of DOM Level 1)
Others? Non-parser-implementations? (Participation of vendors of different kinds of systems in DOM WG has been active.)

SE-C-05 System Integration

103

A Java-DOM Example
A stand-alone toy application BuildXml
either creates a new db document with two person elements, or adds them to an existing db document

Technical basis
DOM support in Sun JAXP native XML document initialisation and storage methods of the JAXP 1.1 default parser (Apache Crimson)

SE-C-05 System Integration

104

Example Code Begin by importing necessary packages


import java.io.*; import org.w3c.dom.*; import org.xml.sax.*; import javax.xml.parsers.*; // Native (parse and write) methods of the // JAXP 1.1 default parser (Apache Crimson): import org.apache.crimson.tree.XmlDocument;

SE-C-05 System Integration

105

Class for modifying the document in file fileName:


public class BuildXml { private Document document; public BuildXml(String fileName) { File docFile = new File(fileName); Element root = null; // doc root element // Obtain a SAX-based parser: DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
SE-C-05 System Integration 106

try { // to get a new DocumentBuilder: documentBuilder builder = factory.newDocumentBuilder(); if (!docFile.exists()) { //create new doc document = builder.newDocument(); // add a comment: Comment comment = document.createComment( "A simple personnel list"); document.appendChild(comment); // Create the root element: root = document.createElement("db"); document.appendChild(root);

SE-C-05 System Integration

107

or if docFile already exists: } else { // access an existing doc try { // to parse docFile document = builder.parse(docFile); root = document.getDocumentElement(); } catch (SAXException se) { System.err.println("Error: " + se.getMessage() ); System.exit(1); } /* A similar catch for a possible IOException */
SE-C-05 System Integration 108

Subroutine to create person elements


public Node createPersonNode(Document document, String idNum, String fName, String lName) { Element person = document.createElement("person"); person.setAttribute("idnum", idNum); Element firstName = document. createElement("first"); person.appendChild(firstName); firstName. appendChild( document. createTextNode(fName) ); /* similarly for a lastName */ return person;

}
109

SE-C-05 System Integration

Create and add two child elements to root:


Node personNode = createPersonNode(document, "1234", "Pekka", "Kilpelinen"); root.appendChild(personNode); personNode = createPersonNode(document, "5678", "Irma", "Knnen"); root.appendChild(personNode);

SE-C-05 System Integration

110

Finally, store the result document:


try { // to write the // XML document to file fileName ((XmlDocument) document).write( new FileOutputStream(fileName)); } catch ( IOException ioe ) { ioe.printStackTrace(); }

SE-C-05 System Integration

111

The main routine


public static void main(String args[]){ if (args.length > 0) { String fileName = args[0]; BuildXml buildXml = new BuildXml(fileName); } else { System.err.println( "Give filename as argument"); }; } // main

SE-C-05 System Integration

112

SAX Simple API for XML

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

http://www.brics.dk/~amoeller/XML/programming/saxapi.html

What is SAX?
Simple API for XML Originally developed through the xml-dev mailing list after Peter got bored of working numerous noninterchangeable XML parsers Primarily a Java API but there implementations in most languages Unfortunately they differ quite a lot So you will need to get a feeling for your particular implementation The full specification is not so 'simple' But a useful application usually only requires a small subset of SAX Currently at version 2.0 Version 2.0 was needed to provide support for namespaces
SE-C-05 System Integration 114

How does SAX work?


An XML tree is not viewed as a data structure, but as a stream of events generated by the parser. Each event triggers a subroutine call or callback procedure An XML tree can be built in response, but it is not required to construct a data structure This is sometimes much more efficient: the document can be piped through the application the only real option for very large documents good for local processing, not for random access

SE-C-05 System Integration

115

The kinds of events are:


The start of the document is encountered The end of the document is encountered The start tag of an element is encountered The end tag of an element is encountered Character data is encountered A processing instruction is encountered
Scanning the XML file from start to end, each event invokes a corresponding callback method that the programmer writes.

SE-C-05 System Integration

116

What are Callbacks?


Callbacks are just procedures/subroutines That the user supplies to the program You are maybe familiar with writing a program that uses someone elses routines Here someone else writes the program and you write the subroutines They allow you to modify the behaviour of a program from the outside The parser calls the subroutines Every time it encounters an event Passing arguments if necessary There has to be a mechanism for registering your routines with the program
SE-C-05 System Integration 117

Events and Callbacks


<cml>

<metadataList>
<metadata name=age value=27/> <metadata name=colour value=blue/> </metadataList> <property title=bigness>

<scalar units=cubic feet>2304</scalar>


</property> </cml>

SE-C-05 System Integration

118

Events and Callbacks


---------------------------------> startDocument

<?xml version=1.0 ?>


<cml> ---------------------------> startElement <property title=dim> -------> startElement <array> --------------------> startElement

0.0000 12.35000 6.45550 ---> characters

</array>

-------------------> endElement

</property> -------------------> endElement </cml> ---------------------------> endElement ---------------------------------> endDocument

SE-C-05 System Integration

119

saxexample.html xmlfile

Example?

import java.io.*; import org.xml.sax.*; import org.xml.sax.helpers.*; import org.apache.xerces.parsers.SAXParser; public class Flour extends DefaultHandler { float amount = 0; public void startElement(String namespaceURI, String localName, String qName, Attributes atts) { if (namespaceURI.equals("http://recipes.org") && localName.equals("ingredient")) { String n = atts.getValue("","name"); if (n.equals("flour")) { String a = atts.getValue("","amount"); // assume 'amount' exists amount = amount + Float.valueOf(a).floatValue(); } } } public static void main(String[] args) { Flour f = new Flour(); SAXParser p = new SAXParser(); p.setContentHandler(f); try { p.parse(args[0]); } catch (Exception e) {e.printStackTrace();} System.out.println(f.amount); } }
SE-C-05 System Integration 120

Saxevents.htm

Events in example

start document processing instruction: dsd starting element: collection -character data, length 3 -starting element: description --character data, length 47 -end element: description -character data, length 3 -starting element: recipe --character data, length 5 ... -end element: recipe -character data, length 1 end element: collection end document
SE-C-05 System Integration 121

SAX 2 Interfaces
Defines interfaces for standard routines and callbacks ContentHandler the most important interface Attributes Interface the second most important
interface

DTDHandler EntitiyResolver ErrorHandler Locator XMLFilter XMLReader SAXException


SE-C-05 System Integration 122

ContentHandler Interface
This is the bit that handles the most important events
The methods that handle the events are referred to as callback routines The parser fires events according to what it finds in the XML file. Every times it encounters an event it calls the appropriate callback routine

The ContentHandler specifies 11 methods


No methods for dealing with comments or XML declarations Attributes are not consider to be events

SE-C-05 System Integration

123

ContentHandler Interface
The most important piece of SAX startDocument() endDocument() startElement(uri, localName, qName, attrs) endElement(uri, localName, qName) characters(text, start, length) ignorableWhitespace(text, start, length) startPrefixMapping(prefix, uri) endPrefixMapping(prefix) processingInstruction(target, data) setDocumentLocator(locator) skippedEnitity(name)
SE-C-05 System Integration 124

Attributes Interface
Specifies methods for accessing individual attributes An attributes object is passed to the startElement routine The order of the attributes is unimportant and need not be in the same order as in the XML document. However we can refer to attributes by their index for convenience Uses overloaded functions allowing us to refer to an attribute by it's qualified name Or by its URI and it local name Or by an index (for convenience)

SE-C-05 System Integration

125

Attributes Interface
getLength () getQName(index) getURI(index) getType(uri, localName) getType(qualifiedName) getType(index)

getLocalName(index)
getIndex(uri, localPart)

getValue(uri, localName)
getValue(qualifiedName)

getIndex(qualifiedName)

getValue(index)

SE-C-05 System Integration

126

ErrorHandler Interface
ErrorHandler Allows you to catch errors and deal with them appropriately Again you have to write these functions The ErrorHandler only specifies the interface warning(exception) ambiguities/non-XML errors error (exception) non fatal errors (invalid documents) fatalError(exception) fatal errors (not well-formed)

SE-C-05 System Integration

127

SAX Pros & Cons


Pros
SAX has a very small memory footprint and is ideal for large documents It is also useful when the Tree-like model of DOM is not the most appropriate one for storing your data particular if the structure is very flat

Cons
A document is intuitive an event is less so There is no default storage model Because SAX only stores a small part of the document in memory at any given time, it is up to you to keep track of where you are in the document If the document has a lot of structure, and latter events need to know about earlier events you can find yourself storing a lot of data in memory

SE-C-05 System Integration

128

XML and Namespaces

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

Namespaces?
Since element names in XML are not predefined, a name conflict will occur when two different documents use the same element names. Namespaces are a simple and straightforward way to distinguish names used in XML documents, no matter where they come from. <table> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>

<table> <name>Coffee Table </name> <width>80</width> <length>120</length> </table>


SE-C-05 System Integration 130

Solving Name Conflicts Using a Prefix


<h:table> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table>

<f:table> <f:name>Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>

SE-C-05 System Integration

131

Using Namespaces
<h:table xmlns:h="http://www.w3.org/TR/html4/"> <h:tr> <h:td>Apples</h:td> <h:td>Bananas</h:td> </h:tr> </h:table> <f:table xmlns:f="http://www.w3schools.com/furniture"> <f:name>Coffee Table</f:name> <f:width>80</f:width> <f:length>120</f:length> </f:table>
SE-C-05 System Integration 132

Default Namespaces
<table xmlns="http://www.w3.org/TR/html4/"> <tr> <td>Apples</td> <td>Bananas</td> </tr> </table>

<table xmlns="http://www.w3schools.com/furniture"> <name>Coffee Table</name> <width>80</width> <length>120</length> </table>


SE-C-05 System Integration 133

<h:html xmlns:xdc="http://www.xml.com/books" xmlns:h="http://www.w3.org/HTML/1998/html4"> <h:head> <h:title>Book Review</h:title></h:head> <h:body> <xdc:bookreview> <xdc:title>XML: A Primer</xdc:title> <h:table> <h:tr align="center"> <h:td>Author</h:td> <h:td>Price</h:td> <h:td>Pages</h:td> <h:td>Date</h:td> </h:tr> <h:tr align="left"> <h:td><xdc:author>Simon St.Laurent</xdc:author></h:td> <h:td><xdc:price>31.98</xdc:price></h:td> <h:td><xdc:pages>352</xdc:pages></h:td> <h:td><xdc:date>1998/01</xdc:date></h:td> </h:tr> </h:table> </xdc:bookreview> </h:body> 134 </h:html> SE-C-05 System Integration

Example

<h:html xmlns:xdc="http://www.xml.com/books" xmlns:h="http://www.w3.org/HTML/1998/html4"> <h:head><h:title>Book Review</h:title></h:head> <h:body> <xdc:bookreview> <xdc:title h:style="font-family: sans-serif;"> XML: A Primer</xdc:title> <h:table> <h:tr align="center"> <h:td>Author</h:td> <h:td>Price</h:td> <h:td>Pages</h:td> <h:td>Date</h:td> </h:tr> <h:tr align="left"> <h:td> <xdc:author>Simon St. Laurent</xdc:author> </h:td> <h:td><xdc:price>31.98</xdc:price></h:td> <h:td><xdc:pages>352</xdc:pages></h:td> <h:td><xdc:date>1998/01</xdc:date></h:td> </h:tr> </h:table> </xdc:bookreview> </h:body> </h:html>
SE-C-05 System Integration 135

<html xmlns="http://www.w3.org/HTML/1998/html4" xmlns:xdc="http://www.xml.com/books"> <head><title>Book Review</title></head> <:body> <xdc:bookreview> <xdc:title>XML: A Primer</xdc:title> <table> <tr align="center"> <td>Author</td> <td>Price</td> <td>Pages</td> <td>Date</td></tr> <tr align="left"> <td><xdc:author>Simon St. Laurent</xdc:author></td> <td><xdc:price>31.98</xdc:price></td> <td><xdc:pages>352</xdc:pages></td> <td><xdc:date>1998/01</xdc:date></td> </tr> </table> </xdc:bookreview> </body> </html>
SE-C-05 System Integration 136

XSL Formating XML data

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

XSL: eXtensible Stylesheet Language


Not a mark-up in the sense of HTML Built on the idea of Templates which are themselves XML documents. XSL templates provide the mechanism for transforming data, and applying formatting information to data

SE-C-05 System Integration

138

Applying the Style


Take an XML document representing a hierarchy of nodes Apply one of several possible, possibly independent, XSL stylesheets Browser produces something that looks like HTML

SE-C-05 System Integration

139

XSL: example
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">

DOCTYPE declaration

</xsl:stylesheet>
SE-C-05 System Integration 140

XSL: example
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <xsl:apply-templates/> </xsl:template>
Find root of DOM tree and apply templates

</xsl:stylesheet>
SE-C-05 System Integration 141

XSL: example
<xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl"> <xsl:template match="/"> <xsl:apply-templates/>

</xsl:template>

<xsl:template match="clrcstructures"> <xsl:apply-templates/> </xsl:template>


Find clrcstructures node and apply templates
</xsl:stylesheet>
SE-C-05 System Integration 142

XSL: example
<xsl:template match="clrcstructures"> <xsl:apply-templates/> </xsl:template>

Find department node and output details

<xsl:template match="department"> <P><xsl:value-of select="deptabbrev"/> (<xsl:value-of select="deptname"/>)</P> </xsl:template>

SE-C-05 System Integration 143

XSL: example
... <xsl:template match="department"> <P><xsl:value-of select="deptabbrev"/> (<xsl:value-of select="deptname"/>)</P>

<UL><xsl:apply-templates/></UL>
</xsl:template>

Find department nodes, output details and apply templates

<xsl:template match="division"> <LI><xsl:value-of select="divnabbrev"/> (<xsl:value-of select="divnname"/>) </LI> </xsl:template>

Find division nodes and output details

SE-C-05 System Integration

144

XSL: example
...

<xsl:template match="group"> <P> <xsl:choose> <xsl:when test = "structureID [ . = 'ITDISEW3G']" > <B><xsl:value-of select="grpname"/></B> </xsl:when> <xsl:otherwise> <xsl:value-of select="grpname"/> Match W3G and </xsl:otherwise> display differently </xsl:choose> </P> </xsl:template>

SE-C-05 System Integration 145

Example
XML file: people.xml XSLT file: people.xsl Formated XML: peoplexsl.xml

SE-C-05 System Integration

146

Converting Relational Database to XML


Example: Export the following data into XML and group books by store Relational Database: Store (sid, name, phone) Book (bid, title, authors) StoreBook (sid , bid, price, stock)
price Store stock Book authors

name

StoreBook

phone

sid
SE-C-05 System Integration

title

bid
147

Converting Relational Database to XML


XML: <store> <name> </name> <phone> </phone> <book> <title> </title> <authors> </authors> <price> </price> </book> <book></book> </store>

SE-C-05 System Integration

148

Extracting data as XML


Most databases now have the ability to return the results of a query in XML format. For example, in SQL Server you can enter the query: SELECT * FROM EMPLOYEE FOR XML RAW which will give you a dump of the entire EMPLOYEE table in "raw" XML format: this means you get no control over the representation. Each row in the result is output as an element, with the column values represented as attributes. You can then put it through an XSLT transformation to turn it into something else.

SE-C-05 System Integration

149

Extracting data as XML


For Oracle the equivalent is the XML SQL utility. Using XSU, you can enter a standard SQL query such as SELECT * FROM EMPLOYEE WHERE EMPLOYEE_NR='517541', and get back the answer in the form of an XML document such as:
<ROWSET> <ROW num="1"> <EMPLOYEE_NR>517541</EMPLOYEE_NR> <NAME>Michael Kay</NAME> </ROW> </ROWSET>

Oracle also has a utility, called XSQL pages, that allows you to embed SQL statements in a skeletal XML document. A request from a browser to this document is directed to a servlet, which executes the SQL statements and enters the results into the page before delivering it back to the browser. Formatting of the page can then be controlled on the client side using either CSS or client-side XSLT.
SE-C-05 System Integration 150

Some other XML formats


MathML for Mathematics Chemical Markup Language (CML) for Chemistry Astronomical Markup Language (AML) for Astronomy Bioinformatic Sequence Markup Language (BSML) for the human genome project Extensible Scientific Interchange Language (XSIL) DDI for Social Science Data

SE-C-05 System Integration

151

CONCLUSIONS

Supported by:

Joint MSc curriculum in software engineering European Union TEMPUS Project CD_JEP-18035-2003
Version: April 28, 2006

XML in Data Management


Integration of Heterogeneous Data
common interface for exchange delivered across a common medium different data formats into the same XML format web based metadata for management, searching and control widely available economic tools client-side processing for presentation and analysis

SE-C-05 System Integration

153

XML...
Can be pre-generated or created on-the-fly at the server Provides an easily parsable, platform and vendor neutral format for transmitting data Needs no network etc support beyond the Web browser (or other transport) Provides the means to validate and transform the data at the desktop

SE-C-05 System Integration

154

Data validation
Even in a perfect world there can be problems in:
Generation Transmission Editing/processing after reception

XML provides a means to declare the data structure to the desktop

SE-C-05 System Integration

155

Metadata - internal
Basic provided by Document Type Definitions (DTDs)
Simplified from SGML version Provides basic structure and cardinality

Several proposals, including MS version shipped in IE5, extending capabilities


Data typing Extended value and structure constraints

SE-C-05 System Integration

156

Semistructured Data and Mediators


Semistructured data is often encountered in data exchange and integration At the sources the data may be structured (e.g. from relational databases) We model the data as semistructured to facilitate exchange and integration Users see an integrated semistructured view that they can query Queries are eventually reformulated into queries over the structured resources (e.g. SQL) Only results need to be materialized

SE-C-05 System Integration

157

Schema archiving - a future?


XML Schema under development
Support for hierarchic and OO views Usable for relational, but lacks proper key support

Problem is document-driven approach


Concentrates on instances of data rather than The Big Picture

SE-C-05 System Integration

158

What is a mediator ?
A complex software component that integrates and transforms data from one or several sources using a declarative specification Two main contexts: Data conversion: converts data between two different models

e.g. by translating data from a relational database into XML


Data integration: integrates data from different sources into a common view
SE-C-05 System Integration 159

CONCLUSION
XML is now achieving momentum The scientific data management community should be at the forefront of its use.
users will demand it advantages of widely available tools advantages in integration advantages in information management

SE-C-05 System Integration

160

Sources
http://sax.sourceforge.net/ - Official Sax Web site http://www.xml.com/pub/a/1999/01/na mespaces.html - Site with tutorial related to namespaces

SE-C-05 System Integration

161

Você também pode gostar