Você está na página 1de 91

Mark Colan

e-business vangelist
mcolan@us.ibm.com
http://ibm.com/developerworks/speakers/colan
An Overview of XML
and Related Technologies
Mark Colan An Overview of XML Technologies IBM Corporation
Page 1
Agenda
The motivation for XML
What is XML?
XML Standards
DTDs
XML Schema
XSL
XML Programming, DOM, and SAX
XML - Best Practices (and pitfalls)
XML Tools and Resources
IBM and XML
Perspective: XML Today and in the Future
Mark Colan An Overview of XML Technologies IBM Corporation
Page 2
B2B exchange: 1970's, 1980's
1970's (and before): The mainframe era!
Data exchange thru magnetic tape reels or punched
cards
shipped by mail or courier
required hardware compatibility
required common data format
Rigid file formats
COBOL file descriptions with fixed length fields
Specified record and block size
Often based on 80-column punch cards
XML would have been possible, but not practical
Personal computers:
PC's are in the hobbyist domain: 1975-1980
IBM PC causes a revolt against mainframe mentality
...but causes headaches for MIS administrators:
Mark Colan An Overview of XML Technologies IBM Corporation
Page 3
Not much had changed...
A variety of network types are in use
File formats still compiled into programs
Enterprise Application Integration is
difficult, time-consuming, expensive!
the legacy of rigid and proprietary file formats, platform,
programming lanugage, and operating system dependence
are the culprits
ERP systems rise to solve problems with integrating
business process software applications
Companies revise business processes to fit the software
But isn't this backwards?
Software should model the business process,
not the other way around!
EDI based on rigid message formats, private networking
Enterprise Application Integration: 1990
Mark Colan An Overview of XML Technologies IBM Corporation
Page 4
EDI is the only standard...
Based on private networking (not the internet)
Expensive, time-consuming to build and integrate
Rigid message formats (XML doesn't exist yet) is one
culprit
Is EDI a success?
Used primarily by the largest companies in the most
industrialized nations
Wide-spread use? 5% of businesses
what about small- and mid-sized companies?
what about developing nations?
Early B2B integration
months or years of development
once you have integrated with a supplier,
you're dependent on them
what do you do if they take advantage of you?
B2B Integration: 1990
Mark Colan An Overview of XML Technologies IBM Corporation
Page 5
B2B and EAI with XML
Flexible file formats
systems less likely to break as software evolves
easier integration
Information: messages and documents
Interoperability: sharing data across applications and
platforms
Integration: bringing together data from multiple sources
XML data is Independent of:
hardware platform
operating system
programming language
XML is in UNICODE, so it is international
XML defines the data format for content
but what about the exchange standards?
attend "An Introduction to Web Services" to find out
object model
delivery device
Mark Colan An Overview of XML Technologies IBM Corporation
Page 6
XML and e-business Integration
Enterprise Application Integration
XML makes it easy to integrate applications from
different vendors, different hardware, different
programming languages
End-to-end integration leverages your systems and
people for better efficiency
Business Partner Integration
Common XML data formats allow different companies to
integrate quickly for e-business
Solves more difficult BP integration problem
New model: find business partners dynamically, and
begin doing business immediately
Business directories, marketplaces, auctions
Mark Colan An Overview of XML Technologies IBM Corporation
Page 7
IBM's Vision: Dynamic e-business
Software designed as flexible components
Software is modelled after your business processes...
not the other way around
remodel your business processes to study
a new business process idea for feasibility
your ideas for improving business processes
can provide a competitive edge
Locate / change business partners as needed
price changes
availability
The vision is realized by:
New internet standards (TCP/IP, XML, ...)
Web Services standards (SOAP, WSDL, UDDI, ...)
Read all about it! http://ibm.com/webservices
Resources has many useful whitepapers on the Dynamic ebiz
Mark Colan An Overview of XML Technologies IBM Corporation
Page 8
Part 2: What is XML?
Tags in text to mark-up meaning
A text-based tag language, similar in style to HTML,
but lets you define your own tags
A standard way of sharing structured data
A key technology to enable e-business
A simplified subset of SGML
A language for defining other markup languages,
interchange formats and message sets
Mark Colan An Overview of XML Technologies IBM Corporation
Page 9
How is XML used?
Documents
purchase order, employee record, electronic Trading
Partner Agreements, structured text documents, ...
common import/export format
great for integrating heterogenous applications
Messages
service request
(example: "please verify this credit card #")
SOAP and Web Services
Mark Colan An Overview of XML Technologies IBM Corporation
Page 10
<address>
<name>
<title>Mrs.</title>
<first-name>Mary</first-name>
<last-name>McGoon</last-name>
</name>
<street>1401 Main Street</street>
<city>Sheboygan</city>
<state>WI</state>
<zip>38472</zip>
<country>USA</country>
</address>
Sample XML code
Compared to HTML, XML
labels the data - says what it is
does NOT say how it should be presented
Mark Colan An Overview of XML Technologies IBM Corporation
Page 11
"Well-formed" XML vs "valid" XML
Well-formed XML: XML has a small set of rules to
define the basic syntax that every XML document
must follow to be accepted by an XML parser
<?xml version="1.0"?>
A "valid" XML document is well-formed AND
complies to a specific DTD or XML Schema
<tag
attribute="x"/>
<tag>data</tag>
nesting of tags
<employee>
<name>Mark Colan</name>
<id>X04913</id>
</employee>
required first line
syntax of tags
tag attributes
Mark Colan An Overview of XML Technologies IBM Corporation
Page 12
XML rules: closing tags required
Legal HTML, not legal in XML:
<p>Explain this!
<br>
XHTML: legal for both XML and HTML:
<p>Explain this!</p>
Legal XML shortcut for a tag with no text data:
<br />
Note the slash before closing bracket.
Mark Colan An Overview of XML Technologies IBM Corporation
Page 13
XML rules: correct nesting
HTML can be written to be conformant XML
This isn't legal XML:
<i>
<b>
</i>
</b>
This is legal (XML and HTML):
<i>
<b>
</b>
</i>
This is legal in HTML, but illegal in XML:
<p>Explain <i><b>this</i></b>!
Legal for both HTML and XML:
<p>Explain <i><b>this</b></i>!</p> OR
<p/>Explain <i><b>this</b></i>!
Mark Colan An Overview of XML Technologies IBM Corporation
Page 14
XML rules: outer tag set
The XML document must be enclosed
in one set of tags.
This is legal:
<colors>
<color>red</color>
<color>green</color>
</colors>
This isn't:
<color>red</color>
<color>green</color>
Mark Colan An Overview of XML Technologies IBM Corporation
Page 15
XML attributes
You can use attributes within a tag:
<paper color="red">
Print this on red paper.
</paper>
Maybe there's no data. Use either form:
<paper color="red"></paper>
<paper color="red"/>
The second form is a short-cut for the first; the two
are equivalent.
Mark Colan An Overview of XML Technologies IBM Corporation
Page 16
XML for specific data sets
What we have seen so far is the rules that
every XML document must follow to be
"well-formed".
Now... how do we specify:
the names of tags we allow?
which tags can nest other tags?
required vs optional for each tag?
one occurrance, or any number?
default value of attributes?
Mark Colan An Overview of XML Technologies IBM Corporation
Page 17
XML Vocabularies
A particular XML markup language is called a
"vocabulary", expressed in either or both:
DTD (Document Type Definition)
part of the XML 1.0 specication
comes from SGML definition
XML Schema
an improved XML definition language
new spec from W3C
"Recommendation" status in May 2001
We'll look at DTDs and Schemas later in this
presentation.
Mark Colan An Overview of XML Technologies IBM Corporation
Page 18
Part 3: XML Standards
Core Technology
Standards
IBM participation includes XML,
Schema, DOM, XSL, Namespaces,
Linking, XHTML, RDF, XHTML,
XML Protocol (SOAP), and
XML Query.
The XML Industry Portal
Sponsored by IBM, Sun, Oracle, SAP, ...
A vendor-neutral XML schema clearinghouse.
Info on how to apply XML in industrial and
commercial settings.
Accelerating the
adoption of industry
standards
oasis-open.org
100+ member companies including
IBM, Sun, Microsoft, Corel,
Software AG, and Oracle.
Enabling a Global
Electronic Market
ebxml.org
United Nations Centre
for the Facilitation
of Procedures and Practices for
Administration, Commerce and
Transport
www.unece.org/cefact/
IBM alphaWorks
Providing early access to
emerging technologies to
developers.
Mark Colan An Overview of XML Technologies IBM Corporation
Page 19
W3C XML technologies
"Recommended" by W3C:
XML Specification 1.0: syntax, DTDs
DOM Specification 2.0: API of parsed objects
XSLT Specification 1.0: transforming XML
XPath Specification 1.0: queries, addressing XML docs
XHTML Specification 1.0: HTML in XML form
XML Schema: big improvements over DTDs
Works in progress:
XSL Formatting Objects
DOM 3.0
XML Query: a more powerful query mechanism
XPointer, XLink
XML Signature, XML Encryption
XML Protocol (SOAP 1.2)
WSDL
Other standards:
SAX 2.0 (defacto standard, not from W3C)
SOAP 1.1 (defacto standard, now under development @ W3C)
Mark Colan An Overview of XML Technologies IBM Corporation
Page 20
XML 1.0 Specification
Originally published: February 1998
In only 50 pages:
complete XML syntax details
complete Document Type Definition (DTD)
XML 1.0 Specification, 2nd Edition: 6 October 2000
errata applied to original spec, not a new version
now a "recommendation" (replaces Feb 1998 edition)
http://www.w3.org/TR/2000/REC-xml-20001006
Supplementary specs:
Namespaces in XML (January, 1999)
Stylesheet linking (June, 1999)
others in progress (XBase, XInclude, Canonical, ...):
see http://www.w3.org/XML/Activity.html#future
Mark Colan An Overview of XML Technologies IBM Corporation
Page 21
XML Schema Specs
A greatly improved vocabulary definition language
replaces DTDs (superset of DTDs)
XML syntax
rich type support
http://www.w3.org/XML/Schema
XML Schema Part 0: Primer
edited by David C. Fallside of IBM
XML Schema Part 1: Structures
XML Schema Part 2: Datatypes
W3C "Recommendation": May 2001
Xerces-J is a complete implementation
"beta" release
a few documented restrictions and bugs
to be resolved in forthcoming final release
xml.apache.com, XML4J on www.alphaworks.ibm.com
Mark Colan An Overview of XML Technologies IBM Corporation
Page 22
DOM 2.0 Specification
Models a tree representation of an XML document
tree is created as a result of parsing a document
supports both XML and HTML
A language-independent object definition and API
Bindings for Java in Appendix
DOM 1.0 W3C Recommendation: October, 1998
spec: http://www.w3.org/TR/REC-DOM-Level-1/
DOM 2.0: now a W3C Recommendation
new methods, types, interfaces
traversals, namespaces, event model, stylesheets
DOM 3.0 is in gathering requirements phase
see http://www.w3.org/DOM
Mark Colan An Overview of XML Technologies IBM Corporation
Page 23
SAX 2.0 Specification
"Simple API for XML" - for event-based parsing
instead of getting a complete DOM tree,
you get notifications of the arrival of each piece
essential when parsing very large documents
A de-facto "standard" by Dave Megginson
not a W3C Recommendation
Bindings available for Java, C++, COM, Perl, Python
History:
Version 1.0 published May, 1998
Version 2.0 published May, 2000
SAX 2.0 support is available in Xerces parsers
see http://www.megginson.com/SAX/index.html
Mark Colan An Overview of XML Technologies IBM Corporation
Page 24
XSL: Extensible Stylesheet Language
see http://www.w3.org/Style/XSL/
Three parts:
XSL
XSLT
Transformation
language
XPath
Formatting
Objects
an XML Vocabulary
for specifying
formatting semantics
a language for addressing
parts of an XML document
Mark Colan An Overview of XML Technologies IBM Corporation
Page 25
XSLT 1.0 Specification
A transformation language for XML documents
designed as an XML vocabulary
styling (rendering to visual form, like HTML)
transformation (vocabulary translation)
can emit XML, HTML, even non-XML formats
XSLT stylesheets are well-formed XML
W3C Recommendation: November, 1999
spec: http://www.w3.org/TR/xslt
XSLT 1.0 implementations from IBM:
Apache Xalan xml.apache.org
LotusXSL www.ibm.alphaworks.com
XSLT 2.0 planned, in requirements phase
Mark Colan An Overview of XML Technologies IBM Corporation
Page 26
XSL Formatting Objects
Layout-oriented XML vocabulary
rich representation of documents for printing, various
device screens, etc
usually created as output of XSLT
using an appropriate stylesheet
XSL Specification defines FO's, refers to XSLT
Currently W3C Candidate Recommendation
see http://www.w3.org/TR/xsl/
FOP open source FO processor implementation
(creates PDF) available at xml.apache.org
See also: "XSL by Example" (on my web site)
examples, more resources
Mark Colan An Overview of XML Technologies IBM Corporation
Page 27
XPath 1.0 Specification
Language for addressing parts of an XML document
used by XSLT and XPointer
basic facilities for manipulation of strings, numbers and
booleans
can be used as simple query language
compact, non-XML syntax for use in URIs
W3C Recommendation: November, 1999
see http://www.w3.org/TR/xpath
XPath 2.0 is planned
will be used by XML Query
XPath implementation: part of Xalan / LotusXSL,
xml.apache.org / www.alphaworks.ibm.com
Mark Colan An Overview of XML Technologies IBM Corporation
Page 28
XML Query
Query facilities to extract data from real and virtual
XML documents
A work in progress:
XML Query Requirements, 16 February 2001
XML Query Use Cases, 8 June 2001
XQuery 1.0 and XPath 2.0 Data Model, 7 June 2001
XQuery 1.0 Formal Semantics, 7 June 2001
XQuery 1.0: An XML Query Language, June 2001
XML Syntax for XQuery 1.0 (XQueryX), June 2001
NEW: XQuery 1.0 and XPath 2.0 Functions and
Operators Version 1.0, first release 27 August 2001
"...many of the powerful and structured facilities of XML
Query have been recognized as so fundamental that
they are going to be incorporated into the next version
of XPath, namely XPath 2.0."
Mark Colan An Overview of XML Technologies IBM Corporation
Page 29
XLink 1.0 Specification
Purpose:
elements can be inserted into XML documents to create
and describe links between resources
XML syntax to create structures that describe the simple
unidirectional hyperlinks of today's HTML, as well as
more sophisticated links
Status:
W3C Recommendation: 21 June 2001
see http://www.w3.org/TR/xlink/
Mark Colan An Overview of XML Technologies IBM Corporation
Page 30
XBase 1.0 Specification
Purpose:
XML syntax providing the equivalent of HTML BASE
functionality generically by defining an XML attribute
named xml:base
Status:
W3C Recommendation, 27 June 2001
http://www.w3.org/TR/xmlbase/
Mark Colan An Overview of XML Technologies IBM Corporation
Page 31
Fragment identifier for URI-references that locate
XML Resources
based on XPath
supports addressing into the internal structures of XML
documents
allows traversals of a document tree and choice of its
internal parts based on element types, attribute values,
character content, and relative position
Status:
W3C Candidate Recommendation 11 September 2001
http://www.w3.org/TR/xptr/
XPointer Specification
Mark Colan An Overview of XML Technologies IBM Corporation
Page 32
XHTML 1.0 Specification
Reformulation of HTML 4.01 as XML
documents must be "well-formed" XML
elements and attributes are lower-case only
for non-empty elements, end tags are required
empty elements (<br />) allowed
attribute values must always be quoted
no attribute "minimization"
W3C Recommendation: January 2000
spec: http://www.w3.org/TR/xhtml1/
second edition in "working draft"
not a new version; incorporates corrections based on
feedback
Mark Colan An Overview of XML Technologies IBM Corporation
Page 33
VoiceXML
Designed for creating audio dialogs that feature
synthesized speech, digitized audio
recognition of spoken and DTMF key input
recording of spoken input
telephony
mixed-initiative conversations
...to make Internet content and information accessible
via voice and phone
VoiceXML Forum is an industry organization
founded by AT&T, IBM, Lucent and Motorola
Submitted for consideration by W3C as standard
http://www.w3.org/TR/voicexml/
Some tools available on www.alphaworks.ibm.com
Mark Colan An Overview of XML Technologies IBM Corporation
Page 34
Web services specifications
Web services is a new model of data exchange
based on SOAP, an XML message protocol
See "Technical Overview of Web Services"
presentation on my web site
SOAP 1.1 / SOAP 1.2
UDDI 2.0
WSDL 1.1
and others under development
Mark Colan An Overview of XML Technologies IBM Corporation
Page 35
Consortium of 100+ industry players
Some vocabularies:
FpML - Financial Products Markup Language
IFX - Interactive Financial Exchange - Banking
RosettaNet - IT Supply Chain
OMG XMI - XML Metadata Interchange
Open Travel Alliance
Health Level Seven
OTP - Open Trading Protocol
WML - Wireless Markup Language
XML/EDI - Electronic Data Interchange
and 100+ others
OASIS / XML.ORG:
promoting industry vocabulary standards
Mark Colan An Overview of XML Technologies IBM Corporation
Page 36
The Electronic Business XML initiative
Established by UN/CEFACT and OASIS
Goal:
enable enterprises of any size, in any location, to meet
and conduct business through the exchange of XML
messages
provide an open technical framework using XML in a
consistent and uniform manner for the exchange of
electronic business data for B2B, B2C, and C2B
Lower the barrier of entry to electronic business
faster/cheaper development of e-business apps
facilitate trade
small- and medium-sized enterprises
developing nations
Mark Colan An Overview of XML Technologies IBM Corporation
Page 37
Part 4: DTDs (Document Type Definitions)
The structure of an XML document
is defined by its DTD. DTDs define:
the tags that can or must appear
how often the tags can appear
how the tags can be nested
allowable, required and default attributes
...but not the type of data
But note: the use of DTDs is optional!
DTD allows a validating parser to detect deviations from
a vocabulary
Can parse well-formed XML without a DTD
DTD is defined by the XML 1.0 specification
Mark Colan An Overview of XML Technologies IBM Corporation
Page 38
<address>
<name>
<title>Mrs.</title>
<first-name>Mary</first-name>
<last-name>McGoon</last-name>
</name>
<street>1401 Main Street</street>
<city>Sheboygan</city>
<state>WI</state>
<zip>38472</zip>
<country>USA</country>
</address>
Sample XML code
Mark Colan An Overview of XML Technologies IBM Corporation
Page 39
<!ELEMENT address
(name, street*, city,
state, zip?, country)>
<!ELEMENT name
(title?, first-name,
last-name)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT first-name (#PCDATA)>
<!ELEMENT last-name (#PCDATA)>
<!ELEMENT street (#PCDATA)>
<!ELEMENT city (#PCDATA)>
<!ELEMENT state (#PCDATA)>
<!ELEMENT zip (#PCDATA)>
Sample DTD
Must have one each:
<name>, <city>,
<state>, <country>
any text: alpha,
numeric, punctuation,
whitespace
Outer group
<address>
May have one or
more <street> tags
<title> and <zip> are
optional; may appear
at most once
Mark Colan An Overview of XML Technologies IBM Corporation
Page 40
Developing DTDs
Study the XML spec; use any text editor
Use a tool like Visual DTD*
editing environment with syntax-directed help
Write sample XML that illustrates all the ways
you'll use the data, then use a tool like Data
Descriptor by Example to generate the DTD.
These tools are available for free from
www.alphaworks.ibm.com
* Visual DTD is found in the package called
"Visual XML Tools"
Mark Colan An Overview of XML Technologies IBM Corporation
Page 41
What's wrong with DTDs?
No type support - #PCDATA can be any string of
characters (except tags)
DTD syntax is different from XML syntax
<!ELEMENT zip (#PCDATA)>
DTDs cannot express specific constraints:
element x can occur from 4 to 17 times
if the type of element y is "decimal", the y element must
contain an x element
But, as Yoda said, "There is... another."
Mark Colan An Overview of XML Technologies IBM Corporation
Page 42
An improved XML vocabulary definition language:
written in XML syntax (DTDs are not)
superset of DTD functionality
Schema composition
namespace support
Biggest improvement is specification of types
Built-in Simple Types
Derived types ("subclass" of built-in Simple Types)
Complex types: definition of an item with sub-items
Part 5: XML Schema
Mark Colan An Overview of XML Technologies IBM Corporation
Page 43
Built-in Simple Types
string, boolean
float, double,
decimal
timeInstant
timePeriod
timeDuration
month, year,
century
recurringDate, recurringDay,
recurringDuration
integer, nonPositiveInteger, positiveInteger,
nonNegativeInteger, negativeInteger, long, int, short,
byte, unsignedLong, unsignedInt, unsignedShort,
unsignedByte
uriReference
date
time
language
Name
QName
NCName
ID
IDREF
IDREFS
ENTITY
ENTITIES
NOTATION
NMTOKEN
NMTOKENS
Mark Colan An Overview of XML Technologies IBM Corporation
Page 44
<size> is a Simple Type
integer type is "built-in" to schema
no sub-elements or attributes
<shoeOrder> is a Complex Type
has an element
has an attribute
designed by a schema author
Simple and Complex types
<shoeOrder ship='2000-05-16'>
<size>9</size>
</shoeOrder>
Mark Colan An Overview of XML Technologies IBM Corporation
Page 45
New simple types may be defined from built-in
types, adding constraints via "facets"
Some facet usage examples:
A string that has a minimum and maximum length
An integer that has minimum and maximum values
A string with an enumerated list of allowed values
A type based on patterns
Defining new Simple Types
Mark Colan An Overview of XML Technologies IBM Corporation
Page 46
Example: US Postal ("zip") code
Only these two exact forms are legal:
five digits, or
five digits followed by a dash then four digits
Examples:
02155
02155-2153
Here's a schema definition for US zip codes:
<simpleType name="USZipcode"
base="string">
<pattern value="[0-9]{5}(-[0-9]{4}?)"/>
</simpleType>
Mark Colan An Overview of XML Technologies IBM Corporation
Page 47
Example: Complex Type
<complexType name='ShoeOrder'>
<element name='size' />
<complexType content='empty'>
<attribute name='sz' type='integer'/>
</complexType>
<attribute name='ship' type='date' />
</complexType>
<shoeOrder> has
a <size> element
a ship= attribute
<shoeOrder ship='1999-05-21'>
<size>9</size>
</shoeOrder>
Here is a schema definition for <shoeOrder>:
Mark Colan An Overview of XML Technologies IBM Corporation
Page 48
Developing XML Schemas
You can:
Study the XML Schema spec; use a text editor
Use Visual DTD*, an editing environment with
syntax-directed help
Write sample XML that illustrates all the ways you'll use
the data, then use Data Descriptor by Example to
generate the Schema.
These tools are available for free from
http://www.alphaworks.ibm.com
*Visual DTD is found in the package called
"Visual XML Tools"
Mark Colan An Overview of XML Technologies IBM Corporation
Page 49
Tools for XML Schemas
Apache Xerces-J / IBM XML4J
beta release of complete implementation of final
XML Schema 1.0 Specification
use to validate Schemas
http://xml.apache.org or
http://www.alphaworks.ibm.com
Visual DTD
includes support for XML Schema data types
see http://www.alphaworks.ibm.com
Convert DTD to XML Schema using Perl
see http://www.w3.org/2000/04/schema_hack/
Mark Colan An Overview of XML Technologies IBM Corporation
Page 50
Learn more about XML Schema
A schema tutorial can be found on IBM's
developerWorks at:
ibm.com/developerworks/library/xml-schema/?dwzone=xml
XML Schema Part 0: Primer, for a very readable
introduction by IBM's David Fallside
w3.org/TR/xmlschema-0/
w3c Schema Tools list:
w3.org/XML/Schema
Robin Cover's index of XML Schema materials
oasis-open.org/cover/schemas.html
Mark Colan An Overview of XML Technologies IBM Corporation
Page 51
eXtensible Stylesheet Language
W3C-defined language for expressing stylesheets and
transformations
XSLT processor
provides a mechanism for transforming and formatting
XML data, either at the client or on the server
e.g. Xalan from xml.apache.org
Typical uses:
rendering XML to HTML, PDF, or plain text
XML vocabulary translation
Part 6: XSL - eXtensible Stylesheet
Language
Mark Colan An Overview of XML Technologies IBM Corporation
Page 52
XSL
XSLT
Transformation
language
XSL: Three parts
XPath
Formatting
Objects
an XML Vocabulary
for specifying
formatting semantics
a language for
addressing parts of
an XML document
Mark Colan An Overview of XML Technologies IBM Corporation
Page 53
<term>Transformation</term>
XML is pointless without transformation!
"Styling":
prepare data for presentation
e.g. render in html
really a transformation from one XML form to another
(HTML)
"Transformation":
convert from one XML form to another
e.g. vocabulary translation
Mark Colan An Overview of XML Technologies IBM Corporation
Page 54
XSL Processing Overview
XSLT
Engine
app
processing
Source
XML
Result
XML
Stream
XSL
Style
Sheet
Result Tree
one or more templates
in the XSL syntax
Apache Xalan, LotusXSL, ...
Mark Colan An Overview of XML Technologies IBM Corporation
Page 55
How is XSLT used?
Legacy
Data
Store
Client
Middle-tier
Server
XSL
SQL
Style
Sheet
HTML
Stream
IBM Websphere,
Lotus Domino,
Apache, etc
SQL
Translator
DB2 XML Extender
DB2 Universal Database
Mark Colan An Overview of XML Technologies IBM Corporation
Page 56
XSL Style Sheet Input File: Book0.xsl
<xsl:stylesheet>
<xsl:template match="title">
<h1>
<xsl:apply-templates/>
</h1>
</xsl:template>
<xsl:template match="author"/>
<xsl:template match="price"/>
</xsl:stylesheet>
XML Input File: Book0.xml
<book>
<author>Tom Wolfe</author>
<title>The Right Stuff</title>
<price>$6.00</price>
</book>
HTML Output
The Right Stuff
XSL
PROCESSOR
XSLT: A Simple Example
(...maybe a little too simple!)
<h1>
</h1>
Problem: Display the title (only) in html as <h1> headline
Mark Colan An Overview of XML Technologies IBM Corporation
Page 57
Part 7: XML Programming, DOM, and SAX
Typical processing steps:
Parse the document (using XML parser)
Parser builds a DOM tree (DOM case)
Parser sends events (SAX case).
Process the document :
Using the DOM API, or
By handling SAX events.
Interpret the data/Do something with the data.
Display it.
Produce a report.
Transcode it, etc...
Mark Colan An Overview of XML Technologies IBM Corporation
Page 58
XML Processing Model (DOM)
XML File
DTD
XML Parser
Application
DOM Tree
Process
Parse
Errors
Mark Colan An Overview of XML Technologies IBM Corporation
Page 59
XML Processing Model (SAX)
XML File
DTD
XML Parser
Application
Events
& Errors
Parse
Mark Colan An Overview of XML Technologies IBM Corporation
Page 60
Parsing The XML Document
Invoke the parser
Parser invocation/control API may vary
JAXP is a new Java standard API
to invoke an XML parser
Parser helps with
Validation
Well-formedness checking
Building a document tree (DOM)
Notifying the application of errors
Most parsers can handle files and streams
NOTE: Many parsers allow some/all of the error
checking features to be disabled
Mark Colan An Overview of XML Technologies IBM Corporation
Page 61
The DOM Tree
DOM is a set of API definitions. DOM Specification
gives abstract APIs
Language bindings exist for popular languages
C++ and Java bindings are defined as classes and
methods
Classes:
Root - always the top-level node of any DOM tree
Element nodes - corresponds to <element>s
Attribute nodes - corresponds to <element> attributes
example: <element name="value">
Text nodes - text found between <element> tags
there can be several text nodes for a given element
for example, when text is placed between sub-elements
A sub-element is just an element node whose parent is
an element... giving us a tree representation
Mark Colan An Overview of XML Technologies IBM Corporation
Page 62
DOM Tree Example
<book>
<author>Tom Wolfe</author>
<title>The Right Stuff</title>
<price>$6.00</price>
</book>
these children
are text nodes
Tom
Wolfe
$6.00
The
Right
Stuff
<book>
<author>
<title>
<price>
ROOT
Tree has an implicit ro
node
<book> is the
document element
ws
ws
ws
ws
Elements
Mark Colan An Overview of XML Technologies IBM Corporation
Page 63
How is SAX different from DOM?
SAX is an event model
register an event listener, and you will be called
for each node found as the parser reads the document
SAX doesn't create data structure in memory
only sends events
DOM parsers typically process SAX events and build a tree
of DOM nodes according to ewhat it gets
Why use DOM?
convenience, or for editing the DOM tree as a whole
need for random access of content of XML document
only use DOM when you have enough memory to store
the entire document (or are willing to go virtual)
Why use SAX?
when you need to process really big DOM trees
when can work with nodes in the order they are received
when you need a subset of the XML document
e.g. Xalan builds a smaller tree representation as DOM subset
Mark Colan An Overview of XML Technologies IBM Corporation
Page 64
Part 8: XML - Best Practices (and pitfalls)
Mark Colan An Overview of XML Technologies IBM Corporation
Page 65
Avoid over-structuring XML
Use simplest arrangement that represents data and
all required relationships
Don't try to save space by avoiding redundant data
Example:
employee list should be a flat file containing department,
division, and other classification fields
don't structure like an org chart
Why?
compression takes care of space problem
easier coding, easier stylesheets
stylesheets can transform to any required arrangement
Mark Colan An Overview of XML Technologies IBM Corporation
Page 66
Spell it out!
Avoid abbreviations that obscure the meaning.
Compression gives similar results whether
abbreviated or spelled out
Easier for developers
Fewer errors from guessing the meaning
Use meaningful element names
Mark Colan An Overview of XML Technologies IBM Corporation
Page 67
Never write code to parse XML: use Xerces
Never transform XML via hand-written code: use
Xalan
less code to write
fewer bugs to find and fix
can change faster when needed
gain new features easily by incorporating a new version
(e.g. parser gains schema support)
Xerces is written optimally
Xalan is being reworked for better efficiency
Use off-the-shelf tools
Mark Colan An Overview of XML Technologies IBM Corporation
Page 68
Use standard vocabularies
Always look for a standard vocabulary before
inventing your own
easier to exchange data with business partners
standard vocabularies mature to avoid problems, and
benefit from several people's insight
reusable DTDs, Schema, even stylesheets
...or work with competitors AND partners to develop
one specific to your industry
"cooperate on standards, compete on implementations"
lower costs for all
speeds the growth of your industry
Mark Colan An Overview of XML Technologies IBM Corporation
Page 69
Write your own vocabularies
Model internal XML data after your internal business
processes
internal processes should be your competitive edge
Use XSLT to combine, transform XML data to
standard forms for business exchange
Mark Colan An Overview of XML Technologies IBM Corporation
Page 70
Structure your company as dot-coms
Use software standards to provide loose coupling
between departs
Well-designed software and data interfaces minimize
the impact of change; change is inevitable
minimize inter-dependence of different systems
Allows easier out-sourcing when cheaper than in-house
Focus on your core competencies, outsource the rest
Mark Colan An Overview of XML Technologies IBM Corporation
Page 71
Know when to use DOM vs SAX
DOM is the easiest to use
parser builds complete data representation in memory
standard API, or data-specific access beans make
programming easy
allows referencing between any part of the data
Use SAX for really big data
may not be practical to build a tree representation of
10GB of XML data (how much RAM do you have?)
SAX events are generated by the parser
handle the ones you are interested in, drop others
process data as it comes along
Mark Colan An Overview of XML Technologies IBM Corporation
Page 72
Avoid defaults in DTDs
Parsing builds a tree with default values in place of
omitted elements
can REALLY bloat your DOM tree
Can externalize default values
store in preamble of XML, or separate stream
design software to access default values when an
optional element is missing
Mark Colan An Overview of XML Technologies IBM Corporation
Page 73
Use XML Schema, not DTD
XML Schema is a W3C recommended spec (May
2001)
Many tools now support most of Schema working drafts
Xerces/Java is 100% compliant (current release is beta,
has some documented limitations)
Tools will quickly evolve to support finished spec
Validation comes for free!
no tedious, bug-ridden validation code
ideal for accepting data from many sources
(e.g. Web Services messages)
Mark Colan An Overview of XML Technologies IBM Corporation
Page 74
Part 9: XML Resources
Mark Colan An Overview of XML Technologies IBM Corporation
Page 75
XML Parser (Apache Xerces)
XML4J, XPK4J, XML4C
XML for RPG on AS/400
LotusXSL (Apache Xalan)
SOAP for Java
TexML
DataCraft
P3P Parser
XML Security Suite
XML Translator Generator
VoiceXML
Voice Server SDK (beta)
XML Viewer
Dynamic XML
PatML
TaskGuide Viewer (wizards)
XML Master (XML Bean
generator)
XML TreeDiff
XML Diff & Merge
XML Lightweight Extractor
XMI Toolkit
Bean Markup Language
SVGView
XML Enabler
Data Descriptors by Example
SVG Viewer
XML Productivity Kit
Visual XML Builder
Visual DTD
Visual XML Query
Visual Transformation
XML Generator
XSL Editor v2.0
Free XML Tools and Technologies
(also Java, C++, multimedia, pervasive computing, voice, and many others)
Mark Colan An Overview of XML Technologies IBM Corporation
Page 76
IBM and Apache Software Foundation
IBM contributed key XML technologies to Apache
Xerces: XML parsers (Java and C++)
Xalan: XSL processors (Java and C++)
SOAP4J: platform-neutral SOAP 1.1 implementation
Open source accelerates vendor-neutral standard
Non-proprietary implementation of W3C specs
Public participation through code contribution
Ongoing demonstration of IBM's dedicated
support of industry standards
Mark Colan An Overview of XML Technologies IBM Corporation
Page 77
developerWorks
ibm.com/developerworks/
New portal to make developers' jobs easier and faster
Focused on open standards
and cross-platform compatibility
Technical resources across
Linux, Java, XML, open source
and more
Combines IBM expertise with
third-party content
Mark Colan An Overview of XML Technologies IBM Corporation
Page 78
Mark Colan An Overview of XML Technologies IBM Corporation
Page 79
Mark Colan An Overview of XML Technologies IBM Corporation
Page 80
Patterns for e-business
What they are:
Repository of successfully implemented designs
architecture patterns, design patterns, runtime patterns,
guidelines and best practices, code
source of information for architects and developers
distillation of the collected wisdom of IBM, partners,
and customer IT architects
code samples and representative implementations
What they aren't:
a complete road map to build applications
architectural imperatives for success
vision of what might be in the future
Free, at http://ibm.com/framework/patterns
Under revision to use XML and Web Services in
designs
Mark Colan An Overview of XML Technologies IBM Corporation
Page 81
Part 10: IBM and XML
Mission: Deliver XML-based solutions that will help
our customers and business partners build, deploy,
and manage e-business applications.
We are doing this by
Ensuring strong, open standards
Enabling our entire product line for XML
Building e-business solutions
Mark Colan An Overview of XML Technologies IBM Corporation
Page 82
Enabling the IBM product line
Dev Tools and
Components
Application
Servers and
Integration
Software
Secure
Network and
Management
Software
NT OS/2 AIX HP-UX Solaris OS/400 OS/390 Linux NetWare
Build
WebSphere
Studio
VisualAge
Deploy
WebSphere
Domino
DB2
MQSeries
Manage
Tivoli
SecureWay
Mark Colan An Overview of XML Technologies IBM Corporation
Page 83
First Union
Legacy
Systems
H
e
y
!
Java and XML
Interfaces
4
Hey!
Mobile
Banking
Online
Banking
Mark Colan An Overview of XML Technologies IBM Corporation
Page 84
jStart Engagement Model
Business
Qualification
Phase I
Evangelism
Business Value
Proposition
Determine
Readiness
Identify Candidate
Projects
Gain Executive
Commitment
Project
Definition
Phase II
Business
Objectives
Requirements
Gathering
Prioritization
and Selection
Project Scope
and Definition
Project
Readiness
Phase III
Project Design
Use Case
Analysis
Produce Project
Plan / Sizing
Develop Business
Proposal
Customer
Commitment
Phase IV
Proposal
Presentation
Contractual
Agreement
Solution
Building
Phase V
Implementation
and Deployment
IBM Reference
Activity
jstart@us.ibm.com ibm.com/software/jstart
We use a "jumpstart" approach
to help customers successfully
build e-business solutions using
XML and Web Services,
starting with education and
ending with services.
Mark Colan An Overview of XML Technologies IBM Corporation
Page 85
Industrial revolution: 30 years to accomplish
Industrial revolution: 30 years to accomplish
Internet standards revolution:
Internet standards revolution:
we're now in year 5
we're now in year 5
Part 11: Perspective
Mark Colan An Overview of XML Technologies IBM Corporation
Page 86
XML in 2001
Most important base technical standards
are now W3C recommendations (http://w3.org)
XML Schema is now a W3C recommendation
announced last week at WWW10 Hong Kong
OASIS has established XML.ORG as the registry
and repository for industry-specific vocabulary
standards
xml.apache.org has robust, mature implementations
of XML parser, XSL processor, and SOAP4J as
open-source implementations
countless products using XML from all vendors
Mark Colan An Overview of XML Technologies IBM Corporation
Page 87
The e-business future: Are we there yet?
We all know XML is the way to exchange data
between platforms
regardless of programming language
on different devices
with different middleware
Isn't there something missing here?
We need standards for exchanging data.
What's the model and mechanism for exchanging
documents to conduct e-business?
ebXML.org
Web Services
Mark Colan An Overview of XML Technologies IBM Corporation
Page 88
Will XML catch on?
Trough of
Disillusionment
Slope of
Enlightenment
Plateau of
Productivity
Maturity
Technology
Trigger
ASPs
Speech Recognition
Smart Cards
Micropayments
Digital Ink
Synthetic
Characters
Biometrics
Enterprise Portals
Audio Mining
Jini
xDSL/Cable
Modems
3D Web
WAP/Wireless
Web
Voice Portals
Visibility
XML
Quantum
Computing
Webtops
Voice over IP
Bluetooth
Java
Language
Less than two years
Two to five years
Five to 10 years
More than 10 years
Gartner Group "Hype Curve": 4Q2000
XML: 4Q99
one year!
XML:
4Q01??
Note: XML 4Q01 is my projection, not Gartner's
Mark Colan An Overview of XML Technologies IBM Corporation
Page 89
Web Services: the next phase for XML
Web Services defines a model for data exchange
built on XML
Web Services technologies are also XML
technologies
SOAP - XML Message Protocol
WSDL - Web Services Description Language (an XML
vocabulary)
UDDI - Universal Description, Discover, and Integration
...and some new ones: WSFL, WSEL, ...
As such, Web Services can be seen as the next step
for XML technologies. If you are interested in XML,
you probably want to learn about Web Services too.
Mark Colan An Overview of XML Technologies IBM Corporation
Page 90
Questions?
ibm.com /alphaworks
site for free emerging tools and technologies from IBM
ibm.com/developerworks/xml
XML Zone on developerWorks - resources for customers
and developers on the use of XML
xml.apache.org
open source XML tools from Apache Software Foundation
w3.org
XML base technical standards
xml.org
XML standard vocabularies repository
xml.org/xmlorg_news/index.shtml
new and news (the Cover pages)
ebxml.org
electronic business in XML initiative
Mark Colan - mcolan@us.ibm.com
ibm.com/developerworks/speakers/colan
copies of this and
other presentations
are available
on this site
Mark Colan An Overview of XML Technologies IBM Corporation
Page 91

Você também pode gostar