XML Explained

Tim Landgrave’s XML series www.techrepublic.
com
XML: Explained by Tim Landgrave

As one of TechRepublic’s most popular writers, Tim Landgrave draws on his years of technical and
business experience to offer advice each week to CIOs, CTOs, and other IT professionals.
A founder of KiZAN Corporation, Tim’s company was one of the first in the world to install Microsoft
Windows NT in 1993. Two years later in 1995, KiZAN was chosen by Microsoft as its first International
Solution Provider of the year.
Tim was also an architect and author of Microsoft’s MSDN Development Architecture Training as well
as a MS Press book on developing multitier applications on the Microsoft platform.
After selling KiZAN to Panurgy, Tim launched Vobix (http://www.vobix.com/careers.htm), an application
services provider focused on providing tools to small and medium-size businesses that are usually
available only to larger companies.
After we ran Tim’s four-part series on XML, we received such a response that we thought TechRepublic
members would appreciate a download of all four articles. In these columns, Tim outlines:
• XML’s potential to change the Web.
• How XML can change development.
• Using XML to ease application integration.
• How XML resolves the debate between the use of COM or CORBA.
If you would like to comment on this download, send us an e-mail
(mailto:cio@techrepublic.com?subject=XML%20download).
Table of Contents
From HTML to XML: A language lesson on the future of the Web .......................................... 2
Prepare for the new Web formats: Be XML savvy ................................................................... 5
Use XML to make your application integration go smoothly.................................................... 8
How XML will resolve the COM versus CORBA debate and end world hunger.................... 10
1
Tim Landgrave’s XML series www.techrepublic.com
From HTML to XML: A language lesson on the

future of the Web
I remember attending Comdex in Las Vegas in the early days of the computer revolution (1982-1985).
Given the size of the show, the main focus was to get as much information that you could, in as short a
time as possible. There was limited time for interaction with folks in the booth (and generally limited
knowledge as well).
As shows like this continued to grow, their effectiveness began to wane. Unless you knew exactly what
you wanted and took the time to plan properly, you never got to see the right people or gadgets. Most of
my interactions with these companies have been replaced with direct mail, magazine reviews, and direct
interaction with company employees.
Well, the Internet of today has a lot in common with those early Comdex shows. In the sea of Web
pages, it’s difficult to ferret out the information that you need. More importantly, other than the big guys
like Amazon, who focus on the Web as a medium for transactions, most companies still think of the Web
as a literature stand placed in front of their electronic Comdex booths. Why? Because the nature of the
Web and the protocol it’s built on—HTML—is to provide a display of rendered data, not a source of
usable, well-defined, interactive data.
Isn’t HTML good enough?

In the early days of the Internet, Tim Berners-Lee began working on a way to pass formatted documents
between machines that were running different operating systems. Given its widespread use in industry,
Standard Generalized Markup Language (SGML) looked like an obvious choice. SGML had been used
since the mid '70s by companies in certain industries, like automotive and healthcare, where it was
necessary to handle large, complex documents that needed to pass between multiple platforms. SGML is
a very complex and robust markup language that’s difficult to master, with tools that are expensive to buy
and experts that are focused on text-based applications.
DEFINING MARKUP LANGUAGE

Markup languages describe how text is structured within a given document. A markup language
focuses on defining the structure and content of a document and its sections, not on the final formatting
of the document for display or printing. Sections of a document are described using “tags” with a
process called “tagging.” Once a section is tagged, changes to all sections with similar tags can be
made simultaneously. Formatting for display or printing is usually accomplished with a special set of
external instructions called “style sheets” and an application called a “renderer” that applies the style
sheet instructions to the tagged sections of a document. Since markup languages focus on defining the
content but not the presentation, different style sheets or applications can process the same markup
document but provide results formatted differently.
But supporting a markup language like SGML (whose standards document was now over 600 pages
long) meant developing more advanced browsers and forcing Web developers to use expensive SGML
tools. The standard that those early pioneers developed is what we now know as HTML or the HyperText
Markup Language. HTML is an application of SGML, or a specific set of tags and processing instructions
designed to define and create a specific document type—a Web page. HTML was designed to allow
content and formatting in a single document to be passed from a server where it resided to the client
browser who requested it.
2
Here it was rendered based on the client’s ability to interpret the tags. If a browser didn’t support a
particular tag, then the information simply wasn’t displayed. This allowed the same document to be
rendered on different machines, while gracefully degrading if features weren’t supported. It also led to
less stringent requirements on the validation of the document’s format. All of this validation took place at
display time, based on the capabilities of the client’s renderer.
This legacy has continued even as newer versions of HTML have been submitted and approved by the
Internet Engineering Task Force (IETF). Even though features like programmability (JavaScript), object
models (Dynamic HTML), and Style Sheets (CSS) have been added to the core markup language, HTML
has a clientcentric focus that no amount of language tweaking can overcome. The real value of HTML
going forward is to focus on its strengths, formatting, and display of data, and using XML to bolster its
weakness, document processing.
XML: SGML on the Atkins Diet

Where HTML is an SGML application, XML is a subset of SGML. Contrary to popular opinion, XML is not
intended to replace HTML, but instead will give Web developers a more robust document definition and
processing capability.
HTML retains its necessity as a rendering engine for those devices that know how to process it. Since
XML separates display instructions (called style sheets) from content definition (XML documents), a site
designer can alter the look and feel of the site simply by applying a different style sheet to the same XML
document.
More importantly, this allows you to use the same content for other systems or devices that don’t use
HTML for their display processing.
XML documents are highly structured, well-defined objects that can be represented by a common
object model, called the Document Object Model. This is important because the ability to represent any
XML document within an object model makes interoperability possible. If an external system wants to
process an HTML file, the system must do the equivalent of Mainframe screen scraping to attempt to
discern the content of the document. Even then, a string like “1/1/2000” can mean either a date or a
financial formula. It’s the ability for XML to store the value and attributes of a piece of data that makes it
different than HTML. It’s the existence of this DOM that provides the secret sauce to XML developers. It’s
now possible to intelligently traverse or process a set of data stored in XML. For example, suppose you
have an XML document like the one below:
!
" ### "
$%&
'!
" ! ### "
3
()
*
" + ### "
From this set of data, I can use an application or a style sheet to display all my holdings and their
purchased value, just one stock of interest or just the total value. More importantly, I can write an
application that can take the input from this XML document, locate the <Ticker> element, use an external
process to look up the current value, and then use a style sheet or custom program to display the current
value of my portfolio. As an XML document, any particular element is directly accessible.
Using XML to create robust Web systems

Next week, we’ll look at how this technology provides big benefits for companies who want to support
new browsers and devices. This ability to dynamically adjust your Web platform to new platforms will help
you begin the process of converting your Web pages from online brochures to systems with which your
customers and trading partners can interact.
4
Prepare for the new Web formats: Be XML savvy

Most CIOs have been around long enough to remember the sense of wonder in using a graphical
interface for the first time. Even though the interface was slow and confined to accessing only local
applications and resources, it was a quantum leap beyond a command line.
Then the Web expanded our vision of what resources we should be able to reach with the click of a
mouse. But with the advent of cell phones and wireless networking, it’s somehow no longer good enough
to be able to access the Web only from your desktop.
Our customers and employees are beginning to expect that they will be able to access our Web
services from common mobile devices like pagers and cell phones. Personal digital assistants like the
Palm VII now allow you to stay connected to the Web wherever you go.
As the number of these Web devices grows, we need to be able to design our Web applications and
services in such a way that people can use these devices to access our sites, or we’ll be left at a
competitive disadvantage. Redesigning your systems to use a mix of structured and unstructured stores
will enable you to meet the demand for supporting these next-generation Web devices. The combination
of XML and XSL technologies will allow you to meet this important business objective. Here’s how it
works:
Using Web devices with style

To see how XML and its companion technology XSL (the eXtensible Stylesheet Language) will help us
address these problems, we first need a little Web development history refresher. When Web developers
first began posting pages, they had to develop them to conform to a specific version of HTML.
For example, HTML 2.0 browsers would not support frames. Therefore it became necessary to not only
detect the version of HTML that the browser would support, but also provide either pages that conformed
to the least common denominator or to create sets of pages for each version of HTML that was to be
supported.
This problem has been compounded by the fact that we’ve not only had to deal with multiple versions of
HTML (2.0, 3.2, 4.0) but with client scripting languages (JavaScript, Jscript and VBScript) and different
manufacturers’ support of HTML standards (Mosaic, Microsoft, Netscape).
Now, to make matters interesting, multiply these combinations by the number of potential new devices
that could use Web protocols in the future (watches, car radios, televisions, refrigerators, etc.). There’s
just no way that the existing pool of available developers can create the sets of pages required to support
all of these different device types. So how do we solve the problem? Enter XSL.
Think of XSL as a set of processing directives that tells a system how to render a particular XML data
set. For example, suppose you have a site that stores your stock portfolio and you need to check its value
from anywhere. Your portfolio could be represented by the following XML document:
!
" ### "
5
$%&
'!
" ! ### "
()
*
" + ### "
When you’re at home or in the office where you have a powerful PC and an intelligent browser like
Internet Explorer 5.0, you’ll want the information displayed in a rich, graphical format. This would include
portfolio valuations, links to company Web sites automatically displayed, and the latest stock prices
automatically downloaded.
To accomplish this, the site would send the XML document down to your browser as well as an XSL
style sheet that describes how the page should be rendered. Since IE5 supports XML and XSL, the
browser will create and display the page using the processing power of the PC. Moreover, if you decide
you want to see the data displayed in a different format, then you can download additional XSL style
sheets without having to download the data again.
The ability to process and render XML and XSL documents locally allows the Web developer to create
rich, interactive user interfaces to be consumed by any system that understands these IETF standards.
But what about your cellular phone, PDA, or “Web watch?”
Well, it’s unlikely that any of these devices (at least in their first generation) will support XML and XSL
natively. So how do you take advantage of the technology? Simple: Allow the server to do the work! Web
servers designed to support XML can make site development for these new devices much easier.
Suppose you want to display information from the same stock portfolio on your cell phone. First, create
an XSL style sheet that can process the same XML file. But instead of producing the HTML that would
normally be displayed by a desktop browser, it instead produces files conforming to WAP (Wireless
Access Protocol, supported by most cell phone manufacturers). Now the cell phone can display the same
portfolio information even though it has a much more limited display. Creating an XSL style sheet that
supports PDA display standards, “Web watch” or “Web refrigerator,” allows the data to be displayed on
these respective devices as well.
6
How XML changes Web development

If you ask your development staff how they currently handle the issues of supporting multiple browsers,
they will tell you one of two things:
1. “We don’t. We just support the least common denominator (HTML 3.2).”
2. “We detect the browser type and then just point the user’s browser toward a totally separate set of
HTML files.”
If they’re using the first method, then your site is not taking advantage of the rich formatting available to
most Web browsers. If they’re using the second method, then you’re paying to maintain two virtually
identical source trees with different sets of rendering commands.
If you want to see them scream and run for cover, show them your Palm VII or your cell phone and ask
to see a version of the Web site for one of those devices. Then tell them to sit down, take a deep breath,
and consider redesigning future versions of the site using the XML/XSL method.
This method basically involves separating the site data into two piles of stuff. First is a single set of well-
defined XML documents containing all of the data from the site. Second, separate all of the formatting
directions into a set of XSL templates—one for each platform to be supported (HTML 3.2, DHTML, IE5,
WAP, “Web Watch,” etc.).
Next, write the browser detection and scripting code necessary to figure out what device is being used
to access the site and then to either download the appropriate pairs of XML/XSL files (for IE5) or to direct
the Web server to use XSL to transform the XML into the appropriate client format. Once they’ve gone
through this exercise, future content can be placed in the first file and be rendered on any device. New
devices can be supported by adding the appropriate XSL templates.
Will this get any easier?

As XML becomes a more standard method for passing data between applications and devices, more
back end data repositories will begin consuming and emitting XML natively. For example, Microsoft’s next
release of SQL Server (SQL Server 2000) will be able to return XML as the response to a query, which
could be passed on to the appropriate XSL template and rendered to the appropriate device
automatically.
The upcoming Exchange 2000 release supports a new data store (called a “Web store”), in which all
individual elements (mail messages, contact records, appointments) are stored as XML documents and
can be displayed in the new Exchange Web client using standard XSL templates. This native support for
XML and XSL will make it even easier to develop future Web applications that support multiple client
devices.
7
Use XML to make your application integration

go smoothly
There has been plenty of talk in the trade rags over the last year about enterprise application integration
(EAI). Basically, EAI is the process of creating standard bridges between the different systems that make
up an organization’s computing environment.
A lot of the emphasis on EAI is the result of the increasing number of mergers and acquisitions, which
require companies to keep multiple similar systems operating while developing their long-term integration
strategy.
Unfortunately, many of the resulting EAI initiatives have resulted in ill-conceived and shortsighted
strategies designed only to solve the near-term problem of application integration within a company and
not the longer-term issue of how applications between companies will be able to integrate. With an
understanding of the role that XML can play in solving these problems, you can position your company to
solve both the “within” and “between” application integration issues at the same time.
When integration problems are like rabbits

Let’s suppose you need to pass invoice data bi-directionally between two systems, where System A
stores invoice data in Format A and System B stores invoice data in Format B. To solve your data transfer
issues, you just need to take one of your entry level COBOL or VB programmers and have him or her
write two programs—one to move data from Format A to Format B, and one to move from Format B to
Format A. Simple, right?
Well, let’s suppose you add System/Format C to the mix. Now you don’t need to write just two
programs, but six. Add System/Format D and you need 12. And this is just one set of data (invoices)
being moved around. This can quickly turn into a “rabbit farm” problem (or “Tribble” dilemma—if you’re
from a more technical background).
A more reasonable approach is to decide up front on the format and structure for the common invoice
elements of all the systems. Then develop a single in/out conversion program for each system to and
from the common format. Now imagine that you can get a large number of commercial software
developers, integration firms, and industry groups to standardize on the same formats for data transfer
even if their internal structures are different. All of a sudden, we’re not just solving our own EAI problem,
but working toward a solution in which systems between multiple companies can now exchange data
more easily.
When EDI isn’t enough

Isn’t this what electronic document interchange was designed to do? In a simplistic way, the answer is
“yes.” In the late 1970's, users and vendors input their requirements to create a set of national electronic
data interchange (EDI) standards. They were working to develop data formats that could reduce the
labor-intensive task of exchanging data while being:
1. Hardware independent.
2. Clearly defined so all trading partners could understand them.
3. Structured in a way that allowed the originator of the transaction to control the exchange and
verify if and when the recipient processed the transaction.
Today there are two widely recognized EDI standards: X12 and the Electronic Data Interchange For
Administration, Commerce, and Transport (EDIFACT).
Although the X12 and EDIFACT standards provide a great deal of flexibility regarding how application
data is represented by EDI data, they both suffer from the same problem: They’re rigid, externally defined
data structures designed to pass data between systems and not allow systems to interoperate.
8
Implementing EDI also requires the purchase and maintenance of expensive software and systems as
well as a contract with one of 20 or so value added network providers (or VANs). None of these
requirements makes EDI conducive to use in an EAI environment.
XML, on the other hand, provides a dynamic, self-describing or externally described data format. The
flexibility of defining unique data structures for each vertical industry and document type versus the
rigidity of conforming to the X12 or EDIFACT standard makes XML an ideal tool for enabling business-to-
business transactions. With XML’s ascension to the throne as the next “on the wire” data format for Web
applications, it’s natural that software developers and their customers are looking for ways to leverage
XML as a data transport between applications.
When will XML be ready for prime time?

There are a few groups in the industry working to define the document structures (also called XML
schemas) for specific vertical industries and promoting their use to software developers, industry
consultants, and companies in the affected industries. One of these is the BizTalk consortium.
BizTalk is not a standards body. It’s composed of potential users of the standards who are committed to
sharing their work in defining schemas in order to accelerate the adoption of XML-enabled, business-to-
business e-commerce and EAI. The end result of this effort is called the BizTalk Framework. The
framework defines a set of guidelines for how to publish schemas in XML and how to use XML messages
to easily integrate software programs.
It will take a couple of years for the standards defined by the BizTalk group and other competing efforts
to shake out, but during the shakeout it’s well worth your organization’s time to investigate standardizing
your internal application integration efforts on one of the industry schemas. I expect the competing groups
to provide an XML- or XSL-based migration path from their industry schemas to the mutually agreed-upon
industry schema.
In the interim, you can derive a lot of value from getting your internal systems to communicate using a
common, XML-based schema. The BizTalk site also has resources like mapping tools, sample code, and
white papers to help you begin the process of defining your business using a common schema.
9
How XML will resolve the COM versus CORBA

debate and end world hunger
With all the wild claims being made about XML, I thought it was appropriate to make one of my own and
state that world hunger will be a thing of the past thanks to XML.
OK, so XML won’t really end world hunger, but it will ultimately resolve the debate over which object
architecture makes the most sense, COM or CORBA.
In this article, we’ll review how new multitier systems are created using these competing object
technologies and how XML will become the new interface for both.
Interfaces and plumbing

Object models like COM and CORBA have two basic sets of functionality: interfaces and plumbing.
At an interface level, they each define how an object can be queried for its available methods, and then
define the mechanism for calling those methods and returning the results. The plumbing level deals with
the internal workings of the object and its interaction with the underlying operating system that hosts it.
Building systems with either of these object models involves building application server farms that sit
between a set of User Interface machines (either Web servers or PCs that can make direct calls to the
application servers) and a set of database servers (that can be databases like Oracle and SQL Server,
systems like AS/400s, or Mainframes using SNA as a transport protocol).
The user interface machines use the object model’s native query interface and method calling services
to gain access to the services offered by the application servers. But how can we allow any Web server or
other external program to make calls to the application servers without having to know anything about the
object model’s native query interface or method calling services? Now we can leverage all the work we’ve
done developing our internal systems and allow external systems to use our application servers without
resorting to custom programs or costly custom interfaces.
Web sites as big objects

This isn’t as far-fetched as it sounds. I’ve helped companies use Web technology to hook together their
current systems. For example, a catalog company collects their orders on a Web site, but then uses
HTTP to call a CGI script on their AS/400 to actually place the order.
The CGI script on the AS/400 simply calls the existing COBOL program that places orders and passes
it the query string information to place the order. Unfortunately, these system interfaces are hard coded
and require intimate knowledge of both systems in order to make them work together properly. But by
using XML to define the interfaces and any custom object types, we can make the process more
universal and accessible.
Let’s imagine a large, international (and, of course, totally fictitious) bookseller Web site called Piranha.
You order books from Piranha.com by navigating through their Web site, selecting books, putting them in
the shopping cart, and then pressing the “Buy” button. But Piranha wants to enable other Web designers,
corporate developers, and B2B partners to find and purchase books programmatically. In other words,
they want to expose their site’s object model. How could this work?
First, they place a certificate-protected file called Piranha.xml on their root Web site. This XML file
defines the methods and properties exposed by the Piranha Web site. These methods and custom types
could include the following:
10
,
) )-.
( /
. 0 1
-/ 2 -/
. 0 1
( /
$ /
) % ,
-/ ) ( -/
) % ,
) )-.
)
( /
// 3&
-/ // 3& -/
// 3&
) -
-/ .,, -/
) -
( /
$ /
& 1
-/ & 1 -/
& 1
$ /
,
-/
) % ,
( )0 "4 (04 . &5 $ 6 4 74%4

-/
Making it all work together
11
So how does this work? The calling program first issues a GET request to:
http://www.piranha.com/piranha.xml
If the appropriate security challenge (certificate, Kerberos password, etc.) is met, then this XML schema
is returned. Next, using the local XML parser, the program determines what methods can be called on the
Piranha site and what parameters need to be passed. To place the order, the calling system first issues
the ListBooksByAuthor command, passing the appropriate parameters. Let’s suppose you want to see all
books by Tom Peters. The command line would look like this:
http://www.piranha.com?method=ListBooksByAuthor,AuthorName=Peters
Now the back-end Piranha systems can call their ListBooksByAuthor method, pass in the name
“Peters,” create a text stream that represents all of the books by authors with “Peters” in their name, and
format the text stream into an XML structure that matches that of the BooksReturned type defined in the
Piranha.xml file. The calling program can retrieve the XML structure from the command line (or some
other method like cookies, streams, etc.) and then use the XML parser to decompose the text stream into
the elements defined by the BooksReturned type.
Now the list of books can be presented locally based on the calling programmer’s taste or the
information can be used to programmatically search the BooksReturned structure and select books to
order. Finally, the calling system creates a ShoppingCart structure and passes an XML stream back to
the Piranha Web site that conforms to the ShoppingCart and eMailAddress syntax rules and the Piranha
Web site returns a PurchaseConfirmation structure that can be consumed by the calling system.
So what just happened? A programmer for a company who needed access to the Piranha internal
functions was able to query the Web site for its exposed interfaces and then use the methods defined by
the interfaces to effect a transaction between the two systems. It didn’t matter whether the systems on
either side were based on COM, CORBA, or even AS/400s. The glue between the systems was HTTP
and XML.
So what’s the right answer in the COM versus CORBA debate?

The answer is: It doesn’t matter. Back-end systems, or NetServices, should be built on the object
architecture that makes the most sense for your company. This should be based on several factors,
including your existing or planned infrastructure, the skills of your current development or engineering
staff, and the demands of your user community. What’s important is that you expose the services you
build using XML schemas to define their interfaces and the custom structures the services may use. By
doing this, you’ll not only open up the objects you create to the other systems in your business, but you
will also be creating a framework from which future systems will be built with integration in mind from the
beginning.
This document is provided for informational purposes only and TechRepublic makes no warranties, either expressed or
implied, in this document. Information in this document is subject to change without notice. The entire risk of the use or
the results of the use of this document remains with the user. The example companies, organizations, products, people
and events depicted herein are fictitious. No association with any real company, organization, product, person or event is
intended or should be inferred. Complying with all applicable copyright laws is the responsibility of the user. Without
limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval
system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for
any purpose, without the express written permission of TechRepublic.
The names of actual companies and products mentioned herein may be the trademarks of their respective owners.
12

XML Explained

Enviado por

Dados do documento

Descrição original:

Direitos autorais

Formatos disponíveis

Compartilhar este documento

Compartilhar ou incorporar documento

Opções de compartilhamento

Você considera este documento útil?

Este conteúdo é inapropriado?

Direitos autorais:

Formatos disponíveis

XML Explained

Enviado por

Direitos autorais:

Formatos disponíveis

Tim Landgrave’s XML series www.techrepublic.

XML: Explained by Tim Landgrave

From HTML to XML: A language lesson on the

Isn’t HTML good enough?

DEFINING MARKUP LANGUAGE

XML: SGML on the Atkins Diet

" + ### "

Using XML to create robust Web systems

Prepare for the new Web formats: Be XML savvy

Using Web devices with style

" ! ### "

" + ### "

How XML changes Web development

Will this get any easier?

Use XML to make your application integration

When integration problems are like rabbits

When EDI isn’t enough

When will XML be ready for prime time?

How XML will resolve the COM versus CORBA

Interfaces and plumbing

Web sites as big objects

( )0 "4 (04 . &5 $ 6 4 74%4

Making it all work together

So what’s the right answer in the COM versus CORBA debate?

Você também pode gostar