Você está na página 1de 68

Document Type Definition

DTDs
What is a DTD

• It define the rules and attributes for using the tags


in xml document.
• Defines the Structure of an XML document
• Only the elements defined in a DTD can be used in
an XML document.
• can be internal or external.
Defining DTD for a Single Element

It tells the parser that the xml document contains elements of the type
defined in DTD, and these elements are also not designed to contain
text.
<?xml version=‘1.0’ encoding=‘UTF-8’?>
<!—
DTD for a list of toys
-- >
<!ELEMENT PRODUCTDATA (PRODUCT+)>
The PRODUCTDATA
? – Optional (zero or one) CONTAINS One or More
* - Zero or more elements of type
+ - One or more <PRODUCT>
An internal DTD
<?xml version=“1.0”?>

<!DOCTYPE invoice [
<!ELEMENT invoice (sku, qty, desc, price) >
<!ELEMENT sku (#PCDATA) >
<!ELEMENT qty (#PCDATA) >
<!ELEMENT desc (#PCDATA) >
<!ELEMENT price (#PCDATA) >
}>

<invoice>
<sku>12345</sku>
<qty>55</qty>
<desc>Left handed monkey wrench</desc>
<price>14.95</price>
</invoice>
An referenced external DTD

<?xml version=“1.0”>

<!DOCTYPE invoice SYSTEM “invoice.dtd”>

<invoice>
<sku>12345</sku>
<qty>55</qty>
<desc>Left handed monkey wrench</desc>
<price>14.95</price>
</invoice>
An external DTD (invoice.dtd)

<?xml version=“1.0”?>
<!ELEMENT invoice (sku, qty, desc, price) >
<!ELEMENT sku (#PCDATA) >
<!ELEMENT qty (#PCDATA) >
<!ELEMENT desc (#PCDATA) >
<!ELEMENT price (#PCDATA) >
Attributes

• Attributes provide extra information about elements


• placed inside the start tag of an element
• Attributes come in name/value pairs
• E.g. <img src="computer.gif" />
• the attribute is "src".
• value of the attribute is "computer.gif".
• Since the element itself is empty it is closed by a "
/"
Attributes in DTD
Working with validating parser DTD must valid attributes for the different
elements.
Basic syntax of DTD attributes declaration is
<!ATTLIST element-name attribute-name attribute-type attribute-
value>
Example :
<?xml version = "1.0"?>
<!DOCTYPE address [
<!ELEMENT address ( name )>
<!ELEMENT name ( #PCDATA )>
<!ATTLIST name id CDATA #REQUIRED> ]>
<address>
<name id = "123">Tanmay Patil</name>
</address>
PCDATA

• PCDATA means parsed character data


• character data - the text found between the start tag and the end tag
of an XML element.
• PCDATA is text that will be parsed by a parser
• Tags inside the text will be treated as markup and entities will be
expanded.
CDATA

• CDATA also means character data


• CDATA is text that will NOT be parsed by a parser
• Tags inside the text will NOT be treated as markup and entities will
not be expanded.
Entities
• Entities are variables used to define common text
• Entity references are references to entities
• the HTML entity reference: "&nbsp;"
• Entities are expanded when a document is parsed by an XML parser.

&lt; <
The following entities are predefined in XML: &gt; >
&amp; &
&quot; “
&apos; ‘
Declaring an Element

• elements are declared with an element declaration


• <!ELEMENT element-name (element-content)>
• Empty elements
• <!ELEMENT img (EMPTY)>
• Elements with data
• <!ELEMENT element-name (#CDATA)>
• or <!ELEMENT element-name (#PCDATA)>
• or <!ELEMENT element-name (ANY)>
• example:<!ELEMENT note (#PCDATA)>
Elements with children (sequences)

• <!ELEMENT note (to,from,heading,body)>


<!ELEMENT to (#CDATA)>
• <!ELEMENT from (#CDATA)>
• <!ELEMENT heading (#CDATA)>
• <!ELEMENT body (#CDATA)>
Wrapping

• <?xml version="1.0"?>
• <!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)>
<!ELEMENT to (#CDATA)>
• <!ELEMENT from (#CDATA)>
• <!ELEMENT heading (#CDATA)>
• <!ELEMENT body (#CDATA)> ]>

• <note>
• <to>Tove</to>
• <from>Jani</from>
• <heading>Reminder</heading>
• <body>Don't forget me this weekend</body>
• </note>
TV Scedule DTD

<!DOCTYPE TVSCHEDULE [
<!ELEMENT TVSCHEDULE (CHANNEL+)>
<!ELEMENT CHANNEL (BANNER, DAY+)>
<!ELEMENT BANNER (#PCDATA)>
<!ELEMENT DAY ((DATE, HOLIDAY) | (DATE, PROGRAMSLOT+))+>
<!ELEMENT HOLIDAY (#PCDATA)>
<!ELEMENT DATE (#PCDATA)>
<!ELEMENT PROGRAMSLOT (TIME, TITLE, DESCRIPTION?)>
<!ELEMENT TIME (#PCDATA)>
<!ELEMENT TITLE (#PCDATA)>
<!ELEMENT DESCRIPTION (#PCDATA)>
<!ATTLIST TVSCHEDULE NAME CDATA #REQUIRED>
<!ATTLIST CHANNEL CHAN CDATA #REQUIRED>
<!ATTLIST PROGRAMSLOT VTR CDATA #IMPLIED>
<!ATTLIST TITLE RATING CDATA #IMPLIED>
<!ATTLIST TITLE LANGUAGE CDATA #IMPLIED>
]>
XML Schema

What is an XML Schema?


An XML Schema describes the structure of an
XML document.
The XML Schema language is also referred to as
XML Schema Definition (XSD).
XML Schemas

• http://www.w3.org/TR/xmlschema-1/10/2000
• generalizes DTDs
• uses XML syntax
• two documents: structure and datatypes
• http://www.w3.org/TR/xmlschema-1
• http://www.w3.org/TR/xmlschema-2
• XML-Schema is very complex
• often criticized
• some alternative proposals
example

• <?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="note">
<xs:complexType>
<xs:sequence>
<xs:element name="to" type="xs:string"/>
<xs:element name="from" type="xs:string"/>
<xs:element name="heading" type="xs:string"/>
<xs:element name="body" type="xs:string"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
purpose of an XML Schema

• The purpose of an XML Schema is to define the


legal building blocks of an XML document:
• the elements and attributes that can appear in a
document
• the number of (and order of) child elements
• data types for elements and attributes
• default and fixed values for elements and attributes
Why XML Schema?

• Hundreds of standardized XML formats are in daily


use.
• Many of these XML standards are defined by XML
Schemas.
• XML Schema is an XML-based (and more powerful)
alternative to DTD.
XML Schemas Support Data Types

• One of the greatest strength of XML Schemas is the


support for data types.
• It is easier to describe allowable document content
• It is easier to validate the correctness of data
• It is easier to define data facets (restrictions on
data)
• It is easier to define data patterns (data formats)
• It is easier to convert data between different data
types
XML Schemas Secure Data Communication

When sending data from a sender to a receiver,


it is essential that both parts have the same
"expectations" about the content.
With XML Schemas, the sender can describe the
data in a way that the receiver will understand.
A date like: "03-11-2004" will, in some countries,
be interpreted as 3.November and in other
countries as 11.March.
XML Schemas Secure Data Communication

However, an XML element with a data type


like this:
<date type="date">2004-03-11</date>
ensures a mutual understanding of the
content, because the XML data type "date"
requires the format "YYYY-MM-DD".
XML Schemas

<xsd:element name=“paper” type=“papertype”/>


<xsd:complexType name=“papertype”>
<xsd:sequence>
<xsd:element name=“title” type=“xsd:string”/>
<xsd:element name=“author” minOccurs=“0”/>
<xsd:element name=“year”/>
<xsd: choice> < xsd:element name=“journal”/>
<xsd:element name=“conference”/>
</xsd:choice>
</xsd:sequence>
</xsd:element>

DTD: <!ELEMENT paper (title,author*,year, (journal|conference))>


Elements v.s. Types in
XML Schema

<xsd:element name=“person”> <xsd:element name=“person”


<xsd:complexType> type=“ttt”>
<xsd:sequence> <xsd:complexType name=“ttt”>
<xsd:element name=“name” <xsd:sequence>
type=“xsd:string”/> <xsd:element name=“name”
<xsd:element name=“address” type=“xsd:string”/>
type=“xsd:string”/> <xsd:element name=“address”
</xsd:sequence> type=“xsd:string”/>
</xsd:complexType> </xsd:sequence>
</xsd:element> </xsd:complexType>

DTD: <!ELEMENT person (name,address)>


Elements v.s. Types in
XML Schema
• Types:
• Simple types (integers, strings, ...)
• Complex types (regular expressions, like in DTDs)
• Element-type-element alternation:
• Root element has a complex type
• That type is a regular expression of elements
• Those elements have their complex types...
• ...
• On the leaves we have simple types
Local and Global Types in
XML Schema
• Local type:
<xsd:element name=“person”>
[define locally the person’s type]
</xsd:element>
• Global type:
<xsd:element name=“person” type=“ttt”/>

<xsd:complexType name=“ttt”>
[define here the type ttt]
</xsd:complexType>
Global types: can be reused in other elements
Local v.s. Global Elements in
XML Schema
• Local element:
<xsd:complexType name=“ttt”>
<xsd:sequence>
<xsd:element name=“address” type=“...”/>...
</xsd:sequence>
</xsd:complexType>
• Global element:
<xsd:element name=“address” type=“...”/>

<xsd:complexType name=“ttt”>
<xsd:sequence>
<xsd:element ref=“address”/> ...
</xsd:sequence>
</xsd:complexType>

Global elements: like in DTDs


Document Object Model (DOM)

Introduction
• DOM presents the xml document as a tree structure having the root
node as the parent element and the elements, attributes and text
defined as the child nodes.

• DOM defines a standard for accessing and manipulating documents.

• Nevertheless, XML presents this data as documents, and the DOM


may be used to manage this data.
DOM

With the Document Object Model, programmers can create and build
documents, navigate their structure, and add, modify, or delete
elements and content.

Anything found in an HTML or XML document can be accessed,


changed, deleted, or added using the Document Object Model, with a
few exceptions - in particular, the DOM interfaces for the internal
subset and external subset have not yet been specified.
DOM Tree Structure of BookStore.xml
Example XML Program
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
</book>
<book category="children">
<title lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
<book category="web">
<title lang="en">Learning XML</title>
<author>Erik T. Ray</author>
<year>2003</year>
<price>39.95</price>
</book>
</bookstore>
SAX (Simple API for XML)

• There are two main models for reading an XML document: the event-
based model and the tree-based model. The Simple API for XML
(SAX) is an serial – parser API for reading XML documents.

• The event-based model for reading XML works as follows. There are
two interacting actors in the process: the parser, a program that
reads the XML document, and the client application, that invoked the
parser and waits for the information collected by the parser.

• SAX Parser is event driven parser.


List of SAX Events

• XML Text Nodes

• XML Element Nodes -

• XML Processing Instructions -

• XML Comments -
These events are fired at the start and end of each of these XML node,
instruction or comments whenever they are encountered
Example

<?xml version = “1.0” encoding =“UTF-8”?>


<PRODUCTDATA>
<PRODUCT PRODUCTID=“P001”>
<PRODUCTNAME>WEBTECHNOLOGY</PRODUCTNAME>
<QTY>200</QTY>
<PRICE>500.00</PRICE>
</PRODUCT>
</PRODUCTDATA>
XSLT
Extensible Stylesheet Language
Transformation
Introduction
• Used for transforming the structure and content of
XML document into the required output.
• Used to transform XML document in to other XML
documents.
• XSLT Processors parse the input xml document, as
well as XSLT style sheet, and then process the
instructions found in the XSLT style sheet, using
elements from the input xml document.
• Main purpose of using XSLT is to convert the xml data
into human readable format. XSLT used to displaying
xml data in other formats such as HTML, PDF, etc.,
XSLT – Style sheet

• It is also an XML document.

• XSL stylesheet contains many XSLT elements and


XSLT functions.

• Begins with either the XSLT elements or with


transform.

• Most important element is template element.


Commonly used elements

• <xsl:template> - defines a template for producing output.


• <xsl:apply-template> - defines a set of nodes to be processed.
• <xsl:import> - used to import contents of one ss into another ss
• <xsl:apply-template> - calls a named template.
• <xsl:call-import> -
• <xsl:stylesheet> or <xsl:transform>
• <xsl:include> used to include the contents of one ss to another
• <xsl:element> used to create an Element node.
• <xsl:attribute> used to create an attribute node and add it to element
• <xsl:attribute-set>used to define a named set of attributes names
and values.
Example
<?xml version="1.0"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My CD Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
<th>Artist</th>
</tr>
<xsl:for-each select="catalog/cd">
<tr>
<td><xsl:value-of select="title"/></td>
<td><xsl:value-of select="artist"/></td>
</tr>
</xsl:for-each>
</table>
</body>
</html> </xsl:template> </xsl:stylesheet>
XSLT built-in Functions

There are over 100 build-in functions in XSLT.

• Current() – used to access the current node


• Document() – access external xml document
• Element-available() – check the particular element present or not
• Format-number() – formats the specified number
• Key() - used to find the nodes with a given value for a named key
JSON
Overview and parsing
What is JSON

• JavaScript Object Notation

• Used to format data


• Commonly used in Web as a vehicle to describe data being sent
between systems
JSON example

• “JSON” stands for “JavaScript Object Notation”


• Despite the name, JSON is a (mostly) language-independent way of
specifying objects as name-value pairs
• Example (http://secretgeek.net/json_3mins.asp):
• {"skillz": {
"web":[
{ "name": "html",
"years": 5
},
{ "name": "css",
"years": 3
}]
"database":[
{ "name": "sql",
"years": 7
}]
}}
JSON syntax

• An object is an unordered set of name/value pairs


• The pairs are enclosed within braces, { }
• There is a colon between the name and the value
• Pairs are separated by commas
• Example: { "name": "html", "years": 5 }
• An array is an ordered collection of values
• The values are enclosed within brackets, [ ]
• Values are separated by commas
• Example: [ "html", ”xml", "css" ]
JSON syntax

• A value can be: A string, a number, true, false, null, an


object, or an array
• Values can be nested
• Strings are enclosed in double quotes, and can contain the
usual assortment of escaped characters
• Numbers have the usual C/C++/Java syntax, including
exponential (E) notation
• All numbers are decimal--no octal or hexadecimal
• Whitespace can be used between any pair of tokens
How to turn JSON into JavaScript object
–eval(*)
• The JavaScript eval(string) method compiles and
executes the given string
• The string can be an expression, a statement, or a
sequence of statements
• Expressions can include variables and object
properties
• eval returns the value of the last expression
evaluated
• When applied to JSON, eval returns the described
object
JSON and—methods?

• In addition to instance variables, objects typically


have methods
• There is nothing in the JSON specification about
methods
• However, a method can be represented as a string,
and (when received by the client) evaluated with
eval
• Obviously, this breaks language-independence
• Also, JavaScript is rarely used on the server side
Comparison of JSON and XML

• Similarities:
• Both are human readable
• Both have very simple syntax
• Both are hierarchical
• Both are language independent
• Both can be used by Ajax
• Both supported in APIs of many programming languages
• Differences:
• Syntax is different
• JSON is less verbose
• JSON can be parsed by JavaScript’s eval method
• JSON includes arrays
• Names in JSON must not be JavaScript reserved words
• XML can be validated
Content Model

• Identify the name of the element and the nature of that


element’s content
• The example declares an element that then describes the
document’s content model

Name Content model

<!ELEMENT note (to, from, subject, body)>

Element
definition
Declaring Attributes

• attributes are declared with an ATTLIST declaration


• <!ATTLIST element-name attribute-name attribute-type default-value>
• DTD example:
• <!ELEMENT square EMPTY>
• <!ATTLIST square width CDATA "0">

• XML example:
• <square width="100"></square>
Document Type Declarations

• There are four types of declarations:


• Element type declarations
• http://www.w3.org/TR/REC-xml#elemdecls
• Attribute List Declarations
• http://www.w3.org/TR/RECxml-attdecls
• Entity declarations
• http://www.w3.org/TR/REC-xml#sec-entity-decl
• Notation declarations
• http://www.w3.org/TR.REC-xml#Notations
Element Type Declarations

• Three types of elements


• EMPTY elements
• ANY elements
• MIXED elements
Empty Elements

• An element that can not contain any content


• The html image tag in xml would typically be empty, such as
<image></image> or <image/>
• empty elements are more useful with the use of attributes

<!ELEMENT test EMPTY>


<!ELEMENT image EMPTY>
<!ELEMENT br EMPTY>
ANY Element

• An element that can contain any content


• it is recommended not to get into the habit declaring
elements with the ANY keyword
• useful when transferring a lot of mixed or unknown data

<!ELEMENT test ANY >


Mixed Element

• Elements that can contain a set of content alternatives


• Separate the options with the “or” symbol “|”

<!ELEMENT test <#PCDATA | name>


Data Types

• Parsed Character Data


• #PCDATA
• <!ELEMENT firstname (#PCDATA)
• <!ELEMENT lastname (#PCDATA)

• Unparsed Character Data


• CDATA
• <firstname><![CDATA[<b>Jim</b>]]></firstname>
• <lastname><![CDATA[<b>Peters</b>]]></lastname>
Structure Symbols

• Parenthesis (samp1, samp2) - The element must contain the sequence samp1 and samp2

• Comma (samp1,samp2,samp3) - The element must contain samp1,samp2 and samp3 in that order

• Or (samp1|samp2|samp3) - The element can contain samp1, samp2 or samp3

• ? samp1? - Element might contain samp1, if it does it can only do it once

• * samp1* - Element can contain samp1 one or more times

• + samp1+ - Element must contain samp1 at least once

• none samp1 - Element must contain samp1


Elements with more structure

<!ELEMENT email (to+ , from , subject? , body)

to: is reqd and can appear more than once


from: must appear only once
subject: optional, but if included can only appear once
body: optional, but if included can only appear once
XML Element Attributes

• XML tags can contain attributes similar to attributes in HTML


tags

HTML Examples:
<h1> align=“center”>An XML Example<h1>
<table width=page> </table>

• Attributes are usually used to provide processing


information to the XML application (the application that is
going to consume the XML)
Attribute Rules

• attribute values must be placed in “ “


• in HTML this is only required id the attribute contains the space character
• attribute values are not processed by the XML parser
• this means the values can’t be automatically checked by the parser
Attributes or Elements?

• Is it better to use attributes or to just make additional XML


elements
• there are no set rules when to use one over the other
• experience is best teacher
• but to help you decide:
• attribute values are not parsed
• can contain special characters that aren’t allowed in elements
• drawback - they cannot be validated by the parser
• must be validated by additional code in the application
An Example
<?xml version=“1.0” ?> <?xml version=“1.0” ?>
<invoice> <invoice date=“7/22/2002”>
<date> <sku>12345</sku>
<month>12</month <qty>55</qty>
<day>22</day> <desc>Left handed monkey wrench</desc>
<year>2002</year> <price>14.95</price>
</date> </invoice>
<sku>12345</sku>
<qty>55</qty>
<desc>Left handed monkey wrench</desc>
<price>14.95</price>
</invoice>
this can’t

this can be validated


Attribute Declarations

Invoice Element Declaration:


<?xml version=“1.0” ?>
<!ELEMENT employee (#PCDATA)

<!ATTLIST ElementName AttributeName Type Default >

<!ATTLIST employee type (FullTime | PartTime) “FullTime” >

Usage in XML file:


<?xml version=“1.0” ?>
<employee type=“PartTime”/>
Other Attribute Declarations
• CDATA
• CDATA attributes are strings , any text is allowed
• ID
• The values of an ID attribute must be a name. All id the ID attributes used in a document must be
unique. IDs uniquely identify individual elements in a document.Elements can only have a single
ID attrinute
• IDREF or IDREFS
• An IDREF attributes value must be the value of a single ID attribute on some element in the
document. The value of an IDREFs attribute may contain multiple IDREF values seperated by
white space.
• ENTITY or ENTITIES
• An ENTITY attribute’s must be the name of a single ENTITY. The value of an ENTITIES attribute
may contain multiple entity names separated by white space.
• NMTOKEN or NMTOKENS
• Name token attributes are a restricted form of string attribute, but there are no other restrictions
on the word.
• List of Names Enumerated
• You can specify that the value of an attribute must be taken from a specific list of names. This
frequently called an enumerated type because each of the possible values must be explicitely
enumerated in the declaration
Attribute Defaults

• #REQUIRED
• The attribute must have an explicitly specified value for every occurrence of the element in the
document
• #IMPLIED
• The attribute value is not required and no default value is provided. If a value is not specified the
XMP processor must proceed without one.
• “value”
• An attrubute can be given any legal value as a default. The attribute value is not required on each
element of the document, and if it is not present it will appear to be the specified default
• #FIXED “value”
• An attribute declaration may specify that an attribute has a fixed value. In this case, the attribute is
not required, but if it occurrs, it must have the specified value. If it is not present, it will appear to
be the specified defualt
A Code sample

<?xml version=“1.0” ?>


<!DOCTYPE email[
<!ATTLIST email
language (english | french | spanish) “english”
priority (normal | high | low) “normal” >
<!ELEMENT to (#PCDATA)>
<!ELEMENT from (#PCDATA) >
<!ELEMENT subject (#PCDATA) >
<!ELEMENT message (#PCDATA) > ] >
<email language=“spanish” priorit=“high”>
<to>Peter Brenner</to>
<from>Dick Steflik</from
<subject> Test Reminder</subject>
<message>The exam is a week from today</message>
</email>
Attribute Summary

• Attributes
• cannot contain multipe values
• cannot be validated
• cannot describe structures like child elements can
• It is recommended to use attributes sparingly
• The following code would not be good form:

<?xml version=“1.0” ?>


<email language=“english” priority=“high”
to=“you” from=“me” subject=“Reminder”
message=“The test is a week from today !” />

Você também pode gostar