Você está na página 1de 20

XML Parsing with

SAX

Prof. Dr. Ch. Reich
rch@fh­furtwangen.de
http://www.informatik.fh­furtwangen.de/~reich
What is SAX?
● SAX (Simple API for XML Access); not a standard;
www.saxproject.org, www.megginson.com/SAX
● Is event oriented:
– The parser runs through the whole document and informs
the application, when an element is found, the application
is looking for.
– In the application the user defines the event-handler-
classes. There the user has to implement the actions of the
classes.
– An object of the event-handler-class has to be registered
by the parser. The parser can call that object during
parsing.
– Is the parser correctly configured, the parser parses the
XML document line by line. The parse can not go back to
parts of the already parsed documents.
● SAX defines just the interfaces.
SAX-Parser Eigenschaften
● Advantages:
– Parsing of any XML file size is possible.
– Helpful, if no data structure has to be build up in the
application.
– Useful, if only special information of the XML document
is needed.
– Simple and fast
● Disadvantages:
– No access of parts of the document at any time.
– Complex search operations are difficult to implement.
– No access of the DTD
– SAX has only read-only access
– SAX is not supported directly by browsers.
Initialisation of the SAX Parser
public class MyHandler extends DefaultHandler
{
...
File file=new File("XMLDocument.xml");
MyHandler myHandler = new MyHandler();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(file, myHandler);
...
}

Events startDocument()
saxParser endDocument() MyHandler
(SAXParser startElement() (MyHandler
Object) endElement() Object)
characters()

XMLDokument.xml
The Handler-Objekt
● The programmer must implement the
interface org.xml.sax.ContentHandler
class MyHandler implements ContentHandler {
public void startElement() {
// here the implementation
}
}

● To simplify it you can extend the base class


org.xml.sax.helpers.DefaultHandler

class MyHandler extends DefaultHandler {


public void startElement() {
// here the implementation
}
}
Parser library java.xml.parser.*
java.lang.Object SAXParserFactory
SAX
SAXParser
DocumentBuilderFactory
DOM
DocumentBuilder
java.lang.Exception ParserConfigurationException
java.lang.Error FactoryConfigurationError

● SAXParserFactory generates a SAX parser factory.


With the factory you create a SAX parser.
● DocumentBuilderFactory generates a DOM parser
factory. With the factory you create a DOM parser.
● Can a factory generate a parser, it is garanted, that
the parser can be used correctly.
SAX Event: Start and End of a
XML File.
● public void startDocument()
Is called, when the XML document (first
line) has been loaded.
● public void endDocument()
Is called, when the parsing of the XML
document is finished (the last line is
parsed).
SAX Event: Element Begin and
End
● public void startElement(String namespaceURI,
String sName, String qName, Attributes attrs)
Is called, when element tag is reached.
– namespaceURI: Contains the name space end
point.
– sName: Specifies the name of the namespace.
– qName: Specifies the name of the element.
– attrs: Contains element's attributes.
● public void endElement(String namespaceURI,
String sName, String qName)
Is called, when element tag is clossed.
SAX Event: Element Content

● public void characters(char[] ch, int start, int


length)
Is called, when the contents of an element
tag is reached.
– ch: Character-Array, filled up with single
characters, that the parser has found.
– start: Start value defines the start position of the
array. In general is the value=0.
– length: Defines the number of characters in the
array.
Example: SAX Parser Run
startDocument(){
<?xml version="1.0"?>
// Implementation
<Address>
}
<Firstname>
endDocument(){
Peter
// Implementation
</Firstname>
}
<Lastname>
startElement(){
Mayer
// Implementation
</Lastname>
}
<Telephon>
characters(){
023 12
// Implementation
</Telephon>
}
...
endElement(){
</Address>
// Implementation
}
Example: MySAXParserOnlyElement
import java.io.File;
import org.xml.sax.Attributes;
import org.xml.sax.helpers.DefaultHandler;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.parsers.SAXParser;

public class MySAXParsingOnlyElements extends DefaultHandler {


public static void main(String param[]) {
if (param.length!=1) {
System.out.println("Übergabeparameter (XML-Datei) vergessen!");
System.exit(1);
}
File datei=new File(param[0]);
DefaultHandler handler = new MySAXParsingOnlyElements();
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(datei,handler);
}
catch (Throwable t) {t.printStackTrace(); }
}
Cont.: MySAXParserOnlyElement
public void startDocument() {
System.out.println("Parsen startet; Beginn des Dokuments");
System.out.println("<?xml version='1.0'?>" +
" (Achtung nicht aus dem XML Dokument glesen)");
}
public void endDocument() {
System.out.println("");
System.out.println("Parsen beendet; Ende des Dokuments erreicht");
}
public void startElement(String namespaceURI, String sName,
String qName, Attributes attrs) {
System.out.print("<"+qName+">");
}
public void endElement(String namespaceURI, String sName,
String qName) {
System.out.print("</"+qName+">");
}
public void characters(char[] ch, int start, int length) {
String str = new String(ch, start, length);
System.out.print(str);
}
}
SAX Event: Attribute Access
● attrs.getQName()
Access of the attribute name.
● attrs.getValue()
Access of the attribute value.
public void startElement(String nsURI, String sName, String qName,
Attributes attrs) {
String attName=""; String att=""; String tagElement="";
if (attrs.getLength() >=1) {
for (int i = 0; i < attrs.getLength(); i++) {
attName = attrs.getQName(i);
att=" "+attName+"='"+attrs.getValue(i)+"'";
}
}
tagElement="<"+qName+att+">";
System.out.print(tagElement);
}
Find out the position of the parser!
● Is there a problem parsing a XML
document, then it is easier to trouble shoot,
if the line number is printed by the parser.

org.xml.sax.Locator-class allows to retrieve
the line and column number the parser has
throwing an event.
● Example:
import org.xml.sax.Locator;
private Locator locator;
public void setDocumentLocator(Locator loc) { locator = loc; }
public void startDocument() {
System.out.println("Wurzelelement ist in Zeile: " +
locator.getLineNumber());
}
org.xml.sax.SAXParseException
● Contians a XML paring error.
● There are 3 levels in an error hierarchy:
– Fatal Errors: non neglegtable errors e.g.
document is not well formed.
– Errors: Regular errors e.g. document does
not correspond with the DTD.
– Warnings: Parser can continue. Warnings
are thrown, if the parser thinks it might be
important.
● throws new SAXParseException(“text“,
locator)
Library: org.xml.sax.helpers
java.lang.Object DefaultHandler

NamespaceSupport
AttributesImpl
XMLReaderFactory
etc.

DefaultHandler: Base class for SAX events.
● AttributesImpl: Attributes of elements can be
read and manipulated.
● NamespaceSupport: Namenspaces can be
generated and added to an element.
class: DefaultHandler
● The DefaultHandler implements the
following interfaces:
– ContentHandler: Handles messages from the
parser about events (e.g. startDocument()).
– DTDHandler: Handles messages from the
parser about DTD related events.
– ErrorHandler: Is called, if errors occur.
– EntityResolver: Resolves external entities.
● To write your own handler, you extend
the class DefaultHandler and overwrite
the handler methods.
DefaultHandler Methods
● Methods for error handling:
– fatalError(SAXParseException e)
– error(SAXParseException e)
– warning(SAXParseException e)
● Further element tag notification:
– InputSource resolveEntity(String publicID,
String systemID)
Resolves external entity definitions.
– void startPrefixMapping(String prefix, String
uri) void endPrefixMapping(String prefix)
Information about start/end of a name space
SAX Parser has to use DTD.
● Default: Non-Validating parser; parser tests
only the well formness of a document.
● saxParserFactory.setValidating(true)
– Befor the SAX parser is generated by the
SAXParserFactory the method: setValidating()
has to be called.
● Implementing public void error
(SAXParseException e)
– Then you can work on errors, which are
generated by inconsistencies appeared using a
DTD.
SAX Parser has to use XSD.

● As before.
– Additionally you have to tell the parser that XSD
has to be used:
parser.setProperty
("http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
– Additionally you have to use the XMLReader
interface that can handle events:
XMLReader reader = parser.getXMLReader();
reader.parse( new InputSource("filename"));

Você também pode gostar