3D PLM Enterprise Architecture |
Middleware Abstraction |
Reading XML from a Custom SourceParsing XML from a user-defined source with DOM or SAX |
Use Case |
AbstractThis article shows how to create your own XML source to feed XML directly to a DOM or a SAX parser. |
The XML parsers provided in the XMLParser framework know how to parse XML from various sources: files, URLs and memory buffers. This use case shows how to create your own XML sources and use them with a DOM or a SAX parser. This use case also shows you how to save your DOM tree as XML in a memory buffer.
[Top]
The CAAXMLCustomStream Use Case is a use case of the CAAXMLParser.edu framework that illustrates XMLParser framework capabilities.
[Top]
This use case provides an implementation of a new XML source. The XML source is capable of fetching a large XML document split over several files (each file contains a chunk of the complete XML document). The use case instantiates a DOM parser, instantiates the XML source with the files passed as an argument in the command line, parses the contents of the supplied source to build a DOM tree, and finally dumps the content of the DOM tree in the console.
[Top]
To launch CAAXMLCustomStream, you will need to set up the build time environment, then compile CAAXMLCustomStream along with its prerequisites, set up the run time environment, and then execute the use case [1].
The use case should be launched as follows from the command line:
CAAXMLCustomStream <file1> ... <fileN>
where <file1>
is the path of the file containing the
first XML chunk, <fileN>
is the path of the file
containing the last XML chunk.
A sample XML file split in three chunks is provided with the use case. To use it, launch the following command from the command line:
Windows | cd InstallRoot\OS\resources\xml\CAAXMLCustomStream |
Unix | cd InstallRoot/OS/resources/xml/CAAXMLCustomStream ;
CAAXMLCustomStream caaxmlchunk1.xml caaxmlchunk2.xml caaxmlchunk3.xml |
where:
InstallRoot
is the directory in which you have
installed the run time part or the product lineOS
is the directory containing the installed code
aix_a
for 32-bit AIXhpux_b
for HP-UXsolaris_a
for Solarisintel_a
for 32-bit Windowswin_b64
for 64-bit Windows[Top]
The CAAXMLCustomStream use case is made of several classes located in the CAAXMLCustomStream.m module of the CAAXMLParser.edu framework:
Windows | InstallRootDirectory\CAAXMLParser.edu\CAAXMLCustomStream.m\ |
Unix | InstallRootDirectory/CAAXMLParser.edu/CAAXMLCustomStream.m/ |
where InstallRootDirectory
is the directory where the
CAA CD-ROM is installed.
[Top]
To create your own XML source and use it to parse XML, there are six main steps:
# |
Step |
---|---|
1 | Implement an XML Custom Stream |
2 | Create a V5 DOM Component and a V5 SAX Component |
3 | Use the V5 SAX Component to Create a SAX Input Source Based on Your XML Stream |
4 | Parse the Custom Source Using DOM |
5 | Dump Your DOM Tree in the Console |
6 | Manage Errors |
Please note that most of the APIs from the XMLParser framework return
a HRESULT. To avoid excessive indentation of the code, which
would cause poor readibility, the following coding style has been used:
all the code is put in a do {} while(0)
loop; if one of the
APIs returns a bad HRESULT, the execution is stopped with a break
and the error handler is invoked.
HRESULT hr = E_FAIL; do { hr = XMLParserAPI_1(); if (FAILED(hr)) { break; } hr = XMLParserAPI_2(); if (FAILED(hr)) { break; } ... hr = XMLParserAPI_N(); if (FAILED(hr)) { break; } } while(0); if (FAILED(hr)) { // Error handling code. } |
[Top]
Custom streams enable you to access XML, which is stored in a location you cannot access with the default types of input sources available in the XMLParser framework, such as a relational database or an encrypted file. In theory, the same result can be achieved by first fetching the whole XML into a memory buffer and then parsing the XML from that memory buffer. However, such a solution creates memory peaks and does not perform as well as the custom stream approach. To create a custom stream, you need to declare and define a V5 component, which implements the CATIXMLInputStream interface.
// CAAXMLMultiFileStream.h ... class CAAXMLMultiFileStream : public CATBaseUnknown { CATDeclareClass; public: ... // Implement the CATIXMLInputStream interface. virtual HRESULT Read( unsigned char* ioByteArray, unsigned int iByteArrayCapacity, unsigned int& oSizeRead); }; |
// CAAXMLMultiFileStream.cpp #include "CAAXMLMultiFileStream.h" // Import the definition of the component // Declare the class as a V5 component CATImplementClass( CAAXMLMultiFileStream, Implementation, CATBaseUnknown, CATnull ); // Implement the CATIXMLInputStream interface #include "TIE_CATIXMLInputStream.h" TIE_CATIXMLInputStream(CAAXMLMultiFileStream); |
The CATIXMLInputStream interface contains just one method,
called Read
. You use this method to return fragments of XML to
parser. The parser calls this method repeatedly, any time it has
finished analyzing the current XML fragment and needs the next one to be
fetched. You never call this method directly: you pass your implementation
of CATIXMLInputStream to the parser and the parser will call
it automatically when it needs it.
// CAAXMLMultiFileStream.cpp HRESULT CAAXMLMultiFileStream::Read( unsigned char* ioByteArray, unsigned int iByteArrayCapacity, unsigned int& oSizeRead) { ... } |
The method accepts the following parameters:
ioByteArray |
A buffer where you must put the XML fragment. |
iByteArrayCapacity |
The size of the ioByteArray buffer. |
oSizeRead |
The size of the fragment returned to the parser. Returning a zero-length read size signals the end of the XML input to the parser. |
[Top]
... CATIXMLDOMDocumentBuilder_var builder; hr = ::CreateCATIXMLDOMDocumentBuilder(builder); ... |
To parse the XML, you will use a DOM parser, so the next step is to
instantiate a V5 DOM component. The V5 DOM component can be created by
calling the CreateCATIXMLDOMDocumentBuilder
global function.
This function returns a V5 handler on the CATIXMLDOMDocumentBuilder
interface, which is the main interface for the V5 DOM component. Using
this interface you will be able to create documents (either by parsing
an XML input source, as here, or from scratch) and save existing
documents to disk.
... CATIXMLSAXFactory_var factory; hr = ::CreateCATIXMLSAXFactory(factory); ... |
To provide the XML, you will need to provide a custom input source to
the DOM parser. Custom input source are created by the CATIXMLSAXFactory
interface, so you will also need a V5 SAX component. The V5 SAX
component can be created by calling the CreateCATIXMLSAXFactory
global function. This function returns a V5 handler on the CATIXMLSAXFactory
interface, which is the main interface for the V5 SAX component. Using
this interface you will be able to create SAX1 and SAX2 parsers and to
create input source to feed XML to the parser.
Note that the above code does not specify the CLSID of the component to use, so the default DOM and SAX components (XML4C3) will be used. See [3] and [4] if you want to use another V5 DOM or SAX component.
[Top]
... CAAXMLMultiFileStream* customStreamImpl = new CAAXMLMultiFileStream(files); CATIXMLInputStream_var customStream = customStreamImpl; customStreamImpl->Release(); customStreamImpl = NULL; ... CATISAXInputSource_var source; hr = factory->CreateInputSourceFromStream(customStream, "MyCustomSource", source); ... |
To create a custom XML source, you first need to instantiate your
custom XML stream component by doing a new
of its main
implementation class and getting its CATIXMLInputStream handle.
Then, you use the CreateInputSourceFromStream
method from the CATISAXInputSource
interface to create the custom XML source. The methods takes as a
parameter your custom implementation of the CATIXMLInputStream
interface. It uses this implementation to obtain the XML content to
parse.
The lifecycle of
your CATIXMLInputStream implementation depends on the lifecycle
the CATISAXInputSource object. As soon as the CATISAXInputSource
goes out of scope, the destructor of the CATIXMLInputStream
implementation will be called, provided that you do not have any other
references on it. You can they perform cleanup and release resources in
this destructor.
[Top]
... CATListOfCATUnicodeString readOptions; readOptions.Append("CATDoValidation"); CATListOfCATUnicodeString readOptionValues; readOptionValues.Append("false"); CATIDOMDocument_var document; hr = builder->Parse(source, document, readOptions, readOptionValues); ... |
To parse the custom input source, you need to invoke the Parse
method of the CATIXMLDOMDocumentBuilder interface. If you want
to parse using SAX, you can just as well pass the input to the Parse
method of a CATISAXParser (SAX1) or a CATISAXXMLReader
(SAX2).
The DOM parser can run in two modes: non-validating and validating.
You determine what mode is used in the Parse
method using the "CATDoValidation"
option. Options are passed to the parser using two CATListOfCATUnicodeStrings.
The first one contains the option names, the second one contains the
option values. For a discussion of non-validating parsers versus
validating parsers and how to choose which parser to instantiate, please see [3] and [4].
[Top]
... CATUnicodeString rawOutput; hr = builder->Write(document, rawOutput); ... cout << rawOutput.ConvertToChar() << endl; ... |
To obtain an XML representation of your DOM tree, call the Write
method of the CATIXMLDOMDocumentBuilder interface. The
resulting XML is returned in a CATUnicodeString. For a discussion of
supported encodings and write options, see [3] and [4].
[Top]
The XMLParser framework uses the HRESULT / CATError mechanism to
manage errors. Make sure to use the CATError::CATGetLastError
to obtain all the available error diagnostics when using XMLParser. More
information about V5 error management is available here [2]
and [4].
[Top]
This use case shows you how to create your own XML sources and use them with a DOM or a SAX parser.
[Top]
[1] | Building and Launching a CAA V5 Use Case |
[2] | Managing Errors Using HRESULT |
[3] | Using XML in V5 |
[4] | XML Tips and Tricks |
[Top] |
Version: 1 [May 2005] | Document created |
[Top] |
Copyright © 2005, Dassault Systèmes. All rights reserved.