XML::Struct::Reader(3) Read XML streams into XML data structures

SYNOPSIS


my $reader = XML::Struct::Reader->new( from => "file.xml" );
my $data = $reader->read;

DESCRIPTION

This module reads an XML stream (via XML::LibXML::Reader) into XML::Struct/MicroXML data structures.

METHODS

read = readNext ( [ $stream ] [, $path ] )

Read the next XML element from a stream. If no path option is specified, the reader's path option is used (""*"" by default, first matching the root, then every other element).

readDocument( [ $stream ] [, $path ] )

Read an entire XML document. In contrast to "read"/"readNext", this method always reads the entire stream. The return value is the first element (that is the root element by default) in scalar context and a list of elements in array context. Multiple elements can be returned for instance when a path was specified to select document fragments.

readElement( [ $stream ] )

Read an XML element from a stream and return it as array reference with element name, attributes, and child elements. In contrast to method "read", this method expects the stream to be at an element node ("$stream->nodeType == 1") or bad things might happed.

readAttributes( [ $stream ] )

Read all XML attributes from a stream and return a (possibly empty) hash reference.

readContent( [ $stream ] )

Read all child elements of an XML element and return the result as (possibly empty) array reference. Significant whitespace is only included if option "whitespace" is enabled.

CONFIGURATION

from
A source to read from. Possible values include a string or string reference with XML data, a filename, an URL, a file handle, instances of XML::LibXML::Document or XML::LibXML::Element, and a hash reference with options passed to XML::LibXML::Reader.
stream
A XML::LibXML::Reader to read from. If no stream has been defined, one must pass a stream parameter to the "read..." methods. Setting a source with option "from" automatically sets a stream.
attributes
Include attributes (enabled by default). If disabled, the representation of an XML element will be

   [ $name => \@children ]

instead of

   [ $name => \%attributes, \@children ]
path
Optional path expression to be used as default value when calling "read". Pathes must either be absolute (starting with ""/"``) or consist of a single element name. The special name ''"*"" matches all element names.

A path is a very reduced form of an XPath expressions (no axes, no "".."", no node tests, "//" only at the start...). Namespaces are not supported yet.

whitespace
Include ignorable whitespace as text elements (disabled by default)
ns
Define how XML namespaces should be processed. By default (value '"keep"'), this document:

    <doc>
      <x:foo xmlns:x="http://example.org/" bar="doz" />
    </doc>

is transformed to this structure, keeping namespace prefixes and declarations as unprocessed element names and attributes:

    [ 'doc', {}, [
        [
          'x:foo', {
              'bar' => 'doz',
              'xmlns:x' => 'http://example.org/'
          }
        ]
    ]

Setting this option to '"strip"' will remove all namespace prefixes and namespace declaration attributes, so the result would be:

    [ 'doc', {}, [
        [
          'foo', {
              'bar' => 'doz'
          }
        ]
    ]

Setting this option to '"disallow"' results in an error when namespace prefixes or declarations are read.

Expanding namespace URIs ('"expand'") is not supported yet.

simple
Convert XML to simple key-value structure (SimpleXML) with XML::Struct::Simple.
depth
Only transform to a given depth, starting at 0 for the root node. Negative values, non-numeric values or "undef" are ignored (unlimited depth as default).

XML elements below the depth are converted to SimpleXML by default or to MicroXML if option "simple" is enabled. This can be configured with option "deep".

This option is useful for instance to access document-oriented XML embedded in data oriented XML.

deep
How to transform elements below given "depth". This option is experimental.
root
Include root element when converting to SimpleXML. Disabled by default.
content
Name of text content when converting to SimpleXML.