XML-Parser reviews

RSS | Module Info

XML-Parser (2.44) *****

Highly useful because of its speed, simplicity and adherence to standard Perl paradigms.

XML-Parser (2.44)

I sometimes need to parse the horrible format known as XML. What to do? The obvious solution in Perl seems to be XML::Parser.

However, XML::Parser has been a miserable experience. Yes, it works, but it is not nice. The problem is that the nature of the callback structure forces you either to use untidy global variables, or jam some foreign object into the parser object itself in order to keep your data somewhere as you parse the XML. In other words, there is no sensible place in XML::Parser to keep your data as you parse your XML file. It's as if it was designed only to parse the data but not ever do anything else with it.

This leads to the problem I've had with this, that every time I use XML::Parser, and want to do something slightly different, I end up having to copy and paste the entire parsing code, even if I am reading exactly the same file. It's very difficult to bundle everything up into a module to read the file and so I tend to end up with hundreds of similar and yet annoyingly incompatible parsers for exactly the same data.

However, today I came to CPAN ratings, looked at the other ratings for XML::Parser, and found recommendation for XML::SAX. I went and looked at the documentation, and found the solution to my problem is to use a closure:

"The only way currently with XML::Parser to safely maintain state is to use a closure:

my $state = MyState->new();

$parser->setHandlers(Start => sub { handle_start($state, @_) });"

That's useful to know.

XML-Parser (2.34) ***

I do with there were examples in the documentation (*), but I was able to write a simple parser to do what I needed in about 15 minutes.

The Tree, Subs and Object modules look like they might make my life easier by putting everything into hashes, objects, etc, but they are so negligibly documented that it's just easier to write a Parser as I know what it should do, versus hacking with dumps of data structures to interpret what is going on.

(*) at least as viewed in search.cpan.org, I didn't poke around very much or look closely at the samples subdirectory in the distro (shouldn't it be called "eg"?)

XML-Parser (2.34) ****

XML::Parser was the first decent Perl interface to XML. It links to James Clark's Expat, the original XML-parsing C library, which handles most of the heavy lifting.

XML::Parser uses the event-driven parsing model as opposed to the tree-based model: rather than parsing XML into a big, treelike data structure, with XML::Parser you define event handlers for certain events (like the start of the document, encountering the start of a tag, encountering an XML comment) that get called as those particular events occur.

XML::Parser works; it's quite mature, as is the underlying C library. XML::Parser is very, very fast. XML::Parser is very well documented.

The only gripes I have with XML::Parser are over its API. It appears that a naming convention is non-existant. If there is indeed a naming convention, then it's an extremely confusing one. For example, some methods are named with the first letter of every subsequent word uppercased, like setHandlers(). Most have an underscore separating each world, a la perlstyle, like current_element(), eq_name(), new_ns_prefixes(), or default_current(). Still some have nothing separating words, like parsestring() or parsefile(). The named parameter syntax suffers from similar problems. Maybe this is attributable to the fact that there were two different authors (three if you count Clark himself) or maybe there really is some big, complicated naming convention that makes perfect sense to the current maintainer, but either way I find all of this extremely unintuitive.

As mentioned by another reviewer, the SAX modules are another, more standardized option�one with an API devoid of any of the problems enumerated above. XML::Parser still has much wider distribution, though, and there are many SAX modules to choose from (XML::SAX::PurePerl, XML::Parser::PerlSAX, XML::LibXML::SAX, for example), which is either a very good thing or a very bad thing depending on your preference.

The inertia now in the XML community is definately away from specialized tools like Expat and towards more standardized APIs like SAX.

XML-Parser (2.34) *****

Forget what everyone else says about XML::Parser. Those SAX nuts are just trying to kill it. Use it while you are still able!

XML-Parser (2.34) ****

This is a good solid module but you should not use it for any new code you are writing.

If you want an event based parser, the SAX API is superior to XML::Parser's API and is supported by a number of different parser libraries - write your code to the SAX API and you can use whichever parser happens to be installed.

If you want a tree style parser then you should look at either XML::Twig (for a strong Perl flavour) or a DOM module with XPath support such as XML::LibXML, XML::XPath or XML::GDOME.

See also the Perl-XML FAQ: perl-xml.sourceforge.net/faq/

XML-Parser (2.33) *****

Seems that there is a dependency upon an expat.h file not included in the distribution - suspect it has something to do with the Apache Server.
2 hidden unhelpful reviews