When I wanted to convert a series of HTML table files to CSV, this class made the HTML parsing beautifully simple.
The module repesents an HTML table as an array of arrays. By happy coincidence, Text::CSV_XS represents a CSV file in exactly the same way. To convert HTML to CSV, you just pass the table object to the csv function. It's that easy!
As Graham Stead said, the documentation would benefit with more detail. For example, it's not clear how the superclass methods are overriden and what they return.
I'm more comfortable with Python, but I couldn't find an equivalent module that makes it so easy. Sebastien Sauvage's html2csv in Python is even simpler to use, but it's not in a package or actively maintained.
An excellent module to handle the fairly common task of extracting data from a HTML table; no need for ugly scraping code, you can just tell this module "Here's some HTML; find me a table with headings named $headers, then get me the data. If the page changes, no problem - as long as the table still has the same column headings (even if their order changes), it'll still be found with no issues.
You can also identify the table you want by name/id or various attributes, if you need to.
An essential part of your toolkit for screen-scraping.
This module is very handy for getting the entries out of tables quickly. However it has some flaws. For example it's not possible to get the attributes of the <td> and other tags which form the table, so if you need to extract only the elements which have a certain name or class, you'll be stuck with this.
There is a way around the problem but it's complicated.
The other big problem with this module is that it's broken on Cygwin and Windows.
Excellent module, much easier than Template::Extract and HTML::TreeBuilder for extracting data from web pages in many cases, and one even doesn't have to look into the HTML source being processed.
My only complaint is the encoding problem. When dealing with pages in non-ascii and non-utf8 encodings like GB2312, it just refuses to match headers. I have to convert the HTML input to UTF-8 manually all the time. I think it may be a problem on the HTML::Parser side... So UTF-8 is always my best friend. :)
This module helped me create a parser that I struggling to build any other way. The headers feature is *very* handy and provides great basic functionality. If you need to go beyond this, be prepared to spend a bit of time understanding how things work; I found Matt's examples (www.mojotoad.com/sisk/projects/HTML-T... to be helpful (and necessary). I give this module an overall high score because its great functionality trumps everythings else. It would have been even better if it were more intuitive (granted, this is highly subjective), or if the off-line examples were referenced in the POD. Kudos!