HTML-TagFilter reviews

cpanratings
 

RSS | Module Info | Add a review of HTML-TagFilter

HTML-TagFilter (1.03)

Although it's quite old, the module installed without problems.

The default behaviour seems quite zany. The default setup converts comments in the HTML into escaped entities, like this:

& lt;!-- comment --& gt;

If I added this to the object creation:

my $htf = HTML::TagFilter->new (

strip_comments => 1,

);

then it would correctly strip out the comments, so it must be recognising them as comments, and yet it does this zany conversion which results in visible things appearing which were meant to be HTML comments. I don't see anywhere in the documentation where it explains the rationale for that, and it just seems like a bug to me.

Like HTML::Detoxifier, it also leaves the contents of scripts between <script> and </script> intact, while removing the tags, but converting quotation marks in the JavaScript into " entities. It also converted the HTML5 doctype into entities, like this:

& lt; !DOCTYPE HTML& gt;

This module probably didn't work correctly at the time of its most recent release, in 2005, and it cannot be recommended in 2017. I suggest trying out HTML::Restrict, HTML::Scrubber, or HTML::Strip instead.

(There is a bug in cpan ratings where it is failing to convert & amp; into & correctly, so excuse me if this text becomes incomprehensible after repeated edits. To work around the cpan ratings issue I have used a space in the above.)

For a list of similar modules and links to other reviews, please see my page at www.lemoda.net/perl/html-cleanup-modu...