This module works correctly to remove unwanted HTML tags from text.
The output is nearly identical to that of HTML::Restrict. Differences include:
* Scrubber converts > and < into HTML entities, whereas Restrict doesn't. Scrubber doesn't, however, convert & into an HTML entity, as it should do if it's going to convert < and >. This bug will bite you if you have both & gt; (ampersand, lower case g, lower case t, semicolon) and > in your input. (Incidentally, CPAN ratings has the same bug: github.com/perlorg/perlweb/issues/213)
* Scrubber leaves all the whitespace at the beginning and end of the text, whereas Restrict removes it (even including the final newline, which seems like a bug to me.) If you use the following:
my $hr = HTML::Restrict->new (
trim => 0,
);
you get identical outputs from either one.
* In the following test, using File::Slurper to read and write 300 HTML files, Scrubber is about 33% faster than Restrict, and about 1/3 as fast as Strip:
Very nice module.
It works very well and it is immensely more accurate in recognizing HTML tags than HTML::Strip.
The documentation could be a little bit clearer though.
Another thing that I would like to have, is the possibility to switch the HTML entities encoding off.