HTML-Scrubber reviews

RSS | Module Info | Add a review of HTML-Scrubber

HTML-Scrubber (0.15) *****

This module works correctly to remove unwanted HTML tags from text.

The output is nearly identical to that of HTML::Restrict. Differences include:

* Scrubber converts > and < into HTML entities, whereas Restrict doesn't. Scrubber doesn't, however, convert & into an HTML entity, as it should do if it's going to convert < and >. This bug will bite you if you have both & gt; (ampersand, lower case g, lower case t, semicolon) and > in your input. (Incidentally, CPAN ratings has the same bug:

* Scrubber leaves all the whitespace at the beginning and end of the text, whereas Restrict removes it (even including the final newline, which seems like a bug to me.) If you use the following:

my $hr = HTML::Restrict->new (

trim => 0,


you get identical outputs from either one.

* In the following test, using File::Slurper to read and write 300 HTML files, Scrubber is about 33% faster than Restrict, and about 1/3 as fast as Strip:

With Scrubber 0.15:

$ time ./

real 0m2.921s
user 0m1.145s
sys 0m0.008s

With Restrict 2.2.3:

$ time ./

real 0m4.442s
user 0m2.323s
sys 0m0.008s

With HTML::Strip 2.10:

$ time ./

real 0m0.900s
user 0m0.111s
sys 0m0.055s

For a list of similar modules and links to other reviews, please see my page at

HTML-Scrubber (0.09) ****

This is a great module for sanitizing HTML code. It supports the removal of tag and of attributes within tags, and it's very configurable.

The documentation could be improved, even though a look at the example is enough for most use cases.

HTML-Scrubber (0.08) *****

Very nice module.
It works very well and it is immensely more accurate in recognizing HTML tags than HTML::Strip.
The documentation could be a little bit clearer though.
Another thing that I would like to have, is the possibility to switch the HTML entities encoding off.