HTML-Restrict reviews

RSS | Module Info

HTML-Restrict (2.2.4) *****

Of the HTML tag restricting modules on CPAN, this, along with HTML::Scrubber, is reliable and can be recommended. Like HTML::Scrubber, it's based on HTML::Parser, and its outputs are very similar to HTML::Scrubber's.

* It is quite significantly slower to load and about 1/3 slower to run than HTML::Scrubber. It is about five times slower than HTML::Strip.

* Both of the modules have a nearly identical bug related to the <br> tag:



* Like HTML::Scrubber, it doesn't convert tags into a reasonable whitespace equivalent, so text containing <br> tag like

I wondered lonely as a cloud<br>That floats on high o'er vales and hills,

will be converted into

I wondered lonely as a cloudThat floats on high o'er vales and hills,

HTML::Strip doesn't have this bug, but it has another one related to adding whitespace where it isn't necessary.

* Unlike HTML::Scrubber, it doesn't turn > and < into HTML entities, which means it's more useful for converting HTML into text.

In the end, this was the module I chose to use, because
* I can easily preprocess around the <br> issue;
* Failing to process Unicode is a huge nuisance for me, so HTML::Laundry is out;
* Pre-and-post processing of the HTML to remove duplicate text takes far longer than the running time of HTML::Restrict, and I'm not doing it in response to user input but to make static text versions of HTML pages for searching within them, so the speed of HTML::Strip gains me little;
* Not having to remove the HTML entities from the output text as I would do with HTML::Scrubber is a minor convenience which tips the balance in favour of HTML::Restrict.

For a list of similar modules and links to other reviews, please see my page at

HTML-Restrict (2.1.0)

GREAT module. Exactly what I was looking for. It's so simple to use and does exactly what it says. And the new feature to perform regex on attributes is awesome!

HTML-Restrict (1.0.4) *****

While I haven't had any direct need for it yet, this is exactly what I was looking for not too long ago. I, personally, think this is a much better way of accepting formatted user input, in a comment form on a website, for example.
Much more flexible than BBCode, which is why I much prefer HTML.

Good distribution, I will certainly be using this in the future.

HTML-Restrict (1.0.4) ***

Nice module. I used it to clean out the style elements from 100+ HTML files and it worked like a charm, just add rules in %rules and you're done :)