| Module Info
| Add a review of HTML-Restrict
Of the HTML tag restricting modules on CPAN, this, along with HTML::Scrubber, is reliable and can be recommended. Like HTML::Scrubber, it's based on HTML::Parser, and its outputs are very similar to HTML::Scrubber's.
* It is quite significantly slower to load and about 1/3 slower to run than HTML::Scrubber. It is about five times slower than HTML::Strip.
* Both of the modules have a nearly identical bug related to the <br> tag:
* Like HTML::Scrubber, it doesn't convert tags into a reasonable whitespace equivalent, so text containing <br> tag like
I wondered lonely as a cloud<br>That floats on high o'er vales and hills,
will be converted into
I wondered lonely as a cloudThat floats on high o'er vales and hills,
HTML::Strip doesn't have this bug, but it has another one related to adding whitespace where it isn't necessary.
* Unlike HTML::Scrubber, it doesn't turn > and < into HTML entities, which means it's more useful for converting HTML into text.
In the end, this was the module I chose to use, because
* I can easily preprocess around the <br> issue;
* Failing to process Unicode is a huge nuisance for me, so HTML::Laundry is out;
* Pre-and-post processing of the HTML to remove duplicate text takes far longer than the running time of HTML::Restrict, and I'm not doing it in response to user input but to make static text versions of HTML pages for searching within them, so the speed of HTML::Strip gains me little;
* Not having to remove the HTML entities from the output text as I would do with HTML::Scrubber is a minor convenience which tips the balance in favour of HTML::Restrict.
For a list of similar modules and links to other reviews, please see my page at www.lemoda.net/perl/html-cleanup-modu...
GREAT module. Exactly what I was looking for. It's so simple to use and does exactly what it says. And the new feature to perform regex on attributes is awesome!
While I haven't had any direct need for it yet, this is exactly what I was looking for not too long ago. I, personally, think this is a much better way of accepting formatted user input, in a comment form on a website, for example.
Much more flexible than BBCode, which is why I much prefer HTML.
Good distribution, I will certainly be using this in the future.
Nice module. I used it to clean out the style elements from 100+ HTML files and it worked like a charm, just add rules in %rules and you're done :)