Lingua::StopWords provides lists of short words like "to" or "and" for several languages which should be neglected in searches. I installed Lingua::StopWords for use in a web log parser to remove uninteresting words from its list of search keywords. As far as I can see it works pretty well; some words it doesn't eliminate include "us" and "can", but since these could be "United States" or "Can of coca-cola" perhaps they are border cases.
The explanation of the module provides an example using "grep" of removing the stopwords from a list which I copied into my program. Although this is very simple, it would be preferable if this was provided as a method or procedure in the module.
Disclaimer: I have only used the English words part of this module. It provides lists for a lot of other languages but I didn't use them, so please consider this a review of the English-language parts only.
1 out of 1 found this review helpful. Was this review helpful to you? Yes No