Algorithm-AhoCorasick reviews

RSS | Module Info | Add a review of Algorithm-AhoCorasick

Algorithm-AhoCorasick (0.03)

Compared to "list2re" from Data::Munge or my own "make_regex" from Convert::Moji, this module is quite slow. This example code compares a search of 1000 random dictionary words over the complete works of Shakespeare:

Here are two typical output runs:

Data::Munge::list2re time: 0.198212146759033 #matches: 3844
Algorithm::AhoCorasick time: 29.4100480079651 #matches: 3866

Data::Munge::list2re time: 0.205715894699097 #matches: 4799
Algorithm::AhoCorasick time: 29.3638379573822 #matches: 4800

Algorithm::AhoCorasick finds a few extra substring matches, so if you absolutely do need to find every single match possible, you might want to use Algorithm::AhoCorasick, but for almost every normal use case, something like Data::Munge::list2re will do the same job in about 1/100 of the total time.

You might also want to note that the time using list2re gets much faster the second time around.

Algorithm-AhoCorasick (0.02) *****

Nice product and very easy to use.

However it can cause memory problems if you need to use the find_all function many many times. In my I case I need to run find_all about 800 times against a few keyword lists of 300-8000 members

We need to be able to cache the result of
my $m = Algorithm::AhoCorasick::SearchMachine->new(@xxxxx);
So that we dont need to regenerate internal structures of the search functionality many times.

Good job.