File-Find-Duplicates reviews

RSS | Module Info

File-Find-Duplicates (1.00) ****

A valuable resource, my thanks to the author. All the empty files are listed, as a bonus, provided there are at least two of them. So it may be worth creating an empty file beforehand.

Windows people will sometimes have a back-slash at the end of a string. So they will need to double it up

my @dupes = find_duplicate_files('H:\\');

or be presented with a puzzling error.

In a line like

my @dupes = find_duplicate_files('/basedir1', '/basedir2');

one of the directories might not exist, perhaps because of a typing error. We aren't particularly notified of this, just left in blissful ignorance. Maybe this is something that could be looked at.

File-Find-Duplicates (1.00) ****

This module is a tremendous help. I've collected a lot of image and video files over the years, and this module made it a snap to compare thousands of files across multiple directories and identify the duplicates. Big thumbs up to the author!

File-Find-Duplicates (1.00) ****

My original message follows, because deleting it would just be dishonest.

I was looking for something like this.

Find identical files under some directories.
Needless to say I was very happy to find this module on CPAN, so I immediately tested it.

It took forever versus some tiny script I had written, but with identical result and no additional information.

Turns out, the author calculates the digest of all files in the directories. In most cases that is just plain unnecessary and a waste of resources.

In mid September I wrote an email to the author saying that I though his program could be improved, a possible way it could be improved and asked whether he was planning to actually change the implementation.

3 months later I still haven't received a response.

In real life scenarios this module is just simply inadequate, you are better of writting your own implementation, unless you don't mind waiting several minutes for the same result you can get in like 10 seconds. (this happened to me)


After having posted this, reading the author's answer on this site, retesting and reading the source code once more, I must admit I was incorrect about the implementation of the module. It does indeed calculate only md5 digest for the cases where more than one file has that size.

The reason why it took so long using this module turns out to have nothing to do with the module itself but a particular problem with my machine at the time I tested it.

I have updated my rating accordingly and apologise for any wrongdoing.

However I maintain that I never received an answer from my email. True I didn't use rt.cpan, I emailed the author directly.

Given that I don't have any spam filters or anything of that kind I must assume an answer was never sent. If it was (and I can't know for sure whether it was or wasn't), I never received it.

File-Find-Duplicates (1.00)

The previous rating is misleading and incorrect.

The module does not calculate an MD5 for every file, as that would indeed be hugely wasteful. It does a first pass where it discards all files that are not of identical size. Only the files that remain are checksummed for a more detailed comparison. (I am certainly open to any patches that can make that process even quicker).

This was actually pointed out in my response to Cláudio's email, but as he didn't send his email via rt.cpan there is no public record of that.