File-DirWalk reviews

RSS | Module Info

File-DirWalk (0.3) **

I recently had to solve a directory recursion problem, and naturally I first searched CPAN to find a directory walking module that would allow me to avoid writing my own scaffolding code. I found File::DirWalk which seemed to be just what I needed, but unfortunately, I found out that appearances can be deceiving. Here are some of the caveats that the documentation does not discuss but that may limit the usefulness of this module a great deal.

First, the interface is simple, which is not necessarily bad, but in this case it's just too simple. The callbacks you specify are invoked with exactly one parameter, which is the full name of the file or directory in question. You typically want to have this name, but very often, you want just the filename relative to its enclosing directory, and when using File::DirWalk, the simplest way to get this name is to use a regex to extract it from the parameter. This is ironic because File::DirWalk has just assembled the full name from a list of components, and now you need to undo this operation yourself. This is inefficient, inelegant and most certainly inconvenient. The same goes for other parameters like the nesting depth. You have to use regexes or keep track of it yourself.

Second, there is the self-contradicting documentation. The description of all callbacks says "Function must return true.", but here is an excerpt of a paragraph that follows later: "The module provides the following constants: SUCCESS, FAILED, ABORTED and PRUNE (1, 0, -1, -10) which you can use within your callback code.".

So instead of "true" (a non-zero value?), success is indicated by returning SUCCESS, and you may also return one of the other constants, flat-out contradicting the claim that the callback functions must return true (i.e. SUCCESS).

Third, this module is unsuitable to cover even relatively simple use cases. I quickly learned that if you don't want to process particular files you still have to return SUCCESS to make File::DirWalk continue walking the tree. This is not particularily elegant, but things could be worse. If you want to skip individual directories, however, then prepare for unpleasant surprises. The documentation does not even attempt to explain the difference betweeen the constants SUCCESS, FAILED, ABORTED and PRUNE, except that PRUNE will skip out of the current directory. Here's the catch: There is *no simple way* to skip an individual directory. If you bother to read the only reliable documentation, the source code, you will see that the only constants that actually cares to check are SUCCESS and PRUNE. But if you return PRUNE from onDirEnter, File::DirWalk will not only skip this directory, but also *all other children of its parent directory* that were not already processed. And if you return FAILED, ABORTED or the number of inhabitants of the autonomous republic of Tuwa, will quit walking the entire tree altogether! How would you actually skip just that directory? Here's the only way I can think of:

1. in onDirEnter set a flag (shared with other handlers) "$skip" and return SUCCESS.
2. in onFile, test if the flag is set and if so, return PRUNE
3. same in onDirEnter
4. in onDirLeave reset the flag

This is a horrible workaround, but it's necessary even for this simple use case. This means the interface is broken, not unfixable, but still broken.

All in all, I cannot recommend using this module, although I don't know much about possible alternatives (e.g. File::Find which has been mentioned by other reviewers). The range of use cases conveniently covered by this module is simply too narrow.

File-DirWalk (0.2) **

In addition to the issues in Robert’s review, it’s also missing any useful tests.

Note that any remotely recent version of File::Find actually does offer extra hooks for entering and leaving directories etc. The bottom line is that using this module will add a dependency to your code without providing anything not otherwise doable – not even a better interface.

A quick glance reveals that it at least doesn’t run afoul of the classic directory recursing trap, looping infinitely in circularly symlinked trees.

File-DirWalk (0.1) **

How is this better than File::Find (which is a core module)? Maybe in that it gives various callbacks for entering/leaving a directory, but otherwise it has less features. There's no way to control whether to process files or subdirectories first, nor does it provide for filtering etc.

It uses global symbols in opendir(DIR, ...) instead of scoped variables, and does not use File::Spec for portable filename handling, so I'd be leary about using this in production code.