File-Type reviews

RSS | Module Info

File-Type (0.22)

This one more or less works for the basic cases, as long as you only need file types that were common up to 2004. Some interesting failures were that it labelled an Excel file (old format) as an msword file, a pure ASCII file as "application/octet-stream", and it seems to like adding "x-" in front of things, so you get "image/x-png" for png images, or "image/x-bmp" for bmp images. Interesting successes were that, unlike File::MMagic, it managed to spot non-ASCII bytes in a text file, and it managed to distinguish an executable from a stream of binary data. It has no idea about XML or SVG and puts anything remotely Unicode into "application/octet-stream".

For a full comparison including source code and results on various files, see

File-Type (0.22) *

Could not correct determine mime type of a file that the 'file' command could. Switched to using File::LibMagic.

File-Type (0.22) *

As another reviewer has said, this module tends to conclude "application/octet-stream" for many kinds of files, making it not very useful.

The currently recommended module in this area seems to be File::LibMagic. Other alternatives include File::MMagic (slow, has quite a few bugs, no longer maintained), File::MMagic::XS (also not actively maintained? long standing bug like failure to parse system magic file still persists), Media::Type::Simple (only maps MIME type from/to file extension).

File::Type / File-Type (0.22) **

Before I begin, let me note that I use this module myself and in functionality terms it should probably score 4 out of 5.

As a campainer against module bloat I find this module rather offensive. The author has taken 128 magic entries (compared to nearly 5000 in /etc/magic) and converted them to rather verbose Perl and loads all of it for the entire time the program is running.

The result? 1.5 meg of RAM to check 128 file types. I would imagine this would mean that to check to the same depth as /etc/magic you would need 50 meg of RAM in order to load it all.

In addition, the tests are in no particular order. I'm not sure whether or not order is important in the source database, but I imagine it would get quite slow when you need to get to entry 50 or 100 just to detect a common file like a .gif.

What maybe the author needs to do is to start with inline code for the most common 10 or 20 file types (gif, jpg, mp3, Word doc, etc) and then load the database in and process it in interpreted form. To help with the load issues, he could compile the code for any file type found and add it to the list of inline tests.

So, a good idea, executed with no respect for system resource, with this score given on that basis.