Once again, this is a review of "Encode::Guess", not "Encode" (cpanratings needs work).
Encode::Guess is one scary module... in my first trial with it, I asked it to distinguish between latin-1 and utf-8, and it guessed that the latin-1 file was utf8, even though I'd intentionally put some octets in there that wouldn't be valid as utf8. Peeking at it internally, I see it's using the Encode::is_utf8 function, which the perl porters swear up and down is not at all a reliable way of identifying utf8 data (it's apparently very badly named).
What it's actually trying to do is a little difficult to understand from the documentation, and I suspect you need to know a fair amount about character encodings for it to make sense at all (e.g. there is no way for it to tell the difference between many encodings, and it's your problem to avoid giving it an ambiguous choice in the list of suspects).
I maintain that Perl couldn't spread so much up to Asian countries without Encode. This module was written by the late Nick Ing-Simmons at the outset, but many authors have since contributed to it. I'd offer my condolences on the late Nick Ing-Simmons, and I, as a user, am indebted to Kogai Dan-san representing the authors for their endeavours.
This is a review of "Encode::Guess", not "Encode" (blame cpanratings for this).
Encode::Guess is a Perl module which guesses the kind of character encoding of some data. It's mostly intended for distinguishing between the dozens of confusing encoding systems used in Japanese and Chinese and other such languages, and doesn't make many promises about being able to guess the encodings of European character sets.
My main interest is in Japanese encodings, and I used Encode::Guess
successfully to decode some files written in either CP932 (Shift JIS) or UTF-8.
Although the module worked, and did its job well, I still have to say that this implementation is counterintuitive. First of all the documentation is very bizarre. The very first thing it tells you, the example in the "Synopsis" section of the documentation, to use Encode::decode ("Guess",...) is horrible and you will be miserable if you try to use this. I guess most people will do the same thing as me and start with the example in the synopsis before they read the whole of the documentation. It was only after struggling with the decode ("Guess",...) method that I came across another way to access this module's functions, buried down in the middle of the documentation. This is to make a $decoder object using Encode::Guess->guess.
The interface is idiosyncratic, but this worked very much more smoothly than using decode ("Guess",...). For example, decode ("Guess",...) just dies if it encounters an error, and it seems to be quite buggy. The object method doesn't die on any kind of error, but just returns a string (instead of a ref) if it fails. [So you have to check whether you got a string or a ref back to find out whether it failed, which is odd, but never mind.] Using the decoder turned out to accomplish what I had wanted to do very well.
As a suggestion for improvement, one thing I think a lot of people would find useful is a way to ask the module to guess the encoding of a file without my having to read the file, catch errors, close the file again, and then decode the contents. A simple routine which takes a file name as an argument and returns a string to send to "encoding", as in
my $XYZ = tell_me_the_file_encoding_of($weirdfile);
die "Unknown encoding" if !$XYZ;
open my $input, "<:encoding($XYZ)", $weirdfile or die $!;
would be handy.
In general this is a highly useful module but I can't give five stars since the interface and documentation are fairly iffy, and the decode ("Guess" thing even seems quite buggy.
I use this module to convert documents from utf8 to a user defined encoding.
Installation: I could not find it on YAST, so I used 'make' and so on. It is clearly a big module, because it generated a lot of files, but it worked flawlessly.
Usage: simple and straightforward. With 3 lines of code I had all I needed:
my @list = Encode->encodings(':all');
Encode::from_to($string, 'utf8', $selected_encoding);
The @list contains all available encodings which I feed into a selection list. The from_to function does the transformation. I checked it with German, Czech, Cyrillic and Hebrew characters and it all worked fine.
This module works very well. Though I am deeply mystified by Unicode, charsets, and so on (isn't everyone?), this module comes through for me.
2 hidden unhelpful reviews