Unicode-Unihan reviews

RSS | Module Info

Unicode-Unihan (0.04)

As it stands, this module is not very useful.

If you have a copy of Unihan.txt, you can do the following to get the same results as the example given in this module, assuming $c contains the character you want to look up:

system ("grep U\\+".sprintf ("%X",ord ($c))." Unihan.txt | grep Mandarin");

So, since it's possible to replace the only function of this module with "grep", I'm not really sure what the point of the module is. Perhaps Unicode::Unihan has a speed advantage. But that is the only advantage that I can imagine.

Also, the AUTOLOAD-based interface is not very practical. If I want to look up the kMandarin key of the Unihan database, I have to access it via a named method like

$uh->Mandarin ($c)

that means that it's hard (messy) to access an arbitrary lookup key at run time, and unless I want to run around eval'ing, the lookup keys have to be hard-coded at compile time. That makes it hard to use this module as a backend for a user accessing a dictionary front end. It would be much cleaner to allow access via something like

$uh->lookup ('Mandarin', $c)

Using AUTOLOAD to create a method to look up the key is an example of someone doing something which is "too clever" and not actually useful.

Another deficiency of the module is that it doesn't have a method to grab all the keys associated with a particular character, like

my $hash_ref_containing_everything = $uh->all_info ($c)

That seems like an obvious functionality, and it's the first thing I'd implement if I was making this module. And, we should be able to go from a value, like the Mandarin pronunciation above, to the characters, so I can go

my @list_of_chars = $uh->lookup_values ('Mandarin', 'TUAN');

not just from a specific character to its specific value of one key.

Finally, for the sake of convenience it would be nice if there was a way to get a list of all the keys in the database other than going and looking at Unicode's web page.