| Module Info
| Add a review of Lingua-EN-AddressParse
I found a nice interface and great documentation here, but unfortunately, I couldn't get the most basic of US address formats to parse (with country_code set to 'US', as specified):
my $address = new Lingua::EN::AddressParse();
my $error = $address->parse("1 17th Street, Denver, CO USA");
Instead of returning an error as I would expect, this code throws several warnings and then crashes:
Use of uninitialized value $country_or_code in numeric eq (==) at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 537.
Use of uninitialized value $country_or_code in hash element at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 554.
Use of uninitialized value $country_or_code in concatenation (.) or string at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.
Invalid country name: chosen, names must be in title case at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.
Can't call method "country_code" on an undefined value at /usr/local/share/perl/5.18.2/Lingua/EN/AddressParse/Grammar.pm line 912.
Perhaps it is excellent for other kinds of use cases, but it fails at the kind of address parsing that I need.
Very useful module. A bit slow to start as it uses Parse::RecDescent, and if you have a lot of addresses from an unclean source then you will have to find an alternative way to parse them (e.g. with Regexp::Assemble). You can report failing addresses to the author via RT for his corpus.
Parsing addresses like this is a hard problem, and given that, this module is an excellent resource. 4 stars due to the hardness of the problem really.
Note: I am told by the author that the module has been completely rewritten, and many of the problems fixed. However, I am no longer working on a project that involved parsing addresses, so I cannot verify this.
Below are previous comments for for v1.11 (which was released in 2002). I do not know if they still apply.
One major problem is that (for US addresses, anyway) it doesn't work unless addresses are very simply formatted:
123 Maple Street, Anytown, ST 12345
Anything beyond that (apartment numbers, post boxes, squares and cross streets, etc.) and it returns nothing.
Often addresses don't fit a simple pattern. (If they all matched that pattern, I wouldn't care about finding a module...)
Worse: it uses Parse::RecDescent, so it takes a few seconds to parse a simple address.
If it worked for most addresses, I'd say the "interface" and "ease of use" were great. Alas...
One note: parsing addresses is a HARD problem.