Lingua-EN-AddressParse reviews

cpanratings
 

RSS | Module Info | Add a review of Lingua-EN-AddressParse

Lingua-EN-AddressParse (1.19) **

I found a nice interface and great documentation here, but unfortunately, I couldn't get the most basic of US address formats to parse (with country_code set to 'US', as specified):

use Lingua::EN::AddressParse;

my $address = new Lingua::EN::AddressParse();

my $error = $address->parse("1 17th Street, Denver, CO USA");

Instead of returning an error as I would expect, this code throws several warnings and then crashes:

Use of uninitialized value $country_or_code in numeric eq (==) at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 537.

Use of uninitialized value $country_or_code in hash element at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 554.

Use of uninitialized value $country_or_code in concatenation (.) or string at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.

Invalid country name: chosen, names must be in title case at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.

Can't call method "country_code" on an undefined value at /usr/local/share/perl/5.18.2/Lingua/EN/AddressParse/Grammar.pm line 912.

Perhaps it is excellent for other kinds of use cases, but it fails at the kind of address parsing that I need.

Lingua-EN-AddressParse (1.15) ****

Very useful module. A bit slow to start as it uses Parse::RecDescent, and if you have a lot of addresses from an unclean source then you will have to find an alternative way to parse them (e.g. with Regexp::Assemble). You can report failing addresses to the author via RT for his corpus.

Parsing addresses like this is a hard problem, and given that, this module is an excellent resource. 4 stars due to the hardness of the problem really.

Lingua-EN-AddressParse (1.14) ***

Note: I am told by the author that the module has been completely rewritten, and many of the problems fixed. However, I am no longer working on a project that involved parsing addresses, so I cannot verify this.

Below are previous comments for for v1.11 (which was released in 2002). I do not know if they still apply.

One major problem is that (for US addresses, anyway) it doesn't work unless addresses are very simply formatted:

123 Maple Street, Anytown, ST 12345

Anything beyond that (apartment numbers, post boxes, squares and cross streets, etc.) and it returns nothing.

Often addresses don't fit a simple pattern. (If they all matched that pattern, I wouldn't care about finding a module...)

Worse: it uses Parse::RecDescent, so it takes a few seconds to parse a simple address.

If it worked for most addresses, I'd say the "interface" and "ease of use" were great. Alas...

One note: parsing addresses is a HARD problem.