Lingua-EN-AddressParse reviews

cpanratings
 

RSS | Module Info | Add a review of Lingua-EN-AddressParse

Lingua-EN-AddressParse (1.20)

The review for version 1.19 is based on incorrect usage of the module.

Firstly, the 'new' method is called without supplying the argument to specify the country that the address format belongs to (the reviewer states "with country_code set to 'US', as specified", but this is not reflected in his code).

This argument is described at the very start of the module synopsis:
------
use Lingua::EN::AddressParse;

my %args =
(

country => 'Australia',

auto_clean => 1,

....
);

my $address = new Lingua::EN::AddressParse(%args);
------

However, I have also released a version 1.20 that gives a more helpful error message when the mandatory argument such as country is omitted.

The following output shows the reviewers sample data being correctly
parsed:

my %args =
(

country => 'US',

auto_clean => 1,

force_case => 1,

force_post_code => 0,

abbreviate_subcountry => 0,

abbreviated_subcountry_only => 1
);

my $address = new Lingua::EN::AddressParse(%args);
my $address_input = "1 17th Street, Denver, CO USA";
my $error = $address->parse($address_input);

Original Input : 1 17th Street, Denver, CO USA
Cleaned Input : 1 17th Street Denver CO USA
Country address format : US
Address type : suburban
Non matching part :
Error : 0
Error descriptions :
Case all : 1 17th Street Denver CO USA
COMPONENTS :
country : USA
post_box :
post_code :
pre_cursor :
property_identifier : 1
property_name :
road_box :
street : 17th
street_direction :
street_type : Street
sub_property_identifier :
subcountry : CO
suburb : Denver

Lingua-EN-AddressParse (1.19) **

I found a nice interface and great documentation here, but unfortunately, I couldn't get the most basic of US address formats to parse (with country_code set to 'US', as specified):

use Lingua::EN::AddressParse;

my $address = new Lingua::EN::AddressParse();

my $error = $address->parse("1 17th Street, Denver, CO USA");

Instead of returning an error as I would expect, this code throws several warnings and then crashes:

Use of uninitialized value $country_or_code in numeric eq (==) at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 537.

Use of uninitialized value $country_or_code in hash element at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 554.

Use of uninitialized value $country_or_code in concatenation (.) or string at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.

Invalid country name: chosen, names must be in title case at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.

Can't call method "country_code" on an undefined value at /usr/local/share/perl/5.18.2/Lingua/EN/AddressParse/Grammar.pm line 912.

Perhaps it is excellent for other kinds of use cases, but it fails at the kind of address parsing that I need.

Lingua-EN-AddressParse (1.15) ****

Very useful module. A bit slow to start as it uses Parse::RecDescent, and if you have a lot of addresses from an unclean source then you will have to find an alternative way to parse them (e.g. with Regexp::Assemble). You can report failing addresses to the author via RT for his corpus.

Parsing addresses like this is a hard problem, and given that, this module is an excellent resource. 4 stars due to the hardness of the problem really.

Lingua-EN-AddressParse (1.14) ***

Note: I am told by the author that the module has been completely rewritten, and many of the problems fixed. However, I am no longer working on a project that involved parsing addresses, so I cannot verify this.

Below are previous comments for for v1.11 (which was released in 2002). I do not know if they still apply.

One major problem is that (for US addresses, anyway) it doesn't work unless addresses are very simply formatted:

123 Maple Street, Anytown, ST 12345

Anything beyond that (apartment numbers, post boxes, squares and cross streets, etc.) and it returns nothing.

Often addresses don't fit a simple pattern. (If they all matched that pattern, I wouldn't care about finding a module...)

Worse: it uses Parse::RecDescent, so it takes a few seconds to parse a simple address.

If it worked for most addresses, I'd say the "interface" and "ease of use" were great. Alas...

One note: parsing addresses is a HARD problem.