Lingua-EN-AddressParse reviews

cpanratings
 

RSS | Module Info | Add a review of Lingua-EN-AddressParse

Lingua-EN-AddressParse (1.20) *****

I've found this a very good address parsing module. In my testing it seems to do slightly better than Geo::StreetAddress::US (v1.04) on US addresses, and it also supports CA, UK, and AU address formats.

It has some support for more 'advanced' address elements like sub-properties ("Unit 2A", "STE 209"), street directionality, and property names ("Attenborough House", but only if quoted).

I also really like that address normalisation is optional, since for some of my use cases I want to be able reassemble partial addresses from their parsed components, without any normalisation.

The main limitations I've experienced so far are these:
- it doesn't seem to support intersection addresses, which seem to fairly common in the US (Geo::StreetAddress::US::parse_intersection works well though)
- it won't parse incomplete addresses (e.g. city/state/postcode without a street), so you have to make sure you have complete addresses each time
- you have to know the address country to pass into the constructor, which means you can't just pass in an address in an unknown format and have it auto-detected

I also have a few minor quibbles with the interface, but at the details are all well-documented, at least.

I'd also love more coverage than just US/CA/UK/AU, but I also understand that real-world address parsing is *hard*. I'm grateful enough for a module that does a decent job of parsing some non-US addresses to give it 5 stars.

(Not sure why Mark Stosberg was having problems with 1.19, his examples all work fine for me on 1.20)

Lingua-EN-AddressParse (1.20)

The review for version 1.19 is based on incorrect usage of the module.

Firstly, the 'new' method is called without supplying the argument to specify the country that the address format belongs to (the reviewer states "with country_code set to 'US', as specified", but this is not reflected in his code).

This argument is described at the very start of the module synopsis:
------
use Lingua::EN::AddressParse;

my %args =
(

country => 'Australia',

auto_clean => 1,

....
);

my $address = new Lingua::EN::AddressParse(%args);
------

However, I have also released a version 1.20 that gives a more helpful error message when the mandatory argument such as country is omitted.

The following output shows the reviewers sample data being correctly
parsed:

my %args =
(

country => 'US',

auto_clean => 1,

force_case => 1,

force_post_code => 0,

abbreviate_subcountry => 0,

abbreviated_subcountry_only => 1
);

my $address = new Lingua::EN::AddressParse(%args);
my $address_input = "1 17th Street, Denver, CO USA";
my $error = $address->parse($address_input);

Original Input : 1 17th Street, Denver, CO USA
Cleaned Input : 1 17th Street Denver CO USA
Country address format : US
Address type : suburban
Non matching part :
Error : 0
Error descriptions :
Case all : 1 17th Street Denver CO USA
COMPONENTS :
country : USA
post_box :
post_code :
pre_cursor :
property_identifier : 1
property_name :
road_box :
street : 17th
street_direction :
street_type : Street
sub_property_identifier :
subcountry : CO
suburb : Denver

Lingua-EN-AddressParse (1.19) **

I found a nice interface and great documentation here, but unfortunately, I couldn't get the most basic of US address formats to parse (with country_code set to 'US', as specified):

use Lingua::EN::AddressParse;

my $address = new Lingua::EN::AddressParse();

my $error = $address->parse("1 17th Street, Denver, CO USA");

Instead of returning an error as I would expect, this code throws several warnings and then crashes:

Use of uninitialized value $country_or_code in numeric eq (==) at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 537.

Use of uninitialized value $country_or_code in hash element at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 554.

Use of uninitialized value $country_or_code in concatenation (.) or string at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.

Invalid country name: chosen, names must be in title case at /usr/local/share/perl/5.18.2/Locale/SubCountry.pm line 561.

Can't call method "country_code" on an undefined value at /usr/local/share/perl/5.18.2/Lingua/EN/AddressParse/Grammar.pm line 912.

Perhaps it is excellent for other kinds of use cases, but it fails at the kind of address parsing that I need.

Lingua-EN-AddressParse (1.15) ****

Very useful module. A bit slow to start as it uses Parse::RecDescent, and if you have a lot of addresses from an unclean source then you will have to find an alternative way to parse them (e.g. with Regexp::Assemble). You can report failing addresses to the author via RT for his corpus.

Parsing addresses like this is a hard problem, and given that, this module is an excellent resource. 4 stars due to the hardness of the problem really.

Lingua-EN-AddressParse (1.14) ***

Note: I am told by the author that the module has been completely rewritten, and many of the problems fixed. However, I am no longer working on a project that involved parsing addresses, so I cannot verify this.

Below are previous comments for for v1.11 (which was released in 2002). I do not know if they still apply.

One major problem is that (for US addresses, anyway) it doesn't work unless addresses are very simply formatted:

123 Maple Street, Anytown, ST 12345

Anything beyond that (apartment numbers, post boxes, squares and cross streets, etc.) and it returns nothing.

Often addresses don't fit a simple pattern. (If they all matched that pattern, I wouldn't care about finding a module...)

Worse: it uses Parse::RecDescent, so it takes a few seconds to parse a simple address.

If it worked for most addresses, I'd say the "interface" and "ease of use" were great. Alas...

One note: parsing addresses is a HARD problem.