Street and Postal Address Parsing- How it Works?
Are you a developer or Working on a project with location-related requirements? If yes, you might require to perform address parsing from strings, standardizing, and more.
But wait…what is address parsing?
Addresses are entered as strings when users enter them. The components, such as the house address, street, city, and state, must be divided into categories before checking if it is correct… A test is then conducted to verify (or not) every piece of the address.
Address Parsing is a crucial aspect of completing any developing job successfully. To learn more about address parsing, its mechanism, and the best address parser for your project, keep reading:
What is Address Parsing?
When you use the phrase Address Parsing or Address Parsing API, you are referring to the process of breaking down a string of text into separate address elements. Let us parse the following address for you:
1 A Queen St Apt 4 York PA 17404-1442
Although there are many others, here is one method of address parsing structures,
As you can see above, the address is parsed or broken down into the required or right data sets.
Why is Address Parsing Difficult for the Developers?
Everything starts with the method you use to write down an address. There are various ways to enter mailing addresses, and different programs handle it differently. And let us not forget about the countries and their postal standards! This is why regular expressions, the simplest solution, usually do not work.
Below are examples of different types of address formats:
|Monsieur Aisha-Pierre Rochefort
102, Boulevard Saint-Jean.
Montréal (Québec) QC H9S 4Z1, CA
|Jeremy Brown Martinson, Jr.
1 A Queen St Apt 4 York PA 17404-1442
|Ms Anne Williams
Finance and Accounting
2 Cavill Avenue
Surfers Paradise QLD 4217, Australia
|Mr Alex Smith
3B High Street
To overcome such obstacles, many companies opted out of using expensive and unique technologies like address parsers. However, there are persuasive techniques that can be used for smaller projects.
Why is Address Parsing a One-Step in the Process?
Address Parsing is not a single-step process. However, it is beneficial when talking about addressing verification and its outcomes! The steps start with the following:
- The user enters the address and initiates data capturing.
- This includes parsing, and the address is broken down into several components.
- Corrected spelling, abbreviation, and casing are applied to the parsed address components.
- The address data we return is beautifully parsed along with your standard address.
Uses of Address Parser
While parsed address data may be a fraction within the address verification and geocoding process, the information has many uses. Address parsing enables:
- Better address matching across datasets
- The creation of persistent unique identifiers
- More precise location data analysis
- How many customer addresses in a city have secondary addresses, such as apartments, suites, or subunits? It is easier to use more accurate information.
- Takes unstructured data and converts that information into usable information.
- Better address management as well as better address storage.
- Address parsing allows for normalization or standardization.
- It can be used for the de-duplication of redundant addresses within a system.
- As mentioned above, the address parser helps store address information in several components. Businesses can store parsed addresses rather than saving the actual address string.
Also Read: Why my Address is Not Valid?
What are the Typical Approaches for a Successful Address Parsing?
As a developer, we know that you might be worried about address parsing, normalizing, and standardizing real-time addresses. And…. we also know that the first thing that comes to your mind is regular expressions! This can further lead to the Dunning-Kruger Effect. In such a case, you have overestimated what you are capable of, especially compared to others that are competent.
A Dunning-Kruger effect is common, but addresses do not follow a regular pattern. The thousands of edge cases found in address data will be missed by any regular expression, regardless of the complexity.
Street addresses vary in format from one area to another. A regular expression may also fail when parsing them in the next town over, even if you have solved most edge cases.
As far as address parsing is concerned, there are two approaches you can follow.
- Your user or customer should provide the address as a component with separate fields for each value. The user will parse it for you, so you won’t have to. In my opinion, creating more consistent address data doesn’t provide the most optimal user experience.
- Addresses should be entered in a freeform style, which means the full address should be on one line.
Due to the lack of uniform entry control for address strings, there is a much higher possibility of differences between different addresses. An apartment number may be entered first by one user and after the street by another. Address parsing is done on various levels based on your requirements and details.
NPM Method for Address Parsing
Standardizing and Address Parsing API can also be done with NPM packages. There are usually certain formats or countries that specify them. The following examples illustrate the use of NPM libraries:
- parse-address-string: An address parser tends to target countries including US and Canada.
- australia-address-parser – npm: It performs in-depth analysis and parsing of Australian addresses.
- uk-clear-addressing – npm: An address parser tends to work for UK street addresses to get the house number, city, state, and more.
Libpostal — A NLP-Trained Address Parser Using Open Data
Mapzen developed Libpostal to initiate international address parsing using a lightweight C library. Machine learning distinguishes the Libpostal from other address parsers, as it is trained using millions of real-world addresses.
You can use the library directly or through bindings. Tech stacks like Python, Go, Java, NodeJS, and Ruby bindings are available for you as a developer. The Libpostal is released under the MIT license and is open source.
- It is smart, practical, and effective.
- Libpostal initiates address parsing based on location strings. Also, it understands expressions such as “restaurants, nearby, and in.
- It is an open-source solution that comes with a permissive license.
- It is necessary to install and support the C library.
- A trained data model for Libpostal must store in memory, so the application consumes about 4GB of space.
Online Validation or Address Parsing APIs
Verification of street addresses and postal addresses is known as address validation. There are 2 ways to verify an address: upfront, by searching for a piece of incorrect or incomplete information, or by address parsing, matching formatting, and cleansing information from a database against the authorized postal data.
There are, however, differences between address parsing and related services. A database will often be matched using rules approaches by address verification services. An address 1 to 150 is the start and end of Main Street in zip 98765, according to service. Based on logic, 987 Main St is a valid residential address, but may or may not be verified for delivery.
Latitude and longitude can also be provided as part of address parsing in some services. In many of these systems, latitude, and longitude are computed by logically splicing addresses in a block. Using lat/long for verified delivery is problematic for retailers, restaurants, and delivery companies. Using approximate data, a driver may not be able to locate you halfway down the block.
Data Capturing- For Optimal Address Parsing
At PostGrid, we work with many printers and delivery service providers for our clients. There are times when customers enter their address-related information on the official websites to initiate the delivery processes daily. And every day, hundreds of addresses are marked undeliverable- that must be corrected within the system. It is a waste of time provided that many address-parsing APIs or solutions can handle it efficiently.
Our team is optimizing the system APIs to standardize. Verify, and address parsing during entry. You can ensure the cleanliness of your data by doing that. Have the consumer agree to the correct delivery address on entry by presenting a standardized, verified address.
However, there are a few standards that you would like to see PostGrid use:
- CASS Certification (For US): A software application that corrects and matches street addresses is evaluated by the Coding Accuracy Support System (CASS). A CASS certification is available for all mail items, service providers, and third-party vendors who want to improve the accuracy of their five-digit coding, ZIP+4, carrier routes, and address-matching software.
- SERP Certification (For Canada): A postal certification is issued by Canada Post under its Software Evaluation and Recognition Program. You can test whether or not your mailing addresses are valid and that they are correct with SERP certification.
You might find this online parser and address validator tool useful if address parsing isn’t a daily need for your project. A PostGrid API for parsing addresses looks like this:
- Simply copy and paste the addresses in the text field, or upload a CSV, Excel, or Text file.
- The verification results can then be downloaded by tapping “Verify”.
- The addresses will be provided to you in the form of a CSV table with verified and parsed addresses.
Using this method will not work for large amounts of addresses since it is only suitable for small amounts of addresses. Developers must cite the sources of the results when using them: OpenStreetMap, OpenAddresses, and more.
Also Read: International Address Verification Services
Address Parsing Through Geocoding API
Geocoding API is one of the most powerful yet complicated to use. it helps multitasks a lot better than any other solution including address parsing, standardizing, and validation at a single time.
Using this method, the data is made clearer and more accurate, as well as checking whether the address is actually located. Consequently, you won’t have to deal with an address that doesn’t exist. Additional useful information is also available.
PostGrid’s Geocoding API assigns a confidence level to each location. There are several levels of clarity checked, and the program shows where the mistake is: a street name, a house number, or a city name.
The following data is generally returned from a PostGrid’s International Address verification API:
- The longitude and latitude coordinates of a suitable location;
- Normalization of postal addresses;
- The full address is standardized, including the address components;
- Each component’s level of confidence;
- We test and parse each address.
You should remember a few specifics when using the Geocoding API for developers. Obviously, non-confirmed addresses require additional logic. Additionally, APIs for large amounts of data are usually expensive.
Also Read: How To Write An Address International
Which is the Best Method to Choose for Address Parsing?
When working with address strings, we recommend the following algorithm:
- Use Geocoding APIs if you are trying to get the location of an address or normalize it.
- It is advisable to use RegEx in case all the addresses have identical formats and are regular.
- You can also use Libpostal to parse addresses if you can’t find a suitable NPM library.
These are some of the most common methods for parsing addresses. To settle down for the most suitable address parsing API, you must start by identifying your goals precisely and trying an integrating PostGrid’s address verification APIs.
However, When you want to parse bulk addresses or addresses from several cities or localities, it is best to go for the most versatile or innovative solution. An address parsing with geocoding can also help you deal with bulk datasets.
Frequently Asked Questions
If you are a developer, try using regex if the addresses have similar formatting or regular. Also, if you are about to parse complex addresses, you must use the NPM library or Libpostal. Lastly, you can choose PostGrid to parse addresses and initiate autocomplete processes to locate the same.
As we mentioned above, countless npm libraries can use to initiate address parsing. However, always verify the licenses before adding the package, order, or mail item to a commercial project.
The geocoding help you look for corresponding delivery destinations for the provided data. Additionally, it also does a lot more than just address parsing. Geocoding tries you understand the address, city, state, and street. So, you can easily get the standardized, normalized, and verified addresses as your outcome.
Address Normalization or standardization is said to be a process of formatting unstructured addresses depending on country mail standards. Also, this process replaces abbreviations with several traditional names or more.
The process of address parsing is said to be a small step within address verification or validation – and it is crucial.
It also helps clear up redundant addresses or information across various datasets. Furthermore, it also helps you create unique data analysis filters, identifiers, and more. This allows for address standardization, and can also help you clean your entire datasets followed by de-duplication.
Sign up if you want to initiate the address parsing process.