WLIA - 2015 Fall Regional, Oshkosh WI

An Approach to
Address Parsing
and Data
Standardization
Codie See
David Vogel
WLIA Fall Regional
Conference – Oshkosh, WI
October 2015

A short history of parsing
Wisconsin addresses at SCO…
• LinkWISCONSIN Address Point andParcel Mapping Project
- Built understanding of FGDC address standard
- Built understanding of Wisconsin Addresses
- Built a tool to handle this as flexibly as possible
• V1 Statewide Parcel Project
- Improved understandings
- Improved upon our parsing tool
…So we had a Wisconsin parsing tool but it was at its tipping
point….

… and then one day on
GitHub
Parserator – a python toolkit for making domain-
specific probabilistic parsers.
• Tendency-Based Parsing, not Rule-Based Parsing
• Trainable to a specific domain
• A flexible framework to build your own parser –
not just for addresses, but anything really!

Parserator - usaddress
usaddress - a child project built off of Parserator:
https://github.com/datamade/usaddress
• Impressive out of the box performance
• Embraces the FGDC endorsed - US Postal
Address Data Standard
• Which is well suited for NG9-1-1, and
adopted by the parcel initiative schema.

Rules … Tendencies
A typical parser will often be brute, adhering to
very discrete & specific classifications…
…how do we anticipate deviations from the norm?
Statistically-driven educated guesses, based on 3 concepts:
-Tokenizing the input 2554 | CTH | J
-Relative order of tokens 2554 | CTH | J
-Content of tokens 2554 | CTH | J

Rules … Tendencies
Statistically-driven educated guesses.

Training: Process Overview
Address Parsing Tool Uses:
• Trained CRFSUITE file (statistical portion of the parse) – Consumed by
Usaddress.py
• Hard coded expressions
• Regex for grid addresses
• Directionals

Tool is based on ~2000 addresses
(number of records in training data)
GOAL:
• Produce the best results with the least amount
of training data
 Focused on selecting addresses for the training data that accounted
for the greatest number of addresses across the state.
 Then shifted our focus to the more specific addresses or special
cases where we noticed issues occurring

Element Focused Training
 Created training files specific to
particular elements
 Street Types
 Unit Types & Unit IDs
 Address Number Suffixes
 Uncaught Street Names

Workflow of Training Process
After initially adding our
state specific training data,
we went through the data
provided with the library
and corrected issues that
were resulting in incorrect
parses.
**This was the most
time-consuming part of
developing this tool.

Wisconsin has 2.28+ million site
addresses associated with parcels!
-Tool does an impressive job flexibly parsing these addresses
-BUT: Not feasible to accommodate for all potential address
options
-Built in 4 Additional Flag fields to the output to help identify where errors or incorrect
parses may have occurred & what the issue may be
Flags include:
1. Parse Error Flag  (Id’s addresses parser was unable to parse)
2. Extraneous Data Flag  (Id’s data not commonly found in address elements)
3. Character Flag  (Id’s improper and uncommon special characters)
4. Incomplete Data Flag  (Id’s addresses that appear to be missing elements)

Other Tools:
XML PARSING TOOL
• Input: Directory of County DOR XML Files
• Converts DOR validated data to .dbf format
• Note: FMKV still needs to be joined after dbf creation
STANDARDIZE TOOL
• Efficient method for standardizing various attributes
• Leverages the InMemory workspace to preform the standardization quickly
• Developed for use with 1)Prefix 2)Street Type 3)Suffix
• Other Uses: School Districts, Class of Property, etc…
COMING SOON!!
• Condo Stack Tool
-Stack relationally related condos using common pins/join keys
-Estimated release Mid-November

Including:
• Tool
Download
• PDF Guide
• Instructional
Videos
http://www.sco.wisc.edu/images/stories/publications/V2/tools/

Suggestions & Questions?
Codie See - SCO
Project Coordinator
csee@wisc.edu
(608) 890-3793
Chris Scheele –SCO
GIS Technician
& tool’s developer
David Vogel - SCO
GIS Specialist
djvogel2@wisc.edu

WLIA - 2015 Fall Regional, Oshkosh WI

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Andere mochten auch

Andere mochten auch (20)

Ähnlich wie WLIA - 2015 Fall Regional, Oshkosh WI

Ähnlich wie WLIA - 2015 Fall Regional, Oshkosh WI (20)

Mehr von Wisconsin State Cartographer's Office

Mehr von Wisconsin State Cartographer's Office (10)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

WLIA - 2015 Fall Regional, Oshkosh WI

Hinweis der Redaktion