Validating your data is a critical step in almost every workflow. Learn how to build FME workspaces to automatically detect and repair problems with attributes, geometry, and more, and how to build a portal to let end users perform data validation on demand. Plus, learn about new functionality in FME Server for detecting workspace failures.
2. AGENDA
1
2
3
4
Why do we validate data?
Indoor Mapping standards compliance
Validating CAD data
Validating topology
5 Automating validation workflows
3. Data validation means checking ...
● Single objects (geometry and attributes)
● Relationships between objects
● Completeness
● Correctness
● Standards compliance
4. Data validation means checking ...
● Schema or data model
● Attribute values and domains
● Geometry
● Topology and spatial relationships
● Networks
● And more
5. Venues worldwide are generating
indoor maps of their spaces for:
○ Space management / planning
○ Geolocating assets
○ Helping patrons navigate
Indoor
Mapping
6. Indoor Mapping Challenges
● Must integrate multiple sources to
produce an indoor map.
○ GeoJSON, Revit, IFC, CAD (Autodesk,
Bentley), Civil 3D, Esri Geodatabase,
databases, CityGML …
● Must transform inconsistent data.
● Must comply with specifications of the
indoor format, e.g. IMDF, HERE,
ArcGIS Indoors, IndoorGML.
○ Strict data models and explicit spatial
relationships.
● Venues constantly change, so maps
need to be updated automatically.
7. Tips for Validating Attributes
● Phone Numbers / UUID / Business Names:
○ AttributeValidator and regular expressions
(^+[0-9-]{10,15}$|^$)
● Hours of Opening – OSM Standard:
○ “24/7”, “Mo-Fr 08:30-20:00”
● Websites:
○ Regular expressions
^http://|^https://
○ HTTPCaller & HTTP Status Code
8. Useful Transformers for Validating Geometries
● GeometryValidator – pass only valid geometries.
● GeometryFilter – filter by geometry type and pass only
valid ones.
● SpatialFilter or SpatialRelator – ensure valid spatial
relationships.
○ Choosing the right spatial join transformer:
see the article fme.ly/byu
9. Automated IMDF Validation
A. Upload your IMDF data and get your
validation report. safe.com/imdf
or
A. Add an IMDFValidator transformer to your
workspace - available from FME Hub -
hub.safe.com
11. CAD Data
Key source of data updates for
many GIS departments.
● Very loose schemas or data models.
● Hard to impose a drawing standard on
contractors.
● Often more detail than is needed in
GIS.
12. ● Digital Submission Compliance
● Contractor CAD data added to GIS
● CAD standard
○ Standards Checker
○ Attribute Checker
○ Topology Checker
City of Kitchener
19. Transformers for Validating Connectivity
● NetworkTopologyCalculator for building
geometric networks (lines & junctions).
● SpatialFilter for identifying objects that are
supposed to connect, e.g. devices on lines.
● TopologyBuilder and PointOnLineOverlayer
for building connected features and identifying
missing junctions/devices.
20. Validating
Automatically
Tip: set up your data validation workflows
to run automatically.
● On a schedule, e.g. daily quality
control.
● In response to an event.
○ “Watch” a directory, FTP, Amazon S3
bucket …
○ Email.
○ Database triggers.
● As a web service.
● Self-serve drag-and-drop webpage
(or mobile app) that anyone on the
team can use.
24. Data Validation Resources
Improving Data Compliancy Using FME
City of Kitchener
CAD Data Validation using FME
Colonial Pipeline
Data Validation Victories: Tips for Better Data Quality
Safe Webinar
FME Extensive Usage Inside the Mapping Production
System
Natural Resources Canada
Creating & Validating IMDF
Knowledge Center
Ultimate Geospatial Data Validation Checklist
Safe Blog
IMDF Validator
Grabber: Technology is an amplifier or multiplier. Technology can amplify both good and poor processes. In the context of data quality - technology can amplify the benefits of working with high quality data in your work - like increased productivity. OR it can amplify the effects of poor data quality - frustrating users, poor decisions based on incorrect data.
Subject: I’m going to discuss some ideas on data validation and how FME can help with different validation tasks.
Message: Why are we talking about data validation and compliance - again. Well it’s important.
We’ve talked about data validation quite a bit in the past. Working with high quality, accurate data is more fun. You get to focus your time solving real problems rather than just cleaning data. FME has many tools to ensure your data is valid and complies to data standards used in your organisation.
Why is compliance important. Because:
Garbage in garbage out
Bad data wastes time and resources
No one has fun working with bad data
FME is a great tool for helping you to validate your data
We’ve had great customer presentations from FME users such as City of Kitchener, Natural Resources Canada on what they are doing with FME on data validation. We’ll present a summary of some of their thoughts here.
Data validation takes on many forms:
Are we validating a single object (self-intersection, attribute validation), or are we validating the relationship between objects(spatial contains, parent-child)
Completeness – Is the data complete, no missing mandatory fields. For example, if you’re creating indoor mapping data you need entrance
Correctness – do the data values meet the standard or data model that has been agreed on.
But there are also feature relationships -:
Data Model: Is there a table relationship that has to be met, parent-child
Data Model: Is the topology correct? Does an island in a lake touch a lake edge? Do county boundaries touch or overlap? Do devices sit on the lines of a utility network
Topology: Does a line form part of a network or is it disconnected. Do the network attributes also confirm to the network – for example for an electrical circuit, do all the circuitID’s match for a given circuit/ Does an 8” pipe connect to a 6” pipe?
Network Topology: Are their junctions at the same location (duplicates) creating topology errors
Why:
Indoor mapping is driven by “space management” and “indoor navigation”.
the “blue dot” disappears the moment you go inside - we need to move the blue dot inside venues
Navigation - allowing people to find their gate. Determining pedestrian choke points. Finding the best locations for revenue sources
90% of our time is spent indoors.
Who’s interested:
Airports/aviation
Conference centers
sports arenas, shoping malls / retail
Train stations)
Finance, real estate, property management (big on TRIRIGA)
Hospitals
Campuses, higher education
There are challenges around creating indoor data that we don’t generally see in other mapping applications:
Integrate: Often multiple data sources need to be combined.
Transform: Data suppliers are not used to sharing their data.
Revit used for design, but as-built models are rare
CAD data has very loose data schemas
Architectural data requirements do not align with indoor mapping data requirements
i.e. in CAD walls have inside & outside. Indoor just needs the centerline
Indoor needs explicit ‘entrances’ often not modeled in architecture.
Comply: Indoor formats have strict data models that don’t often match architectural. Also formats have explicit spatial relationships between objects.
Automate: Many venues change constantly -
airports have gates shared between domestic & international
Conference centers have dynamic layouts
So timeliness of data can be critical.
Keep your indoor maps in sync by setting up your workflow to run as new data arrives (FME Server).
*** You can use FME to do all of this! Integrate sources, transform it to meet requirements, automatically keep indoor maps synchronized. ***
Here are some examples of validating attribution typical in Indoor mapping scenarios. Note - this is non-spatial data. So FME isn’t restricted to validating spatial data
You can demo these in: 1.AttributeValidation.fmw
AttributeValidator is your go to resource for validating attributes. Whether it’s a domain list or regular expression match. i.e. phone numbers, UUID, valid Names. There are also external tools you can call. For example there is a python function for validating the version of of a UUID code.
Hours of Opening: Who’s been in one of the Exit Rooms? If you don’t check the closing hours carefully it’s easy to get left in their all night. OSM has a standard around opening times for enterprises. These need to be validated to ensure that the likes of Apple & Google maps show us the correct opening hours. There’s a great site for building Opening Hour example strings and also an API for working with opening hour that can be used to validate the opening hour strings . At Safe we’re not proud - if there is a great tool available that users can use in a workflow, we’ll give access to it!
Creating Opening Hours webpage
We can validate a URL in two ways.
Ensure that the URL is basically valid using regex - great website here.
Test to see if the website returns a result.
FME can do this with regular expressions and HTTPCaller
We have found that creating custom transformers that cover a single explicit validation test is the cleanest way to create validation workflows. We learnt this tip form our colleagues at NRCan, as we’ll see a little later
GeometryValidator
Many options for single object geometry data validation such as self-intersection, duplicate points etc.
GeometryFilter
Simply validate the geometry type. You can’t throw a point at a line feature class in a geodatabase.
SpatialFilter / SpatialRelator and the *Overlayer series of transformers - i.e. PointOnLineOverlayer
Validate the spatial relations between different objects - more on this later.
Choosing the appropriate spatial relationship tool in FME - great article here
Safe has made an IMDF Validator available to make it easy to validate your indoor IMDF datasets before they are submitted to Apple. This is an example of automating your data validation workflows. Go to link to show the tool. Just drag and drop your file.
There is also an IMDFValidator custom transformer you can include in your own FME workflows (available on FME Hub).
Demonstrate the Validator using the source dataset victoria.zip This is the results of the the Esri to IMDF tutorial. You can download and run the tutorial if you wish. The report results are in ./results/IMDFReport.html, if you don’t want to wait for the email with the link. No need to explain all the results.
IMDF Validator as been used by 40+ organisations for validating their IMDF datsets.
The IMDF specification requires the conformance to about 240 rules. We have created a series of custom transformers like the ValidateHours and CheckWebsite that are used for the validation of different objects - there are about 140 unique tests in the IMDF validator
Processing digital submissions of CAD data is often a key part of the data processing workflow in GIS departments.
Drawing compliance (Colonial Pipeline)
Data compliance (City of Kitchener)
MMCD (Master Municipal Construction Documents Association) in BC is an organisation trying to impose standards on data for municipalities to improve efficiency and accuracy of processing CAD data
Problems when taking delivery of CAD data:
Very loose standards & data models
Data suppliers:
Small contractors and architects with limited knowledge of GIS and more structured database data models
A lot of detail - example
There’s more to a CAD drawing than data - the frame and title blocks can also be validated - this is done by Colonial Pipeline
Quality GIS data is critical for CoK. Their Esri Enterprise Geodb is linked to other information systems such as: AMANDA, Cityworks, SAP, Stormwater Rate
Most GIS data is updated through contractor data. To improve efficiency and accuracy CoK developed a CAD standard that all contractors need to confirm too for data delivery. CofK defined a standard DWG template for AutoCAD Map 3D
All the validation is driven by excel spreadsheets that define the rules for the different validation steps.
Kitchener logo has a link to the full presentation by David
Planned updates to the digital submission compliance include :
Update the attribute & topology - they were created in 2013! - to use any new and improved FME functionality
Perhaps more use of FME Server
DEMO: Based on the CofK data validation
The Attribute Checker uses a csv file produced in Excel that identifies what object data field names should be attached to the entities on specific AutoCAD layers. Additional checks on each field can be performed – minimum and maximum character lengths, field data types, minimum and maximum number values, pick list restrictions, and whether the field is required to have a value or not.
Here the key transformer is the Joiner. Each entity on a layer is joined to each layer attribute in the CSV. Then the entities attributes and their values are compared against the list of ‘valid’ attributes and values retrieved form the CSV
Quick demo here - ..\demos\2 AttributeChecker
Concept is very simple - match the feature to a record in the spreadsheet. If there is a Join then the record is valid for that test. If there is no join then the feature is invalid.
. You can see that there is a custom transformer specifically for domain tests - in this case testing the MATERIAL domain. An example of building custom transformers for specific tests.
You could do this in AttributeValidator, but using a spreadsheet makes it a little easier to maintain the schema in the long run - for example if wanted to change the MATERIAL domain, you just have to edit the spreadsheet
Colonial Pipeline took this one step further and validated the entire CAD drawing including both the drawing space and the paper space (Frame, Titles etc.)
When we talk about topology there are three primary relationships we can look for:
Connectivity
Adjacency
Enclosure
Most utility networks have connectivity rules;
water main must connect to a smaller water main through a reducer
High side of a transformer can only connect to the primary conductors
Some networks, like hydrographic networks, also include enclosure rules
An island in a lake must be inside a lake boundary, but can’t touch the boundary
Topology Rules can be formalized in a feature catalog, i.e. the ISO 19110 feature catalog or can be define in a rule set in a database, such as the ArcPro Utility Network rules.
NRCan
Esri Utility Networks
NRCan - Natural Resources Canada CCCOT/CCMEO division is responsible for Canada’s national map,
NRCan use FME in a wide range of data production and validation tasks. All their data is described with a ISO 19110 compliant feature catalog. NRCan uses their feature catalogs to drive the validation process
Data validation uses the catalog to ensure attribute and topological compliance.
A bit more on the ISO 19110 feature catalog here
FME reads the feature catalog and then validates the data against the catalog rules. DatabaseJoiner is the key transformer for grabbing the correct rule for each feature being validated. They have built a series of custom transformers for each specific test. Catalog validations include:
Spatial relations validationDomain attribute validationProximity validationMinimal dimension validationSegmentation validationData clipping validation
You can see the similar pattern here. A catalog of your rules either in a spreadsheet or database, and then a specific custom transformer to validate that rule. This makes maintenance of your validation rules easier, if there is a comprehensive set of rules.
We’ve been working with our colleagues at Esri to build migration workspaces from the ArcGIS Geometric Network to Esri UN Asset Package.
Migrating to Utility Networks involves creating high fidelity devices from ‘simple’ devices in the original ArcGIS Geometric Network.
Success in a migration like this, or any other data migration, depends on understanding the quality of the source data. Garbage in garbage out. Validation will tell you:
Do you have to do clean-up before you can start the migration,
can the migration workflows include some clean-up?
FME has tools to help with these decisions. We’ve already looked at Attribute Validator for assessing attribute values and domains and some topology validation. FME also has tools you can use to check geometric network connectivity. This might include:
Validate the connectivity of lines - water lines, conductors
Validate devices sit on vertices on lines
Detect missing junctions - such as T’s or Taps
Check for duplicate device locations and duplicate vertices on lines
You might also have to check database relationships -
Check relationships - device to device unit tables
NetworkTopologyCalculator - builds a connected network and gives each network an ID. Very good for visualizing network inconsistencies
SpatialFilter - great for identifying objects that are supposed to connect but do not.
TopologyBuilder & PointOnAreaOverlayer can build lists of connected features at nodes/junctions which can be analysed to validate missing junctions, type of junction. A good example is a T connector. If a water pipe has a lateral line form a wMain, there should be a T-connector at that junction. Similarly a reducer at a node where the pipe diameter changes. Tap at an intersection of three conductors.
DEMO: TopologyValidation.fmw
Just building a workspace that validates a dataset is one way of automating your validation process. For example, AutoCAD Drawing standards (DWS) files include tools for validating layers, attribution etc. But processing drawings with DWS can still be a very manual process (Colonial Pipeline talks about this in one of their FME presentations). An FME workspace can encapsulate all your validation rules into one workflow. In addition, you can automate how those validation tasks are triggered using FME Server. This can take the form of directory or FTP site watchers, emailing data for validation, drag n’ drop.
Opportunity to mention Automations in terms of event-based workflows.
Opportunity to mention FME Data Express in terms of self-serve options (anyone can run your validation workspace on their mobile device, all they need to do is pass in the file they want to validate!)
Simple example here on the FME Server demos (link on image)
In conclusion:
Clean data is a key to working with data in today’s world here we have highly integrated data systems
Validating your data should be a key part of your data processes
FME can help for Attributes, geometries, topologies
Message repeat :
Why is compliance important. Because:
Garbage in garbage out
Bad data wastes time and resources
No one has fun working with bad data
FME is a great tool for helping you to validate your data
FME has all the tools you need to check every part of your datasets, no matter what the format. There are also tools for Repairing your data - but that is a topic for another day!
QA should be a part of EVERY WORKFLOW.
Call to action: Talk to the experts team for ideas or review some of these materials to find ideas on data validation workflows that work for you
Here are some references to the data validation stories that I’ve mentioned in this presentation + other resources
Kitchener: https://www.safe.com/presentation/improving-data-compliancy-using-fme/
Colonial: https://www.safe.com/presentation/cad-data-validation/
Safe: https://www.safe.com/webinars/data-validation-webinar/
NRCan: https://www.safe.com/presentation/nrcan-map-production-system/
IMDF: https://knowledge.safe.com/articles/73930/creating-and-validating-imdf-format-datasets.html
Safe: https://blog.safe.com/2014/11/data-quality-checklist
IMDF Validator: https://www.safe.com/free-tools/imdf-validator/
Other resources
Consortech https://www.safe.com/presentation/fme-validate-cad-file-submittals-utilities/
Consortech https://www.safe.com/presentation/improving-data-integration-and-quality-through-digital-uploads/
MCE https://www.safe.com/presentation/reporting-summary-information-of-spatial-datasets-and-non-compliance-issues-using-fme-workspaces/
Metria https://www.safe.com/presentation/automated-quality-controls-with-fme/
Whitestar https://www.safe.com/presentation/using-fme-to-compile-validate-and-maintain-a-4-million-oil-and-gas-well-database/