Geophy CTO Sander Mulders presented their Metadata platform at our March meetup at Skillsmatters' CodeNode. The talk was about how Geophy use Linked Data approaches to accelerate & improve the accuracy of real estate requirements such as valuations.
Sander talked about the thousands of data sources used, how they use RDF for data integration, how to construct features and metadata driven services using components such as Apache Kafka and Stardog.
3. GEOPHY
Automated Data Intake
Framework to consume
thousands of public and
proprietary sources.
Unified Semantic Database
One unified global ontology to
link and integrate every dataset.
Powerful Enrichment Models
Predictive models and
forecasting for new insights.
The Geophy Data Platform
4. GEOPHY Our Products
DATA ENRICHMENTDATA FUSION VALUATIONS
Geospatial, semantic & temporal
matching & enrichment
From semi- & unstructured to
fully integrated
Automated valuations using machine
learning for accuracy & speed
US CRE
EU CREEU Resi
Location
Quality
Market
Quality
Asset Quality
Global REIT
Asset Data
US & EU
Property Data
Document
Structuring
10. GEOPHY
How do we get from source to feature
How to construct the feature
11. GEOPHY
We would need some kind of service(s) to construct the feature
How to construct the feature
12. GEOPHY
Depending on the feature we need a combination of services all operating in a specific way
How to construct the feature
13. GEOPHY
Now imagine doing this for 1000’s of features…
● Each feature would have its own
engineering lifecycle including testing,
development and maintenance
● Most features might be discarded after
modelling results (feature reduction)
Feature * 1000
14. GEOPHY
We describe the way the services should run in the ontology itself:
it lives where the data lives!
Ontology to the rescue
15. GEOPHY
Service Definition
services:university_high_quality
rdf:type config:service ;
rdfs:comment "Service calculating a feature for the high quality universities near a building" ;
config:query """
prefix block
DELETE {?building ?definition_key ?oldvalue }
INSERT {?building ?definition_key ?value}
WHERE {
GRAPH/Service <Metadata> {
?component
meta:service services:university_high_quality ;
meta:formula ?formula ;
meta:key ?definition_key .
# …. #
}
# ….. #
# filter out the universities with high score #
# aggregate the score to average#
BIND(f:component(?formula,?university_count, quality_average_aggregated) AS ?value).
}
"""^^<http://geophy.io/ontologies/datatype#SPARQL> ;
Service Metadata
meta:parking_plot
rdf:type meta:component ;
meta:service services:university_high_quality ;
meta:key component:university_high_quality ;
meta:formula """
function component(university_count, quality_average_aggregated) {
/* javascript code calculating high quality university score */
switch(expression) {
case 0:
return 0;
case 1:
return … ;
default:
return … ;
}
}"""^^<http://geophy.io/ontologies/datatype#Javascript> ;
.
Example Ontology
16. GEOPHY
Since we don’t have control over the data sources, new data can come in at any time.
Data is updating continuously
21. GEOPHY
We have 1000’s of sources that are out of our control and data scientists asking for 100’s of features
Scaling Out
22. GEOPHY
We have 1000’s of sources that are out of our control and data scientists asking for 100’s of features
3 core principles
23. GEOPHY Got you thinking?
We are looking for people to join our team in
Delft, New York, London (or remote)
Software Engineers {Kafka - Java/Scala - Graph}
Ontologists
Data Scientists
Data Engineers