SlideShare a Scribd company logo
1 of 17
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
Powering
Real Estate
Property
Analytics
Gheni Abla
for DART, CoreLogic
June 21, 2017
1
MongoDB
+ Spark
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 Managing and storing data for real estate properties in MongoDB at CoreLogic®
 Distributing large-scale analytics processing using Spark
 Utilizing MongoDB replication for implementing
high-availability between two geographically
dispersed data centers
Learning Objectives
2
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
CoreLogic – Provider of Property Data,
Financial Data, Analytics and Services
3
* as of Feb 27, 2017
Market Cap:
$3.3
billion*
Operations:
8
countries
Employees:
6,000+
worldwide*
Principal Markets:
U.S. &
Australia
Property
Intelligence
Headquarters:
Irvine
CA
Risk Mgmt.
& Workflow
Principal Businesses:
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
Unique Insights and Reach
Across the Housing Ecosystem
4
RENTAL
PROPERTIES
REAL ESTATE MORTGAGE &
CAPITAL MARKETS
INSURANCE GOVERNMENT
Consumer
Experience
Rents an
Apartment
Decides to
Buy a House
Needs Financing
or Refinancing
Needs Insurance
& Makes Claims
Expects Regulatory
Protection
Clients
Property
Managers,
Property Owners
Realtors, Property
Information
Services,
Contractors
Lenders, Servicers,
Capital Markets,
GSEs
Insurance Carriers,
Re-Insurance
Government,
Regulators
Underwriting
Risk Management
Valuations
Market Intelligence
1 of 3 Rental
Properties
70% of Real Estate
Agents
3 Out of Every
4 Loans
70% of
Homeowner
Insurance Policies
Almost Every
Housing Regulator
CoreLogic Solutions
CoreLogic Reach
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
Property Information Differentiators
5
3,100+
counties
5,000+
data fields
Rapid
daily data refresh
99.5%
standard of
accuracy driven
by automated
keying
processes
ACCURACY DEPTH
OF DETAIL
99.9%
of U.S. property
records
BREADTH
OF COVERAGE
4.5B+
records spanning
more than
50 years
DEEP
PROPERTY DATA
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. 6
Complete. Current. Connected.
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 A new addition – not entire data warehouse
 Data for every real estate property in the US
 Location, address, zip, city,
county, characteristics, owner…
 Sale transactions history, loans, payments
 Computation results
 Estimated values
 Confidence scores
 Statistical distributions
Repository for Multiple Data Sets
MongoDB
7
ZIP2ZIP1
ZIP3 ZIP4
Property
Data
Property
Transactions
MLS Data
AVM
Build.
Permits
Payments
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 MongoDB over HDFS
 Used for both batch computation process and for serving data for real-time applications
 Majority of applications are read-heavy
 Updated daily, weekly or monthly
 More frequent analytics
 Schema-less
 Not all records are same – good support for storing sparse data
 Property information and disclosure rules are different state-by-state and county-by-county
 Data sets are getting richer everyday, but not at the same rate for every property
for Real Estate Property Data
MongoDB as a Repository
8
ZIP2ZIP1
ZIP3 ZIP4
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 Support for Replication
 Data replicated within data centers and across multiple data centers
 Support for parallel reads
 Support for Sharding
 Utilized for separating data among storage medium
 SSD for frequently accessed data and rotational disks for less frequently accessed data
for Real Estate Property Data
MongoDB as a Repository
9
ZIP4
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 Store latitude, longitude and
other geo data
 Search by location or area
(e.g polygon, circle)
 Geospatial operators used:
 $geoWithin
 $geoIntersects
 $near
 Specialized geospatial index
for fast search
Efficient Support for Geospatial Information
10
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 Multiple analytics processes depend on
MongoDB
 Examples: Residential Property Appraisal, Automatic
Home Valuation, Marketing and Propensity Models etc.
 Executed on daily, weekly or monthly
 Spark cluster used for distributed computation
 Computation is distributed by geographical entities
e.g: zip codes, counties, states
 Computation results are also stored in MongoDB
Computations and Analytics
11
Cluster
Master
Worker
ZIP2ZIP1
Worker Worker Worker
ZIP3 ZIP4
Data for ZIP2
Property
Data
Property
Transactions
MLS Data
AVM
Build.
Permits
Payments
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
Static Model
(Batch Process)
Comparable Select.
Market Price
Tier Calculations
Outlier Detection
 A software tool that help home
appraisers quickly complete their
assessments and produce higher-
quality appraisals
 Static Model(Batch Process)
 Calculates data required by the regression
model (market price, tiers, etc.)
 Dynamic Model(Model Service)
 Comparable property search
 Regression-based analytics
Two Components
Example: Appraisal Adjustment Model
12
Comps Stats
Property
Data
HPI
ZIPs
Database
Data Feed
Dynamic Model
(REST Web Service)
Comparable Search
Regression
Weekly Batch Job
Mobile
App
Service
Tier
Model Service
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 Spark - Cluster computing platform software
 Task scheduling
 Memory management
 Fault recovery
 Support for database access (including MongoDB)
 Support for Scala, Java, Python
 This model code is written in Scala
 MongoDB access is done via MongoDB-spark connector
 Good performance
 Direct dataset inference – MongoDB data as Spark DataFrames
 Computation performed:
 Hundreds of millions of regressions
 Hundreds of millions geo searches for comparable properties
 ~100 million properties analyzed and provided with key statistics
 ~10 hours to execute for all house properties in US
Runs via Spark
Static Model
13
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
Batch Processing
Static Model
14
Data Center
Replica Replica
Cluster
Master
Comps
Stats
Property
Data
HPI
ZIPs
Worker
Addrb ZIP2Addra ZIP1
Worker Worker Worker
Addrc ZIP3 Addrd ZIP4
Primary
ETL
Server
Data
Sources
Comps
Stats
Property
Data
HPI
ZIPs
Comps
Stats
Property
Data
HPI
ZIPs
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 RESTful API
 Request includes:
 Standardized address
 Neighborhood boundaries
 Regression with four independent variables to calculate adjusted price
 Number of beds, number of baths, building square feet, land square feet
 Response returns:
 Adjusted value of the house
 Comparable houses used in regression
 Statistics of the adjustments to measure the confidence
 Model code is written in Scala and data access to MongoDB is via Casbah
 Configured for multi datacenter redundancy
Service Tier
Dynamic Model
15
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
Dynamic Model
Dynamic Model
16
Data Center 1 Data Center 2
Replica Replica Replica ReplicaPrimary
Requests sent
to topologically
nearest
MongoDB server
Global Traffic Manager
Service Tier
Tomcat
Dynamic
Model
Dynamic
Model
Tomcat Tomcat
Dynamic
Model
Tomcat
Dynamic
Model
Comps
Stats
Property
Data
HPI
ZIPs
Comps
Stats
Property
Data
HPI
ZIPs
Comps
Stats
Property
Data
HPI
ZIPs
Comps
Stats
Property
Data
HPI
ZIPs
Comps
Stats
Property
Data
HPI
ZIPs
© 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.
 MongoDB provided powerful support for storing and searching location-based
real estate property data
 MongoDB’s replication capability provided high-availability across data centers
 MongoDB supports data needs of both batch-oriented distributed computation
and real-time web services
 Scala language, Casbah library and MongoDB-Spark connector facilitated
seamless integration between data access and analytics
Conclusion
17
ZIP2ZIP1
ZIP3 ZIP4

More Related Content

What's hot

Apollo hospitals
Apollo hospitalsApollo hospitals
Apollo hospitals
akshay8oct
 
PWOS - The new PPT for Clinics in India
PWOS - The new PPT for Clinics in India PWOS - The new PPT for Clinics in India
PWOS - The new PPT for Clinics in India
Navdeep Singh
 

What's hot (20)

Big data analytics in healthcare industry
Big data analytics in healthcare industryBig data analytics in healthcare industry
Big data analytics in healthcare industry
 
Digital Transformation Templates.ppt
Digital Transformation Templates.pptDigital Transformation Templates.ppt
Digital Transformation Templates.ppt
 
Cognizant PPT.pptx
Cognizant PPT.pptxCognizant PPT.pptx
Cognizant PPT.pptx
 
Shopify Investor Deck January 2016
Shopify Investor Deck January 2016Shopify Investor Deck January 2016
Shopify Investor Deck January 2016
 
Apollo hospitals
Apollo hospitalsApollo hospitals
Apollo hospitals
 
Data Analytics in Healthcare
Data Analytics in HealthcareData Analytics in Healthcare
Data Analytics in Healthcare
 
Pharma business models
Pharma business modelsPharma business models
Pharma business models
 
Pharmaceuticals - Industry Value Chain
Pharmaceuticals - Industry Value ChainPharmaceuticals - Industry Value Chain
Pharmaceuticals - Industry Value Chain
 
Epharma Business Plan
Epharma Business PlanEpharma Business Plan
Epharma Business Plan
 
Advancing Medical Device Interoperability (MDI)
Advancing Medical Device Interoperability (MDI)Advancing Medical Device Interoperability (MDI)
Advancing Medical Device Interoperability (MDI)
 
AI in Healthcare 2017
AI in Healthcare 2017AI in Healthcare 2017
AI in Healthcare 2017
 
Big Data and Advanced Analytics
Big Data and Advanced AnalyticsBig Data and Advanced Analytics
Big Data and Advanced Analytics
 
PWOS - The new PPT for Clinics in India
PWOS - The new PPT for Clinics in India PWOS - The new PPT for Clinics in India
PWOS - The new PPT for Clinics in India
 
Accenture Global Operations for R&D
Accenture Global Operations for R&DAccenture Global Operations for R&D
Accenture Global Operations for R&D
 
Blockchain in Health Care
Blockchain in Health CareBlockchain in Health Care
Blockchain in Health Care
 
eCommerce Business Model - Online Drugs Store ( Pharmacy)
eCommerce Business Model - Online Drugs Store ( Pharmacy) eCommerce Business Model - Online Drugs Store ( Pharmacy)
eCommerce Business Model - Online Drugs Store ( Pharmacy)
 
SAS Viya AI for digital banking
SAS Viya AI for digital bankingSAS Viya AI for digital banking
SAS Viya AI for digital banking
 
The Industrialist: Trends & Innovations - January 2024
The Industrialist: Trends & Innovations - January 2024The Industrialist: Trends & Innovations - January 2024
The Industrialist: Trends & Innovations - January 2024
 
PVcase $100M Series B pitch deck
PVcase $100M Series B pitch deckPVcase $100M Series B pitch deck
PVcase $100M Series B pitch deck
 
Big data in healthcare
Big data in healthcareBig data in healthcare
Big data in healthcare
 

Similar to Power Real Estate Property Analytics with MongoDB + Spark

Similar to Power Real Estate Property Analytics with MongoDB + Spark (20)

Securing Search Index with Searchable Encryption
Securing Search Index with Searchable EncryptionSecuring Search Index with Searchable Encryption
Securing Search Index with Searchable Encryption
 
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
From Mainframe to Microservices: Vanguard’s Move to the Cloud - ENT331 - re:I...
 
Machine Learning for z/OS
Machine Learning for z/OSMachine Learning for z/OS
Machine Learning for z/OS
 
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
Case Study: Sprinklr Uses Amazon EBS to Maximize Its NoSQL Deployment - DAT33...
 
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
MongoDB World 2018: Managing a Mission Critical eCommerce Application on Mong...
 
Webinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDBWebinar: Utilisations courantes de MongoDB
Webinar: Utilisations courantes de MongoDB
 
Red Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use CasesRed Hat Storage: Emerging Use Cases
Red Hat Storage: Emerging Use Cases
 
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoTApache Spark and Apache Ignite: Where Fast Data Meets the IoT
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
 
MongoDB - General Purpose Database
MongoDB - General Purpose DatabaseMongoDB - General Purpose Database
MongoDB - General Purpose Database
 
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoTApache Spark and Apache Ignite: Where Fast Data Meets IoT
Apache Spark and Apache Ignite: Where Fast Data Meets IoT
 
Servereless Jobs with AWS Lambda
Servereless Jobs with AWS LambdaServereless Jobs with AWS Lambda
Servereless Jobs with AWS Lambda
 
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demandsMongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
MongoDB .local Toronto 2019: MongoDB – Powering the new age data demands
 
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at NationwideDeploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
Deploying Enterprise Scale Deep Learning in Actuarial Modeling at Nationwide
 
Which Questions We Should Have
Which Questions We Should HaveWhich Questions We Should Have
Which Questions We Should Have
 
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
Mykola Murha  "Using Google Cloud Platform for creating of Big Data Analysis ...Mykola Murha  "Using Google Cloud Platform for creating of Big Data Analysis ...
Mykola Murha "Using Google Cloud Platform for creating of Big Data Analysis ...
 
Big Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's PerspectiveBig Data Learnings from a Vendor's Perspective
Big Data Learnings from a Vendor's Perspective
 
The new dominant companies are running on data
The new dominant companies are running on data The new dominant companies are running on data
The new dominant companies are running on data
 
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
HBaseCon2017 Splice Machine as a Service: Multi-tenant HBase using DCOS (Meso...
 
Operational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data StoresOperational Analytics Using Spark and NoSQL Data Stores
Operational Analytics Using Spark and NoSQL Data Stores
 

More from MongoDB

More from MongoDB (20)

MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDBMongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
MongoDB .local Paris 2020: Les bonnes pratiques pour sécuriser MongoDB
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Recently uploaded (20)

MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 

Power Real Estate Property Analytics with MongoDB + Spark

  • 1. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. Powering Real Estate Property Analytics Gheni Abla for DART, CoreLogic June 21, 2017 1 MongoDB + Spark
  • 2. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  Managing and storing data for real estate properties in MongoDB at CoreLogic®  Distributing large-scale analytics processing using Spark  Utilizing MongoDB replication for implementing high-availability between two geographically dispersed data centers Learning Objectives 2
  • 3. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. CoreLogic – Provider of Property Data, Financial Data, Analytics and Services 3 * as of Feb 27, 2017 Market Cap: $3.3 billion* Operations: 8 countries Employees: 6,000+ worldwide* Principal Markets: U.S. & Australia Property Intelligence Headquarters: Irvine CA Risk Mgmt. & Workflow Principal Businesses:
  • 4. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. Unique Insights and Reach Across the Housing Ecosystem 4 RENTAL PROPERTIES REAL ESTATE MORTGAGE & CAPITAL MARKETS INSURANCE GOVERNMENT Consumer Experience Rents an Apartment Decides to Buy a House Needs Financing or Refinancing Needs Insurance & Makes Claims Expects Regulatory Protection Clients Property Managers, Property Owners Realtors, Property Information Services, Contractors Lenders, Servicers, Capital Markets, GSEs Insurance Carriers, Re-Insurance Government, Regulators Underwriting Risk Management Valuations Market Intelligence 1 of 3 Rental Properties 70% of Real Estate Agents 3 Out of Every 4 Loans 70% of Homeowner Insurance Policies Almost Every Housing Regulator CoreLogic Solutions CoreLogic Reach
  • 5. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. Property Information Differentiators 5 3,100+ counties 5,000+ data fields Rapid daily data refresh 99.5% standard of accuracy driven by automated keying processes ACCURACY DEPTH OF DETAIL 99.9% of U.S. property records BREADTH OF COVERAGE 4.5B+ records spanning more than 50 years DEEP PROPERTY DATA
  • 6. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. 6 Complete. Current. Connected.
  • 7. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  A new addition – not entire data warehouse  Data for every real estate property in the US  Location, address, zip, city, county, characteristics, owner…  Sale transactions history, loans, payments  Computation results  Estimated values  Confidence scores  Statistical distributions Repository for Multiple Data Sets MongoDB 7 ZIP2ZIP1 ZIP3 ZIP4 Property Data Property Transactions MLS Data AVM Build. Permits Payments
  • 8. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  MongoDB over HDFS  Used for both batch computation process and for serving data for real-time applications  Majority of applications are read-heavy  Updated daily, weekly or monthly  More frequent analytics  Schema-less  Not all records are same – good support for storing sparse data  Property information and disclosure rules are different state-by-state and county-by-county  Data sets are getting richer everyday, but not at the same rate for every property for Real Estate Property Data MongoDB as a Repository 8 ZIP2ZIP1 ZIP3 ZIP4
  • 9. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  Support for Replication  Data replicated within data centers and across multiple data centers  Support for parallel reads  Support for Sharding  Utilized for separating data among storage medium  SSD for frequently accessed data and rotational disks for less frequently accessed data for Real Estate Property Data MongoDB as a Repository 9 ZIP4
  • 10. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  Store latitude, longitude and other geo data  Search by location or area (e.g polygon, circle)  Geospatial operators used:  $geoWithin  $geoIntersects  $near  Specialized geospatial index for fast search Efficient Support for Geospatial Information 10
  • 11. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  Multiple analytics processes depend on MongoDB  Examples: Residential Property Appraisal, Automatic Home Valuation, Marketing and Propensity Models etc.  Executed on daily, weekly or monthly  Spark cluster used for distributed computation  Computation is distributed by geographical entities e.g: zip codes, counties, states  Computation results are also stored in MongoDB Computations and Analytics 11 Cluster Master Worker ZIP2ZIP1 Worker Worker Worker ZIP3 ZIP4 Data for ZIP2 Property Data Property Transactions MLS Data AVM Build. Permits Payments
  • 12. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. Static Model (Batch Process) Comparable Select. Market Price Tier Calculations Outlier Detection  A software tool that help home appraisers quickly complete their assessments and produce higher- quality appraisals  Static Model(Batch Process)  Calculates data required by the regression model (market price, tiers, etc.)  Dynamic Model(Model Service)  Comparable property search  Regression-based analytics Two Components Example: Appraisal Adjustment Model 12 Comps Stats Property Data HPI ZIPs Database Data Feed Dynamic Model (REST Web Service) Comparable Search Regression Weekly Batch Job Mobile App Service Tier Model Service
  • 13. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  Spark - Cluster computing platform software  Task scheduling  Memory management  Fault recovery  Support for database access (including MongoDB)  Support for Scala, Java, Python  This model code is written in Scala  MongoDB access is done via MongoDB-spark connector  Good performance  Direct dataset inference – MongoDB data as Spark DataFrames  Computation performed:  Hundreds of millions of regressions  Hundreds of millions geo searches for comparable properties  ~100 million properties analyzed and provided with key statistics  ~10 hours to execute for all house properties in US Runs via Spark Static Model 13
  • 14. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. Batch Processing Static Model 14 Data Center Replica Replica Cluster Master Comps Stats Property Data HPI ZIPs Worker Addrb ZIP2Addra ZIP1 Worker Worker Worker Addrc ZIP3 Addrd ZIP4 Primary ETL Server Data Sources Comps Stats Property Data HPI ZIPs Comps Stats Property Data HPI ZIPs
  • 15. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  RESTful API  Request includes:  Standardized address  Neighborhood boundaries  Regression with four independent variables to calculate adjusted price  Number of beds, number of baths, building square feet, land square feet  Response returns:  Adjusted value of the house  Comparable houses used in regression  Statistics of the adjustments to measure the confidence  Model code is written in Scala and data access to MongoDB is via Casbah  Configured for multi datacenter redundancy Service Tier Dynamic Model 15
  • 16. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary. Dynamic Model Dynamic Model 16 Data Center 1 Data Center 2 Replica Replica Replica ReplicaPrimary Requests sent to topologically nearest MongoDB server Global Traffic Manager Service Tier Tomcat Dynamic Model Dynamic Model Tomcat Tomcat Dynamic Model Tomcat Dynamic Model Comps Stats Property Data HPI ZIPs Comps Stats Property Data HPI ZIPs Comps Stats Property Data HPI ZIPs Comps Stats Property Data HPI ZIPs Comps Stats Property Data HPI ZIPs
  • 17. © 2017 CoreLogic, Inc. [NYSE:CLGX] All Rights Reserved. Proprietary.  MongoDB provided powerful support for storing and searching location-based real estate property data  MongoDB’s replication capability provided high-availability across data centers  MongoDB supports data needs of both batch-oriented distributed computation and real-time web services  Scala language, Casbah library and MongoDB-Spark connector facilitated seamless integration between data access and analytics Conclusion 17 ZIP2ZIP1 ZIP3 ZIP4