SlideShare ist ein Scribd-Unternehmen logo
1 von 46
James Kerr
Senior Solutions Architect
james.kerr@mongodb.com
Conquering Data
Proliferation
2
Part 2 In The Data Management Series
Data integration
Capture data changes
Engaging with your data
From Relational
To MongoDB
Conquering
Data Proliferation
Bulletproof
Data Management
ç
Ω
Part
1
Part
2
Part
3
3
Agenda
• Today's Problem
• Systems of Engagement
• Single View of…
• Changing Data
• Summary
Today's Problem
5
6
Result
• Data walled off in "silos"
• Can't get a complete picture
• Have to "swivel chair" system to system
• Hard to find new avenues to add value
• Frustrated ops
• Frustrated customers
7
Example
• 20+ million Veterans in the US today
• 250,000+ employees at Veterans Affairs
• $3.9 billion for IT in 2015 budget
• What happens when a Veteran has to change their
address with the VA?
• How does a doctor see a single view of a Veteran's
health record?
Systems of Engagement
9
Next Big Wave of Change
Today's Systems of Record were
yesterday's Systems of Engagement!
Enterprise IT Transition From
• Systems of Record
To the Next Stage
• Systems of Engagement
10
Definition
• Incorporate technologies which encourage peer
interactions
• More decentralized
• More options for infrastructure especially cloud
• Enable new / faster interactions
11
Notional Architecture
Systems of Engagement
DataServices Data Processing
Integration,
Analytics, etc.
Systems of Record
Master Data
Raw Data
Integrated Data
…
12
Many Complexities to Tackle
• Data Extraction (ETL)
• Change Data Capture (CDC)
• Data Governance
• Data Lineage
– Versioning
– Merging changes
• Security / Entitlements
13
Focus for Today
• Data Extraction (ETL)
• Change Data Capture (CDC)
• Data Governance
• Data Lineage
– Versioning
– Merging changes
• Security / Entitlements
Getting Started
15
Don't Boil the Ocean
• Information is often spread across multiple systems of
record
• Start with a read-only view of that information
• Target high value/impact data – "moments of
engagement"
16
Example – Single View of a Health Record
• Veteran's view
• Doctor's view
• Case worker's view
17
Single View Architecture
Systems of Engagement
DataServices
Data Processing
Integration,
Analytics, etc.
Systems of Record
Master Data
Raw Data
Integrated Data
…
ETL
record
record
18
• Dynamic schema
• Rich querying
• Aggregation framework
• High scale/performance
• Auto-sharding
• Map-reduce capability (Native MR or Hadoop Connector)
• Enterprise Security Features
Single View – Why MongoDB?
19
Systems of Record Data Model
• Continuity of Care (CCR) XML docs
• Pulled some examples from
http://googlehealthsamples.googlecode.com/svn/trunk/CCR_samples
...
<Immunizations>
<Immunization>
<CCRDataObjectID>BB0022</CCRDataObjectID>
<DateTime>
<Type>
<Text>Start date</Text>
</Type>
<ExactDateTime>1998-06-13T05:00:00Z</ExactDateTime>
</DateTime>
<Source>
<Actor>
<ActorID>Jane Smith</ActorID>
<ActorRole>
<Text>Ordering clinician</Text>
</ActorRole>
</Actor>
</Source>
...
20
Systems of Record Data Model
...
<Medications>
<Medication>
<CCRDataObjectID>52</CCRDataObjectID>
<DateTime>
<Type>
<Text>Prescription Date</Text>
</Type>
<ExactDateTime>2007-03-09T12:00:00Z</ExactDateTime>
</DateTime>
<Type>
<Text>Medication</Text>
</Type>
<Source>
<Actor>
<ActorID>Rx History Supplier</ActorID>
</Actor>
</Source>
<Product>
<ProductName>
<Text>TIZANIDINE HCL 4 MG TABLET TEV</Text>
<Code>
<Value>-1</Value>
<CodingSystem>omi-coding</CodingSystem>
<Version>2005</Version>
...
21
Engagement Data Model
• Leverage dynamic schema / flexible data model
• Use an envelope/wrapper pattern
Source Data
Master Data /
Common Data Model
Metadata
Integrated Data
Metadata
22
Data Flow
1. Read most recent CCRs from each source system
2. Create a source document for each CCR in our system
of engagement database
1. Transform XML to JSON for the source data
2. Record the system and date in the metadata
3. Pull out the patient's identifying information to the
common data
4. Generate an Id for the raw file
3. Store the original CCR XML into GridFS
4. After each source document is created, update the
integrated document for the patient
23
Engagement Data Model - Metadata
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
meta : {
system : "EHR1",
lastUpdate : ISODate(...)
...
},
common : { ... },
source : { ... }
raw_id : "..."
}
24
Engagement Data Model - Source
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
...
source : {
...
Immunizations : {
Immunization : {
CCRDataObjectID :"BB0022",
DateTime : {
Type : {
Text :"Start date"
},
ExactDateTime :"1998-06-13T05:00:00Z"
},
Source : {
Actor : {
ActorID :"Jane Smith",
ActorRole : {
Text :"Ordering clinician"
}
}
},
...
},
...
}
25
Engagement Data Model - Common
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
...
common : {
patient : "D6E5D510-592D-C613-DB46..."
},
...
}
26
Engagement Data Model - Integrated
{
_id : ObjectId("556b92b83f7e775b8e92b30d"),
...
meta : {
lastUpdate : ISODate(...)
integrated : [
{ _id : ObjectId("...a"),
{ _id : ObjectId("...b")
]
},
common : { ... }
...
}
27
Engagement Data Model - Integrated
{
_id : ObjectId("556b92b83f7e775b8e92b30d"),
...
common : {
patient : "D6E5D510-592D-C613-DB46...",
CCRs : [
{
...
Medication : {
Product : {
ProductName :
"TIZANIDINE HCL 4 MG TABLET TEV"
}
}
...
},
{
...
Immunizations : { ... },
...
}
]
}
...
}
Engage!
29
Single View Enables New Interactions
• Deliver faster
• Deliver to new applications (mobile, etc.)
• Improve services
• New analytics
30
Changing Data
• Now that data is easy to get to, users will want to make
changes
• With single view, can change data in the source systems
of record
• Remember the change of address scenario?
31
Example – Change of Address
• Enter in different systems
• Call different parts of the organization
• What if you have dependents that
live with you?
32
Capture Data Changes
Systems of Engagement
DataServices
Data Processing
Integration,
Analytics, etc.
Systems of Record
Master Data
Raw Data
Integrated Data
…
ETL
Bus
Apache Kafka
record
record
record
33
Engagement Data Model - Metadata
{
_id : ObjectId("556c1122c9c8f48313553be5"),
meta : {
system : "PatientRecords",
lastUpdate : ISODate(...),
version : 2,
lineage : { ... },
...
},
common : { ... },
source : { ... }
}
34
Engagement Data Model - Source
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
...
source : {
patientId : "D6E5D510-592D-C613-DB46..."
address1 : "John Smith",
address2 : null,
city : "New York",
state : "NY",
zip : "10007"
},
...
}
35
Engagement Data Model - Common
{
_id : ObjectId("556b92b83f7e775b8e92b30a"),
...
common : {
patient : "D6E5D510-592D-C613-DB46...",
address : {
addr1 : "John Smith",
city : "New York",
state : "NY",
zip : "10007"
}
},
...
}
36
Systems of Record Data Model
• Address records can be in different systems
• Each system can be notified of the change to the record
37
Update Process
1. User accesses an application to change their address
2. User updates their address in the System of
Engagement
3. The address change is broadcast to any Systems of
Record that have registered
4. An adapter applies the address change to the System
of Record in an application-specific manor
38
Tracking Changes
• Add basic document versioning to track what changed
when
• Prefer the separate "current" and "history" collections
approach
– current contains the last updated version
– history contains all previous versions
• Can query history to see the lineage
39
Engagement Data Model – Current
{
_id : ObjectId("556c1122c9c8f48313553be5"),
meta : {
system : "PatientRecords",
lastUpdate : ISODate(...),
version : 2,
lineage : {
event : "update",
source : "ProfileApp",
},
...
},
...
}
40
Engagement Data Model - History
{
_id : { ObjectId(...), v : 1 },
meta : {
system : "PatientRecords",
lastUpdate : ISODate(...),
version : 1,
lineage : {
event : "update",
source : "PatientRecords",
},
...
},
...
}
41
Result – New Possibilities
• Change address in one place!
• Other value-add processes can be triggered by changes
• Example: Automated outreach
– heath and benefits centers in new location
– help moving
• Extend address change to Veteran’s dependents
Next Steps
43
Keep going
• Keep adding valuable processes to improve or provide
new services
• Phase out legacy if desired
– Part 1 – From Relational to MongoDB
• Improve data governance
– Part 3 – Bulletproof Data Management
• Reduce costs
44
• Systems of Engagement give users new ways to interact
with data
• You can start small and add value quickly
• MongoDB enables Systems of Engagement
– Dynamic schema
– Fast, flexible querying, analysis, & aggregation
– High performance
– Scalable
– Secure
Summary
45
• Systems of Engagement and the Future of Enterprise IT:
A Sea Change in Enterprise IT
http://www.aiim.org/futurehistory
• Systems of Engagement & the Enterprise
http://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/
• Geoffrey Moore - The Future of Enterprise IT
http://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of-
enterprise-it
• Ask Asya
http://askasya.com/post/trackversions
http://askasya.com/post/revisitversions
References
Questions?
james.kerr@mongodb.com

Weitere ähnliche Inhalte

Was ist angesagt?

The METL Process in Investment Banking
The METL Process in Investment BankingThe METL Process in Investment Banking
The METL Process in Investment Banking
Antony Benzing
 
DASIA2009 Yggdrasyll
DASIA2009 YggdrasyllDASIA2009 Yggdrasyll
DASIA2009 Yggdrasyll
Hans de Wolf
 

Was ist angesagt? (18)

MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
Multi-model database
Multi-model databaseMulti-model database
Multi-model database
 
Data Warehousing AWS 12345
Data Warehousing AWS 12345Data Warehousing AWS 12345
Data Warehousing AWS 12345
 
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREEA ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
A ROBUST APPROACH FOR DATA CLEANING USED BY DECISION TREE
 
Jboss Teiid - The data you have on the place you need
Jboss Teiid - The data you have on the place you needJboss Teiid - The data you have on the place you need
Jboss Teiid - The data you have on the place you need
 
IRJET- Transaction of Healthcare Records using Blockchain
IRJET-  	  Transaction of Healthcare Records using BlockchainIRJET-  	  Transaction of Healthcare Records using Blockchain
IRJET- Transaction of Healthcare Records using Blockchain
 
Big Data Technology Insights
Big Data Technology InsightsBig Data Technology Insights
Big Data Technology Insights
 
Introduction to data mining and data warehousing
Introduction to data mining and data warehousingIntroduction to data mining and data warehousing
Introduction to data mining and data warehousing
 
001 More introduction to big data analytics
001   More introduction to big data analytics001   More introduction to big data analytics
001 More introduction to big data analytics
 
J0212065068
J0212065068J0212065068
J0212065068
 
E045026031
E045026031E045026031
E045026031
 
Lecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data WarehouseLecture 04 - Granularity in the Data Warehouse
Lecture 04 - Granularity in the Data Warehouse
 
Data masking techniques for Insurance
Data masking techniques for InsuranceData masking techniques for Insurance
Data masking techniques for Insurance
 
The METL Process in Investment Banking
The METL Process in Investment BankingThe METL Process in Investment Banking
The METL Process in Investment Banking
 
CXAIR for Data Migration
CXAIR for Data MigrationCXAIR for Data Migration
CXAIR for Data Migration
 
DASIA2009 Yggdrasyll
DASIA2009 YggdrasyllDASIA2009 Yggdrasyll
DASIA2009 Yggdrasyll
 
Data warehousing and machine learning primer
Data warehousing and machine learning primerData warehousing and machine learning primer
Data warehousing and machine learning primer
 

Andere mochten auch

Andere mochten auch (8)

MongoDB at Scale!
MongoDB at Scale!MongoDB at Scale!
MongoDB at Scale!
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
MongoDB in the Big Data Landscape
MongoDB in the Big Data LandscapeMongoDB in the Big Data Landscape
MongoDB in the Big Data Landscape
 
Modern Databases for Modern Application Architectures: The Next Wave of Desig...
Modern Databases for Modern Application Architectures: The Next Wave of Desig...Modern Databases for Modern Application Architectures: The Next Wave of Desig...
Modern Databases for Modern Application Architectures: The Next Wave of Desig...
 
An Introduction to the ArchiMate 3.0 Specification
An Introduction to the ArchiMate 3.0 SpecificationAn Introduction to the ArchiMate 3.0 Specification
An Introduction to the ArchiMate 3.0 Specification
 
MongoDB Administration 101
MongoDB Administration 101MongoDB Administration 101
MongoDB Administration 101
 
Using MongoDB with Hadoop & Spark
Using MongoDB with Hadoop & SparkUsing MongoDB with Hadoop & Spark
Using MongoDB with Hadoop & Spark
 
Back to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQLBack to Basics Webinar 1: Introduction to NoSQL
Back to Basics Webinar 1: Introduction to NoSQL
 

Ähnlich wie Painting the Future of Big Data with Apache Spark and MongoDB

BizDataX White paper Test Data Management
BizDataX White paper Test Data ManagementBizDataX White paper Test Data Management
BizDataX White paper Test Data Management
Dragan Kinkela
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)
Denodo
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
Shiyong Lu
 

Ähnlich wie Painting the Future of Big Data with Apache Spark and MongoDB (20)

Data Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data ProliferationData Management 2: Conquering Data Proliferation
Data Management 2: Conquering Data Proliferation
 
BizDataX White paper Test Data Management
BizDataX White paper Test Data ManagementBizDataX White paper Test Data Management
BizDataX White paper Test Data Management
 
3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt3._DWH_Architecture__Components.ppt
3._DWH_Architecture__Components.ppt
 
Why Data Virtualization? An Introduction
Why Data Virtualization? An IntroductionWhy Data Virtualization? An Introduction
Why Data Virtualization? An Introduction
 
Solving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDBSolving the Disconnected Data Problem in Healthcare Using MongoDB
Solving the Disconnected Data Problem in Healthcare Using MongoDB
 
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
Quicker Insights and Sustainable Business Agility Powered By Data Virtualizat...
 
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
Why Your Data Science Architecture Should Include a Data Virtualization Tool ...
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Microsoft Azure Big Data Analytics
Microsoft Azure Big Data AnalyticsMicrosoft Azure Big Data Analytics
Microsoft Azure Big Data Analytics
 
A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)A Key to Real-time Insights in a Post-COVID World (ASEAN)
A Key to Real-time Insights in a Post-COVID World (ASEAN)
 
How MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare TechnologyHow MongoDB is Transforming Healthcare Technology
How MongoDB is Transforming Healthcare Technology
 
Why Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by DenodoWhy Data Virtualization? An Introduction by Denodo
Why Data Virtualization? An Introduction by Denodo
 
3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio3 Reasons Data Virtualization Matters in Your Portfolio
3 Reasons Data Virtualization Matters in Your Portfolio
 
Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)Data Services and the Modern Data Ecosystem (Middle East)
Data Services and the Modern Data Ecosystem (Middle East)
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)A Logical Architecture is Always a Flexible Architecture (ASEAN)
A Logical Architecture is Always a Flexible Architecture (ASEAN)
 
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data FabricUsing a Semantic and Graph-based Data Catalog in a Modern Data Fabric
Using a Semantic and Graph-based Data Catalog in a Modern Data Fabric
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?Denodo Platform 7.0: What's New?
Denodo Platform 7.0: What's New?
 

Mehr von MongoDB

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

Painting the Future of Big Data with Apache Spark and MongoDB

  • 1. James Kerr Senior Solutions Architect james.kerr@mongodb.com Conquering Data Proliferation
  • 2. 2 Part 2 In The Data Management Series Data integration Capture data changes Engaging with your data From Relational To MongoDB Conquering Data Proliferation Bulletproof Data Management ç Ω Part 1 Part 2 Part 3
  • 3. 3 Agenda • Today's Problem • Systems of Engagement • Single View of… • Changing Data • Summary
  • 5. 5
  • 6. 6 Result • Data walled off in "silos" • Can't get a complete picture • Have to "swivel chair" system to system • Hard to find new avenues to add value • Frustrated ops • Frustrated customers
  • 7. 7 Example • 20+ million Veterans in the US today • 250,000+ employees at Veterans Affairs • $3.9 billion for IT in 2015 budget • What happens when a Veteran has to change their address with the VA? • How does a doctor see a single view of a Veteran's health record?
  • 9. 9 Next Big Wave of Change Today's Systems of Record were yesterday's Systems of Engagement! Enterprise IT Transition From • Systems of Record To the Next Stage • Systems of Engagement
  • 10. 10 Definition • Incorporate technologies which encourage peer interactions • More decentralized • More options for infrastructure especially cloud • Enable new / faster interactions
  • 11. 11 Notional Architecture Systems of Engagement DataServices Data Processing Integration, Analytics, etc. Systems of Record Master Data Raw Data Integrated Data …
  • 12. 12 Many Complexities to Tackle • Data Extraction (ETL) • Change Data Capture (CDC) • Data Governance • Data Lineage – Versioning – Merging changes • Security / Entitlements
  • 13. 13 Focus for Today • Data Extraction (ETL) • Change Data Capture (CDC) • Data Governance • Data Lineage – Versioning – Merging changes • Security / Entitlements
  • 15. 15 Don't Boil the Ocean • Information is often spread across multiple systems of record • Start with a read-only view of that information • Target high value/impact data – "moments of engagement"
  • 16. 16 Example – Single View of a Health Record • Veteran's view • Doctor's view • Case worker's view
  • 17. 17 Single View Architecture Systems of Engagement DataServices Data Processing Integration, Analytics, etc. Systems of Record Master Data Raw Data Integrated Data … ETL record record
  • 18. 18 • Dynamic schema • Rich querying • Aggregation framework • High scale/performance • Auto-sharding • Map-reduce capability (Native MR or Hadoop Connector) • Enterprise Security Features Single View – Why MongoDB?
  • 19. 19 Systems of Record Data Model • Continuity of Care (CCR) XML docs • Pulled some examples from http://googlehealthsamples.googlecode.com/svn/trunk/CCR_samples ... <Immunizations> <Immunization> <CCRDataObjectID>BB0022</CCRDataObjectID> <DateTime> <Type> <Text>Start date</Text> </Type> <ExactDateTime>1998-06-13T05:00:00Z</ExactDateTime> </DateTime> <Source> <Actor> <ActorID>Jane Smith</ActorID> <ActorRole> <Text>Ordering clinician</Text> </ActorRole> </Actor> </Source> ...
  • 20. 20 Systems of Record Data Model ... <Medications> <Medication> <CCRDataObjectID>52</CCRDataObjectID> <DateTime> <Type> <Text>Prescription Date</Text> </Type> <ExactDateTime>2007-03-09T12:00:00Z</ExactDateTime> </DateTime> <Type> <Text>Medication</Text> </Type> <Source> <Actor> <ActorID>Rx History Supplier</ActorID> </Actor> </Source> <Product> <ProductName> <Text>TIZANIDINE HCL 4 MG TABLET TEV</Text> <Code> <Value>-1</Value> <CodingSystem>omi-coding</CodingSystem> <Version>2005</Version> ...
  • 21. 21 Engagement Data Model • Leverage dynamic schema / flexible data model • Use an envelope/wrapper pattern Source Data Master Data / Common Data Model Metadata Integrated Data Metadata
  • 22. 22 Data Flow 1. Read most recent CCRs from each source system 2. Create a source document for each CCR in our system of engagement database 1. Transform XML to JSON for the source data 2. Record the system and date in the metadata 3. Pull out the patient's identifying information to the common data 4. Generate an Id for the raw file 3. Store the original CCR XML into GridFS 4. After each source document is created, update the integrated document for the patient
  • 23. 23 Engagement Data Model - Metadata { _id : ObjectId("556b92b83f7e775b8e92b30a"), meta : { system : "EHR1", lastUpdate : ISODate(...) ... }, common : { ... }, source : { ... } raw_id : "..." }
  • 24. 24 Engagement Data Model - Source { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... source : { ... Immunizations : { Immunization : { CCRDataObjectID :"BB0022", DateTime : { Type : { Text :"Start date" }, ExactDateTime :"1998-06-13T05:00:00Z" }, Source : { Actor : { ActorID :"Jane Smith", ActorRole : { Text :"Ordering clinician" } } }, ... }, ... }
  • 25. 25 Engagement Data Model - Common { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... common : { patient : "D6E5D510-592D-C613-DB46..." }, ... }
  • 26. 26 Engagement Data Model - Integrated { _id : ObjectId("556b92b83f7e775b8e92b30d"), ... meta : { lastUpdate : ISODate(...) integrated : [ { _id : ObjectId("...a"), { _id : ObjectId("...b") ] }, common : { ... } ... }
  • 27. 27 Engagement Data Model - Integrated { _id : ObjectId("556b92b83f7e775b8e92b30d"), ... common : { patient : "D6E5D510-592D-C613-DB46...", CCRs : [ { ... Medication : { Product : { ProductName : "TIZANIDINE HCL 4 MG TABLET TEV" } } ... }, { ... Immunizations : { ... }, ... } ] } ... }
  • 29. 29 Single View Enables New Interactions • Deliver faster • Deliver to new applications (mobile, etc.) • Improve services • New analytics
  • 30. 30 Changing Data • Now that data is easy to get to, users will want to make changes • With single view, can change data in the source systems of record • Remember the change of address scenario?
  • 31. 31 Example – Change of Address • Enter in different systems • Call different parts of the organization • What if you have dependents that live with you?
  • 32. 32 Capture Data Changes Systems of Engagement DataServices Data Processing Integration, Analytics, etc. Systems of Record Master Data Raw Data Integrated Data … ETL Bus Apache Kafka record record record
  • 33. 33 Engagement Data Model - Metadata { _id : ObjectId("556c1122c9c8f48313553be5"), meta : { system : "PatientRecords", lastUpdate : ISODate(...), version : 2, lineage : { ... }, ... }, common : { ... }, source : { ... } }
  • 34. 34 Engagement Data Model - Source { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... source : { patientId : "D6E5D510-592D-C613-DB46..." address1 : "John Smith", address2 : null, city : "New York", state : "NY", zip : "10007" }, ... }
  • 35. 35 Engagement Data Model - Common { _id : ObjectId("556b92b83f7e775b8e92b30a"), ... common : { patient : "D6E5D510-592D-C613-DB46...", address : { addr1 : "John Smith", city : "New York", state : "NY", zip : "10007" } }, ... }
  • 36. 36 Systems of Record Data Model • Address records can be in different systems • Each system can be notified of the change to the record
  • 37. 37 Update Process 1. User accesses an application to change their address 2. User updates their address in the System of Engagement 3. The address change is broadcast to any Systems of Record that have registered 4. An adapter applies the address change to the System of Record in an application-specific manor
  • 38. 38 Tracking Changes • Add basic document versioning to track what changed when • Prefer the separate "current" and "history" collections approach – current contains the last updated version – history contains all previous versions • Can query history to see the lineage
  • 39. 39 Engagement Data Model – Current { _id : ObjectId("556c1122c9c8f48313553be5"), meta : { system : "PatientRecords", lastUpdate : ISODate(...), version : 2, lineage : { event : "update", source : "ProfileApp", }, ... }, ... }
  • 40. 40 Engagement Data Model - History { _id : { ObjectId(...), v : 1 }, meta : { system : "PatientRecords", lastUpdate : ISODate(...), version : 1, lineage : { event : "update", source : "PatientRecords", }, ... }, ... }
  • 41. 41 Result – New Possibilities • Change address in one place! • Other value-add processes can be triggered by changes • Example: Automated outreach – heath and benefits centers in new location – help moving • Extend address change to Veteran’s dependents
  • 43. 43 Keep going • Keep adding valuable processes to improve or provide new services • Phase out legacy if desired – Part 1 – From Relational to MongoDB • Improve data governance – Part 3 – Bulletproof Data Management • Reduce costs
  • 44. 44 • Systems of Engagement give users new ways to interact with data • You can start small and add value quickly • MongoDB enables Systems of Engagement – Dynamic schema – Fast, flexible querying, analysis, & aggregation – High performance – Scalable – Secure Summary
  • 45. 45 • Systems of Engagement and the Future of Enterprise IT: A Sea Change in Enterprise IT http://www.aiim.org/futurehistory • Systems of Engagement & the Enterprise http://www-01.ibm.com/software/ebusiness/jstart/systemsofengagement/ • Geoffrey Moore - The Future of Enterprise IT http://www.slideshare.net/SAPanalytics/geoffrey-moore-the-future-of- enterprise-it • Ask Asya http://askasya.com/post/trackversions http://askasya.com/post/revisitversions References

Hinweis der Redaktion

  1. Hello and welcome to Conquering Data Proliferation, the 2nd talk in our 3 part data management prototype to production series today. My name is James Kerr and I'm a Solutions Architect here at MongoDB. I've been with the company about 2 and a half years now and have been in the NoSQL space building large scale distributed databases for the last 9 years or so. I work primarily with US government agencies building things on MongoDB but I do work with our commercial customers now and again as well.
  2. As I said, this is part 2 in out 3 part data management path to production series. Hopefully you caught Jay's talk on migrating from relational to mongodb. In this talk I'll cover one type of system you could possibly migrate your relational databases to. I'll talk about what's being called systems of engagement (as opposed to systems of record) and how to get your data to and from that system. Be sure to catch the 3rd part of the series where Buzz will talk about some clever ways of tackling some of the data governance and quality issues we face when building these types of systems
  3. The title of this talk is "Conquering Data Proliferation" which is a pretty far reaching topic. But the fact is that this is a problem that most enterprises, big and small, are facing today.
  4. Today's enterprise systems are large and complex. There are often a number of systems of record that are used to run a company's or agency's core business or mission. A large investment has been made in these systems of record but most of them were built as monolithic applications and therefore, they don't talk to each other and information is walled off in silos Usually these systems were built or purchased to handle different aspects of a company's core business so you end up with different parts of each business object for the different lines of business (whether that be a customer, veteran, product or whatever your business is) spread across and duplicated by many systems
  5. Cannot link data together across Systems of Record Systems of Record not designed to have end users interact with them
  6. What happens when a Veteran has to change their address with the VA? Veteran has to update different parts of the organization manually May be propagated to other internal systems (or not) What happens if an address is not up to date? Benefits mailed to the wrong address (delay or maybe worse) in service How does a doctor see a single view of a Veteran's health record? Next to impossible right now VA has efforts underway to address (Political issues outside the scope of this talk) They need a way and are in fact making efforts to simplify basic functions such as this
  7. So how do we tackle this? Systems of Engagement is a concept was introduced by a fellow by the name of Geoffrey Moore a few years back. - NEXT
  8. He said that Systems of Engagement are the next big wave of change in Enterprise IT. This is essentially the transition from our existing, mostly passive, systems of record to connected systems of engagement that are more active and encourage/support peer interactions. Essentially, our customers and employees want to interact with business systems the way they they do in their personal/social lives. By some accounts, industry spent over $1 trillion on systems of record and though we continue to spend on them, we have reached a point of diminishing returns. Remember though that Today's Systems of Record were yesterday's Systems of Engagement – the engagement model has changed. People need to be able to do things themselves. Can't wait for hours for things to happen.
  9. "systems of engagement refers to the transition from current enterprise systems designed around discrete pieces of information ("records") to systems which are more decentralized, incorporate technologies which encourage peer interactions, and which often leverage cloud technologies to provide the capabilities to enable those interactions." You have probably heard terms such as Data Lake Operational Data Layer Data hub These are all moves towards a system of engagement where data is central
  10. We need to be able to do this transition while still leveraging our investment in Systems of Record * Analogy: Retrofitting old building with new “connectivity” and interfaces (maintain existing architecture)
  11. Today we'll touch on getting data out of source systems of record, pushing changes back to those systems and tracking the lineage of data through the system
  12. So how do we start to approach this? You have heard a lot about the 360 degree view of a customer, product or anything that is core to a business * popular concept that a lot of enterprises are putting solutions in place for This is a good starting point for a System of Engagement * You need a view across your core business before you can start to find new ways of interacting with it
  13. You have heard a lot about the360 degree view or single view of a customer, product or anything that is core to a business This is a popular concept that a lot of enterprises are putting solutions in place for This is a good starting point for a System of Engagement You often need a view across your core business before you can start to find new ways of interacting with it Remember the Retrofitting analogy? We are trying to build on top of our existing investment in systems of record and the Information about our core business objects is typically spread across these systems. Let's start by identifying "moments of engagement" that are of high value to our business and customers and that just providing a view for would make major improvements. These "moments" are the things that customers/users either expect the most from the business or feel the most pain about when they interact with the business.
  14. Let's go back to our examples at the VA. One of the more complex issues they face is providing a single view of a Veteran's health record. Providing this view is critical to a Veteran receiving quality care from doctors as well as other services provided by the VA. Right now health records are spread across systems (and agencies at that) and it is very difficult to see a Veteran's entire history. The VA has efforts under way to improve this and the approach is in line with the systems of engagement concept I have been talking about.
  15. So let's go back to our notional architecture… We have a central database fed and orchestrated by data ingestion and processing capabilities. This is fronted by data services that are consumed by new applications Let's put some technologies in place to try it out: 1) For the central database, we'll use MongoDB. No big surprise here and we'll talk more about why MongoDB is a great fit for this in a second. 2) For the ETL and data processing, we'll use Pentaho. There are other options for this but Pentaho has a good integration with MongoDB and is fairly easy to use (see part 1 in this series for more details on migrating data from relational databases) 3) Lastly, we'll use Node.js to build our RESTful services to sit on top of the database ** Describe the data flow: Records in SOR ELT pulls parts from SOR and updates SOE
  16. This is a picture you have seen many times over the years so what's different? Let's talk about some of the features of MongoDB and how they enable Dynamic schema  can handle vastly different data together and can keep improving and fixing issues over time easily (schema on read) Our example shows two systems but think about the complexities of integrating data across 5, 10 or even more systems Rich querying  supporting ends users directly requires multiple ways of access and key/value is not sufficient Aggregation framework  database-supported roll-ups for analysis High scale/performance  directly impacts customer & user experience so every second counts Auto-sharding  can automatically add processing power as data is added Map-reduce capability (Native MR or Hadoop Connector)  batch analysis looking for patterns and opportunities in the single view Enterprise Security  Provide the security controls necessary to protect the data
  17. So let's jump back into our example… We have a couple of different electronic health record (EHR) systems that we are going to pull Continuity of Care, or CCR, records from They happen to be in XML and have a lot of fields so I just pulled a couple of snippets out CCRs are meant to track many different types of medical interactions and events Here we have some things about immunizations… CLICK
  18. And here we have some data about medications that were perscribed So how are we going to put that together in one place so we can better interact with it and get a single view across the different CCRs for each patient? CLICK
  19. Leveraging the dynamic schema capabilities, we can readily create a document data model to encapsulate our original source data, common fields across systems as well as metadata CLICK We can start with the source data The source data can be a roughly transformed version of the data from the system of record so that we can interact with it as JSON/BSON CLICK The source data can be wrapped in an envelope document CLICK We can track metadata about the source data – source system, date, etc CLICK We can also extract any master data or the data that fits into a common enterprise data model out of the source If the raw data is required, it can be stored or maybe just cached as binary data stored directly in MongoDB and the wrapper document can contain a pointer to it CLICK, CLICK, CLICK Let's add a few more source data documents Now, in the process of creating these documents, we could also be creating an integrated view across them CLICK, CLICK There's a lot of flexibility here Maybe you want to keep the original source data objects as individual documents and then integrate them or maybe you just want to keep the integrated data objects
  20. ** add pentaho screenshot if you get a chance The actual process we go though isn't really that interesting You've all seen or written basic ETL processes before so I'll just cover it at a high level For this example, we'll create source documents for each of the original CCRs as well as an integrated view for each patient The last step can either be done incrementally or once you have completed loading the full batch of source documents for all patients. At which time, you would create an integrated document for each patient that you updated.
  21. Here, the source data is transformed from XML into JSON so we can work with the structure in MongoDB. Otherwise, we have to just store it raw in binary or text form The topic of converting XML to JSON is a whole separate discussion but it can range from simple to complex depending on how general a solution you need. Also keep in mind that this is an optional step and can be done at later stages in the process of rolling out your system. It may be more beneficial to focus on the "common" fields and integrate them initially.
  22. Once again, there are a lot of ways we could structure this but for this example, I'll just go with a simple wrapper structure
  23. What level should we track lineage at? Per field?
  24. Can list each of the CCRs in our integrated document and we can just pull in the fields that we want.
  25. Now that we have a single view, what do we do next? – NEXT, NEXT
  26. Improve care in the case of our VA example
  27. Having a view is a great first step but now that data is easy to engage with, users will want to be able make changes to it as well Worst case, they can go back to the systems of record directly and change the data there Let's switch gears a bit and go back to the change of address example we talked about
  28. So let's add a component that will propagate changes from the system of engagement back to the systems of record In addition to the previous components we put in place for the single view, we need some sort of message processing component to receive and publish data changes back to the source systems. For this example, we will use Apache Kafka as it is pretty commonly used these days. We'll show changing the integrated data in the system of engagement database and propagating that back to the systems of record
  29. Let's start with the data model in our system of engagement this time… The metadata looks the same as before but lets now add a version field to track when changes are made. We'll talk more about this in a second.
  30. Once again, there are a lot of ways we could structure this but for this example, I'll just go with a simple wrapper structure
  31. At a high level, conceptually, this process is quite simple 1, 2, 3, 4 As we drill into it though, many complexities arise: TBD http://askasya.com/post/revisitversions http://docs.mongodb.org/manual/tutorial/perform-two-phase-commits/ http://docs.mongodb.org/manual/tutorial/update-if-current/ Show updated lineage/versioning Failure/Retry scenarios
  32. This is
  33. We add the version number to the _id field
  34. So what do we end up with? We have re-implemented a "moment of engagement" that use to be complicated for operations and frustrating to end users to now be simple and painless. We can now think about additional processes that we could launch in response to this change: Let's look for dependents in the Veteran's benefits and, if they live with them, update their address too Let's do some automatic outreach to help the Veteran get settled in their new location Combining this with the benefits of creating single views, we can see how truly powerful this can be.
  35. So what's next? - NEXT
  36. Just keep going…
  37. Systems of engagement are a new wave of change happening in enterprise IT. These new systems are transforming businesses and allowing both employees and end users to interact with systems in new ways. MongoDB enables Systems of Engagement Dynamic schema can handle data from numerous different systems all in one place Fast, flexible querying, analysis, & aggregation gets maximum value from the data High performance allows Systems of Engagement to handle load from a new class of users