Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Solving the Disconnected Data 
Problem in Healthcare Using 
MongoDB 
A MongoSF talk – December 3rd 2014 
Sven Junkergård -...
ME 
I am a reformed consultant who used to do architecture consulting… 
• MSc Computer Science and Engineering – Chalmers ...
WHO I WORK FOR – ZEPHYR HEALTH 
3 
ORGANIZATIONAL 
EXPERTISE 
• Life Sciences 
• Brand Management 
• Big Data 
• Applied M...
V 
V 
V 
SOLVING THE VARIETY PROBLEM 
Volume 
Velocity 
Variety 
V Visualization 
Healthcare example 
Genomic sequencing 
...
WHY HEALTHCARE DATA IS A DIFFERENT WORLD ENTIRELY 
Loan application decision Clinical trial investigator decision 
5 
• Re...
THE TYPES OF PROBLEMS THAT CAN BE SOLVED 
WITH INTEGRATED DISPARATE DATA 
Problem What is it? 
Site selection 
Finding the...
DATA CATEGORIES AND EXAMPLES 
Creating a complete picture requires combining disconnected data from 
Internal 
CRM 
Trials...
A DIFFERENT PROBLEM REQUIRES A DIFFERENT SOLUTION 
Instead… 
• A different data model based on 
descriptive meta data 
• A...
ENTITY CENTRIC DATA MODEL 
Traditional, relational model Entity centric model 
Entity 
table 
Data 
source 1 
Data 
source...
ONTOLOGY-BASED DEVELOPMENT 
10 
Requirements 
• Flexible 
• Extensible and adaptive 
• Easy to maintain 
Solution 
• Ontol...
VOCABULARY 
11 
Entities 
Organic 
Attributes 
Entity 
Relationships 
Derived 
Attributes 
Real world things or events 
E....
WHY MONGODB? 
Our requirements 
• Extremely flexible data storage 
• Low cost of evolving schema 
• Highly performant for ...
DATA ORGANIZATION 
dataset dataset_ 
13 
Full Profile 
Main Profile 
Entity 
Relationships 
Attribute 
References 
Identit...
DATA INTEGRATION 
14 
{ 
first_name: Charles 
last_name: Morris 
street: 200 First St. 
city: Rochester 
state: MN 
zip: 5...
ALGORITHM EXAMPLES 
15 
Record linkage 
Disambiguation 
Dataset identification 
Clustering 
C Morris 
Heart and Vascular C...
ILLUSTRATIVE MONGODB PROFILE 
{ 
NPI FirstName LastName Specialty 
1 Tom Smith Cardiologist 
“_id” : “53bcf9cae4b03f352d4b...
ADDING ADDITIONAL ATTRIBUTES 
{ 
NPI Institutio 
“_id” : “53bcf9cae4b03f352d4b47c7“, 
"identity": {"npi": "1", 
NPI FirstN...
TRICKS TO TAME THE WILD DATA 
• Ontology – how we keep track of all ingested information 
• Vocabulary – bringing structur...
DERIVED ATTRIBUTES 
What’s the problem? 
• Data is rarely clean and business rules are 
complex 
What are we doing about i...
GEOSPATIAL MAPPING APPROACH FOR 
AWKWARD GEO DATA 
20 
Using traditional method 
Reporting unit 
Postal codes 
Mapping + c...
INDEXING 
Why MongoDB alone does not get it done 
• Cross collection queries required for large number of scenarios 
• Ind...
THE ZEPHYR PLATFORM 
100,000,000+ data points ingested and indexed each year 
100,000,000+ data points ingested and indexe...
CONSUMING INTEGRATED DISPARATE DATA 
Analytical applications use the zAPI and the ontology to produce 
applications that a...
TARGETED ANALYTICAL APPLICATIONS 
Apps for real business problems leveraged by everyday business users 
Illuminate 
Voyage...
A BRIEF DEMO 
25
LEARNINGS 
• There was no one technology or one database that provided a 
compete solution  embrace diversity 
• Create g...
SUMMARY 
Wrapping it all up in five points 
1. Healthcare is different and has lots of critical data that is disconnected ...
THANK YOU! 
Brian Roy – Strategy and architecture 
Mahesh Chaudhari – Database architecture 
Cesar Arevalo – Data integrat...
Zephyr Health 
CONTACT INFORMATION 
450 Mission St. Suite 201 
San Francisco, California 94105 
+1.415.529.7649 
zephyrhea...
BACKUP SCREEN SHOTS 
30
ILLUMINATE – LANDING PAGE
ILLUMINATE – ALL CASES VIEW
ILLUMINATE – GRID VIEW
ILLUMINATE – GRAPH VIEW
ILLUMINATE – PROFILE VIEW
Nächste SlideShare
Wird geladen in …5
×

Solving the Disconnected Data Problem in Healthcare Using MongoDB

The data diversity in healthcare and life sciences is exploding and the market is fundamentally changing as a result of healthcare reform. The result is more and more data but it is compartmentalized and disconnected. At Zephyr Health, we have developed a data platform that is able to provide connectivity between thousands of healthcare data assets using an ontology driven approach storing data in MongoDB. This session will show how we break down this very challenging problem and how some of MongoDBs more recent features have been utilized to do so.

  • Loggen Sie sich ein, um Kommentare anzuzeigen.

Solving the Disconnected Data Problem in Healthcare Using MongoDB

  1. 1. Solving the Disconnected Data Problem in Healthcare Using MongoDB A MongoSF talk – December 3rd 2014 Sven Junkergård - CTO
  2. 2. ME I am a reformed consultant who used to do architecture consulting… • MSc Computer Science and Engineering – Chalmers University of Technology in Gothenburg • AMS, Capgemini • Cake Financial – aggregating retail investor portfolios and generating investment insights from the best of the best • Billfloat – novel financial credit product with highly differentiated underwriting method • Zephyr Health – built out technology and engineering team to deliver on a big vision – integrate disconnected data in healthcare and solve real problems. Now CTO. 2
  3. 3. WHO I WORK FOR – ZEPHYR HEALTH 3 ORGANIZATIONAL EXPERTISE • Life Sciences • Brand Management • Big Data • Applied Mathematics • Algorithms • IaaS | SaaS | PaaS San Francisco London India OFFICE LOCATIONS CURRENT CLIENTS Include members of: GLOBAL TOP 5 BIOPHARM GLOBAL TOP 5 PHARM GLOBAL TOP 5 MEDICAL DEVICES • Machine Learning • Artificial Intelligence • Statistics & Modeling • Data Science • Visualization • App Development OUR FOCUS • Organize disconnected data in healthcare and life science • Visualize the combination of heterogeneous data sources in analytical problems • Solve important and challenging problems for our customers
  4. 4. V V V SOLVING THE VARIETY PROBLEM Volume Velocity Variety V Visualization Healthcare example Genomic sequencing Streaming device data Understanding healthcare landscape and treatment effectiveness • Image sources: illumina and iRhythm 4 Internal Vendor Public Providing relevant and powerful visualizations that provide real insights Data trends
  5. 5. WHY HEALTHCARE DATA IS A DIFFERENT WORLD ENTIRELY Loan application decision Clinical trial investigator decision 5 • Research • Published trials • Current sponsored trials • Prescriptions • Claims • Funding • Network leadership • Site profile • Site certification • Site statistics SSN Applicant demographics SSN SSN Bank account Credit report SSN SSN Identity check Income verification Investigator Site Patients Inconsistent or missing keys
  6. 6. THE TYPES OF PROBLEMS THAT CAN BE SOLVED WITH INTEGRATED DISPARATE DATA Problem What is it? Site selection Finding the right locations to house clinical trials Trail outcomes Visualizing data from different sources within clinical trials Medical expertise communication Identifying the healthcare professionals with the right expertise Scoring and ranking Finding the top ranking healthcare professionals or institutions for a particular purpose Network leadership analysis Understanding who is connected to who and how information is disseminated Care delivery effectiveness Identifying areas of great or poor performance and the underlying reason Patient outcomes Relating patient outcomes to specific market activities Health economics Understanding the financial effectiveness of an intervention or introducing a new standard or care 6
  7. 7. DATA CATEGORIES AND EXAMPLES Creating a complete picture requires combining disconnected data from Internal CRM Trials Payments Sales Partners Speakers an enormous variety of sources Vendors Rx Claims Referral patterns Primary research Consulting Public Providers Grants Public trials Research Keys Controlled Vendor specific Anything and nothing Formats Spreadsheets (structured) Flat files Anything Managing variety is the key to solving the problem Managing data variety is the key to solving the problem 7
  8. 8. A DIFFERENT PROBLEM REQUIRES A DIFFERENT SOLUTION Instead… • A different data model based on descriptive meta data • A non-traditional data store • Something other than Informatica • Automated intelligent algorithms • A few special tricks • An API • Some really great applications... 8 ETL DW DM OLAP Cube BI Insigh t
  9. 9. ENTITY CENTRIC DATA MODEL Traditional, relational model Entity centric model Entity table Data source 1 Data source 2 Data source n Attributes Entity Attributes Entity Attributes Entity Meta data …… …… …… …… …… …… …… …… …… …… …… …… ……
  10. 10. ONTOLOGY-BASED DEVELOPMENT 10 Requirements • Flexible • Extensible and adaptive • Easy to maintain Solution • Ontology: used to formally represent knowledge within a domain • Vocabulary: Collection of entities, attributes, relationships that provides context within the domain • Taxonomy (Classification): A hierarchical collection of controlled terms from vocabulary
  11. 11. VOCABULARY 11 Entities Organic Attributes Entity Relationships Derived Attributes Real world things or events E.g. Institution, patient, sales, potential, etc. Data points coming from datasets E.g. first_name, age, revenue, date, etc. Relationships between different entities Processed key-value pairs from existing organic and/or derived attributes
  12. 12. WHY MONGODB? Our requirements • Extremely flexible data storage • Low cost of evolving schema • Highly performant for complex joints, recursive queries etc • Scalable to large volumes of connected information MongoDB: • Document store is a great fit for storing arbitrary information • Key-value pair in JSON format – (allowed for both adding data traceability and cheap data evolution) • Secondary indexes and strict consistency • Map-reduce functionality Challenges: • Queries are powerful but not easy to write • We needed complex joints across arbitrary information (how do you create an index on something you don’t even know what it is ahead of time?) 12
  13. 13. DATA ORGANIZATION dataset dataset_ 13 Full Profile Main Profile Entity Relationships Attribute References Identity Section Attributes (Organic + Derived) File Raw records Info Data Geo locations
  14. 14. DATA INTEGRATION 14 { first_name: Charles last_name: Morris street: 200 First St. city: Rochester state: MN zip: 55905 phone: 802-555-1234 email: cmorris@mayoclinic.com headshot: <AF6713…> thought_leader_score: 8 pub_count: 203 } DISPARATE SOURCES OF INFORMATION STRUCTURED PROFILE APPLICATION REPRESENTATION All enabled through a series of data integration algorithms
  15. 15. ALGORITHM EXAMPLES 15 Record linkage Disambiguation Dataset identification Clustering C Morris Heart and Vascular Center 123 Main St Rochester, MN 55903 802-555-9988 Charles “Chuck” Morris Cardiologist 200 First St. Rochester, MN 55905 802-555-1234 cmorris@mayoclinic.com ?? Automatically choosing the most authoritative version of an attribute Maximizing re-use of meta data describing imported data sets Pre-calculating clusters in weakly attributed data
  16. 16. ILLUSTRATIVE MONGODB PROFILE { NPI FirstName LastName Specialty 1 Tom Smith Cardiologist “_id” : “53bcf9cae4b03f352d4b47c7“, "identity": {"npi": "1", "specialty": ["Cardiologist”], "first_name": "Tom", "last_name": "Smith”}, "attributes": { "npi": {1}, "first_name": {"Tom”}, "last_name": {"Smith”}, "specialty": {"Cardiologist”} } } 16
  17. 17. ADDING ADDITIONAL ATTRIBUTES { NPI Institutio “_id” : “53bcf9cae4b03f352d4b47c7“, "identity": {"npi": "1", NPI FirstName LastName Specialty 1 Tom Smith Cardiologist "specialty": ["Cardiologist”], "first_name": "Tom", "last_name": "Smith”}, "attributes": { "npi": {1}, "first_name": {"Tom”}, "last_name": {"Smith”}, "specialty": {"Cardiologist”}, "institution": {"UCSF Medical Center”}, "clinical_trial": {"Heart Valve Clinical Trial”}, "start_date": {"01/01/2011”}, "end_date": {"03/25/2013”} } } 17 n ClinicalTrial Name Start Date End Date 1 UCSF Medical Center Heart Valve Clinical Trial 01/01/2011 03/25/2013
  18. 18. TRICKS TO TAME THE WILD DATA • Ontology – how we keep track of all ingested information • Vocabulary – bringing structure to large variety of information • Derived attributes – encapsulate complexity • GIS transformations – practical integration of geo data • Indexing – fast access to complex information in MongoDB 18
  19. 19. DERIVED ATTRIBUTES What’s the problem? • Data is rarely clean and business rules are complex What are we doing about it? • Use existing (organic) attributes and apply rules to generate new (derived) attributes • Derived attributes generated through queries or map-reduce jobs Why it matters • Too complex and expensive to consider all business rules at run-time with every query • Hides the complexity and introduces uniformity 19 Attributes Entity
  20. 20. GEOSPATIAL MAPPING APPROACH FOR AWKWARD GEO DATA 20 Using traditional method Reporting unit Postal codes Mapping + calculations Stuttgart District Using geospatial method Geocoded reporting unit District Stuttgart State Mapping + calculation Baden-Württemberg State • Additional challenges with mismatches between reporting unit postal codes and mapping postal codes • Have to compensate for missing postal codes • Split patients or metrics across multiple regions when reporting unit spans multiple regions Baden-Württemberg • Requires determining a single central point for each reporting unit • Uses no mapping documents • No compensatory calculations required • Overall accuracy increases 7700117733 70173
  21. 21. INDEXING Why MongoDB alone does not get it done • Cross collection queries required for large number of scenarios • Indexing challenges when dealing with unknown information What we did • Graph based index • Entities and attributes are nodes • Entity – attribute ownership and entity to entity relationships are edges How we use it • zQueries allow us to do complex queries from web front ends 21
  22. 22. THE ZEPHYR PLATFORM 100,000,000+ data points ingested and indexed each year 100,000,000+ data points ingested and indexed each year Disconnected Data Apps for Life Sciences Algorithm Driven Data Ingestion Synchronization Proprietary REST API zQuery Internal Vendor Public Data Organized in Connected Profile Documents Graph Based Materialized Query Index Ontology Driven Data Tier 22
  23. 23. CONSUMING INTEGRATED DISPARATE DATA Analytical applications use the zAPI and the ontology to produce applications that adapt to changing data Zephyr Platform Ontology Driven Data Store A P I REST API Exposes both data and the ontology zQueries jSON based query language for queries against dynamic and connected data Analytical Apps Functional Focus Solving specific business problem with focused apps Design Single page apps with targeted data visualizations 23
  24. 24. TARGETED ANALYTICAL APPLICATIONS Apps for real business problems leveraged by everyday business users Illuminate Voyager Kaleidoscope 24 Lighthouse
  25. 25. A BRIEF DEMO 25
  26. 26. LEARNINGS • There was no one technology or one database that provided a compete solution  embrace diversity • Create generic platform, pour effort into specialized algorithms to populate data intelligently • Ontology driven development can be very powerful but data organization still a challenge • Indexing on a priori unknown attributes is challenging • Data modeling is always important, large profiles had to be broken down 26
  27. 27. SUMMARY Wrapping it all up in five points 1. Healthcare is different and has lots of critical data that is disconnected 2. Generic, MongoDB-based data storage model using meta-data 3. Data integration powered by algorithms 4. Document profiles for facts, graph for querying 5. Diverse set of end user analytical applications powered by the generic data platform Why this matters • Standards are really important, but slow to develop • Huge amount of change occurring in our healthcare system • We need to make decisions today based on available data sets despite existing challenges 27
  28. 28. THANK YOU! Brian Roy – Strategy and architecture Mahesh Chaudhari – Database architecture Cesar Arevalo – Data integration implementation The guys that made all of it come together! 28
  29. 29. Zephyr Health CONTACT INFORMATION 450 Mission St. Suite 201 San Francisco, California 94105 +1.415.529.7649 zephyrhealth.com Sven Junkergård CTO +1.415.503.7412 sven@zephyrhealth.com 29
  30. 30. BACKUP SCREEN SHOTS 30
  31. 31. ILLUMINATE – LANDING PAGE
  32. 32. ILLUMINATE – ALL CASES VIEW
  33. 33. ILLUMINATE – GRID VIEW
  34. 34. ILLUMINATE – GRAPH VIEW
  35. 35. ILLUMINATE – PROFILE VIEW

×