SlideShare ist ein Scribd-Unternehmen logo
1 von 17
1
Data Café — A Platform For Creating
Biomedical Data Lakes
Pradeeban Kathiravelu1,2, Ameen Kazerouni2, Ashish Sharma2
1 Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal
2 Department of Biomedical Informatics, Emory University, Atlanta, USA
www.sharmalab.info
2
Data Landscape
for Precision Medicine
DATA
CHARACTERISTICS
• Large number of small datasets
• Structured…Semi-structured
…Unstructured…Ill formed
• Noisy and Fuzzy/Uncertain
• Spatial, Temporal relationships
DATA MANAGEMENT
• Variety in storage and messaging
protocols
• No shared interface
3
Illustrative Use Case
Execute a Radiogenomics workflow on the diffusion images of GBM
patients who received a TMZ + experimental regimen with an overall
survival of 18months or more.
Execute a Radiogenomics workflow on the diffusion images of GBM
patients who received a TMZ + experimental regimen with an overall
survival of 18months or more
PACS + EMR + AIM + RT + Molecular
4
Motivation
• Most current solutions require a DBA to initiate the migration of data into
a Data Warehousing environment
• to query and explore all the data at once.
• Costly to set up such warehouses.
• Unified warehouse with access to query and explore the data.
• Limitations
• Scalability and extensibility to incorporate new data sources
• A priori knowledge of the data models of the different data sources.
BIOMEDICAL DATA LAKES
• Cohort Discovery and Creation — Assembled per-study
• Heterogeneous data collected in a loosely structured fashion.
• Agile and easy to create.
• Integrate with data exploration/visualization via REST APIs.
• Problem or hypothesis specific virtual data set.
• Powered by Drill + HDFS, Data Sources via APIs.
6
Data Café
• An agile approach to creating and extending the concept of a star
schema
• to model a problem/hypothesis specific dataset.
• by leveraging Apache Drill to easily query the data.
• Tackles the limitations in the existing approaches.
• Provides researchers the ability to add new data models and sources.
7
Core Concepts
Step 1. Given a set of data sources,
create a graphical representation of
the join attributes.
This graph represents how data is
connected across the various data
sources
8
Core Concepts
Step 2. Run a set of parallel queries on
the data sources that include the
attributes that are present in the
query graph.
In the top figure, our query is of type:
{id1: A1 > x and B2 == y}
We run similar queries across C, D and
E and retrieve the set of relevant id’s
(join attributes).
9
Core Concepts
Step 3. Compute intersection across
the various id’s (join attributes). The
data of interest can now be obtained
using the id’s in this intersection.
A subsequent query will allow us to
stream, in parallel, data from
individual sources, given the relevant
ids (join attributes)
10
Data Café Architecture
11
Apache Drill
• Variety – Query a range of non-relational data sources.
• Flexibility.
• Agility – Faster Insights.
• Scalability.
12
Evaluation Environment
• Data Café was deployed along with the data sources and Drill in Amazon
EC2.
• MongoDB instantiated in EC2 instances.
• Hive on Amazon EMR (Elastic MapReduce).
• EMR HDFS was configured with 3 nodes.
• Various datasets for evaluation
• Two synthetic datasets.
• Clinical Data from the TCGA BRCA collection
13
Results
• Quick creation of data lakes
• without prior knowledge of the data schema.
• Very fast execution of large queries
• with Apache Drill.
• Data Café can be an efficient platform for exploring an integrated data
source.
• Integrated data source construction process may be time consuming.
• Less critical path.
• Done less frequently than the data queries from HDFS/Hive using Drill.
14
Conclusion
• A novel platform for integrating multiple data sources.
• Without a priori knowledge of the data models of the sources that are being
integrated.
• Indices to do the actual integration
• Enables parallelizing the push of the actual data into HDFS.
• Apache Drill as a fast query execution engine that supports SQL.
• Currently ingesting data from TCGA.
15
Current State and Future Plans
• Ongoing efforts to evaluate the platform with diverse and heterogeneous data
sources.
• Expanding to a larger multi-node distributed cluster.
• Integration with DataScope.
• Multiple data stores and larger data sets.
• Integration with imaging clients such as caMicroscope, as well as archives such
as The Cancer Imaging Archive (TCIA).
Acknowledgements
Google Summer of Code 2015
NCIP/Leidos 14X138, caMicroscope
— A Digital Pathology Integrative
Query System; Ashish Sharma PI
Emory/WUSTL/Stony Brook
NCI U01 [1U01CA187013-01],
Resources for development and
validation of Radiomic Analyses &
Adaptive Therapy, Fred Prior, Ashish
Sharma (UAMS, Emory)
The results published here are in part
based upon data generated by the
TCGA Research Network:
http://cancergenome.nih.gov/
For more information
including recent updates
please visit:
www.sharmalab.info
ashish.sharma@emory.edu

Weitere ähnliche Inhalte

Was ist angesagt?

From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...Databricks
 
Data cloud lab version v.001.2020
Data cloud lab version v.001.2020Data cloud lab version v.001.2020
Data cloud lab version v.001.2020mdcdwh
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Matteo Manca
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...OpenAIRE
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data RepositoriesEnvironmental Data Initiative
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksOpenAIRE
 
New PID developments
New PID developmentsNew PID developments
New PID developmentsOpenAIRE
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databasestusharjadhav2611
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solutionStatice
 
data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining E2MATRIX
 
CRM - Data Collection, Storage and Acces.
CRM - Data Collection, Storage and Acces.CRM - Data Collection, Storage and Acces.
CRM - Data Collection, Storage and Acces.Vishwas Sankhe
 
Lambda Architecture The Hive
Lambda Architecture The HiveLambda Architecture The Hive
Lambda Architecture The HiveAltan Khendup
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsOntotext
 
5 data preparation and processing2
5 data preparation and processing25 data preparation and processing2
5 data preparation and processing2Mahmoud Alfarra
 
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRole of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRamakant Soni
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaCrossref
 

Was ist angesagt? (20)

From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
From Vaccine Management to ICU Planning: How CRISP Unlocked the Power of Data...
 
Data mining
Data miningData mining
Data mining
 
Data cloud lab version v.001.2020
Data cloud lab version v.001.2020Data cloud lab version v.001.2020
Data cloud lab version v.001.2020
 
Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning Introduction to data pre-processing and cleaning
Introduction to data pre-processing and cleaning
 
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
How OpenAIRE uses persistent identifiers for discovery, enrichment, and linki...
 
EDI Training Module 12: An Introduction to Metadata and Data Repositories
EDI Training Module 12:  An Introduction to Metadata and Data RepositoriesEDI Training Module 12:  An Introduction to Metadata and Data Repositories
EDI Training Module 12: An Introduction to Metadata and Data Repositories
 
Role of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly worksRole of PIDs in connecting scholarly works
Role of PIDs in connecting scholarly works
 
New PID developments
New PID developmentsNew PID developments
New PID developments
 
Record matching over query results from Web Databases
Record matching over query results from Web DatabasesRecord matching over query results from Web Databases
Record matching over query results from Web Databases
 
9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution9 facts about statice's data anonymization solution
9 facts about statice's data anonymization solution
 
Data mining
Data miningData mining
Data mining
 
data warehousing and data mining
data warehousing and data mining data warehousing and data mining
data warehousing and data mining
 
CRM - Data Collection, Storage and Acces.
CRM - Data Collection, Storage and Acces.CRM - Data Collection, Storage and Acces.
CRM - Data Collection, Storage and Acces.
 
The Big Metadata
The Big MetadataThe Big Metadata
The Big Metadata
 
Lambda Architecture The Hive
Lambda Architecture The HiveLambda Architecture The Hive
Lambda Architecture The Hive
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
It Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got SemanticsIt Don’t Mean a Thing If It Ain’t Got Semantics
It Don’t Mean a Thing If It Ain’t Got Semantics
 
5 data preparation and processing2
5 data preparation and processing25 data preparation and processing2
5 data preparation and processing2
 
Role of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data WarehouseRole of Data Cleaning in Data Warehouse
Role of Data Cleaning in Data Warehouse
 
ORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE IndonesiaORCID at Crossref LIVE Indonesia
ORCID at Crossref LIVE Indonesia
 

Andere mochten auch

EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data CommonsVivien Bonazzi
 
Entrance test for teacher 2013...
Entrance test for teacher 2013...Entrance test for teacher 2013...
Entrance test for teacher 2013...Ashish Sharma
 
From protein interaction networks to human phenotypes
From protein  interaction networks to human phenotypesFrom protein  interaction networks to human phenotypes
From protein interaction networks to human phenotypesbiocs
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Neil Saunders
 
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsLeveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsNitish Aggarwal
 
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionAmrapali Zaveri, PhD
 
Towards Social semantic journalism
Towards Social semantic journalismTowards Social semantic journalism
Towards Social semantic journalismBahareh Heravi
 
Linked data in the digital humanities skills workshop for realising the oppo...
Linked data in the digital humanities  skills workshop for realising the oppo...Linked data in the digital humanities  skills workshop for realising the oppo...
Linked data in the digital humanities skills workshop for realising the oppo...jodischneider
 
Beyond Journalism Chicago
Beyond Journalism ChicagoBeyond Journalism Chicago
Beyond Journalism ChicagoMark Deuze
 
Harrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social mediaHarrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social mediadri_ireland
 
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction NetworksSpecificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction Networkspedrobeltrao
 
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...Ronak Shah
 
Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...jodischneider
 
Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...Lars Juhl Jensen
 
PhD viva - 11th November 2015
PhD viva - 11th November 2015PhD viva - 11th November 2015
PhD viva - 11th November 2015Kevin Keraudren
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD VivaAidan Hogan
 
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...Pradeeban Kathiravelu, Ph.D.
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavisSean Davis
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane
 
Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013Scribe Software Corp.
 

Andere mochten auch (20)

EMBL Australian Bioinformatics Resource AHM - Data Commons
EMBL Australian Bioinformatics Resource AHM   - Data CommonsEMBL Australian Bioinformatics Resource AHM   - Data Commons
EMBL Australian Bioinformatics Resource AHM - Data Commons
 
Entrance test for teacher 2013...
Entrance test for teacher 2013...Entrance test for teacher 2013...
Entrance test for teacher 2013...
 
From protein interaction networks to human phenotypes
From protein  interaction networks to human phenotypesFrom protein  interaction networks to human phenotypes
From protein interaction networks to human phenotypes
 
Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...Using structural information to predict protein-protein interaction and enyzm...
Using structural information to predict protein-protein interaction and enyzm...
 
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and RecommendationsLeveraging Wikipedia-based Features for Entity Relatedness and Recommendations
Leveraging Wikipedia-based Features for Entity Relatedness and Recommendations
 
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of CognitionTowards Biomedical Data Integration for Analyzing the Evolution of Cognition
Towards Biomedical Data Integration for Analyzing the Evolution of Cognition
 
Towards Social semantic journalism
Towards Social semantic journalismTowards Social semantic journalism
Towards Social semantic journalism
 
Linked data in the digital humanities skills workshop for realising the oppo...
Linked data in the digital humanities  skills workshop for realising the oppo...Linked data in the digital humanities  skills workshop for realising the oppo...
Linked data in the digital humanities skills workshop for realising the oppo...
 
Beyond Journalism Chicago
Beyond Journalism ChicagoBeyond Journalism Chicago
Beyond Journalism Chicago
 
Harrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social mediaHarrower Heravi RDA P4 Social media
Harrower Heravi RDA P4 Social media
 
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction NetworksSpecificity and Evolvability in Eukaryotic Protein Interaction Networks
Specificity and Evolvability in Eukaryotic Protein Interaction Networks
 
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
Protein-Protein Interaction using SVM based kernel,Jacob Coefficient and Gene...
 
Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...Identifying, annotating, and filtering arguments and opinions on the social w...
Identifying, annotating, and filtering arguments and opinions on the social w...
 
Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...Combining sequence motifs and protein interactions to unravel complex phospho...
Combining sequence motifs and protein interactions to unravel complex phospho...
 
PhD viva - 11th November 2015
PhD viva - 11th November 2015PhD viva - 11th November 2015
PhD viva - 11th November 2015
 
Aidan's PhD Viva
Aidan's PhD VivaAidan's PhD Viva
Aidan's PhD Viva
 
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
ViTeNA: An SDN-Based Virtual Network Embedding Algorithm for Multi-Tenant Dat...
 
2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis2016 07 12_purdue_bigdatainomics_seandavis
2016 07 12_purdue_bigdatainomics_seandavis
 
Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation Sabrina Kirrane INSIGHT Viva Presentation
Sabrina Kirrane INSIGHT Viva Presentation
 
Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013Industry Report: The State of Customer Data Integration in 2013
Industry Report: The State of Customer Data Integration in 2013
 

Ähnlich wie Data Café — A Platform For Creating Biomedical Data Lakes

Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesUri Laserson
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeGeoffrey Fox
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amirydatastack
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeDataWorks Summit
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College LondonTorsten Reimer
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3Simon Ambridge
 
data analytics lecture3.ppt
data analytics lecture3.pptdata analytics lecture3.ppt
data analytics lecture3.pptNamrataBhatt8
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...DataWorks Summit
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoalarsgeorge
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...AKSHAY BHAGAT
 
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationUsing The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationDan Wellisch
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataRobert Grossman
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceGlobus
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Geoffrey Fox
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreUri Laserson
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesGeoffrey Fox
 

Ähnlich wie Data Café — A Platform For Creating Biomedical Data Lakes (20)

Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
High Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run TimeHigh Performance Data Analytics and a Java Grande Run Time
High Performance Data Analytics and a Java Grande Run Time
 
Data lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiryData lake-itweekend-sharif university-vahid amiry
Data lake-itweekend-sharif university-vahid amiry
 
Big Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short TimeBig Data at Geisinger Health System: Big Wins in a Short Time
Big Data at Geisinger Health System: Big Wins in a Short Time
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
Green Shoots: Research Data Management Pilot at Imperial College London
Green Shoots:Research Data Management Pilot at Imperial College LondonGreen Shoots:Research Data Management Pilot at Imperial College London
Green Shoots: Research Data Management Pilot at Imperial College London
 
20160331 sa introduction to big data pipelining berlin meetup 0.3
20160331 sa introduction to big data pipelining berlin meetup   0.320160331 sa introduction to big data pipelining berlin meetup   0.3
20160331 sa introduction to big data pipelining berlin meetup 0.3
 
data analytics lecture3.ppt
data analytics lecture3.pptdata analytics lecture3.ppt
data analytics lecture3.ppt
 
JOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big DataJOSA TechTalk: Metadata Management
in Big Data
JOSA TechTalk: Metadata Management
in Big Data
 
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
From Insights to Value - Building a Modern Logical Data Lake to Drive User Ad...
 
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 GenoaHadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
Hadoop is dead - long live Hadoop | BiDaTA 2013 Genoa
 
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...Big Data (SOCIOMETRIC METHODS FOR  RELEVANCY ANALYSIS OF LONG TAIL  SCIENCE D...
Big Data (SOCIOMETRIC METHODS FOR RELEVANCY ANALYSIS OF LONG TAIL SCIENCE D...
 
Using The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare InnovationUsing The Hadoop Ecosystem to Drive Healthcare Innovation
Using The Hadoop Ecosystem to Drive Healthcare Innovation
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
A Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials ScienceA Data Ecosystem to Support Machine Learning in Materials Science
A Data Ecosystem to Support Machine Learning in Materials Science
 
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
Multi-faceted Classification of Big Data Use Cases and Proposed Architecture ...
 
Big Data
Big Data Big Data
Big Data
 
Hadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant StoreHadoop for Bioinformatics: Building a Scalable Variant Store
Hadoop for Bioinformatics: Building a Scalable Variant Store
 
Matching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software ArchitecturesMatching Data Intensive Applications and Hardware/Software Architectures
Matching Data Intensive Applications and Hardware/Software Architectures
 

Mehr von Pradeeban Kathiravelu, Ph.D.

Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.Pradeeban Kathiravelu, Ph.D.
 
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...Pradeeban Kathiravelu, Ph.D.
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesPradeeban Kathiravelu, Ph.D.
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreePradeeban Kathiravelu, Ph.D.
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos... My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...Pradeeban Kathiravelu, Ph.D.
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...Pradeeban Kathiravelu, Ph.D.
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Pradeeban Kathiravelu, Ph.D.
 
Moving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routersMoving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routersPradeeban Kathiravelu, Ph.D.
 
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...Pradeeban Kathiravelu, Ph.D.
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Pradeeban Kathiravelu, Ph.D.
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesPradeeban Kathiravelu, Ph.D.
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Pradeeban Kathiravelu, Ph.D.
 

Mehr von Pradeeban Kathiravelu, Ph.D. (20)

Google Summer of Code_2023.pdf
Google Summer of Code_2023.pdfGoogle Summer of Code_2023.pdf
Google Summer of Code_2023.pdf
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022Google Summer of Code (GSoC) 2022
Google Summer of Code (GSoC) 2022
 
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
Niffler: A DICOM Framework for Machine Learning and Processing Pipelines.
 
Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021Google summer of code (GSoC) 2021
Google summer of code (GSoC) 2021
 
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
A DICOM Framework for Machine Learning Pipelines against Real-Time Radiology ...
 
Google Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentorsGoogle Summer of Code (GSoC) 2020 for mentors
Google Summer of Code (GSoC) 2020 for mentors
 
Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020Google Summer of Code (GSoC) 2020
Google Summer of Code (GSoC) 2020
 
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data SourcesData Services with Bindaas: RESTful Interfaces for Diverse Data Sources
Data Services with Bindaas: RESTful Interfaces for Diverse Data Sources
 
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degreeThe UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
The UCLouvain Public Defense of my EMJD-DC Double Doctorate Ph.D. degree
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos... My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Compos...
 
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
My Ph.D. Defense - Software-Defined Systems for Network-Aware Service Composi...
 
UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018UCL Ph.D. Confirmation 2018
UCL Ph.D. Confirmation 2018
 
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...Software-Defined Systems for Network-Aware Service Composition and Workflow P...
Software-Defined Systems for Network-Aware Service Composition and Workflow P...
 
Moving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routersMoving bits with a fleet of shared virtual routers
Moving bits with a fleet of shared virtual routers
 
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
Software-Defined Data Services: Interoperable and Network-Aware Big Data Exec...
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Software-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big ServicesSoftware-Defined Inter-Cloud Composition of Big Services
Software-Defined Inter-Cloud Composition of Big Services
 
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
Scalability and Resilience of Multi-Tenant Distributed Clouds in the Big Serv...
 
Componentizing Big Services in the Internet
Componentizing Big Services in the InternetComponentizing Big Services in the Internet
Componentizing Big Services in the Internet
 

Kürzlich hochgeladen

Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetChandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meetpriyashah722354
 
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591adityaroy0215
 
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetOzhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012Call Girls Service Gurgaon
 
❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR Call G...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR   Call G...❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR   Call G...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR Call G...Gfnyt.com
 
Call Girls Service In Goa 💋 9316020077💋 Goa Call Girls By Russian Call Girl...
Call Girls Service In Goa  💋 9316020077💋 Goa Call Girls  By Russian Call Girl...Call Girls Service In Goa  💋 9316020077💋 Goa Call Girls  By Russian Call Girl...
Call Girls Service In Goa 💋 9316020077💋 Goa Call Girls By Russian Call Girl...russian goa call girl and escorts service
 
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meetpriyashah722354
 
VIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near Me
VIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near MeVIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near Me
VIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near Memriyagarg453
 
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetHubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋Sheetaleventcompany
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthanindiancallgirl4rent
 
Bangalore call girl 👯‍♀️@ Simran Independent Call Girls in Bangalore GIUXUZ...
Bangalore call girl  👯‍♀️@ Simran Independent Call Girls in Bangalore  GIUXUZ...Bangalore call girl  👯‍♀️@ Simran Independent Call Girls in Bangalore  GIUXUZ...
Bangalore call girl 👯‍♀️@ Simran Independent Call Girls in Bangalore GIUXUZ...Gfnyt
 
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Sheetaleventcompany
 
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetCall Girls Service
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipurgragmanisha42
 
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...indiancallgirl4rent
 
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...indiancallgirl4rent
 
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...Sheetaleventcompany
 
Chandigarh Escorts, 😋9988299661 😋50% off at Escort Service in Chandigarh
Chandigarh Escorts, 😋9988299661 😋50% off at Escort Service in ChandigarhChandigarh Escorts, 😋9988299661 😋50% off at Escort Service in Chandigarh
Chandigarh Escorts, 😋9988299661 😋50% off at Escort Service in ChandigarhSheetaleventcompany
 

Kürzlich hochgeladen (20)

Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetChandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Chandigarh Call Girls 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
 
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
VIP Call Girl Sector 25 Gurgaon Just Call Me 9899900591
 
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetOzhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Ozhukarai Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Thane Just Call 9907093804 Top Class Call Girl Service Available
 
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
VIP Call Girls Sector 67 Gurgaon Just Call Me 9711199012
 
❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR Call G...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR   Call G...❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR   Call G...
❤️♀️@ Jaipur Call Girls ❤️♀️@ Meghna Jaipur Call Girls Number CRTHNR Call G...
 
Call Girls Service In Goa 💋 9316020077💋 Goa Call Girls By Russian Call Girl...
Call Girls Service In Goa  💋 9316020077💋 Goa Call Girls  By Russian Call Girl...Call Girls Service In Goa  💋 9316020077💋 Goa Call Girls  By Russian Call Girl...
Call Girls Service In Goa 💋 9316020077💋 Goa Call Girls By Russian Call Girl...
 
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real MeetCall Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
Call Girls Chandigarh 👙 7001035870 👙 Genuine WhatsApp Number for Real Meet
 
VIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near Me
VIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near MeVIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near Me
VIP Call Girls Noida Jhanvi 9711199171 Best VIP Call Girls Near Me
 
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetHubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Hubli Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
💚😋Mumbai Escort Service Call Girls, ₹5000 To 25K With AC💚😋
 
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur RajasthanJaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
Jaipur Call Girls 9257276172 Call Girl in Jaipur Rajasthan
 
Bangalore call girl 👯‍♀️@ Simran Independent Call Girls in Bangalore GIUXUZ...
Bangalore call girl  👯‍♀️@ Simran Independent Call Girls in Bangalore  GIUXUZ...Bangalore call girl  👯‍♀️@ Simran Independent Call Girls in Bangalore  GIUXUZ...
Bangalore call girl 👯‍♀️@ Simran Independent Call Girls in Bangalore GIUXUZ...
 
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
Call Girl In Zirakpur ❤️♀️@ 9988299661 Zirakpur Call Girls Near Me ❤️♀️@ Sexy...
 
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real MeetMangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
Mangalore Call Girls 👙 6297143586 👙 Genuine WhatsApp Number for Real Meet
 
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In RaipurCall Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
Call Girl Raipur 📲 9999965857 ヅ10k NiGhT Call Girls In Raipur
 
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
(Sonam Bajaj) Call Girl in Jaipur- 09257276172 Escorts Service 50% Off with C...
 
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
(Ajay) Call Girls in Dehradun- 8854095900 Escorts Service 50% Off with Cash O...
 
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
Punjab❤️Call girls in Mohali ☎️7435815124☎️ Call Girl service in Mohali☎️ Moh...
 
Chandigarh Escorts, 😋9988299661 😋50% off at Escort Service in Chandigarh
Chandigarh Escorts, 😋9988299661 😋50% off at Escort Service in ChandigarhChandigarh Escorts, 😋9988299661 😋50% off at Escort Service in Chandigarh
Chandigarh Escorts, 😋9988299661 😋50% off at Escort Service in Chandigarh
 

Data Café — A Platform For Creating Biomedical Data Lakes

  • 1. 1 Data Café — A Platform For Creating Biomedical Data Lakes Pradeeban Kathiravelu1,2, Ameen Kazerouni2, Ashish Sharma2 1 Instituto Superior Técnico, Universidade de Lisboa, Lisboa, Portugal 2 Department of Biomedical Informatics, Emory University, Atlanta, USA www.sharmalab.info
  • 2. 2 Data Landscape for Precision Medicine DATA CHARACTERISTICS • Large number of small datasets • Structured…Semi-structured …Unstructured…Ill formed • Noisy and Fuzzy/Uncertain • Spatial, Temporal relationships DATA MANAGEMENT • Variety in storage and messaging protocols • No shared interface
  • 3. 3 Illustrative Use Case Execute a Radiogenomics workflow on the diffusion images of GBM patients who received a TMZ + experimental regimen with an overall survival of 18months or more. Execute a Radiogenomics workflow on the diffusion images of GBM patients who received a TMZ + experimental regimen with an overall survival of 18months or more PACS + EMR + AIM + RT + Molecular
  • 4. 4 Motivation • Most current solutions require a DBA to initiate the migration of data into a Data Warehousing environment • to query and explore all the data at once. • Costly to set up such warehouses. • Unified warehouse with access to query and explore the data. • Limitations • Scalability and extensibility to incorporate new data sources • A priori knowledge of the data models of the different data sources.
  • 5. BIOMEDICAL DATA LAKES • Cohort Discovery and Creation — Assembled per-study • Heterogeneous data collected in a loosely structured fashion. • Agile and easy to create. • Integrate with data exploration/visualization via REST APIs. • Problem or hypothesis specific virtual data set. • Powered by Drill + HDFS, Data Sources via APIs.
  • 6. 6 Data Café • An agile approach to creating and extending the concept of a star schema • to model a problem/hypothesis specific dataset. • by leveraging Apache Drill to easily query the data. • Tackles the limitations in the existing approaches. • Provides researchers the ability to add new data models and sources.
  • 7. 7 Core Concepts Step 1. Given a set of data sources, create a graphical representation of the join attributes. This graph represents how data is connected across the various data sources
  • 8. 8 Core Concepts Step 2. Run a set of parallel queries on the data sources that include the attributes that are present in the query graph. In the top figure, our query is of type: {id1: A1 > x and B2 == y} We run similar queries across C, D and E and retrieve the set of relevant id’s (join attributes).
  • 9. 9 Core Concepts Step 3. Compute intersection across the various id’s (join attributes). The data of interest can now be obtained using the id’s in this intersection. A subsequent query will allow us to stream, in parallel, data from individual sources, given the relevant ids (join attributes)
  • 11. 11 Apache Drill • Variety – Query a range of non-relational data sources. • Flexibility. • Agility – Faster Insights. • Scalability.
  • 12. 12 Evaluation Environment • Data Café was deployed along with the data sources and Drill in Amazon EC2. • MongoDB instantiated in EC2 instances. • Hive on Amazon EMR (Elastic MapReduce). • EMR HDFS was configured with 3 nodes. • Various datasets for evaluation • Two synthetic datasets. • Clinical Data from the TCGA BRCA collection
  • 13. 13 Results • Quick creation of data lakes • without prior knowledge of the data schema. • Very fast execution of large queries • with Apache Drill. • Data Café can be an efficient platform for exploring an integrated data source. • Integrated data source construction process may be time consuming. • Less critical path. • Done less frequently than the data queries from HDFS/Hive using Drill.
  • 14. 14 Conclusion • A novel platform for integrating multiple data sources. • Without a priori knowledge of the data models of the sources that are being integrated. • Indices to do the actual integration • Enables parallelizing the push of the actual data into HDFS. • Apache Drill as a fast query execution engine that supports SQL. • Currently ingesting data from TCGA.
  • 15. 15 Current State and Future Plans • Ongoing efforts to evaluate the platform with diverse and heterogeneous data sources. • Expanding to a larger multi-node distributed cluster. • Integration with DataScope. • Multiple data stores and larger data sets. • Integration with imaging clients such as caMicroscope, as well as archives such as The Cancer Imaging Archive (TCIA).
  • 16. Acknowledgements Google Summer of Code 2015 NCIP/Leidos 14X138, caMicroscope — A Digital Pathology Integrative Query System; Ashish Sharma PI Emory/WUSTL/Stony Brook NCI U01 [1U01CA187013-01], Resources for development and validation of Radiomic Analyses & Adaptive Therapy, Fred Prior, Ashish Sharma (UAMS, Emory) The results published here are in part based upon data generated by the TCGA Research Network: http://cancergenome.nih.gov/
  • 17. For more information including recent updates please visit: www.sharmalab.info ashish.sharma@emory.edu