SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
Challenges of Big Data Processing in
Personalized Medicine
Trends and Concepts, Potsdam
Nov. 28, 2013

Dr. Matthieu Schapranow
Hasso Plattner Institute
Our Vision
Personalized Medicine
2

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
Our Vision
Personalized Medicine
3

Desirability
■  Leveraging directed customer services
■  Portfolio of integrated services for
clinicians, researchers, and patients
■  Include latest research results, e.g. most
effective therapies
Viability
Feasibility
■  Enable personalized medicine also in faroff regions and developing countries to
exceed customer base

■  HiSeq 2500 enables highcoverage whole genome
sequencing in ≈1d

■  Share data via the Internet to get
feedback from word-wide experts (costsaving)

■  IMDB enables allele frequency
determination of 12B records
within <1s

■  Combine research data (publications,
annotations, genome data) from
international databases in a single
knowledge base

■  Identification of relevant
annotations out of 80M <1s
■  Data preparation as a service
reduces TCO

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
Our Challenge
Data Challenges
4

Human genome/biological data
600GB per full genome
15PB+ in databases of leading institutes

Hospital information systems
Often more than 50GB

Cancer patient records
>160k records at NCT

Prescription data
1.5B records from 10,000 doctors
and 10M Patients (100 GB)

Human proteome
160M data points (2.4GB) per sample
>3TB raw proteome data in ProteomicsDB

PubMed database
>23M articles

Medical sensor data
Scan of a single organ in 1s
creates 10GB of raw data

Clinical trials
Currently more than 30k
recruiting on ClinicalTrials.gov

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
Our Approach
In-Memory Technology
●
●
●
●

Read Event
Read Event
Repositories
Repositories

Combined
column
and row store
Insert only
for time travel

+

+
+
++
A

up to 2.000
up to 2.000
requests
requests
per second
per second

Minimal
projections

Any attribute
as index

Bulk load

Discovery Service
Discovery Service

Multi-core/
parallelization

SAP HANA
SAP HANA

P

Active/passive
data store
Dynamic
multithreading
within nodes
No aggregate
tables

+++

up to 8.000 read
up to 8.000 read
event notifications
event notifications
per second
per second

On-the-fly
extensibility
Map
reduce

P
P

t

A
A

Lightweight
Compression

Partitioning

SQL

Analytics on
historical
data
Single and
multi-tenancy
Object to
relational
mapping
Group Key

SQL interface
on columns &
rows
Reduction of
layers

x
x

5

Verification
Verification
Services
Services

T

Text Retrieval
and Extraction
No disk

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
High-Performance In-Memory Genome Project

Architectural Overview
6

Cohort
Analysis

Pathway
Finder

Analytical
Tools

Paper
Search

Access
Control

Clinical Trial
Finder

App Store

Extensions

Billing

Genome
Data

Papers
Pathways

Pipeline
...

Genome
Metadata

Data

Pipeline
Editor

Pipeline
Models
...

In-Memory Database
Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013

...
High-Performance In-Memory Genome Project

Selected Research Topics
7

Improving Analyses:
■  Medical Knowledge Cockpit, Oncolyzer
■  Genome Browser enables deep dive into the genome
■  Cohort Analysis, e.g. clustering of patient cohorts
■  Combined Search, e.g. in clinical trials and side-effect databases
■  Pathways Topology Analysis, e.g. to identify cause/effect
Improving Data Preparations:
■  Graphical modeling of Genome Data Processing (GDP) pipelines
■  Scheduling and execution of multiple GPD pipelines in parallel
■  App store for medical knowledge (bring algorithms to data)
■  Exchange of sensitive data, e.g. history-based access control
■  Billing processes for intellectual property and services
Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
High-Performance In-Memory Genome Project

Medical Knowledge Cockpit

■  Search for affected genes in
distributed and heterogeneous data
sources

8

■  Immediate exploration of relevant
information, such as
□  Gene descriptions,
□  Molecular impact and related
pathways,
Unified access to
structured and
unstructured data
sources

Automatic
clinical trial
matching using
HANA text analysis
features

□  Scientific publications, and
□  Suitable clinical trials.
■  No manual searching for hours or
days – In-memory technology
translates it into interactive finding

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
High-Performance In-Memory Genome Project

Genome Browser

■  Genome Browser: Comparison of
multiple genomes with reference

9

■  Combined knowledge base: latest
relevant annotations and literature,
e.g. NCBI, dbSNP, and UCSC
■  Detailed exploration of genome
locations and existing associations
Unified access
to multiple formerly
disjoint data sources

Matching of
variants with relevant
international annotations

■  Ranked variants, e.g. accordingly to
known diseases
■  Links to more detailed sources
enable fast identification of relevant
information while eliminating longlasting searches.

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
High-Performance In-Memory Genome Project

Analysis of Patient Cohorts

■  In a patient cohort, a subset does
not respond to therapy – why?

10

■  Clustering using various statistical
algorithms, such as k-means or
hierarchical clustering
■  Calculation of all locus combinations
in which at least 5% of all TCGA
participants have mutations: 200ms
for top 20 combinations

Fast clustering
directly inside HANA

■  Individual clusters are calculated in
parallel directly within the database
■  K-means algorithm: 50ms (PAL) vs.
500ms (R)

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
High-Performance In-Memory Genome Project
Search in Structured and Unstructured Medical Data

■  Extended text analysis feature by
medical terminology

11

□  Genes (122,975 + 186,771
synonyms)
□  medical terms and categories
(98,886 diseases + 48,561
synonyms, 47 categories)
Unified access to
structured and
unstructured data
sources

Clinical trial
matching using
text analysis features

□  pharmaceutical ingredients
(7,099 + 5,561 synonyms)
■  Imported clinicaltrials.gov database
(145k trials/30,138 recruiting)
■  Extracted, e.g., 320k genes, 161k
ingredients, 30k periods
■  Select all studies based on multiple
filters in <500ms

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
High-Performance In-Memory Genome Project

Pathway Analysis

■  Search in pathways is limited to “is
a certain element contained” today.

12

■  Integrated >1,5k pathways from
international sources, e.g. KEGG,
HumanCyc, and WikiPathways, into
HANA

Unified access
to multiple formerly
disjoint data sources

■  Implemented graph-based topology
exploration and ranking based on
patient specifics
■  Enables interactive identification of
possible dysfunctions affecting the
course of a therapy before its start

Pathway analysis
of genetic variations with
Graph Engine

Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
Our Vision
13

For researchers:
■  Enable real-time analysis of genome data
■  Automatic scan of pathways to identify cellular
impact of mutations
■  Free-text search in publications, diagnosis, and EMR
data (structured and unstructured data)
For clinicians:
■  Preventive diagnostics to identify risk patients
■  Indicate pharmacokinetic correlations
■  Scan for comparable patient cases
For patients:
■  Identify relevant clinical trials / experts
■  Start most appropriate therapy early based on all
evidences and latest knowledge
Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
Thank you for your interest!
Keep in contact with us.
14

Dr. Matthieu-P. Schapranow
schapranow@hpi.uni-potsdam.de
http://we.analyzegenomes.com/

Hasso Plattner Institute
Enterprise Platform & Integration Concepts
Dr. Matthieu-P. Schapranow
August-Bebel-Str. 88
14482 Potsdam, Germany
Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013

Weitere ähnliche Inhalte

Was ist angesagt?

5 data preparation and processing2
5 data preparation and processing25 data preparation and processing2
5 data preparation and processing2Mahmoud Alfarra
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2Mahmoud Alfarra
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processingMahmoud Alfarra
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & ApplicationsFazle Rabbi Ador
 
Data Mining and Data Warehouse
Data Mining and Data WarehouseData Mining and Data Warehouse
Data Mining and Data WarehouseAnupam Sharma
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsIJMER
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply ChainPaul Groth
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsMongoDB
 
Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2lmk7
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Greg Landrum
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...Susanna-Assunta Sansone
 
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsFrom Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsMathieu d'Aquin
 
Datamining
DataminingDatamining
Dataminingsumit621
 

Was ist angesagt? (20)

5 data preparation and processing2
5 data preparation and processing25 data preparation and processing2
5 data preparation and processing2
 
Data preparation and processing chapter 2
Data preparation and processing chapter  2Data preparation and processing chapter  2
Data preparation and processing chapter 2
 
4 Data preparation and processing
4  Data preparation and processing4  Data preparation and processing
4 Data preparation and processing
 
Data Mining & Applications
Data Mining & ApplicationsData Mining & Applications
Data Mining & Applications
 
Data mining
Data miningData mining
Data mining
 
Data Mining and Data Warehouse
Data Mining and Data WarehouseData Mining and Data Warehouse
Data Mining and Data Warehouse
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Data Mining: Future Trends and Applications
Data Mining: Future Trends and ApplicationsData Mining: Future Trends and Applications
Data Mining: Future Trends and Applications
 
Transparency in the Data Supply Chain
Transparency in the Data Supply ChainTransparency in the Data Supply Chain
Transparency in the Data Supply Chain
 
MongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema DesignMongoDB Days UK: Jumpstart: Schema Design
MongoDB Days UK: Jumpstart: Schema Design
 
Webinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance ImplicationsWebinar: Schema Design and Performance Implications
Webinar: Schema Design and Performance Implications
 
Finding statistics2
Finding statistics2Finding statistics2
Finding statistics2
 
Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...Is that a scientific report or just some cool pictures from the lab? Reproduc...
Is that a scientific report or just some cool pictures from the lab? Reproduc...
 
3 classification
3  classification3  classification
3 classification
 
On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...On community-standards, data curation and scholarly communication" Stanford M...
On community-standards, data curation and scholarly communication" Stanford M...
 
Discovery informaticsstanton
Discovery informaticsstantonDiscovery informaticsstanton
Discovery informaticsstanton
 
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent SystemsFrom Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
 
Data mining
Data miningData mining
Data mining
 
Datamining
DataminingDatamining
Datamining
 

Andere mochten auch

ABBYY Capture solutions
ABBYY Capture solutionsABBYY Capture solutions
ABBYY Capture solutionsDavinci
 
Cloud computing: Trends and Challenges
Cloud computing: Trends and ChallengesCloud computing: Trends and Challenges
Cloud computing: Trends and ChallengesBig Data Colombia
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges Experian_US
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunitiesMohammed Guller
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingeXascale Infolab
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune amrutupre
 

Andere mochten auch (7)

Big data
Big dataBig data
Big data
 
ABBYY Capture solutions
ABBYY Capture solutionsABBYY Capture solutions
ABBYY Capture solutions
 
Cloud computing: Trends and Challenges
Cloud computing: Trends and ChallengesCloud computing: Trends and Challenges
Cloud computing: Trends and Challenges
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges
 
Big data trends challenges opportunities
Big data trends challenges opportunitiesBig data trends challenges opportunities
Big data trends challenges opportunities
 
Current Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data BenchmarkingCurrent Trends and Challenges in Big Data Benchmarking
Current Trends and Challenges in Big Data Benchmarking
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
 

Ähnlich wie Introduction to High-performance In-memory Genome Project at HPI

A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisMatthieu Schapranow
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Matthieu Schapranow
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialMatthieu Schapranow
 
In-memory Applications for Oncology
In-memory Applications for OncologyIn-memory Applications for Oncology
In-memory Applications for OncologyMatthieu Schapranow
 
Enabling Real-time Genome Data Research with In-memory Database Technology (S...
Enabling Real-time Genome Data Research with In-memory Database Technology (S...Enabling Real-time Genome Data Research with In-memory Database Technology (S...
Enabling Real-time Genome Data Research with In-memory Database Technology (S...Matthieu Schapranow
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Matthieu Schapranow
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineMatthieu Schapranow
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesMatthieu Schapranow
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...Matthieu Schapranow
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialMatthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Matthieu Schapranow
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchMatthieu Schapranow
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthMatthieu Schapranow
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...Matthieu Schapranow
 
Gaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical DataGaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical DataMatthieu Schapranow
 
Turning Big Data into Precision Medicine
Turning Big Data into Precision MedicineTurning Big Data into Precision Medicine
Turning Big Data into Precision MedicineMatthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
How Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineHow Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineMatthieu Schapranow
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Matthieu Schapranow
 

Ähnlich wie Introduction to High-performance In-memory Genome Project at HPI (20)

A Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data AnalysisA Platform for Integrated Genome Data Analysis
A Platform for Integrated Genome Data Analysis
 
Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?Big Medical Data – Challenge or Potential?
Big Medical Data – Challenge or Potential?
 
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or PotentialProcessing of Big Medical Data in Personalized Medicine: Challenge or Potential
Processing of Big Medical Data in Personalized Medicine: Challenge or Potential
 
In-memory Applications for Oncology
In-memory Applications for OncologyIn-memory Applications for Oncology
In-memory Applications for Oncology
 
Enabling Real-time Genome Data Research with In-memory Database Technology (S...
Enabling Real-time Genome Data Research with In-memory Database Technology (S...Enabling Real-time Genome Data Research with In-memory Database Technology (S...
Enabling Real-time Genome Data Research with In-memory Database Technology (S...
 
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
Enabling Real-Time Genome Data Research with In-Memory Database Technology (I...
 
In-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems MedicineIn-Memory Data Management for Systems Medicine
In-Memory Data Management for Systems Medicine
 
Big Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and ChallengesBig Data in Genomics: Opportunities and Challenges
Big Data in Genomics: Opportunities and Challenges
 
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
A Federated In-Memory Database Computing Platform Enabling Real-Time Analysis...
 
BioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or PotentialBioNRW: Big Medical Data: Challenge or Potential
BioNRW: Big Medical Data: Challenge or Potential
 
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
Festival of Genomics 2016 London: Analyze Genomes: A Federated In-Memory Comp...
 
"When time matters..."
"When time matters...""When time matters..."
"When time matters..."
 
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences ResearchAnalyze Genomes: In-memory Apps for Next-generation Life Sciences Research
Analyze Genomes: In-memory Apps for Next-generation Life Sciences Research
 
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital HealthAnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
AnalyzeGenomes.com: A Federated In-Memory Database Platform for Digital Health
 
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
The Driver of the Healthcare System in the 21st Century: Real-world Applicati...
 
Gaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical DataGaining Time -- Real-time Analysis of Big Medical Data
Gaining Time -- Real-time Analysis of Big Medical Data
 
Turning Big Data into Precision Medicine
Turning Big Data into Precision MedicineTurning Big Data into Precision Medicine
Turning Big Data into Precision Medicine
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
How Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision MedicineHow Real-time Analysis turns Big Medical Data into Precision Medicine
How Real-time Analysis turns Big Medical Data into Precision Medicine
 
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
Gesundheit geht uns alle an: Smart Data ermöglicht passendere Entscheidungen...
 

Mehr von Matthieu Schapranow

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticeMatthieu Schapranow
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?Matthieu Schapranow
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Matthieu Schapranow
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineMatthieu Schapranow
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureMatthieu Schapranow
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineMatthieu Schapranow
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineMatthieu Schapranow
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Matthieu Schapranow
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Matthieu Schapranow
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesMatthieu Schapranow
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Matthieu Schapranow
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Matthieu Schapranow
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Matthieu Schapranow
 
Festival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: AgendaFestival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: AgendaMatthieu Schapranow
 
Analyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisAnalyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisMatthieu Schapranow
 
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesMatthieu Schapranow
 
A Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesA Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesMatthieu Schapranow
 

Mehr von Matthieu Schapranow (20)

Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in PracticePatient Journey in Oncology 2025: Molecular Tumour Boards in Practice
Patient Journey in Oncology 2025: Molecular Tumour Boards in Practice
 
How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?How will AI affect the patient journey of the future?
How will AI affect the patient journey of the future?
 
AI in Oncology
AI in OncologyAI in Oncology
AI in Oncology
 
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
Algorithmen statt Ärzte: Algorithmen statt Ärzte: Ersetzt Big Data künftig ...
 
In-Memory Apps for Precision Medicine
In-Memory Apps for Precision MedicineIn-Memory Apps for Precision Medicine
In-Memory Apps for Precision Medicine
 
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart FailureICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
ICT Platform to Enable Consortium Work for Systems Medicine of Heart Failure
 
Analyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision MedicineAnalyze Genomes Services for Precision Medicine
Analyze Genomes Services for Precision Medicine
 
Analyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision MedicineAnalyze Genomes: In-memory Apps supporting Precision Medicine
Analyze Genomes: In-memory Apps supporting Precision Medicine
 
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
Analyze Genomes: A Federated In-memory Database Computing Platform enabling r...
 
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
Festival of Genomics 2016 London: Mining and Processing of Unstructured Medic...
 
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
Festival of Genomics 2016 London: Analyze Genomes: Modeling and Executing Gen...
 
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world ExamplesFestival of Genomics 2016 London: Analyze Genomes: Real-world Examples
Festival of Genomics 2016 London: Analyze Genomes: Real-world Examples
 
Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?Festival of Genomics 2016 London: Challenges of Big Medical Data?
Festival of Genomics 2016 London: Challenges of Big Medical Data?
 
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
Festival of Genomics 2016 London: Real-time Exploration of the Cancer Genome,...
 
Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?Festival of Genomics 2016 London: What to take home?
Festival of Genomics 2016 London: What to take home?
 
Festival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: AgendaFestival of Genomics 2016 London: Agenda
Festival of Genomics 2016 London: Agenda
 
Analyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response AnalysisAnalyze Genomes: Drug Response Analysis
Analyze Genomes: Drug Response Analysis
 
Big Data in Life Sciences
Big Data in Life SciencesBig Data in Life Sciences
Big Data in Life Sciences
 
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life SciencesAnalyze Genomes: A Federated In-Memory Database System For Life Sciences
Analyze Genomes: A Federated In-Memory Database System For Life Sciences
 
A Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life SciencesA Federated In-Memory Database System for Life Sciences
A Federated In-Memory Database System for Life Sciences
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Kürzlich hochgeladen (20)

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Introduction to High-performance In-memory Genome Project at HPI

  • 1. Challenges of Big Data Processing in Personalized Medicine Trends and Concepts, Potsdam Nov. 28, 2013 Dr. Matthieu Schapranow Hasso Plattner Institute
  • 2. Our Vision Personalized Medicine 2 Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 3. Our Vision Personalized Medicine 3 Desirability ■  Leveraging directed customer services ■  Portfolio of integrated services for clinicians, researchers, and patients ■  Include latest research results, e.g. most effective therapies Viability Feasibility ■  Enable personalized medicine also in faroff regions and developing countries to exceed customer base ■  HiSeq 2500 enables highcoverage whole genome sequencing in ≈1d ■  Share data via the Internet to get feedback from word-wide experts (costsaving) ■  IMDB enables allele frequency determination of 12B records within <1s ■  Combine research data (publications, annotations, genome data) from international databases in a single knowledge base ■  Identification of relevant annotations out of 80M <1s ■  Data preparation as a service reduces TCO Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 4. Our Challenge Data Challenges 4 Human genome/biological data 600GB per full genome 15PB+ in databases of leading institutes Hospital information systems Often more than 50GB Cancer patient records >160k records at NCT Prescription data 1.5B records from 10,000 doctors and 10M Patients (100 GB) Human proteome 160M data points (2.4GB) per sample >3TB raw proteome data in ProteomicsDB PubMed database >23M articles Medical sensor data Scan of a single organ in 1s creates 10GB of raw data Clinical trials Currently more than 30k recruiting on ClinicalTrials.gov Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 5. Our Approach In-Memory Technology ● ● ● ● Read Event Read Event Repositories Repositories Combined column and row store Insert only for time travel + + + ++ A up to 2.000 up to 2.000 requests requests per second per second Minimal projections Any attribute as index Bulk load Discovery Service Discovery Service Multi-core/ parallelization SAP HANA SAP HANA P Active/passive data store Dynamic multithreading within nodes No aggregate tables +++ up to 8.000 read up to 8.000 read event notifications event notifications per second per second On-the-fly extensibility Map reduce P P t A A Lightweight Compression Partitioning SQL Analytics on historical data Single and multi-tenancy Object to relational mapping Group Key SQL interface on columns & rows Reduction of layers x x 5 Verification Verification Services Services T Text Retrieval and Extraction No disk Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 6. High-Performance In-Memory Genome Project Architectural Overview 6 Cohort Analysis Pathway Finder Analytical Tools Paper Search Access Control Clinical Trial Finder App Store Extensions Billing Genome Data Papers Pathways Pipeline ... Genome Metadata Data Pipeline Editor Pipeline Models ... In-Memory Database Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013 ...
  • 7. High-Performance In-Memory Genome Project Selected Research Topics 7 Improving Analyses: ■  Medical Knowledge Cockpit, Oncolyzer ■  Genome Browser enables deep dive into the genome ■  Cohort Analysis, e.g. clustering of patient cohorts ■  Combined Search, e.g. in clinical trials and side-effect databases ■  Pathways Topology Analysis, e.g. to identify cause/effect Improving Data Preparations: ■  Graphical modeling of Genome Data Processing (GDP) pipelines ■  Scheduling and execution of multiple GPD pipelines in parallel ■  App store for medical knowledge (bring algorithms to data) ■  Exchange of sensitive data, e.g. history-based access control ■  Billing processes for intellectual property and services Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 8. High-Performance In-Memory Genome Project Medical Knowledge Cockpit ■  Search for affected genes in distributed and heterogeneous data sources 8 ■  Immediate exploration of relevant information, such as □  Gene descriptions, □  Molecular impact and related pathways, Unified access to structured and unstructured data sources Automatic clinical trial matching using HANA text analysis features □  Scientific publications, and □  Suitable clinical trials. ■  No manual searching for hours or days – In-memory technology translates it into interactive finding Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 9. High-Performance In-Memory Genome Project Genome Browser ■  Genome Browser: Comparison of multiple genomes with reference 9 ■  Combined knowledge base: latest relevant annotations and literature, e.g. NCBI, dbSNP, and UCSC ■  Detailed exploration of genome locations and existing associations Unified access to multiple formerly disjoint data sources Matching of variants with relevant international annotations ■  Ranked variants, e.g. accordingly to known diseases ■  Links to more detailed sources enable fast identification of relevant information while eliminating longlasting searches. Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 10. High-Performance In-Memory Genome Project Analysis of Patient Cohorts ■  In a patient cohort, a subset does not respond to therapy – why? 10 ■  Clustering using various statistical algorithms, such as k-means or hierarchical clustering ■  Calculation of all locus combinations in which at least 5% of all TCGA participants have mutations: 200ms for top 20 combinations Fast clustering directly inside HANA ■  Individual clusters are calculated in parallel directly within the database ■  K-means algorithm: 50ms (PAL) vs. 500ms (R) Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 11. High-Performance In-Memory Genome Project Search in Structured and Unstructured Medical Data ■  Extended text analysis feature by medical terminology 11 □  Genes (122,975 + 186,771 synonyms) □  medical terms and categories (98,886 diseases + 48,561 synonyms, 47 categories) Unified access to structured and unstructured data sources Clinical trial matching using text analysis features □  pharmaceutical ingredients (7,099 + 5,561 synonyms) ■  Imported clinicaltrials.gov database (145k trials/30,138 recruiting) ■  Extracted, e.g., 320k genes, 161k ingredients, 30k periods ■  Select all studies based on multiple filters in <500ms Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 12. High-Performance In-Memory Genome Project Pathway Analysis ■  Search in pathways is limited to “is a certain element contained” today. 12 ■  Integrated >1,5k pathways from international sources, e.g. KEGG, HumanCyc, and WikiPathways, into HANA Unified access to multiple formerly disjoint data sources ■  Implemented graph-based topology exploration and ranking based on patient specifics ■  Enables interactive identification of possible dysfunctions affecting the course of a therapy before its start Pathway analysis of genetic variations with Graph Engine Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 13. Our Vision 13 For researchers: ■  Enable real-time analysis of genome data ■  Automatic scan of pathways to identify cellular impact of mutations ■  Free-text search in publications, diagnosis, and EMR data (structured and unstructured data) For clinicians: ■  Preventive diagnostics to identify risk patients ■  Indicate pharmacokinetic correlations ■  Scan for comparable patient cases For patients: ■  Identify relevant clinical trials / experts ■  Start most appropriate therapy early based on all evidences and latest knowledge Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013
  • 14. Thank you for your interest! Keep in contact with us. 14 Dr. Matthieu-P. Schapranow schapranow@hpi.uni-potsdam.de http://we.analyzegenomes.com/ Hasso Plattner Institute Enterprise Platform & Integration Concepts Dr. Matthieu-P. Schapranow August-Bebel-Str. 88 14482 Potsdam, Germany Challenges of Big Data Processing in Personalized Medicine, Schapranow, Nov. 28, 2013