Presentation given by Appistry's Vice President of Product Strategy, Sultan Meghi at the World Genome Data Analysis Summit. Meghi presented about the big data challenges facing labs as they strive to manage the flow of genetic data from sequencer to the clinic.
PAACDA Comprehensive Data Corruption Detection Algorithm.docx
Appistry WGDAS Presentation
1. From Sequencer to Clinic:
Managing Science and Scale
Sultan Meghi, Vice President of Product
Strategy
World Genome Data Analysis Summit
November 28, 2012
2. Challenges Along the Path from Genomics Research to
Personalized Medicine
Implementing technology
Implementing science
Scaling from research to clinic
The Problem Restated…
What’s the most efficient, reliable and robust way to capture
my genetic data, analyze it and secure it for re-analysis and
deeper interpretation in a clinical setting?
Enabling Science at Scale
Platform for big data
Analytics framework for implementing science
Flexible deployment
AGENDA 1
3. Target:
Clinicians
Mega-scale Complex Infrastructure and Patients
data Pipeline costs,
management Development, complexity, leveraging a
and data Test & security and dynamically
analysis. Deployment. compliance.
expanding
Accelerating the Science of Genetic Discovery for
field of
Researchers, Bioinformatics Specialists & Tool science.
Development.
Government 3rd Party
Funding Payers
CUSTOMER NEEDS
4. “We can sequence the genome for dirt
cheap, but we don’t know how to deal
with the data.”
Eric Green M.D.,Ph.D.
Director, NHGRI
“How do we avoid the pitfall of having
cheap human genome sequencing but
complex and expensive manual analysis
to make clinical sense out of the data?”
Elaine Mardis Ph.D.
Director of Technology Development
Source: WSJ, NYT, Genome Medicine THE GENOMICS DATA PROBLEM 3
5. “Big Data” is essentially large amounts of data
Multiple sources or data formats
Unstructured or semi-structured
Difficult to put into databases and analyze
Seen in other industry areas:
Telecom
THE BIG DATA CHALLENGE IN GENOMICS 4
6. “Moving data around and storing the data is painful.
It’s a huge problem for us. We’re looking at the
cloud for processing options.”
- Carol Rohl Ph.D., Director of Merck, Research Labs
STORAGE
“Datasets are so large, you have to analyze them at the
same site where the data is or using mirrors. You do not
want to be writing it onto a remote hard drive and move
the data each time you want to analyze it.”
COMPUTATION - John Monahan, Novartis Institutes for Biomedical Research
“Bioinformatics tools and reference datasets change
monthly, weekly and in some cases daily. This requires
easy to manage application and data management
platforms to keep up to date with all the changes.”
APPLICATIONS
- Sultan Meghji, Appistry, GigaOM 2012
Source: Appistry proprietary market research by CBT Advisors THE BIG DATA CHALLENGE IN GENOMICS 5
10. Capabilities needed
Automated Data
Private Cloud Genomics
Management and Storage
Services (HIPAA
Tightly Coupled to
Compliance)
Analysis
Industry Tools,
Data Sets and
YOUR Science
Massively Analytics Layer Simplifies
Scalable/Reliable Fabric the Build, Test and
for Algorithms, Tools and Deployment of Analytic
Applications Pipelines
APPISTRY’S GENOMIC SOLUTION 9
11. ATCGTA
TCGGCA
CTAATC
GCTCGG
CTATAG
Public Cloud
Data from
Sequencers 2
8 5
1 3 Open-Source
9 Algorithms
4
7 3
10
User
6
Public Gene
For EachRun Data All StorageDataFTP or forRepeat5+Days3-8
Step 1: AccessData Algorithms Databases 9, 10= =steps
8: Open-Source Algorithm Infrastructure Months
7: New Gene Stored Open-Source
6: Reorganize Gene on via Update:
5: Upload algorithms + Sequence1,
3: Send AlgorithmsRepeatfortoInfo 2, Infrastructure
10:
9: Public Gene steps Data
Set: Database Storage
4: Reprogramto Database to Storage Algorithms
2: Download DataData + Sequencer FedEx
Stored From Databases
Source: Appistry survey AYRRIS PRODUCT 10
12. ATCGTA
TCGGCA
CTAATC
GCTCGG SFTP
Transfer HIPAA Compliant
CTATAG
Genomics Cloud
Data from Appistry Private Cloud
Appistry
Sequencers Courier
Over
Annotated Results & Ayrris Pipelines
HTTPS Visualizations Your Science
SNPs, Indels,
Rare Variants, etc
Appistry
Courier
Consumption of Over
Results by internal HTTPS
Bioinformaticians
and Clinicians
Data
Center and
Researcher
CLOUD WORKFLOW 11
13. APPISTRY CLOUD APPISTRY APPLIANCE
INSTITUTION
via INTERNET
Cloud-based genomic data On-site modular turn-key
analysis and storage hardware and software
Subscription to Appistry’s secure, Enterprise-level implementation of
HIPAA compliant cloud storage private network HIPAA-enabled
storage
Same access to pipeline analysis algorithms & annotations (Same Science)
Same underlying technology and efficiency
BUSINESS MODEL 12
14. ATCGTA
TCGGCA
CTAATC Regulatory
GCTCGG Compliant
CTATAG
Genomics Cloud
Appistry Private Cloud
Data from
Sequencers Annotated Results & Ayrris Pipelines
Visualizations Your Science
Data from other
instruments
Integrated with
Integrated with
Secured, Integrated Workflows, Research Data
Medical Data –
Data Management and Analysis Systems (Genomics,
EMR, Biller/Payer
Pharma)
CLOUD WORKFLOW 13
16. Genomic Information
Decisions for prevention
or early treatment
Breast cancer
Osteoporosis
Lung cancer
Heart disease
Autism
Leukemia
ADHD
Genetic disorders
15
17. Thanks for Your Attention
main: 314.450.5720
fax: 314.450.5722
sultan@appistry.com
appistry.com
1141 South 7th St., Suite 300
St. Louis, MO 63141
Hinweis der Redaktion
Voice over points:Next Gen Sequencing (NGS) has provided a powerful new tool for the investigation of genomic informationThe growing number of sequencers on the market are generating a huge demand for tools for data analysisThere is a lack of fast,affordable, easy-to-use, and comprehensive bioinformatics tools in the marketAnother quote: (backup)“Data handling is now the bottleneck. It costs more to analyze a genome than to sequence a genome.” - David Haussler, Director of the Center for Biomolecular Science & Engineering at the University of California, Santa Cruz, in NYT
Seen years ago in finance, logistics, geospatial & defense areas5 universal issues when dealing with “big data”StorageComputationNetwork bandwidth (movement of the data)Operational complexity Complex programming tasks