1. 1
Big Data in Disease Management
Mohamood Adhil
InterpretOmics, Bangalore
9th Cloud Computing and Big Data Analytics, 17th
March 2016
2. 2
Big Data ?
Big data is not only about size
Term “Big Data” is coined when the growth of the
data exponentially increased and the data are
difficult to process with conventional software
tools to extract meaningful information
Big data analytics is used in many fields such as ecommerce, Health care,
Astronomy, Politics, Weather, Media, Research .......
www.interpretomics.co
4. 4
Causal Relationship
Main objective from the data for any
field is to find the causal of an event,
For Example:
What are the causes of the downfall of
company stock values
What are the causes of the crime
happens frequently in particular place
Variables (v) are identified from the
data to find the cause for the effect
which can be used in future to alter
the event
Data
v1 v2 v3 vn
www.interpretomics.co
5. 5
Interesting discovery for the Causal
US-based National Institute of Neurological
Disorders and Stroke in 2007 which found that in
families affected by Parkinson Disease, those
who drank a lot of coffee were less likely to
develop the parkinson disease
In the early 1900s, incidents of lung cancer were
on the rise but no-one really knew why. German
physician Fritz Lickint published a paper in which
he showed that lung cancer patients were
particularly likely to have been smokers
Positive Correlation
http://archneur.jamanetwork.com/article.aspx?articleid=793724
http://www.statisticsviews.com/details/feature/7914611/A-Day-in-the-Life-of-Explanatory-Variables-and-Confoundi
ng-Factors.html
www.interpretomics.co
6. 6
Correlation is not always the causal
In US, Number of people eating ice-cream
positively correlates with number of deaths
caused due to drowning.
Worldwide non-commercial space launches
correlates with sociology doctorates
awarded (US)
Japanese passenger cars sold in the US
correlates with suicides by crashing of
motor vehicle Confounding Factors
http://www.dailymail.co.uk/sciencetech/article-2640550/Does-sour-cream-cause-bike-accidents-No-looks
-like-does-Graphs-reveal-statistics-produce-false-connections.html
www.interpretomics.co
7. 7
Statistics Key Points on Analyzing Big-data
Understand the sample size (data size and sample size are
different)
Visualize data before and after analyzing the data
Select the appropriate statistical model or tool based on the
problem to be addressed
Dont look for patterns, discover patterns from the data
(Exploratory Data Analysis)
Be aware of Confounding factors
www.interpretomics.co
8. 8
Bangalore - Breast Cancer Capital of India
City New
cases
per lakh
Bangalore 36.6
Thiruvananthapuram 35.1
Chennai 32.6
Nagpur 32.5
Delhi 32.2
Some of the proposed reasons with no
proper evidences are Rapid Urbanization,
Late Marriage, declining trend of
Breastfeeding, Contraceptive Pills, Food
Habbits
Are these factors really causal ?
www.interpretomics.co
9. 9
Genomics data (Big Omics Data)
Complete set of DNA (Chromosome)
which includes genic and non-genic
regions is known as Genome
Entire genome contains 3 billion bases
Genome is sequenced (NGS) to identify
the variantions responsible for the
phenotype (Example: Disease)
One sample sequence data will be
approximately 10-20 GB depends on the
type of sequencing
DNA will define you
www.interpretomics.co
13. 13
Seven Dimensions of Genomics Data
Volume
Velocity
Variety
Veracity
Vexing
Variability
Value
General for all big
data
Specific to
Genomics data
www.interpretomics.co
14. 14
Application of Genomics Data
Genomics data plays crucial role from
bench to bed side
Bench - Drug Discovery Process
Bed Side - Genetic Testing for
Precision Medicine
Main difference between bench and
bedside is the number of samples;
Usually bench will have cohort data
(N=n) and bed side will have single
data (N=1)
Bench Bed side
www.interpretomics.co
15. 15
Genetic Testing for Precision Medicine
Some of the popular genetic testing using NGS
technique (Big Omics data) are:
Genetic Predisposition - To know more about the
genetic make up and odds of getting the disease
Disease Diagnostic – This test is to diagnose the
particular disease where it is difficult in case of rare
disorders like psychatric and metabolic disorders
Drug Response prediction (Pharmacogenomics) –
This type of test helps for drug selection based on the
genomic variations
These results are produced using evidence based
technique
Analytical
Engine
Genome-phenome
Databases
10-1000 TB of data
5-10 GB
Genetic Report
With Valid Evidence
www.interpretomics.co
16. 16
Example - Genomics Data for Screening and Diagnosis
Applied genetics diagnostics, Bangalore is the next generation
healthcare company based on bangalore that offers genetic
diagnostic services to hospitals, physicians and healthcare
organization
Interpretomics is the scientific partner providing sequencing and
interpretation to Applied genetics.
Some of the test from AppGenDx includes:
Single Gene Test
Multi Gene Test
Multi Disease Test
OncoScreen
CarrierScreen ....
To Know more: http://www.appgendx.com/
www.interpretomics.co
17. 17
Case Study
Patient: 33 years male with ulcer in buccal mucosa
Doctor Diagnosis: Oral Squamous Cell Carcinoma
Disease Causal Mutation using NGS: CDKN1A gene
c.93C>A; p.Ser32Arg, Heterozygous condition
Disease Reported: Oral Squamous Cell Carcinoma
Case 2
Patient: 2 Years 7 Months age female patient having unsteady
walks and not diagnosed with specific disease
Doctor Diagnosis: -
Disease Causal Mutation using NGS: Mutation in AGRN gene:
c.1072G>T; p.Gly358Trp, Heterozygous
Disease Reported: Myasthenic syndrome, congenital, 8, with pre-
and postsynaptic defects
Case 1
www.interpretomics.co
18. 18
Drug Development
Drug development is the time consuming
where it takes approximately 15 years to
enter into the market
Requires huge amount of money (~1 to
10 billion) for the drug development
On average 1 in 10 drugs from the clinical
development will be approved by FDA
These three hurdles can be overcome by
targeted drugs using big omics data for
improved turn around time, reduced cost,
and increased success rate
Currently 42% of all drugs and 73% of oncology drugs in development are targeted drugs. This market is worth
approximately $42 billion and should be worth over $60 billion by 2019.
(The Journal of Precision Medicine Vol1 Issue 2 Page no 31)
www.interpretomics.co
19. 19
iOMICS – Unified Genomics Software Solution
Multi-omics Multi-scale data management, analysis and interpretation
software.
Developed for composite analysis needs and tested with numerous real
data sets, this robust platform addresses the complexities of Life Sciences
“Big Data” for driving actionable insights with unprecedented ease.
Cloud and On-Premise Version
Intuitive Analysis
Dynamic Visualisation
Support in-house and 3rd
party softwares and databases
www.interpretomics.co
21. 21
iOMICS – Omnia (Knowledge Base)
Curation is based on data and text mining techniques using
manual curation and manual validation pipelines by PhD quality
biologists
Omnia contains 316 disease types for four disease groups:
Neurology, Metabolic, Pediatric and Oncology.
Currently, Omnia contains more than 200,000 variations, 100
Genomic Experiments and 5000 papers are curated for
genome-phenome relationship.
www.interpretomics.co
22. 22
Future Predictions
Computing resources needed to handle genome data will soon
exceed those of Twitter and YouTube
By 2025, between 100 million and 2 billion human genomes
could have been sequenced
Data-storage could run to as much as 2–40 exabytes
Storage is smaller problem compared to computing such as
acquiring, distributing and analysing genomics data may be
even more demanding
http://www.nature.com/news/genome-researchers-raise-alarm-over-big-data-1.17912
www.interpretomics.co