SlideShare ist ein Scribd-Unternehmen logo
1 von 61
Downloaden Sie, um offline zu lesen
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

Pattern Recognition In Clinical Data
Saket Choudhary
Dual Degree Project
Guide: Prof. Santosh Noronha

C G C A T C G A G C T
C G C G T C G A G C T

October 30, 2013

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

I NTRODUCTION
I NTRODUCTION
Objective
S IGNIFICANT M UTATIONS
Next Generation Sequencing
Motivation
Computational Methods for Driver Detection
V IRAL G ENOME D ETECION
Workflow
R EPRODUCIBILITY
Reproducibility
C ONCLUSIONS
Wrapping up

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

O BJECTIVE
BWA
v/s
BWAPSSM

Galaxy
Workflow

Viral
Genome
Integration

Benchmarking
Alignment
tools

Tools
for
Biologists
Reproducible
analysis
Driver
& Passenger
Mutation
Detection

Galaxy
Tools

Better/New
Algorithms

Next Generation
Sequencing
&
Cancer Research

Driver
& Passenger
Mutation
Detection

Lit.
Survey

Ensembl
Method
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

N EXT G ENERATION S EQUENCING

C G C G T C G A G C T A G C A

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

N EXT G ENERATION S EQUENCING

C G C G T C G A G C T A G C A

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

N EXT G ENERATION S EQUENCING

C G C G T C G A G C T A G C A

G C G T∗

C G A G∗

C T A G∗

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

N EXT G ENERATION S EQUENCING
C G C G T C G A G C T A G C A

G C G T∗

C G A G∗

C T A G∗

A G C G C G T C G A G C T A G C A C A

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

NGS: W HY ?

Molecular Approach: Study of variations at the ’base’
level
Low Cost: 1000$ genome
Faster: Quicker than traditional sequencing techniques

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

NGS: W HY ?

Molecular Approach: Study of variations at the ’base’
level
Low Cost: 1000$ genome
Faster: Quicker than traditional sequencing techniques

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

NGS: W HY ?

Molecular Approach: Study of variations at the ’base’
level
Low Cost: 1000$ genome
Faster: Quicker than traditional sequencing techniques

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

NGS: W HERE ?

Study variations, genotype-phenotype association
Look for ’markers of diseases’
Prognosis

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

NGS: W HERE ?

Study variations, genotype-phenotype association
Look for ’markers of diseases’
Prognosis

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

NGS: W HERE ?

Study variations, genotype-phenotype association
Look for ’markers of diseases’
Prognosis

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

NGS: M UTATIONS

3x109 base pairs
We are all 99.9% similar, at DNA level
More than 2 million SNPs
No particular pattern of SNPs
If a certain mutation causes a change in an amino acid, it is
referred to as non synonymous(nsSNV)
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

D RIVERS AND PASSENGERS I
Cancer is known to arise due to mutations
Not all mutations are equally important!

Somatic Mutations
Set of mutations acquired after zygote formation, over and
above the germline mutations

Driver Mutations
Mutations that confer growth advantages to the cell, being
selected positively in the tumor tissue

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

D RIVERS AND PASSENGERS

Drivers are NOT simply loss of function mutations, but more
than that:
Loss of function: Inactivate tumor suppressor proteins
Gain of function: Activates normal genes transforming
them to oncogenes
Drug Resistance Mutations: Mutations that have evolved
to overcome the inhibitory effect of drugs
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

D RIVERS AND PASSENGERS

Drivers are NOT simply loss of function mutations, but more
than that:
Loss of function: Inactivate tumor suppressor proteins
Gain of function: Activates normal genes transforming
them to oncogenes
Drug Resistance Mutations: Mutations that have evolved
to overcome the inhibitory effect of drugs
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

D RIVERS AND PASSENGERS

Drivers are NOT simply loss of function mutations, but more
than that:
Loss of function: Inactivate tumor suppressor proteins
Gain of function: Activates normal genes transforming
them to oncogenes
Drug Resistance Mutations: Mutations that have evolved
to overcome the inhibitory effect of drugs
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

D RIVER M UTATIONS : W HY ?
Identify driver mutations −→ better therapeutic targets
But how does one zero down upon the exact set? −→
experiments are too costly, probably infeasible for 2 million+
SNPs −→ Leverage computational analysis
Low cost of NGS comes with a heavier roadblock of data
analysis
Searching among 2 million+ SNPs is a non-trivial, and a
computationally intensive problem
Softwares have a low consensus ratio amongst them selves
←→ Defining a driver, computationally is non-trivial
However there is no tool that allows one to visualise the
results on an input across the cohort of tools
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

D RIVER M UTATIONS : W HY ?
Identify driver mutations −→ better therapeutic targets
But how does one zero down upon the exact set? −→
experiments are too costly, probably infeasible for 2 million+
SNPs −→ Leverage computational analysis
Low cost of NGS comes with a heavier roadblock of data
analysis
Searching among 2 million+ SNPs is a non-trivial, and a
computationally intensive problem
Softwares have a low consensus ratio amongst them selves
←→ Defining a driver, computationally is non-trivial
However there is no tool that allows one to visualise the
results on an input across the cohort of tools
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

D RIVER M UTATIONS : W HY ?
Identify driver mutations −→ better therapeutic targets
But how does one zero down upon the exact set? −→
experiments are too costly, probably infeasible for 2 million+
SNPs −→ Leverage computational analysis
Low cost of NGS comes with a heavier roadblock of data
analysis
Searching among 2 million+ SNPs is a non-trivial, and a
computationally intensive problem
Softwares have a low consensus ratio amongst them selves
←→ Defining a driver, computationally is non-trivial
However there is no tool that allows one to visualise the
results on an input across the cohort of tools
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

D RIVER M UTATIONS : W HY ?
Identify driver mutations −→ better therapeutic targets
But how does one zero down upon the exact set? −→
experiments are too costly, probably infeasible for 2 million+
SNPs −→ Leverage computational analysis
Low cost of NGS comes with a heavier roadblock of data
analysis
Searching among 2 million+ SNPs is a non-trivial, and a
computationally intensive problem
Softwares have a low consensus ratio amongst them selves
←→ Defining a driver, computationally is non-trivial
However there is no tool that allows one to visualise the
results on an input across the cohort of tools
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

M ACHINE L EARNING I
Two datasets:
Training: Labeled dataset, containing a table of features
with mutations labelled as ”drivers/passengers”
Test: ’Learning’ from training dataset, test the prediction
model
Table: Training Dataset

Chromosome
1
1
2
.
.
.

Position
27822
27832
47842
.
.
.

Ref
A
T
G
.
.
.

Alt
G
G
C
.
.
.

Type
Driver
Driver
Passenger
.
.
.
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

M ACHINE L EARNING II

Table: Test Dataset

Chromosome
1
1

Position
27824
47832

Ref
A
T

Alt
G
G

Type
?
?

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

M ACHINE L EARNING : F EATURE S ELECTION I
Machine Learning relies on a set of features for training
Redundant features should be avoided
CHASM [1] makes use of
p(Xi ) represents the probability of occurrence of an event Xi
Considering a series of events X1 , X2 , X3 ...,Xn analogous ’series
of packets’ in communication theory , the information received
at each step can be quantified on a log scale by:
1
= −log2 (p(Xi ))
log2 (Xi )

(1)

The expected value of information from a series of events is
called shannon entropy: H(X):
H(X) = −

p(Xi ) log2 p(Xi )
i

(2)
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

M ACHINE L EARNING : F EATURE S ELECTION II
Mutual Information between two random variables X, Y is
defined as the amount of information gained about random
variable X due to additional information gained from the
second, Y:
I(X, Y) = H(X) − H(X|Y)

(3)

Here:
X: Class Label[Driver/Passenger]
Y: Predictive Feature
and hence I(X, Y) represents how much information was
gained about the class label Y from knowledge of a feature X
Simplifying :
I(X, Y) =

p(x, y)log2

p(x, y)
p(x)p(y)

(4)
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

F UNCTIONAL I MPACT I

If a certain mutation confers an advantage to the cell in
terms of replication rate, it is probably going to be selected
while all those mutations that reduce its fitness have a
higher chance of being eliminated from the population.
Certain residues in a MSA of homologous sequences are
more conserved than others. A highly conserved if
mutated is possibly going to cost a lot since what had
’evolved’ is disturbed!
Scores can be assigned based on this ”conservation”
parameter.
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

F UNCTIONAL I MPACT II
Figure: SIFT [?] algorithm

R EPRODUCIBILITY

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

Some of the common tools/algorithms used for driver
mutation prediction:
SIFT
Polyphen
Mutation Assesor
TransFIC
Condel

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

F RAMEWORK FOR COMPARING VARIOUS TOOLS I
Different tools use different formats, give different outputs
for similar input
Running analysis on multiple tools −→ keep shifting data
formats
Concordance?

Polyphen2 Input
chr1:888659 T/C
chr1:1120431 G/A
chr1:1387764 G/A
chr1:1421991 G/A
chr1:1599812 C/T
chr1:1888193 C/A
chr1:1900186 T/C
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

F RAMEWORK FOR COMPARING VARIOUS TOOLS II

SIFT Input
1,888659,T,C
1,1120431,G,A
1,1387764,G,A
1,1421991,G,A
1,1599812,C,T
1,1888193,C,A
1,1900186,T,C
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

D RIVER M UTATIONS : T OOLS DON ’ T AGREE

X Axis: Condel Score Y Axis: MA Score

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

Solution?:
Galaxy[?], an open source web-based platform for
bioinformatics, makes it possible to represent the entire data
analysis pipeline in an intuitive graphical interface
Figure: Galaxy Workflow polyphen2 algorithm
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

Run all tools in one go:
Figure: Run all tools

R EPRODUCIBILITY

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

Compare all tools:
Figure: Compare all tools

R EPRODUCIBILITY

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

V IRAL G ENOME D ETECION
Cervical cancers have been proven to be associated with
Human Papillomavirus(HPV)
Cervical cancer datasets from Indian women was put through
an analysis to detect :
1. Any possible HPV integration
2. Sites of HPV integration
Who Cares?
Replacing whole genome sequencing, by targeted
sequencing at the sites where these virus have been
detected in a cohort of samples, thus speeding up the
whole process.
I NTRODUCTION

S IGNIFICANT M UTATIONS

RawData

V IRAL G ENOME D ETECION

Align data
to human
genome
reference

R EPRODUCIBILITY

Extract
unmapped
regions

Align
unmapped
regions
to Virus
genome

Extract
mapped
BLAST
regions

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

RawData

V IRAL G ENOME D ETECION

Align data
to human
genome
reference

R EPRODUCIBILITY

Extract
unmapped
regions

Align
unmapped
regions
to Virus
genome

Extract
mapped
BLAST
regions

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

RawData

V IRAL G ENOME D ETECION

Align data
to human
genome
reference

R EPRODUCIBILITY

Extract
unmapped
regions

Align
unmapped
regions
to Virus
genome

Extract
mapped
BLAST
regions

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

RawData

V IRAL G ENOME D ETECION

Align data
to human
genome
reference

R EPRODUCIBILITY

Extract
unmapped
regions

Align
unmapped
regions
to Virus
genome

Extract
mapped
BLAST
regions

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

RawData

V IRAL G ENOME D ETECION

Align data
to human
genome
reference

R EPRODUCIBILITY

Extract
unmapped
regions

Align
unmapped
regions
to Virus
genome

Extract
mapped
BLAST
regions

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

RawData

V IRAL G ENOME D ETECION

Align data
to human
genome
reference

R EPRODUCIBILITY

Extract
unmapped
regions

Align
unmapped
regions
to Virus
genome

Extract
mapped
BLAST
regions

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

RawData

V IRAL G ENOME D ETECION

Align data
to human
genome
reference

R EPRODUCIBILITY

Extract
unmapped
regions

Align
unmapped
regions
to Virus
genome

Extract
mapped
BLAST
regions

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

G ALAXY W ORKFLOW

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

Figure: Aligned HPV genomes

C ONCLUSIONS
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

R EPRODUCIBILITY

In pursuit of novel ’discovery’, standardizing the data
analysis pipeline is often ignored, leading to dubious
conclusions
Analysis should be reproducible and above all, correct
Parameter’s values can change the results by a big factor,
they need to be documented/logged
Garbage in, Garbage out
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

C ONCLUSIONS
With the Galaxy tool box for identification of significant
mutations and the study of the science behind the methods, the
next steps would be to:
Open source the toolbox to the community: A tool makes
little sense if it is not in a usable form, community
feedback will be used to add more tools and improve the
existing ones
A new method for driver mutation prediction: all the
methods have low level of concordance. A new method
that takes into account the available data at all levels :
mutations, transcriptome and micro array data is possible.
With the Galaxy toolbox in place, it would be possible to
integrate information at various levels
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

F UTURE W ORK

Develop an algorithm that integrates machine learning
approach with functional approach by zeroing down upon
only those attributes that are known to have an impact
The algorithm would also account for information at other
levels: RNA expressions, Clinical data.
Integrating information at all levels would provide a
deeper insight
The developed Galaxy toolbox will be used a the basic
framework for integrating information
I NTRODUCTION

S IGNIFICANT M UTATIONS

V IRAL G ENOME D ETECION

R EPRODUCIBILITY

C ONCLUSIONS

R EFERENCES I

Hannah Carter, Sining Chen, Leyla Isik, Svitlana
Tyekucheva, Victor E Velculescu, Kenneth W Kinzler, Bert
Vogelstein, and Rachel Karchin.
Cancer-specific high-throughput annotation of somatic
mutations: computational prediction of driver missense
mutations.
Cancer research, 69(16):6660–6667, 2009.

Weitere ähnliche Inhalte

Andere mochten auch (7)

gsoc2012demo
gsoc2012demogsoc2012demo
gsoc2012demo
 
testppt
testppttestppt
testppt
 
testppt
testppttestppt
testppt
 
testppt
testppttestppt
testppt
 
ISG-Presentacion
ISG-PresentacionISG-Presentacion
ISG-Presentacion
 
ISG-Presentacion
ISG-PresentacionISG-Presentacion
ISG-Presentacion
 
Global_Health_and_Intersectoral_Collaboration
Global_Health_and_Intersectoral_CollaborationGlobal_Health_and_Intersectoral_Collaboration
Global_Health_and_Intersectoral_Collaboration
 

Ähnlich wie Pattern Recognition in Clinical Data

Genomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyGenomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyDan Gaston
 
Cancer Genomics Visualization across Scales: Nucleotides to Cohorts
Cancer Genomics Visualization across Scales: Nucleotides to CohortsCancer Genomics Visualization across Scales: Nucleotides to Cohorts
Cancer Genomics Visualization across Scales: Nucleotides to CohortsNils Gehlenborg
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Candy Smellie
 
Identification of cancer drivers across tumor types
Identification of cancer drivers across tumor typesIdentification of cancer drivers across tumor types
Identification of cancer drivers across tumor typesNuria Lopez-Bigas
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
CDAC 2018 Cantor liquid biopsies
CDAC 2018 Cantor liquid biopsiesCDAC 2018 Cantor liquid biopsies
CDAC 2018 Cantor liquid biopsiesMarco Antoniotti
 
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinic
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinicGenomics 2015 Keynote - Utilizing cancer sequencing in the clinic
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinicAndreas Scherer
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSLubna MRL
 
Genome editing comes of age
Genome editing comes of ageGenome editing comes of age
Genome editing comes of ageJan Hryca
 
Web cast cancer gene panels march 11 2015
Web cast cancer gene panels march 11 2015Web cast cancer gene panels march 11 2015
Web cast cancer gene panels march 11 2015Andreas Scherer
 
Cancer Gene Panels
Cancer Gene PanelsCancer Gene Panels
Cancer Gene PanelsGolden Helix
 
CSH SC 2015 - PAGODA talk
CSH SC 2015 - PAGODA talkCSH SC 2015 - PAGODA talk
CSH SC 2015 - PAGODA talkJean Fan
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationNils Gehlenborg
 
Genome Editing Comes of Age
Genome Editing Comes of AgeGenome Editing Comes of Age
Genome Editing Comes of AgeCandy Smellie
 
DDP Stage 2 Presentation
DDP Stage 2 PresentationDDP Stage 2 Presentation
DDP Stage 2 PresentationSaket Choudhary
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopNuria Lopez-Bigas
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsIncedo
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingStephen Turner
 

Ähnlich wie Pattern Recognition in Clinical Data (20)

Genomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and PathologyGenomics, Bioinformatics, and Pathology
Genomics, Bioinformatics, and Pathology
 
Cancer Genomics Visualization across Scales: Nucleotides to Cohorts
Cancer Genomics Visualization across Scales: Nucleotides to CohortsCancer Genomics Visualization across Scales: Nucleotides to Cohorts
Cancer Genomics Visualization across Scales: Nucleotides to Cohorts
 
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
Genome Editing Comes of Age; CRISPR, rAAV and the new landscape of molecular ...
 
Identification of cancer drivers across tumor types
Identification of cancer drivers across tumor typesIdentification of cancer drivers across tumor types
Identification of cancer drivers across tumor types
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
CDAC 2018 Cantor liquid biopsies
CDAC 2018 Cantor liquid biopsiesCDAC 2018 Cantor liquid biopsies
CDAC 2018 Cantor liquid biopsies
 
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinic
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinicGenomics 2015 Keynote - Utilizing cancer sequencing in the clinic
Genomics 2015 Keynote - Utilizing cancer sequencing in the clinic
 
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICSPROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
PROKARYOTIC TRANSCRIPTOMICS AND METAGENOMICS
 
Genome editing comes of age
Genome editing comes of ageGenome editing comes of age
Genome editing comes of age
 
Web cast cancer gene panels march 11 2015
Web cast cancer gene panels march 11 2015Web cast cancer gene panels march 11 2015
Web cast cancer gene panels march 11 2015
 
Cancer Gene Panels
Cancer Gene PanelsCancer Gene Panels
Cancer Gene Panels
 
CSH SC 2015 - PAGODA talk
CSH SC 2015 - PAGODA talkCSH SC 2015 - PAGODA talk
CSH SC 2015 - PAGODA talk
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
Genome Editing Comes of Age
Genome Editing Comes of AgeGenome Editing Comes of Age
Genome Editing Comes of Age
 
DDP Stage 2 Presentation
DDP Stage 2 PresentationDDP Stage 2 Presentation
DDP Stage 2 Presentation
 
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics WorkshopLopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
Lopez-Bigas talk at the EBI/EMBL Cancer Genomics Workshop
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
Oncogenomics 2013
Oncogenomics 2013Oncogenomics 2013
Oncogenomics 2013
 
Bio conference live 2013
Bio conference live 2013Bio conference live 2013
Bio conference live 2013
 

Mehr von Saket Choudhary (20)

testppt
testppttestppt
testppt
 
testppt
testppttestppt
testppt
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Testslideshare1
Testslideshare1Testslideshare1
Testslideshare1
 
Testslideshare1
Testslideshare1Testslideshare1
Testslideshare1
 
Testslideshare1
Testslideshare1Testslideshare1
Testslideshare1
 
Testslideshare1
Testslideshare1Testslideshare1
Testslideshare1
 
Testslideshare1
Testslideshare1Testslideshare1
Testslideshare1
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 
Training_Authoring
Training_AuthoringTraining_Authoring
Training_Authoring
 

Kürzlich hochgeladen

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 

Kürzlich hochgeladen (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 

Pattern Recognition in Clinical Data

  • 1. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY Pattern Recognition In Clinical Data Saket Choudhary Dual Degree Project Guide: Prof. Santosh Noronha C G C A T C G A G C T C G C G T C G A G C T October 30, 2013 C ONCLUSIONS
  • 2. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY I NTRODUCTION I NTRODUCTION Objective S IGNIFICANT M UTATIONS Next Generation Sequencing Motivation Computational Methods for Driver Detection V IRAL G ENOME D ETECION Workflow R EPRODUCIBILITY Reproducibility C ONCLUSIONS Wrapping up C ONCLUSIONS
  • 3. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 4. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 5. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 6. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 7. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 8. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 9. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 10. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 11. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 12. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 13. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 14. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 15. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS O BJECTIVE BWA v/s BWAPSSM Galaxy Workflow Viral Genome Integration Benchmarking Alignment tools Tools for Biologists Reproducible analysis Driver & Passenger Mutation Detection Galaxy Tools Better/New Algorithms Next Generation Sequencing & Cancer Research Driver & Passenger Mutation Detection Lit. Survey Ensembl Method
  • 16. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY N EXT G ENERATION S EQUENCING C G C G T C G A G C T A G C A C ONCLUSIONS
  • 17. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY N EXT G ENERATION S EQUENCING C G C G T C G A G C T A G C A C ONCLUSIONS
  • 18. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY N EXT G ENERATION S EQUENCING C G C G T C G A G C T A G C A G C G T∗ C G A G∗ C T A G∗ C ONCLUSIONS
  • 19. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY N EXT G ENERATION S EQUENCING C G C G T C G A G C T A G C A G C G T∗ C G A G∗ C T A G∗ A G C G C G T C G A G C T A G C A C A C ONCLUSIONS
  • 20. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY NGS: W HY ? Molecular Approach: Study of variations at the ’base’ level Low Cost: 1000$ genome Faster: Quicker than traditional sequencing techniques C ONCLUSIONS
  • 21. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY NGS: W HY ? Molecular Approach: Study of variations at the ’base’ level Low Cost: 1000$ genome Faster: Quicker than traditional sequencing techniques C ONCLUSIONS
  • 22. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY NGS: W HY ? Molecular Approach: Study of variations at the ’base’ level Low Cost: 1000$ genome Faster: Quicker than traditional sequencing techniques C ONCLUSIONS
  • 23. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY NGS: W HERE ? Study variations, genotype-phenotype association Look for ’markers of diseases’ Prognosis C ONCLUSIONS
  • 24. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY NGS: W HERE ? Study variations, genotype-phenotype association Look for ’markers of diseases’ Prognosis C ONCLUSIONS
  • 25. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY NGS: W HERE ? Study variations, genotype-phenotype association Look for ’markers of diseases’ Prognosis C ONCLUSIONS
  • 26. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS NGS: M UTATIONS 3x109 base pairs We are all 99.9% similar, at DNA level More than 2 million SNPs No particular pattern of SNPs If a certain mutation causes a change in an amino acid, it is referred to as non synonymous(nsSNV)
  • 27. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY D RIVERS AND PASSENGERS I Cancer is known to arise due to mutations Not all mutations are equally important! Somatic Mutations Set of mutations acquired after zygote formation, over and above the germline mutations Driver Mutations Mutations that confer growth advantages to the cell, being selected positively in the tumor tissue C ONCLUSIONS
  • 28. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS D RIVERS AND PASSENGERS Drivers are NOT simply loss of function mutations, but more than that: Loss of function: Inactivate tumor suppressor proteins Gain of function: Activates normal genes transforming them to oncogenes Drug Resistance Mutations: Mutations that have evolved to overcome the inhibitory effect of drugs
  • 29. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS D RIVERS AND PASSENGERS Drivers are NOT simply loss of function mutations, but more than that: Loss of function: Inactivate tumor suppressor proteins Gain of function: Activates normal genes transforming them to oncogenes Drug Resistance Mutations: Mutations that have evolved to overcome the inhibitory effect of drugs
  • 30. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS D RIVERS AND PASSENGERS Drivers are NOT simply loss of function mutations, but more than that: Loss of function: Inactivate tumor suppressor proteins Gain of function: Activates normal genes transforming them to oncogenes Drug Resistance Mutations: Mutations that have evolved to overcome the inhibitory effect of drugs
  • 31. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS D RIVER M UTATIONS : W HY ? Identify driver mutations −→ better therapeutic targets But how does one zero down upon the exact set? −→ experiments are too costly, probably infeasible for 2 million+ SNPs −→ Leverage computational analysis Low cost of NGS comes with a heavier roadblock of data analysis Searching among 2 million+ SNPs is a non-trivial, and a computationally intensive problem Softwares have a low consensus ratio amongst them selves ←→ Defining a driver, computationally is non-trivial However there is no tool that allows one to visualise the results on an input across the cohort of tools
  • 32. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS D RIVER M UTATIONS : W HY ? Identify driver mutations −→ better therapeutic targets But how does one zero down upon the exact set? −→ experiments are too costly, probably infeasible for 2 million+ SNPs −→ Leverage computational analysis Low cost of NGS comes with a heavier roadblock of data analysis Searching among 2 million+ SNPs is a non-trivial, and a computationally intensive problem Softwares have a low consensus ratio amongst them selves ←→ Defining a driver, computationally is non-trivial However there is no tool that allows one to visualise the results on an input across the cohort of tools
  • 33. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS D RIVER M UTATIONS : W HY ? Identify driver mutations −→ better therapeutic targets But how does one zero down upon the exact set? −→ experiments are too costly, probably infeasible for 2 million+ SNPs −→ Leverage computational analysis Low cost of NGS comes with a heavier roadblock of data analysis Searching among 2 million+ SNPs is a non-trivial, and a computationally intensive problem Softwares have a low consensus ratio amongst them selves ←→ Defining a driver, computationally is non-trivial However there is no tool that allows one to visualise the results on an input across the cohort of tools
  • 34. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS D RIVER M UTATIONS : W HY ? Identify driver mutations −→ better therapeutic targets But how does one zero down upon the exact set? −→ experiments are too costly, probably infeasible for 2 million+ SNPs −→ Leverage computational analysis Low cost of NGS comes with a heavier roadblock of data analysis Searching among 2 million+ SNPs is a non-trivial, and a computationally intensive problem Softwares have a low consensus ratio amongst them selves ←→ Defining a driver, computationally is non-trivial However there is no tool that allows one to visualise the results on an input across the cohort of tools
  • 35. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS M ACHINE L EARNING I Two datasets: Training: Labeled dataset, containing a table of features with mutations labelled as ”drivers/passengers” Test: ’Learning’ from training dataset, test the prediction model Table: Training Dataset Chromosome 1 1 2 . . . Position 27822 27832 47842 . . . Ref A T G . . . Alt G G C . . . Type Driver Driver Passenger . . .
  • 36. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY M ACHINE L EARNING II Table: Test Dataset Chromosome 1 1 Position 27824 47832 Ref A T Alt G G Type ? ? C ONCLUSIONS
  • 37. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS M ACHINE L EARNING : F EATURE S ELECTION I Machine Learning relies on a set of features for training Redundant features should be avoided CHASM [1] makes use of p(Xi ) represents the probability of occurrence of an event Xi Considering a series of events X1 , X2 , X3 ...,Xn analogous ’series of packets’ in communication theory , the information received at each step can be quantified on a log scale by: 1 = −log2 (p(Xi )) log2 (Xi ) (1) The expected value of information from a series of events is called shannon entropy: H(X): H(X) = − p(Xi ) log2 p(Xi ) i (2)
  • 38. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS M ACHINE L EARNING : F EATURE S ELECTION II Mutual Information between two random variables X, Y is defined as the amount of information gained about random variable X due to additional information gained from the second, Y: I(X, Y) = H(X) − H(X|Y) (3) Here: X: Class Label[Driver/Passenger] Y: Predictive Feature and hence I(X, Y) represents how much information was gained about the class label Y from knowledge of a feature X Simplifying : I(X, Y) = p(x, y)log2 p(x, y) p(x)p(y) (4)
  • 39. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS F UNCTIONAL I MPACT I If a certain mutation confers an advantage to the cell in terms of replication rate, it is probably going to be selected while all those mutations that reduce its fitness have a higher chance of being eliminated from the population. Certain residues in a MSA of homologous sequences are more conserved than others. A highly conserved if mutated is possibly going to cost a lot since what had ’evolved’ is disturbed! Scores can be assigned based on this ”conservation” parameter.
  • 40. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION F UNCTIONAL I MPACT II Figure: SIFT [?] algorithm R EPRODUCIBILITY C ONCLUSIONS
  • 41. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY Some of the common tools/algorithms used for driver mutation prediction: SIFT Polyphen Mutation Assesor TransFIC Condel C ONCLUSIONS
  • 42. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS F RAMEWORK FOR COMPARING VARIOUS TOOLS I Different tools use different formats, give different outputs for similar input Running analysis on multiple tools −→ keep shifting data formats Concordance? Polyphen2 Input chr1:888659 T/C chr1:1120431 G/A chr1:1387764 G/A chr1:1421991 G/A chr1:1599812 C/T chr1:1888193 C/A chr1:1900186 T/C
  • 43. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS F RAMEWORK FOR COMPARING VARIOUS TOOLS II SIFT Input 1,888659,T,C 1,1120431,G,A 1,1387764,G,A 1,1421991,G,A 1,1599812,C,T 1,1888193,C,A 1,1900186,T,C
  • 44. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY D RIVER M UTATIONS : T OOLS DON ’ T AGREE X Axis: Condel Score Y Axis: MA Score C ONCLUSIONS
  • 45. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS Solution?: Galaxy[?], an open source web-based platform for bioinformatics, makes it possible to represent the entire data analysis pipeline in an intuitive graphical interface Figure: Galaxy Workflow polyphen2 algorithm
  • 46. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION Run all tools in one go: Figure: Run all tools R EPRODUCIBILITY C ONCLUSIONS
  • 47. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION Compare all tools: Figure: Compare all tools R EPRODUCIBILITY C ONCLUSIONS
  • 48. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS V IRAL G ENOME D ETECION Cervical cancers have been proven to be associated with Human Papillomavirus(HPV) Cervical cancer datasets from Indian women was put through an analysis to detect : 1. Any possible HPV integration 2. Sites of HPV integration Who Cares? Replacing whole genome sequencing, by targeted sequencing at the sites where these virus have been detected in a cohort of samples, thus speeding up the whole process.
  • 49. I NTRODUCTION S IGNIFICANT M UTATIONS RawData V IRAL G ENOME D ETECION Align data to human genome reference R EPRODUCIBILITY Extract unmapped regions Align unmapped regions to Virus genome Extract mapped BLAST regions C ONCLUSIONS
  • 50. I NTRODUCTION S IGNIFICANT M UTATIONS RawData V IRAL G ENOME D ETECION Align data to human genome reference R EPRODUCIBILITY Extract unmapped regions Align unmapped regions to Virus genome Extract mapped BLAST regions C ONCLUSIONS
  • 51. I NTRODUCTION S IGNIFICANT M UTATIONS RawData V IRAL G ENOME D ETECION Align data to human genome reference R EPRODUCIBILITY Extract unmapped regions Align unmapped regions to Virus genome Extract mapped BLAST regions C ONCLUSIONS
  • 52. I NTRODUCTION S IGNIFICANT M UTATIONS RawData V IRAL G ENOME D ETECION Align data to human genome reference R EPRODUCIBILITY Extract unmapped regions Align unmapped regions to Virus genome Extract mapped BLAST regions C ONCLUSIONS
  • 53. I NTRODUCTION S IGNIFICANT M UTATIONS RawData V IRAL G ENOME D ETECION Align data to human genome reference R EPRODUCIBILITY Extract unmapped regions Align unmapped regions to Virus genome Extract mapped BLAST regions C ONCLUSIONS
  • 54. I NTRODUCTION S IGNIFICANT M UTATIONS RawData V IRAL G ENOME D ETECION Align data to human genome reference R EPRODUCIBILITY Extract unmapped regions Align unmapped regions to Virus genome Extract mapped BLAST regions C ONCLUSIONS
  • 55. I NTRODUCTION S IGNIFICANT M UTATIONS RawData V IRAL G ENOME D ETECION Align data to human genome reference R EPRODUCIBILITY Extract unmapped regions Align unmapped regions to Virus genome Extract mapped BLAST regions C ONCLUSIONS
  • 56. I NTRODUCTION S IGNIFICANT M UTATIONS G ALAXY W ORKFLOW V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS
  • 57. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY Figure: Aligned HPV genomes C ONCLUSIONS
  • 58. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS R EPRODUCIBILITY In pursuit of novel ’discovery’, standardizing the data analysis pipeline is often ignored, leading to dubious conclusions Analysis should be reproducible and above all, correct Parameter’s values can change the results by a big factor, they need to be documented/logged Garbage in, Garbage out
  • 59. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS C ONCLUSIONS With the Galaxy tool box for identification of significant mutations and the study of the science behind the methods, the next steps would be to: Open source the toolbox to the community: A tool makes little sense if it is not in a usable form, community feedback will be used to add more tools and improve the existing ones A new method for driver mutation prediction: all the methods have low level of concordance. A new method that takes into account the available data at all levels : mutations, transcriptome and micro array data is possible. With the Galaxy toolbox in place, it would be possible to integrate information at various levels
  • 60. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS F UTURE W ORK Develop an algorithm that integrates machine learning approach with functional approach by zeroing down upon only those attributes that are known to have an impact The algorithm would also account for information at other levels: RNA expressions, Clinical data. Integrating information at all levels would provide a deeper insight The developed Galaxy toolbox will be used a the basic framework for integrating information
  • 61. I NTRODUCTION S IGNIFICANT M UTATIONS V IRAL G ENOME D ETECION R EPRODUCIBILITY C ONCLUSIONS R EFERENCES I Hannah Carter, Sining Chen, Leyla Isik, Svitlana Tyekucheva, Victor E Velculescu, Kenneth W Kinzler, Bert Vogelstein, and Rachel Karchin. Cancer-specific high-throughput annotation of somatic mutations: computational prediction of driver missense mutations. Cancer research, 69(16):6660–6667, 2009.