SlideShare ist ein Scribd-Unternehmen logo
1 von 43
Robust Pathway-based Multi-Omics Data Integration
using Directed Random Walk for Survival Prediction
in Multiple Cancer Studies
So Yeon Kim, Hyun-Hwan Jeong, Jaesik Kim,
Jeong-Hyeon Moon, Kyung-Ah Sohn
17TH ANNUAL INTERNATIONAL CONFERENCE ON CRITICAL
ASSESSMENT OF MASSIVE DATA ANALYSIS (CAMDA 2018)
CANCER DATA INTEGRATION CHALLENGE
CAMDA 2018 1
Outline
• Introduction
− Motivation
− Related works
• Methods
• Results
• Conclusion
CAMDA 2018 2
Introduction
CAMDA 2018 3
Motivation (1/3)
• Rich information of multi-omics
data provide opportunities for
better biological understanding
and improved clinical outcome
prediction
• Integrative analysis is
important to discover
interrelationships between
multiple different levels of data
CAMDA 2018 4
Weinstein et al. Nature genetics, 2013
Motivation (2/3)
• Graph-based integration methods are effective at combining
multi-omics data to consider the interactions between different
types of genomic data
CAMDA 2018 5
Kim et al. Journal of the American Medical Informatics Association, 2014
Motivation (3/3)
• Incorporating genomic
knowledge such as pathway
information on the
integrated graph can be
useful to increase prediction
power and find important
genes and pathways in
cancers
CAMDA 2018 6
Liu et al. Bioinformatics, 2013
Related works (1/7)
• Pathway-based integrative methods
− They simply transformed single genomic profile into pathway profile using
activity scoring measure
− A pathway level analysis of gene expression (PLAGE) used the singular
vector of singular value decomposition of given gene set
CAMDA 2018 7
TomFohr et al. Bioinformatics, 2005
Related works (2/7)
• Pathway-based integrative methods
− Z-score method convert gene expression
profile into z-scores and combines z-
scores of genes in each pathway per
sample
− They take pathways as the set of genes
− Better to consider gene-gene interactions
CAMDA 2018 8
Lee et al. PLoS Comput Biol, 2008
Related works (3/7)
• Some methods utilized gene-gene
interactions on a graph
− A denoising algorithm based on
relevance network topology (DART)
integrates pathways by deriving
perturbation signatures which reflect
gene contributions in each pathway
CAMDA 2018 9
Jiao et al. Bioinformatics, 2011
Related works (4/7)
• Some methods utilized gene-gene
interactions on a graph
− A directed random walk-based pathway
activity inference method (DRW)
identifies topologically important genes
and pathways by weighting the genes in
the gene-gene network
CAMDA 2018 10
Liu et al. Bioinformatics, 2013
Related works (5/7)
• Some methods utilized gene-gene
interactions on a graph
− Integrated extension on multi-omics
data (DRW-GM)
− Improved prediction performance
− Found many risk metabolite pathways
and topologically important genes for
cancer by a joint analysis of gene
expression and metabolite data
CAMDA 2018 11
Liu, et al. Scientific reports, 2015
Related works (6/7)
• Integrative DRW (iDRW) incorporate interaction between gene
expression and methylation features exploiting DRW-based methods
CAMDA 2018 12
Kim et al. BMC Medical Genomics, 2018 (to be appear)
Related works (7/7)
• Improved survival prediction power and jointly analyzed gene
expression and methylation data on an integrated gene-gene graph
CAMDA 2018 13
Kim et al. BMC Medical Genomics, 2018 (to be appear)
Overview
• Investigate the effectiveness of iDRW method on other types of
genomic profiles for two different cancers
• Reflect the interactions between gene expression and copy
number data on an integrated graph
• Construct graph with the updated pathway database
• A survival group classification for breast cancer and neuroblastoma
patient samples
CAMDA 2018 14
Overview
CAMDA 2018 15
Methods
CAMDA 2018 16
Integrated gene-gene graph construction (1/2)
• 327 human pathways and
corresponding gene sets from KEGG
database
• Interactions between genes were
defined using R KEGGgraph package
• Integrated directed gene-gene graph
− 7,390 nodes and 58,426 edges
CAMDA 2018 17
B
A
gene
KEGG PATHWAY
Database
Integrated gene-gene graph construction (2/2)
• To reflect the impact of copy number variation on gene expression,
we assign directional edges to all the overlapping genes
CAMDA 2018 18
Gene expression
Overlapping genes
Copy number alteration
Gene expression
Copy number alteration
Pathway activity inference
• The weight of the gene 𝒘 𝒈 is
the p-value from
- DESeq2 analysis (RNA-Seq)
- Two-tailed t-test (Microarray)
- 𝜒2-test of independence (Copy
number data)
CAMDA 2018 19
Genes
Samples
Gene expression
𝒛 𝒈𝒊
Genes
Samples
CNA
𝒛 𝒈𝒊
Weight initialization
𝑾 𝟎 = −𝒍𝒐𝒈(𝒘 𝒈 + 𝝐)
Pathway activity inference
CAMDA 2018 20
ground
node
Global directed gene-gene graph
Gene expression Copy number alteration Random walker
Integrative Directed Random Walk(iDRW)
𝑾∞
Genes
Samples
Gene expression
𝒛 𝒈𝒊
Genes
Samples
CNA
𝒛 𝒈𝒊
Weight initialization
𝑾 𝟎 = −𝒍𝒐𝒈(𝒘 𝒈 + 𝝐)
𝑾 𝒕+𝟏 = 𝟏 − 𝒓 𝑴 𝑻 𝑾 𝒕 + 𝒓𝑾 𝟎
Pathway activity inference
CAMDA 2018 21
Pathway Profile
Pathway
Samples
𝒂 𝑷𝒋
𝒂 𝑷𝒋 = 𝒊=𝟏
𝒏 𝒋
𝑾∞ 𝒈𝒊 ∗ 𝒔𝒄𝒐𝒓𝒆 𝒈𝒊 ∗ 𝒛 𝒈𝒊
𝒊=𝟏
𝒏 𝒋
(𝑾∞ 𝒈𝒊 ) 𝟐
ground
node
Global directed pathway graph
Gene expression Copy number alteration Random walker
Integrative Directed Random Walk(iDRW)
𝑾∞
𝑷𝒋 = {𝒈 𝟏, 𝒈 𝟐, … , 𝒈 𝒏𝒋
}
𝒏𝒋 differential genes
Pathway activity inference
CAMDA 2018 22
Pathway Profile
Pathway
Samples
𝒂 𝑷𝒋
𝒂 𝑷𝒋 = 𝒊=𝟏
𝒏 𝒋
𝑾∞ 𝒈𝒊 ∗ 𝒔𝒄𝒐𝒓𝒆 𝒈𝒊 ∗ 𝒛 𝒈𝒊
𝒊=𝟏
𝒏 𝒋
(𝑾∞ 𝒈𝒊 ) 𝟐
ground
node
Global directed pathway graph
Gene expression Copy number alteration Random walker
Integrative Directed Random Walk(iDRW)
𝑾∞
𝑷𝒋 = {𝒈 𝟏, 𝒈 𝟐, … , 𝒈 𝒏𝒋
}
𝒏𝒋 differential genes
Score of gene 𝒔𝒄𝒐𝒓𝒆 𝒈𝒊 is
- 𝑙𝑜𝑔2 fold change from DESeq2 analysis (RNA-Seq)
- 𝑠𝑖𝑔𝑛 𝑡𝑠𝑐𝑜𝑟𝑒 𝑔𝑖 (Microarray)
- 𝑚𝑒𝑎𝑛(𝐶𝑁𝐴 𝑔𝑖 𝑝𝑜𝑜𝑟) − 𝑚𝑒𝑎𝑛(𝐶𝑁𝐴 𝑔𝑖 𝑔𝑜𝑜𝑑) (Copy number data)
Pathway feature selection and survival prediction
• Feature ranking strategy
• p-values from the t-test of pathway
activities
• Top-k pathways across samples are
going to be the input to the
classification model
CAMDA 2018 23
Pathway Profile
Pathway
Samples
𝒂 𝑷𝒋
Rank
Top-k pathway feature selection
k
p-value
from t-test
Pathway feature selection and survival prediction
• Survival prediction
• Logistic regression model
classifies the samples into
good and poor group
• Empirically select top-k
pathway features that
showed the best
classification performance
CAMDA 2018 24
Pathway Profile
Pathway
Samples
𝒂 𝑷𝒋
Rank
pathway
00410pathway
00060
Risk-active pathway identification
Survival prediction
Results
CAMDA 2018 25
Challenge Dataset (1/2)
• Breast cancer patients data
from METABRIC dataset
• 24,368 genes of mRNA expression
profile from Illumina Human v3
microarray with log intensity levels
• 22,544 genes of putative copy-
number alterations data
• 1,648 patient samples are divided
into 908 good (> 10 years) and 740
poor (≤ 10 years) samples
CAMDA 2018 26
Agerage
survival years
10
Agerage
age at diagnosis
62
Performance evaluation (1/2)
CAMDA 2018 27
Predicted
good poor
Actual
good TP FN
poor FP TN
Survival prediction
1,648
patients
908 good
group
(long-term
survival)
740 poor
group
(short-
term
survival)
Classification accuracy
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍
𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍
Performance evaluation (1/2)
CAMDA 2018 28
Predicted
good poor
Actual
good TP FN
poor FP TN
Survival prediction
1,648
patients
908 good
group
(long-term
survival)
740 poor
group
(short-
term
survival)
Classification accuracy
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍
𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍
fold 1
fold 2
fold 3
fold 4
fold 5
Training setValidation set
5-fold cross-validation
Challenge Dataset (2/2)
• Neuroblastoma dataset from
NCBI GSE49711
• 60,586 genes of gene expression
profile of RNA sequencing
• 22,692 genes of DNA copy number
data
• 144 patient samples are divided into
38 good and 105 poor samples
(binary class label for overall survival
days provided by NCBI dataset)
CAMDA 2018 29
88 56
Agerage
survival years
< 1 year
Agerage
age at diagnosis
16 months
Performance evaluation (1/2)
CAMDA 2018 30
Predicted
good poor
Actual
good TP FN
poor FP TN
Survival prediction
144
patients
38 good
group
(long-term
survival)
105 poor
group
(short-
term
survival)
Classification accuracy
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍
𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍
Performance evaluation (1/2)
CAMDA 2018 31
Predicted
good poor
Actual
good TP FN
poor FP TN
Survival prediction
144
patients
38 good
group
(long-term
survival)
105 poor
group
(short-
term
survival)
Classification accuracy
𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 =
𝐓𝐏 + 𝐓𝐍
𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍
fold 1
fold 2
fold 3
fold 4
…
fold N
Training setValidation set
Leave-one-out cross-validation
Pathway-based methods
• For gene expression data in each dataset, four pathway-based
methods were compared
− PLAGE [TomFohr et al. Bioinformatics, 2005]
− Z-score [Lee et al. PLoS Comput Biol, 2008]
− DART [Jiao et al. Bioinformatics, 2011]
− DRW [Liu et al. Bioinformatics, 2013]
• Evaluate classification performances in the same way as the
proposed method
CAMDA 2018 32
Integrative analysis on multi-omics data improves
survival prediction performance (1/2)
• Four pathway-based
methods on a single
gene expression profile
• iDRW method on the
gene expression profile
and copy number data in
breast cancer (A) or in
neuroblastoma patients
(B)
CAMDA 2018 33
Breast cancer Neuroblastoma
Integrative analysis on multi-omics data improves
survival prediction performance (2/2)
• Improved performances
when utilizing interactions
between genes on a graph
• Especially, DRW-based
methods showed a more
contribution to a
performance improvement
• iDRW performed the best
in both cancer dataset
CAMDA 2018 34
Breast cancer Neuroblastoma
iDRW identifies cancer-associated pathways and genes (1/5)
Dataset Pathway ID Pathway name Total genes EXP CNA
Breast
cancer
(k = 25)
hsa04740 Olfactory transduction 419 54 268
hsa04014 Ras signaling pathway 232 68 164
hsa04015 Rap1 signaling pathway 206 64 142
hsa04916 Melanogenesis 101 37 73
hsa04722 Neurotrophin signaling pathway 119 38 84
hsa05200 Pathways in cancer 526 166 359
hsa04933 AGE-RAGE signaling pathway in diabetic complications 99 37 67
hsa04530 Tight junction 170 53 107
hsa04510 Focal adhesion 199 76 125
hsa04080 Neuroactive ligand-receptor interaction 278 64 193
hsa05225 Hepatocellular carcinoma 168 56 112
hsa04020 Calcium signaling pathway 182 59 136
hsa04024 cAMP signaling pathway 198 58 139
CAMDA 2018 35
Top-k pathways ranked by the iDRW method in breast cancer. For each pathway, the total number of genes, the number of significant
genes whose p-value(𝒘 𝒈) < 0.05 from gene expression (EXP) or copy number data (CNA) are shown.
iDRW identifies cancer-associated pathways and genes (2/5)
Dataset Pathway ID Pathway name Total genes EXP CNA
Breast
cancer
(k = 25)
hsa04217 Necroptosis 164 49 97
hsa04060 Cytokine-cytokine receptor interaction 270 70 192
hsa05152 Tuberculosis 179 58 112
hsa05165 Human papillomavirus infection 319 103 210
hsa04810 Regulation of actin cytoskeleton 208 64 132
hsa04151 PI3K-Akt signaling pathway 352 119 241
hsa04022 cGMP-PKG signaling pathway 163 58 109
hsa04630 Jak-STAT signaling pathway 162 43 112
hsa05167 Kaposi's sarcoma-associated herpesvirus infection 186 61 114
hsa04010 MAPK signaling pathway 295 87 209
hsa04371 Apelin signaling pathway 137 46 99
hsa04390 Hippo signaling pathway 154 58 100
CAMDA 2018 36
Top-k pathways ranked by the iDRW method in breast cancer. For each pathway, the total number of genes, the number of significant
genes whose p-value(𝒘 𝒈) < 0.05 from gene expression (EXP) or copy number data (CNA) are shown.
iDRW identifies cancer-associated pathways and genes (3/5)
CAMDA 2018 37
Hanahan et al. Cell, 2011
Six biological capabilities which are acquired during the tumor generation
Some of top-ranked pathways (Ras signaling, Necroptosis, Regulation of actin cytoskeleton, and PI3K-
Akt signaling pathway) are related with at least one of six functions
“…overexpression of 34 Olfactory Receptors genes has been reported
in patients bearing breast tumors caused by CHEK2 1100delC mutation…”
iDRW identifies cancer-associated pathways and genes (4/5)
Dataset Pathway ID Pathway name Total genes EXP CNA
Neuroblastoma
(k = 5)
hsa04976 Bile secretion 71 13 5
hsa05034 Alcoholism 180 22 7
hsa01100 Metabolic pathways 1273 43 93
hsa04080 Neuroactive ligand-receptor interaction 278 21 24
hsa04151 PI3K-Akt signaling pathway 352 19 31
CAMDA 2018 38
Top-k pathways ranked by the iDRW method in neuroblastoma data. For each pathway, the total number of genes, the number of
significant genes whose p-value(𝒘 𝒈) < 0.05 from gene expression (EXP) or copy number data (CNA) are shown.
iDRW identifies cancer-associated pathways and genes (5/5)
CAMDA 2018
39
“… we propose a mechanism underlying a potent and
selective anti-tumor effect of LCA in cultured human neuroblastoma cells …”“…the level of Urinary catecholamine metabolites which consist of vanillylmandelic
acid (VMA), homovanillic acid (HVA) and dopamine elevated in neuroblastoma
patients…”
Conclusions
• We showed the effectiveness of an integrative directed random
walk-based method utilizing pathway information (iDRW) on
different cancer datasets
• We benchmark iDRW and several state-of-the-art pathway-based
methods for the survival prediction model
CAMDA 2018 40
Conclusions
• Contributions
− Revamp a directed gene-gene graph considering the interactions
between gene expression and copy number data
− Jointly identify cancer-related pathways and genes on gene
expression and copy number data for breast cancer and
neuroblastoma datasets
CAMDA 2018 41
Acknowledgements
All lab members of LAMDA lab
Kyung-Ah Sohn
Byungkon Kang
Yenewondim Biadgie
Garam Lee
Habtamu Minassie Aycheh
Sehee Wang
Jungryul Seo
Nam-Hyuk Ahn
Min-Soo Kim
Tae-rim Kim
Young-Bum Choi
Jun-hyung Yu
Jeong-hyun Moon
Jaesik Kim
Sijin Kim
Heejin Kim
Joon-seon Hwang
Hyun-Hwan Jeong, Ph.D.
Post-doctoral associate
Baylor College of Medicine
Texas Children’s Hospital
Kyung-Ah Sohn, Ph.D.
Associate Professor
Department of Software and Computer Engineering,
Ajou University
Jaesik Kim
Graduate student, Masters course
Department of Software and Computer Engineering,
Ajou University
Jeong-Hyeon Moon
Graduate student, Masters course
Department of Software and Computer Engineering,
Ajou University
CAMDA 2018 42
Thank you !
Q & A
CAMDA 2018 43

Weitere ähnliche Inhalte

Was ist angesagt?

Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
yuvraj404
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
avrilcoghlan
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
Dayananda Salam
 

Was ist angesagt? (20)

Microarray Data Analysis
Microarray Data AnalysisMicroarray Data Analysis
Microarray Data Analysis
 
Multi Omics Approach in Medicine
Multi Omics Approach in MedicineMulti Omics Approach in Medicine
Multi Omics Approach in Medicine
 
Publicly available tools and open resources in Bioinformatics
Publicly available  tools and open resources in BioinformaticsPublicly available  tools and open resources in Bioinformatics
Publicly available tools and open resources in Bioinformatics
 
Statistical significance of alignments
Statistical significance of alignmentsStatistical significance of alignments
Statistical significance of alignments
 
Structural genomics
Structural genomicsStructural genomics
Structural genomics
 
Microarray data analysis _ by Ritesh Kumar
Microarray data analysis _ by Ritesh KumarMicroarray data analysis _ by Ritesh Kumar
Microarray data analysis _ by Ritesh Kumar
 
Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014Clinical proteomics in diseases lecture, 2014
Clinical proteomics in diseases lecture, 2014
 
Protein-protein interaction networks
Protein-protein interaction networksProtein-protein interaction networks
Protein-protein interaction networks
 
Secondary structure prediction
Secondary structure predictionSecondary structure prediction
Secondary structure prediction
 
An Introduction to Crispr Genome Editing
An Introduction to Crispr Genome EditingAn Introduction to Crispr Genome Editing
An Introduction to Crispr Genome Editing
 
Introduction to systems biology
Introduction to systems biologyIntroduction to systems biology
Introduction to systems biology
 
Molecular pharming
Molecular pharmingMolecular pharming
Molecular pharming
 
RNAseq Analysis
RNAseq AnalysisRNAseq Analysis
RNAseq Analysis
 
Next generation sequencing
Next generation sequencingNext generation sequencing
Next generation sequencing
 
Brief Introduction of SILAC
Brief Introduction of SILACBrief Introduction of SILAC
Brief Introduction of SILAC
 
Gene network and pathways
Gene network and pathwaysGene network and pathways
Gene network and pathways
 
Transcriptomics and metabolomics
Transcriptomics and metabolomicsTranscriptomics and metabolomics
Transcriptomics and metabolomics
 
OMICS in Crop Improvement.pptx
OMICS in Crop Improvement.pptxOMICS in Crop Improvement.pptx
OMICS in Crop Improvement.pptx
 
Comparative genomics
Comparative genomicsComparative genomics
Comparative genomics
 
SNP Detection Methods and applications
SNP Detection Methods and applications SNP Detection Methods and applications
SNP Detection Methods and applications
 

Ähnlich wie Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk for Survival Prediction in Multiple Cancer Studies

Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
Alexander Decker
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
Alexander Decker
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET
 

Ähnlich wie Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk for Survival Prediction in Multiple Cancer Studies (20)

GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptxGENETIC GAIN BY GENOMIC SELECTION PPT.pptx
GENETIC GAIN BY GENOMIC SELECTION PPT.pptx
 
Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...Effect of Feature Selection on Gene Expression Datasets Classification Accura...
Effect of Feature Selection on Gene Expression Datasets Classification Accura...
 
CPTAC3_NIH_Workshop_05012018.pptx
CPTAC3_NIH_Workshop_05012018.pptxCPTAC3_NIH_Workshop_05012018.pptx
CPTAC3_NIH_Workshop_05012018.pptx
 
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
SCDT: FC-NNC-structured Complex Decision Technique for Gene Analysis Using Fu...
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Comparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning andComparing prediction accuracy for machine learning and
Comparing prediction accuracy for machine learning and
 
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
Integrative Pathway-based Survival Prediction utilizing the Interaction betwe...
 
Mapping to the Metabolomic Manifold
Mapping to the Metabolomic ManifoldMapping to the Metabolomic Manifold
Mapping to the Metabolomic Manifold
 
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
dkNET Webinar: Multi-Omics Data Integration for Phenotype Prediction of Type-...
 
Multivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic DataMultivariate Analysis and Visualization of Proteomic Data
Multivariate Analysis and Visualization of Proteomic Data
 
Kishor Presentation
Kishor PresentationKishor Presentation
Kishor Presentation
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
TCIA Data Harmonization Project
TCIA Data Harmonization ProjectTCIA Data Harmonization Project
TCIA Data Harmonization Project
 
Bioinformatics-R program의 실례
Bioinformatics-R program의 실례Bioinformatics-R program의 실례
Bioinformatics-R program의 실례
 
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
Using accurate long reads to improve Genome in a Bottle Benchmarks 220923
 
Visual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient StratificationVisual Exploration of Clinical and Genomic Data for Patient Stratification
Visual Exploration of Clinical and Genomic Data for Patient Stratification
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
EFFICACY OF NON-NEGATIVE MATRIX FACTORIZATION FOR FEATURE SELECTION IN CANCER...
 
Prote-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and VisualizationProte-OMIC Data Analysis and Visualization
Prote-OMIC Data Analysis and Visualization
 
research paper
research paperresearch paper
research paper
 

Mehr von SOYEON KIM

Mehr von SOYEON KIM (20)

Revealing disease-associated pathways by network integration of untargeted me...
Revealing disease-associated pathways by network integration of untargeted me...Revealing disease-associated pathways by network integration of untargeted me...
Revealing disease-associated pathways by network integration of untargeted me...
 
Systems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traitsSystems genetics approaches to understand complex traits
Systems genetics approaches to understand complex traits
 
Network embedding
Network embeddingNetwork embedding
Network embedding
 
DeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social RepresentationsDeepWalk: Online Learning of Social Representations
DeepWalk: Online Learning of Social Representations
 
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral FilteringConvolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering
 
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image SearchVisual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
 
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...
Pathways-Driven Sparse Regression Identifies Pathways and Genes Associated wi...
 
A survey of heterogeneous information network analysis
A survey of heterogeneous information network analysisA survey of heterogeneous information network analysis
A survey of heterogeneous information network analysis
 
Translated learning
Translated learningTranslated learning
Translated learning
 
Self taught clustering
Self taught clusteringSelf taught clustering
Self taught clustering
 
Semi-automatic ground truth generation using unsupervised clustering and limi...
Semi-automatic ground truth generation using unsupervised clustering and limi...Semi-automatic ground truth generation using unsupervised clustering and limi...
Semi-automatic ground truth generation using unsupervised clustering and limi...
 
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
Mobile Phone Spam Image Detection based on Graph Partitioning with Pyramid H...
 
Text extraction from natural scene image, a survey
Text extraction from natural scene image, a surveyText extraction from natural scene image, a survey
Text extraction from natural scene image, a survey
 
Opinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network EffectsOpinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network Effects
 
Evaluating color descriptors for object and scene recognition
Evaluating color descriptors for object and scene recognitionEvaluating color descriptors for object and scene recognition
Evaluating color descriptors for object and scene recognition
 
Outcome-guided mutual information networks for investigating gene-gene intera...
Outcome-guided mutual information networks for investigating gene-gene intera...Outcome-guided mutual information networks for investigating gene-gene intera...
Outcome-guided mutual information networks for investigating gene-gene intera...
 
Spectral clustering
Spectral clusteringSpectral clustering
Spectral clustering
 
Sentiwordnet: A publicly available lexical resource for opinion mining
Sentiwordnet: A publicly available lexical resource for opinion miningSentiwordnet: A publicly available lexical resource for opinion mining
Sentiwordnet: A publicly available lexical resource for opinion mining
 
Opinion spam and analysis
Opinion spam and analysisOpinion spam and analysis
Opinion spam and analysis
 
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
Investigating the Effectiveness of E-mail Spam Image Data for Phone Spam Imag...
 

Kürzlich hochgeladen

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 

Kürzlich hochgeladen (20)

Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 

Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk for Survival Prediction in Multiple Cancer Studies

  • 1. Robust Pathway-based Multi-Omics Data Integration using Directed Random Walk for Survival Prediction in Multiple Cancer Studies So Yeon Kim, Hyun-Hwan Jeong, Jaesik Kim, Jeong-Hyeon Moon, Kyung-Ah Sohn 17TH ANNUAL INTERNATIONAL CONFERENCE ON CRITICAL ASSESSMENT OF MASSIVE DATA ANALYSIS (CAMDA 2018) CANCER DATA INTEGRATION CHALLENGE CAMDA 2018 1
  • 2. Outline • Introduction − Motivation − Related works • Methods • Results • Conclusion CAMDA 2018 2
  • 4. Motivation (1/3) • Rich information of multi-omics data provide opportunities for better biological understanding and improved clinical outcome prediction • Integrative analysis is important to discover interrelationships between multiple different levels of data CAMDA 2018 4 Weinstein et al. Nature genetics, 2013
  • 5. Motivation (2/3) • Graph-based integration methods are effective at combining multi-omics data to consider the interactions between different types of genomic data CAMDA 2018 5 Kim et al. Journal of the American Medical Informatics Association, 2014
  • 6. Motivation (3/3) • Incorporating genomic knowledge such as pathway information on the integrated graph can be useful to increase prediction power and find important genes and pathways in cancers CAMDA 2018 6 Liu et al. Bioinformatics, 2013
  • 7. Related works (1/7) • Pathway-based integrative methods − They simply transformed single genomic profile into pathway profile using activity scoring measure − A pathway level analysis of gene expression (PLAGE) used the singular vector of singular value decomposition of given gene set CAMDA 2018 7 TomFohr et al. Bioinformatics, 2005
  • 8. Related works (2/7) • Pathway-based integrative methods − Z-score method convert gene expression profile into z-scores and combines z- scores of genes in each pathway per sample − They take pathways as the set of genes − Better to consider gene-gene interactions CAMDA 2018 8 Lee et al. PLoS Comput Biol, 2008
  • 9. Related works (3/7) • Some methods utilized gene-gene interactions on a graph − A denoising algorithm based on relevance network topology (DART) integrates pathways by deriving perturbation signatures which reflect gene contributions in each pathway CAMDA 2018 9 Jiao et al. Bioinformatics, 2011
  • 10. Related works (4/7) • Some methods utilized gene-gene interactions on a graph − A directed random walk-based pathway activity inference method (DRW) identifies topologically important genes and pathways by weighting the genes in the gene-gene network CAMDA 2018 10 Liu et al. Bioinformatics, 2013
  • 11. Related works (5/7) • Some methods utilized gene-gene interactions on a graph − Integrated extension on multi-omics data (DRW-GM) − Improved prediction performance − Found many risk metabolite pathways and topologically important genes for cancer by a joint analysis of gene expression and metabolite data CAMDA 2018 11 Liu, et al. Scientific reports, 2015
  • 12. Related works (6/7) • Integrative DRW (iDRW) incorporate interaction between gene expression and methylation features exploiting DRW-based methods CAMDA 2018 12 Kim et al. BMC Medical Genomics, 2018 (to be appear)
  • 13. Related works (7/7) • Improved survival prediction power and jointly analyzed gene expression and methylation data on an integrated gene-gene graph CAMDA 2018 13 Kim et al. BMC Medical Genomics, 2018 (to be appear)
  • 14. Overview • Investigate the effectiveness of iDRW method on other types of genomic profiles for two different cancers • Reflect the interactions between gene expression and copy number data on an integrated graph • Construct graph with the updated pathway database • A survival group classification for breast cancer and neuroblastoma patient samples CAMDA 2018 14
  • 17. Integrated gene-gene graph construction (1/2) • 327 human pathways and corresponding gene sets from KEGG database • Interactions between genes were defined using R KEGGgraph package • Integrated directed gene-gene graph − 7,390 nodes and 58,426 edges CAMDA 2018 17 B A gene KEGG PATHWAY Database
  • 18. Integrated gene-gene graph construction (2/2) • To reflect the impact of copy number variation on gene expression, we assign directional edges to all the overlapping genes CAMDA 2018 18 Gene expression Overlapping genes Copy number alteration Gene expression Copy number alteration
  • 19. Pathway activity inference • The weight of the gene 𝒘 𝒈 is the p-value from - DESeq2 analysis (RNA-Seq) - Two-tailed t-test (Microarray) - 𝜒2-test of independence (Copy number data) CAMDA 2018 19 Genes Samples Gene expression 𝒛 𝒈𝒊 Genes Samples CNA 𝒛 𝒈𝒊 Weight initialization 𝑾 𝟎 = −𝒍𝒐𝒈(𝒘 𝒈 + 𝝐)
  • 20. Pathway activity inference CAMDA 2018 20 ground node Global directed gene-gene graph Gene expression Copy number alteration Random walker Integrative Directed Random Walk(iDRW) 𝑾∞ Genes Samples Gene expression 𝒛 𝒈𝒊 Genes Samples CNA 𝒛 𝒈𝒊 Weight initialization 𝑾 𝟎 = −𝒍𝒐𝒈(𝒘 𝒈 + 𝝐) 𝑾 𝒕+𝟏 = 𝟏 − 𝒓 𝑴 𝑻 𝑾 𝒕 + 𝒓𝑾 𝟎
  • 21. Pathway activity inference CAMDA 2018 21 Pathway Profile Pathway Samples 𝒂 𝑷𝒋 𝒂 𝑷𝒋 = 𝒊=𝟏 𝒏 𝒋 𝑾∞ 𝒈𝒊 ∗ 𝒔𝒄𝒐𝒓𝒆 𝒈𝒊 ∗ 𝒛 𝒈𝒊 𝒊=𝟏 𝒏 𝒋 (𝑾∞ 𝒈𝒊 ) 𝟐 ground node Global directed pathway graph Gene expression Copy number alteration Random walker Integrative Directed Random Walk(iDRW) 𝑾∞ 𝑷𝒋 = {𝒈 𝟏, 𝒈 𝟐, … , 𝒈 𝒏𝒋 } 𝒏𝒋 differential genes
  • 22. Pathway activity inference CAMDA 2018 22 Pathway Profile Pathway Samples 𝒂 𝑷𝒋 𝒂 𝑷𝒋 = 𝒊=𝟏 𝒏 𝒋 𝑾∞ 𝒈𝒊 ∗ 𝒔𝒄𝒐𝒓𝒆 𝒈𝒊 ∗ 𝒛 𝒈𝒊 𝒊=𝟏 𝒏 𝒋 (𝑾∞ 𝒈𝒊 ) 𝟐 ground node Global directed pathway graph Gene expression Copy number alteration Random walker Integrative Directed Random Walk(iDRW) 𝑾∞ 𝑷𝒋 = {𝒈 𝟏, 𝒈 𝟐, … , 𝒈 𝒏𝒋 } 𝒏𝒋 differential genes Score of gene 𝒔𝒄𝒐𝒓𝒆 𝒈𝒊 is - 𝑙𝑜𝑔2 fold change from DESeq2 analysis (RNA-Seq) - 𝑠𝑖𝑔𝑛 𝑡𝑠𝑐𝑜𝑟𝑒 𝑔𝑖 (Microarray) - 𝑚𝑒𝑎𝑛(𝐶𝑁𝐴 𝑔𝑖 𝑝𝑜𝑜𝑟) − 𝑚𝑒𝑎𝑛(𝐶𝑁𝐴 𝑔𝑖 𝑔𝑜𝑜𝑑) (Copy number data)
  • 23. Pathway feature selection and survival prediction • Feature ranking strategy • p-values from the t-test of pathway activities • Top-k pathways across samples are going to be the input to the classification model CAMDA 2018 23 Pathway Profile Pathway Samples 𝒂 𝑷𝒋 Rank Top-k pathway feature selection k p-value from t-test
  • 24. Pathway feature selection and survival prediction • Survival prediction • Logistic regression model classifies the samples into good and poor group • Empirically select top-k pathway features that showed the best classification performance CAMDA 2018 24 Pathway Profile Pathway Samples 𝒂 𝑷𝒋 Rank pathway 00410pathway 00060 Risk-active pathway identification Survival prediction
  • 26. Challenge Dataset (1/2) • Breast cancer patients data from METABRIC dataset • 24,368 genes of mRNA expression profile from Illumina Human v3 microarray with log intensity levels • 22,544 genes of putative copy- number alterations data • 1,648 patient samples are divided into 908 good (> 10 years) and 740 poor (≤ 10 years) samples CAMDA 2018 26 Agerage survival years 10 Agerage age at diagnosis 62
  • 27. Performance evaluation (1/2) CAMDA 2018 27 Predicted good poor Actual good TP FN poor FP TN Survival prediction 1,648 patients 908 good group (long-term survival) 740 poor group (short- term survival) Classification accuracy 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = 𝐓𝐏 + 𝐓𝐍 𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍
  • 28. Performance evaluation (1/2) CAMDA 2018 28 Predicted good poor Actual good TP FN poor FP TN Survival prediction 1,648 patients 908 good group (long-term survival) 740 poor group (short- term survival) Classification accuracy 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = 𝐓𝐏 + 𝐓𝐍 𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍 fold 1 fold 2 fold 3 fold 4 fold 5 Training setValidation set 5-fold cross-validation
  • 29. Challenge Dataset (2/2) • Neuroblastoma dataset from NCBI GSE49711 • 60,586 genes of gene expression profile of RNA sequencing • 22,692 genes of DNA copy number data • 144 patient samples are divided into 38 good and 105 poor samples (binary class label for overall survival days provided by NCBI dataset) CAMDA 2018 29 88 56 Agerage survival years < 1 year Agerage age at diagnosis 16 months
  • 30. Performance evaluation (1/2) CAMDA 2018 30 Predicted good poor Actual good TP FN poor FP TN Survival prediction 144 patients 38 good group (long-term survival) 105 poor group (short- term survival) Classification accuracy 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = 𝐓𝐏 + 𝐓𝐍 𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍
  • 31. Performance evaluation (1/2) CAMDA 2018 31 Predicted good poor Actual good TP FN poor FP TN Survival prediction 144 patients 38 good group (long-term survival) 105 poor group (short- term survival) Classification accuracy 𝐀𝐜𝐜𝐮𝐫𝐚𝐜𝐲 = 𝐓𝐏 + 𝐓𝐍 𝐓𝐏 + 𝐅𝐍 + 𝐅𝐏 + 𝐓𝐍 fold 1 fold 2 fold 3 fold 4 … fold N Training setValidation set Leave-one-out cross-validation
  • 32. Pathway-based methods • For gene expression data in each dataset, four pathway-based methods were compared − PLAGE [TomFohr et al. Bioinformatics, 2005] − Z-score [Lee et al. PLoS Comput Biol, 2008] − DART [Jiao et al. Bioinformatics, 2011] − DRW [Liu et al. Bioinformatics, 2013] • Evaluate classification performances in the same way as the proposed method CAMDA 2018 32
  • 33. Integrative analysis on multi-omics data improves survival prediction performance (1/2) • Four pathway-based methods on a single gene expression profile • iDRW method on the gene expression profile and copy number data in breast cancer (A) or in neuroblastoma patients (B) CAMDA 2018 33 Breast cancer Neuroblastoma
  • 34. Integrative analysis on multi-omics data improves survival prediction performance (2/2) • Improved performances when utilizing interactions between genes on a graph • Especially, DRW-based methods showed a more contribution to a performance improvement • iDRW performed the best in both cancer dataset CAMDA 2018 34 Breast cancer Neuroblastoma
  • 35. iDRW identifies cancer-associated pathways and genes (1/5) Dataset Pathway ID Pathway name Total genes EXP CNA Breast cancer (k = 25) hsa04740 Olfactory transduction 419 54 268 hsa04014 Ras signaling pathway 232 68 164 hsa04015 Rap1 signaling pathway 206 64 142 hsa04916 Melanogenesis 101 37 73 hsa04722 Neurotrophin signaling pathway 119 38 84 hsa05200 Pathways in cancer 526 166 359 hsa04933 AGE-RAGE signaling pathway in diabetic complications 99 37 67 hsa04530 Tight junction 170 53 107 hsa04510 Focal adhesion 199 76 125 hsa04080 Neuroactive ligand-receptor interaction 278 64 193 hsa05225 Hepatocellular carcinoma 168 56 112 hsa04020 Calcium signaling pathway 182 59 136 hsa04024 cAMP signaling pathway 198 58 139 CAMDA 2018 35 Top-k pathways ranked by the iDRW method in breast cancer. For each pathway, the total number of genes, the number of significant genes whose p-value(𝒘 𝒈) < 0.05 from gene expression (EXP) or copy number data (CNA) are shown.
  • 36. iDRW identifies cancer-associated pathways and genes (2/5) Dataset Pathway ID Pathway name Total genes EXP CNA Breast cancer (k = 25) hsa04217 Necroptosis 164 49 97 hsa04060 Cytokine-cytokine receptor interaction 270 70 192 hsa05152 Tuberculosis 179 58 112 hsa05165 Human papillomavirus infection 319 103 210 hsa04810 Regulation of actin cytoskeleton 208 64 132 hsa04151 PI3K-Akt signaling pathway 352 119 241 hsa04022 cGMP-PKG signaling pathway 163 58 109 hsa04630 Jak-STAT signaling pathway 162 43 112 hsa05167 Kaposi's sarcoma-associated herpesvirus infection 186 61 114 hsa04010 MAPK signaling pathway 295 87 209 hsa04371 Apelin signaling pathway 137 46 99 hsa04390 Hippo signaling pathway 154 58 100 CAMDA 2018 36 Top-k pathways ranked by the iDRW method in breast cancer. For each pathway, the total number of genes, the number of significant genes whose p-value(𝒘 𝒈) < 0.05 from gene expression (EXP) or copy number data (CNA) are shown.
  • 37. iDRW identifies cancer-associated pathways and genes (3/5) CAMDA 2018 37 Hanahan et al. Cell, 2011 Six biological capabilities which are acquired during the tumor generation Some of top-ranked pathways (Ras signaling, Necroptosis, Regulation of actin cytoskeleton, and PI3K- Akt signaling pathway) are related with at least one of six functions “…overexpression of 34 Olfactory Receptors genes has been reported in patients bearing breast tumors caused by CHEK2 1100delC mutation…”
  • 38. iDRW identifies cancer-associated pathways and genes (4/5) Dataset Pathway ID Pathway name Total genes EXP CNA Neuroblastoma (k = 5) hsa04976 Bile secretion 71 13 5 hsa05034 Alcoholism 180 22 7 hsa01100 Metabolic pathways 1273 43 93 hsa04080 Neuroactive ligand-receptor interaction 278 21 24 hsa04151 PI3K-Akt signaling pathway 352 19 31 CAMDA 2018 38 Top-k pathways ranked by the iDRW method in neuroblastoma data. For each pathway, the total number of genes, the number of significant genes whose p-value(𝒘 𝒈) < 0.05 from gene expression (EXP) or copy number data (CNA) are shown.
  • 39. iDRW identifies cancer-associated pathways and genes (5/5) CAMDA 2018 39 “… we propose a mechanism underlying a potent and selective anti-tumor effect of LCA in cultured human neuroblastoma cells …”“…the level of Urinary catecholamine metabolites which consist of vanillylmandelic acid (VMA), homovanillic acid (HVA) and dopamine elevated in neuroblastoma patients…”
  • 40. Conclusions • We showed the effectiveness of an integrative directed random walk-based method utilizing pathway information (iDRW) on different cancer datasets • We benchmark iDRW and several state-of-the-art pathway-based methods for the survival prediction model CAMDA 2018 40
  • 41. Conclusions • Contributions − Revamp a directed gene-gene graph considering the interactions between gene expression and copy number data − Jointly identify cancer-related pathways and genes on gene expression and copy number data for breast cancer and neuroblastoma datasets CAMDA 2018 41
  • 42. Acknowledgements All lab members of LAMDA lab Kyung-Ah Sohn Byungkon Kang Yenewondim Biadgie Garam Lee Habtamu Minassie Aycheh Sehee Wang Jungryul Seo Nam-Hyuk Ahn Min-Soo Kim Tae-rim Kim Young-Bum Choi Jun-hyung Yu Jeong-hyun Moon Jaesik Kim Sijin Kim Heejin Kim Joon-seon Hwang Hyun-Hwan Jeong, Ph.D. Post-doctoral associate Baylor College of Medicine Texas Children’s Hospital Kyung-Ah Sohn, Ph.D. Associate Professor Department of Software and Computer Engineering, Ajou University Jaesik Kim Graduate student, Masters course Department of Software and Computer Engineering, Ajou University Jeong-Hyeon Moon Graduate student, Masters course Department of Software and Computer Engineering, Ajou University CAMDA 2018 42
  • 43. Thank you ! Q & A CAMDA 2018 43