Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA

Integrative analysis of transcriptomics and proteomics data (ArrayMining and TopoGSA) Integrative analysis of transcriptomics and proteomics data: implications to cancer biology ASAP – Interdisciplinary Optimisation Laboratory School of Computer Science Centre for Integrative Plant Biology Centre for Healthcare Associated Infections Institute of Infection, Immunity and Inflammation University of Nottingham Enrico Glaab & Natalio Krasnogor

[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Outline Gibson G (2003) Microarray Analysis. PLoS Biol 1(1): e15. doi:10.1371/journal.pbio.0000015

Introduction ,[object Object],[object Object],[object Object]

Reference data set Armstrong et al. Leukemia data set ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],samples Heat map: 30 most differentially expressed genes vs. samples genes

Main data set QMC breast cancer microarray data set ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],grade1 grade 3 Heat map: 30 most differentially expressed genes vs. samples (grade 1 and grade 3) genes

Breast cancer data - difficulties Breast cancer outcome is hard to predict: Large degree of class-overlap in Breast cancer microarray data, whereas Leukemia decision boundaries are easy to find (Blazadonakis, 2009). Van‘t Veer et al. Alon et al. Golub et al.

Data Fusion Other biological data sources used: ,[object Object], mutated genes in different human cancer types (Breast, Liver,...)  30 gene sets of size > 10 genes  obtained from GO, BioCarta, Reactome, KEGG and InterPro  total: approx. 3000 pathways (size > 10) ,[object Object],[object Object],Breast cancer microarray data : Protein interaction data : Cellular pathway data : Cancer gene sets :

Methods overview Methods overview: ArrayMining & TopoGSA

Web-tool: ArrayMining.net What is ArrayMining.net? ArrayMinining.net is an online microarray analysis tool set integrating multiple data sources and algorithms. 6 analysis modules: 1. Gene selection 2. Sample clustering 3. Sample classification 4. Gene Set Analysis 5. Gene Network Analysis 6. Cross-Study Normalization Goal : A “swiss knife“ for microarray analysis tasks classical new www.arraymining.org

ArrayMining.net: Gene selection ,[object Object],[object Object],[object Object],[object Object], previously identified by Armstrong et al.  newly identified Affymetrix ID Gene symbol Gene descriptions – source: F-statistic 32847_at  MYLK myosin, light polypeptide kinase 159.59 1389_at  MME membrane metallo-endopeptidase (neutral endopeptidase, enkephalinase) 137.53 35164_at  WFS1 wolfram syndrome 1 (wolframin) 128 36239_at  POU2AF1 pou domain, class 2, associating factor 1 116.75 1325_at  SMAD1 smad, mothers against dpp homolog 1 (drosophila) 110.37 963_at  LIG4 ligase iv, dna, atp-dependent 89.77 34168_at  DNTT deoxynucleotidyltransferase, terminal 89.31 40570_at  FOXO1 forkhead box o1a (rhabdomyosarcoma) 86.89 33412_at  LGALS1 lectin, galactoside-binding, soluble, 1 (galectin 1) 81.31

ArrayMining.net: Gene selection ,[object Object],[object Object],[object Object],[object Object],[object Object],   

ArrayMining.net: Examples Further examples: Gene selection and Clustering module Automatic generation of heatmaps and PCA Cluster plots (Armstrong et al. dataset) samples genes

ArrayMining.net: Examples Further examples: 3D-ICA and Co-Expression analysis 3D Independent Component Analysis plot (left) and the largest connected components from a gene co-expression network (right) for the Armstrong et al. dataset Sample space: Gene space: ALL AML MLL

ArrayMining.net: In-house data Heat map: 50 most significant genes Box plot: 4 most significant genes Apply the tools on new data: QMC Breast cancer data Expression levels across 3 tumour grades: STK6 MYBL2 KIF2C AURKb

ArrayMining.net: QMC dataset ,[object Object],[object Object],[object Object],Gene name PC (gene vs. outcome): Fold Change Q-value (Rank) ESTROGEN RECEPTOR 1 -0.75 0.16 1.6e-20 (1.) RAS-LIKE, ESTROGEN-REGULATED, GROWTH INHIBITOR -0.66 0.46 5.3e-14 (2.) WD REPEAT DOMAIN 19 -0.66 0.73 1.2e-13 (3.) CARBONIC ANHYDRASE XII -0.65 0.28 2.7e-13 (4.) ARP3 ACTIN-RELATED PROTEIN 3 HOMOLOG (YEAST) 0.64 1.37 9.6e-13 (5.) TETRATRICOPEPTIDE REPEAT DOMAIN 8 -0.63 0.82 2.2e-12 (6.) BREAST CANCER MEMBRANE PROTEIN 11 -0.62 0.24 7.1e-12 (7.)

ArrayMining.net: Example ,[object Object],[object Object],[object Object],[object Object],ArrayMining - Class Discovery Analysis module:

ArrayMining.net: Consensus clustering ArrayMining‘s consensus clustering approach: Clustering Agreement := No. of times pairs of samples are assigned to the same cluster across all input clusterings Idea: Reward objects in the same cluster, if they have a high agreement. Agreement matrix: A ij := # agreements across all clusterings for samples i and j Fitness function:  := (max(A)+min(A))/2 Sample 1 Sample 2

[object Object],[object Object],[object Object],Simulated Annealing - variants Cauchy vs. Gaussian distribution

Clustering methods and validity indices ,[object Object],[object Object],[object Object],[object Object],Example: Silhouette width a(i) = avg. distance of obj(i) to all others in the same cluster b(i) = avg. distance of obj(i) to all others in closest distinct cluster

Consensus clustering: example ,[object Object],[object Object],Example application: QMC breast cancer dataset low confidence (silhouette widths) best separation for two clusters

External validation Random model Single clustering Consensus Measure similarity of clusterings with the rand index R : a, b, c and d are the #pairs of objects assigned to: - the same cluster in both clusterings (a) - different clusters in both clusterings (b) - the same cluster in clustering 1/2 and different clusters in clustering 2/1 (c/d) - Corrected for chance:  adjusted rand index Reference clustering: 3 tumour grades (low, medium, high) Clustering results – external validation (tumour grades) 10000 random clusterings

ArrayMining.net: Gene set analysis samples pathways ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],(example: Van Andel institute cancer gene sets) Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Subramanian et al. PNAS October 25, 2005 vol. 102 no. 43 15545–15550

ArrayMining.net: Examples Gene Set Analysis module – example analysis Heat map for the Armstrong et al. dataset based on pathway meta-genes ,[object Object],[object Object]

Consensus clustering: example (2) ,[object Object],[object Object],Combine consensus clustering with gene set analysis ~3 times higher confidence better separation

External validation Single clustering Consensus clustering Consensus (PAM+SOTA) 10000 random clusterings

Interim Summary ,[object Object],[object Object],[object Object],[object Object],ArrayMining Integrative Clustering - Summary

TopoGSA TopoGSA : Network topological analysis of gene sets What is TopoGSA ? TopoGSA is a web-application mapping gene sets onto a comprehensive human protein interaction network and analysing their network topological properties. Two types of analysis: 1. Compare genes within a gene set: e.g. up- vs. down-regulated genes 2. Compare a gene set against a database of known gene sets (e.g. KEGG, BioCarta, GO) www.infobiotics.net/ TopoGSA

TopoGSA - Methods ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],TopoGSA computes the following topological properties for an uploaded geneset and matched-size random gene sets:

KEGG-BRITE pathway colouring ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mean node betweenness Mean clustering coefficient Mean shortest path length

ArrayMining  TopoGSA ,[object Object],[object Object],[object Object],[object Object]

Real-world application of tools sets ,[object Object],[object Object],[object Object],TMAs of invasive breast cancer show strong RERG expression

RERG Protein Expression VS BCSS & DMFI Kaplan Meier plot of RERG protein expression with respect to BCSS in ER + U ER - cohort Kaplan Meier plot of RERG protein expression with respect to BCSS in ER + only Without adjuvant treatment Without Tamoxifen treatment

Conclusions(I): Feature comparison with similar tools ArrayMining & TopoGSA GEPAS (Tarraga et al.) Expression Profiler (Kapushesky et al.) Pre-processing : Image analysis, single- and dimensionality reduction, gene name normalization, cross-study normalization , covariance-based filtering Pre-processing : Image analysis, missing value imputation, multiple single study normalization methods, dimensionality reduction, ID converter Pre-processing : Image analysis, single study normalization, missing value imputation, dimensionality reduction, advanced data selection Analysis : Classification, Clustering, Gene selection, GSEA, PCA, ICA, Co-expression analysis, PPI-topology analysis, Ensembles/Cons. Analysis : Classification, Clustering, Gene selection, GSEA, PCA, CGH arrays, Tissue mining,Text mining, TF-binding site prediction Analysis : Clustering, Gene selection, PCA, Co-expression analysis (different from ArrayMining), COA, Similarity search Usability/features : PDF-reports, sortable ranking tables, data anno-tation, 2D/ 3D plots , e-mail notification, video tutorials Usability/features : special tree visualization (Caat, SotaTree, Newick Trees), 2D plots, data annotation (Babelomics), Usability/features : Excel export, XML queries, 2D plots, data annotation (GO, chromosome location)

Conclusions (2) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Outlook : PPI-based pathway-enlargement ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],black = pathway-nodes; red blue green = nodes added based on different criteria ... ... ...

Pathway enlargment – added genes Example case: BioCarta BTG family proteins and cell cycle regulation Black: Original pathway nodes – Green : Nodes added based on connectivity Added cancer gene

Pathway enlargment – Example 1 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example: Alzheimer disease pathway

Pathway enlargment – Example 2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example: Interleukin signaling pathways

Pathway enlargment - conclusion ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Acknowledgements ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (12)

Ähnlich wie Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA

Ähnlich wie Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA (20)

Mehr von Natalio Krasnogor

Mehr von Natalio Krasnogor (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA

Hinweis der Redaktion