These slides are part of a presentation I gave on March 2010 at the BioInformatics and Genome Research Open Club at the Weizmann Institute of Science, Israel.
In these slides my student and I describe two web-applications for microarray and gene/protein set analysis,
ArrayMining.net and TopoGSA. These use ensemble and consensus methods as well as the
possibility of modular combinations of different analysis techniques for an integrative view of
(microarray-based) gene sets, interlinking transcriptomics with proteomics data sources. This integrative process uses tools from different fields, e.g. statistics, optimisation and network
topological studies. As an example for these integrative techniques, we use a microarray
consensus-clustering approach based on Simulated Annealing, which is part of the ArrayMining.net
Class Discovery Analysis module, and show how this approach can be combined in a modular
fashion with a prior gene set analysis. The results reveal that improved cluster validity indices can be obtained by merging the two methods, and provide pointers to distinct sub-classes within pre-defined tumour categories for a breast cancer dataset by the Nottingham Queens Medical Centre.
In the second part of the talk, I show how results from a supervised
microarray feature selection analysis on ArrayMining.net can be investigated in further detail with
TopoGSA, a new web-tool for network topological analysis of gene/protein sets mapped on a
comprehensive human protein-protein interaction network. I discuss results from a TopoGSA
analysis of the complete set of genes currently known to be mutated in cancer.
Food Chain and Food Web (Ecosystem) EVS, B. Pharmacy 1st Year, Sem-II
Integrative analysis of transcriptomics and proteomics data with ArrayMining and TopoGSA
1. Integrative analysis of transcriptomics and proteomics data (ArrayMining and TopoGSA) Integrative analysis of transcriptomics and proteomics data: implications to cancer biology ASAP – Interdisciplinary Optimisation Laboratory School of Computer Science Centre for Integrative Plant Biology Centre for Healthcare Associated Infections Institute of Infection, Immunity and Inflammation University of Nottingham Enrico Glaab & Natalio Krasnogor
2.
3.
4.
5.
6. Breast cancer data - difficulties Breast cancer outcome is hard to predict: Large degree of class-overlap in Breast cancer microarray data, whereas Leukemia decision boundaries are easy to find (Blazadonakis, 2009). Van‘t Veer et al. Alon et al. Golub et al.
9. Web-tool: ArrayMining.net What is ArrayMining.net? ArrayMinining.net is an online microarray analysis tool set integrating multiple data sources and algorithms. 6 analysis modules: 1. Gene selection 2. Sample clustering 3. Sample classification 4. Gene Set Analysis 5. Gene Network Analysis 6. Cross-Study Normalization Goal : A “swiss knife“ for microarray analysis tasks classical new www.arraymining.org
13. ArrayMining.net: Examples Further examples: Gene selection and Clustering module Automatic generation of heatmaps and PCA Cluster plots (Armstrong et al. dataset) samples genes
14. ArrayMining.net: Examples Further examples: 3D-ICA and Co-Expression analysis 3D Independent Component Analysis plot (left) and the largest connected components from a gene co-expression network (right) for the Armstrong et al. dataset Sample space: Gene space: ALL AML MLL
15. ArrayMining.net: In-house data Heat map: 50 most significant genes Box plot: 4 most significant genes Apply the tools on new data: QMC Breast cancer data Expression levels across 3 tumour grades: STK6 MYBL2 KIF2C AURKb
19. ArrayMining.net: Consensus clustering ArrayMining‘s consensus clustering approach: Clustering Agreement := No. of times pairs of samples are assigned to the same cluster across all input clusterings Idea: Reward objects in the same cluster, if they have a high agreement. Agreement matrix: A ij := # agreements across all clusterings for samples i and j Fitness function: := (max(A)+min(A))/2 Sample 1 Sample 2
20.
21.
22.
23. External validation Random model Single clustering Consensus Measure similarity of clusterings with the rand index R : a, b, c and d are the #pairs of objects assigned to: - the same cluster in both clusterings (a) - different clusters in both clusterings (b) - the same cluster in clustering 1/2 and different clusters in clustering 2/1 (c/d) - Corrected for chance: adjusted rand index Reference clustering: 3 tumour grades (low, medium, high) Clustering results – external validation (tumour grades) 10000 random clusterings
31. TopoGSA TopoGSA : Network topological analysis of gene sets What is TopoGSA ? TopoGSA is a web-application mapping gene sets onto a comprehensive human protein interaction network and analysing their network topological properties. Two types of analysis: 1. Compare genes within a gene set: e.g. up- vs. down-regulated genes 2. Compare a gene set against a database of known gene sets (e.g. KEGG, BioCarta, GO) www.infobiotics.net/ TopoGSA
32.
33.
34.
35.
36. RERG Protein Expression VS BCSS & DMFI Kaplan Meier plot of RERG protein expression with respect to BCSS in ER + U ER - cohort Kaplan Meier plot of RERG protein expression with respect to BCSS in ER + only Without adjuvant treatment Without Tamoxifen treatment
37. Conclusions(I): Feature comparison with similar tools ArrayMining & TopoGSA GEPAS (Tarraga et al.) Expression Profiler (Kapushesky et al.) Pre-processing : Image analysis, single- and dimensionality reduction, gene name normalization, cross-study normalization , covariance-based filtering Pre-processing : Image analysis, missing value imputation, multiple single study normalization methods, dimensionality reduction, ID converter Pre-processing : Image analysis, single study normalization, missing value imputation, dimensionality reduction, advanced data selection Analysis : Classification, Clustering, Gene selection, GSEA, PCA, ICA, Co-expression analysis, PPI-topology analysis, Ensembles/Cons. Analysis : Classification, Clustering, Gene selection, GSEA, PCA, CGH arrays, Tissue mining,Text mining, TF-binding site prediction Analysis : Clustering, Gene selection, PCA, Co-expression analysis (different from ArrayMining), COA, Similarity search Usability/features : PDF-reports, sortable ranking tables, data anno-tation, 2D/ 3D plots , e-mail notification, video tutorials Usability/features : special tree visualization (Caat, SotaTree, Newick Trees), 2D plots, data annotation (Babelomics), Usability/features : Excel export, XML queries, 2D plots, data annotation (GO, chromosome location)
38.
39.
40. Pathway enlargment – added genes Example case: BioCarta BTG family proteins and cell cycle regulation Black: Original pathway nodes – Green : Nodes added based on connectivity Added cancer gene
41.
42.
43.
44.
Hinweis der Redaktion
Now we combine the Class Discovery analysis with the Gene Set Analysis module, discussed on the next slides.