There is an increasing amount of oncogenomic data available in the last years, and more is to come. The main challenges the scientific community is and will be facing are the integration of this data to extract new knowledge and the intuitive visualization of the results obtained in the analysis. Here two complementary but independent tools for the analysis of oncogenomic data are presented: IntOGen and GiTools.
IntOGen is a framework that includes public oncogenomic data and integrates it in different ways. Its main purpose is to identify those genes which are consistently altered (up or down-regulated) across many samples in a specific experiment, and combine all experiment from a same cancer type to end up having a p-value for a gene and cancer type. This same principle can then be applied to gene modules, or sets, which consist of groups of genes that share a biological property (module analysis). IntOGen has a web page from where the user can explore the datasets included in the database, from individual genes in all cancer types to different experiments, or gene modules (GO terms, KEGG pathways or user-defined groups of genes) across all the experiments.
GiTools is a desktop-based framework developed also by the lab which allows the analysis and visualization of genomic data. It supports different input formats (all plain text) and data can even be imported from BioMart, so everything stored in that database can be used directly in GiTools. Also there is an IntOGen data importer, so users can download matrices or oncomodules at different levels (experiments or combined results) and use them directly. Right now it can perform a limited number of analysis (enrichment analysis, correlations, results combination...) but it is built in a modular fashion and it can be easily expanded to include more matrix-based statistical tests. It allows the flexible exploration of the data and creating figures for papers from there directly, which can be exported in many different formats.
Two case studies are presented to illustrate the combined usefulness of these tools, aiming to answer two main questions: “what biological processes are enriched in genes siginificantly up-regulated in cancer?” and “what is the correlation between different tumour types for the pattern of genes up-regulated?”. Also different real applications of these tools are presented, both from published and unpublished research, stressing that they can be used not only in oncogenomics projects, but also in evolution and global gene regulation.
In the near future GiTools will be incorporating new analysis, such as GSEA and clustering, and connections with the R statistical framework. IntOGen will soon have a Biomart-compatible interface, which will make the data even more easily available.
1. IntOGen & Gitools
integration, visualization and data-mining of
multidimensional oncogenomic data
Christian Pérez-Llamas
Master student
Biomedical Genomics
GRIB-UPF
April 2010
2. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
3. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
5. Identification of cancer related genes
Cancer type A
exp. 1
exp. 2
exp. 3
exp. n
experiment 1
samples
STEP 1 STEP 2
identification of combination of
genes
driver alterations experiments
+ ...
genes
altered 0 0.05 1
not altered
corrected p-value
International Classification of Disease
from Word Health Organization
15. Data Analysis Browse Export
Many File Formats Supported
TSV
CDM
BDM
GMX
GMT
TCM
16. Data Analysis Browse Export
Import data from: Marts
● International Cancer Genome Consorcium
Data Levels Alterations
● Genes significantly altered ● Experiments ● Upregulation
● Modules of genes significantly altered ● Combinations ● Downregulation
● Gain
● Loss
20. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
21. Case study
● What biological processes are enriched in genes
significantly up-regulated in cancer ?
● What is the correlation between different tumour
types for the pattern of genes up-regulated ?
39. Enrichment analysis
Biological modules
Tumor
Tumor
type i
type i
... ... GO Biological processes
Tumor
type i
...
STEP 1 STEP 2
genes
genes
Transform to 1 Enrichment
p-values < 0.05 analysis
modules
Xi~Bin(pi)
H0: pm = pi
H1: pm > pi
0 0.05 1
Annotated genes
p-value in module M
53. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
54. Real projects
● RBP2 function
● Functional protein divergence
● Study of altered regulatory programs in cancer
● Stress response genes and transition into increased malignant states
● Comparison of alteration patterns among tumor types
RBP2
Functional Enrichment
of RBP2 targets at
different time points of
differentiation
Lopez-Bigas et al.,
Molecular Cell 2008
55. Real projects
● RBP2 function
● Functional protein divergence
● Study of altered regulatory programs in cancer
● Stress response genes and transition into increased malignant states
● Comparison of alteration patterns among tumor types
Lopez-Bigas et al.,
Genome Biology 2008
56. Outline
● Introduction
● Case study
● Real projects
● Conclusions
● Future work
57. Conclusions
● IntOGen is a novel framework for Oncogenomics data
integration
● IntOGen.org is a discovery tool for cancer researchers
● Gitools main features are:
● Interactive heatmap
● Import from Biomart
● Import from IntOGen
● Command line option
58. Future work
● Biomart compatible interface for IntOGen
● Implement more analysis:
● GSEA
● Clustering
● Modules hierarchy aware enrichment like Gostats
● Connection with R
● Implement more editors:
● Table and modules editor
59. Acknowledgements
Nuria López-Bigas
Gunes Gundem
Jordi Deu-Pons
Khademul Islam
Michael Schroeder
Alba Jené-Sanz
Xavier Rafael
Remember to visit
www.intogen.org
www.gitools.org