Anzeige
Anzeige

Más contenido relacionado

Presentaciones para ti(20)

Similar a Bioinformatics in dermato-oncology(20)

Anzeige
Anzeige

Bioinformatics in dermato-oncology

  1. Joaquín Dopazo Computational Genomics Department, Centro de Investigación Príncipe Felipe (CIPF), Functional Genomics Node, (INB), Bioinformatics Group (CIBERER) and Medical Genome Project, Spain. http://bioinfo.cipf.es http://www.medicalgenomeproject.com http://www.babelomics.org http://www.hpc4g.org @xdopazo VII International Symposium Advances in Dermato-oncology 24 October 2014 RESEARCH METHODS IN ONCOLOGIC DERMATOLOGY Bioinformatics
  2. Precision medicine relies on a better understanding of the relationships between genotype and phenotype •Precision medicine requires of better ways of defining diseases by introducing genomic technologies into the diagnostic procedures. •A more precise diagnostic of diseases, based on the description of their molecular mechanisms, is critical for creating innovative diagnostic, prognostic, and therapeutic strategies properly tailored to each patient’s necessities The transition to personalized genomic medicine (or precision medicine)
  3. While the cost fall down, the amount of data to manage and its complexity raise exponentially. Costs are already almost competitive enough to be used in clinic The problem is… are we ready to deal with this data? Exome sequencing successfully used. NGS is matching the cost of other clinical tests http://www.genome.gov/sequencingcosts/
  4. http://www.nih.gov/news/health/jun2014/nhgri-18.htm http://www.nejm.org/doi/full/10.1056/NEJMra1312543 More than 10,000 exomes will be ordered for diagnostic purposes Clinical application of exomes
  5. But.. what are big data? Wikipedia: collection of data sets so large and complex that it becomes difficult to process using traditional data processing applications Google: extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions Genomic data are big data: •Are large and complex •Individual genome data harbour much more information than the one used in the experiment that generated them •The availability of thousands of genomes enables finding new associations and interactions Medicine has become a data-driven discipline. The use of genome sequencing offer the possibility of studying biological systems with an unprecedented level of detail. Because of the pace of data production, genomic data are big data
  6. From empirical to genomic medicine Test Therapy 1 Empirical medicine Test Therapy 1 Therapy 2 Therapy 3 ? Genomic medicine + Genomic analysis allows associating patients to therapies from the very beginning, saving time and costs and increasing the success of treatments. feedback Therapy 2    Therapy 3
  7. Personalized Genomic Medicine. Phase I: generating the knowledge database ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- ----- sequencing Patient List of variants Database. Query: variant/pathway Therapy Outcome System feedback Genetic variants are linked to therapies through the knowledge of their functional effects (systems biology) Initially the system will need much feedback: Knowledge generation phase. Growing knowledge database Genomic medicine Knowledge database
  8. Personalized Genomic Medicine. Phase II: applying the knowledge database Patient 1)Genomic sequencing 2)Database of markers 3)Therapy prediction Genomic core facility phase II Clinician receives hints on possible prescriptions and therapeutic interventions + Other factors (risk, cost, etc.) Prescription Pre-symptomatic: • Genetic predisposition of acquired diseases (>6000. some treatable) • Early diagnosis of genetic diseases Symptomatic analysis • Diagnostic of acquired diseases • Early cancer detection • Cancer treatment recommendation
  9. Phase I: Finding new biomarkers Feedback: treatment failures are reanalyzed to search for: 1)Biomarkers (of failure) 2)Subgroups (to search for new personalized and rational therapeutic interventions Treatables Failure treatment biomarkers Group A biomarkers Group A biomarkers Irrelevant Non treatables Signaling Protein interaction Regulation Variants are used as biomarkers to distinguish between responders and non-responders and to sub-classify non-responders Rationale design of therapies rely on Systems Biology concepts. Pathways are complex and must be understood with the proper bioinformatic tools Test Therapy 1 Therapy 2 Therapy 3 ? feedback
  10. 3-Methylglutaconic aciduria (3- MGA-uria) is a heterogeneous group of syndromes characterized by an increased excretion of 3-methylglutaconic and 3-methylglutaric acids. WES with a consecutive filter approach is enough to detect the new mutation in this case. How to deal with genomic data? Successive Filtering approach An example with 3-Methylglutaconic aciduria syndrome
  11. New variants and disease genes found with WES and successive filtering WES IRDs arRP (EYS) BBS arRP arRP (USH2) 3-MGA- uria (SERAC1) NBD (BCKDK )
  12. BiERapp: interactive web-based tool for easy candidate prioritization by successive filtering SEQUENCING CENTER Data preprocessing VCF FASTQ Genome Maps BAM BiERapp filters No-SQL (Mongo) VCF indexing Population frequencies Consequence types Experimental design BAM viewer and Genomic context ? Easy scale up
  13. NA19660 NA19661 NA19600 NA19685 BiERapp: the interactive filtering tool for easy candidate prioritization http://bierapp.babelomics.org Aleman et al., 2014 NAR
  14. Knowledge DB Freq. popul. MySeq IonTorrent IonProton Illumina NO Diagnostic Therapeutic decision New variants Disease All Candidate Prioritization Data preprocessing Sequence DB Sequences Freqs. Future technologies New knowledge for future diagnostic The final schema: diagnostic and discovery
  15. Diagnostic by targeted sequencing (panels of genes) Tool for defining panels New filter based on local population variant frequencies If no diagnostic variants appear, then secondary findings are studied Diagnostic mutations http://team.babelomics.org
  16. Is the single-gene approach realistic? Can we easily detect all types of disease-related variants? There are several problems: a)Interrogating 60Mb sites (3000 Mb in genomes) produces too many variants. A large number of these segregating with our experimental design b)There is a non-negligible amount of apparently deleterious variants that (apparently) has no pathologic effect c)In many cases we are not targeting rare but common variants (which occur in normal population) d)In many cases only one variant does not explain the disease but rather a combination of them (epistasis) e)Consequently, the few individual variants found associated to the disease usually account for a small portion of the trait heritability
  17. From gene-based to mechanism-based perspective Transforming gene expression values into another value that accounts for a function. Easiest example of modeling function: signaling pathways. Function: transmission of a signal from a receptor to an effector Activations and repressions occur Receptor Effector
  18. A B A B B D C E F A G P(A→G activated) = P(A)P(B)P(D)P(F)P(G) + P(A)P(C)P(E)P(F)P(G) - P(A)P(F)P(G)P(B)P(C)P(D)P(E) Prob. = [1-P(A activated)]P(B activated) Prob. = P(A activated)P(B activated) Activation Inhibition Sub-pathway Modeling pathways
  19. Obtaining probability distributions for ALL the probes in the microarray … Probe 1 Probe 2 Probe 3 : : : Probe 50,000 Using genomic big data A large dataset of Affymetrix microarrays (10,000) is used to adjust a mixture of distributions of gene activity for all the probes. …
  20. Pprobe 0,001 0.89 0.5 …… 0.4 … ON OFF Then, the activation state of any probe from a new microarray can be calculated as a probability: Finally, gene activation probabilities are summarized from their corresponding probes as the 90% percentil value (to avoid outliers) Using probability distributions to estimate gene activation probabilities
  21. Gene activation probabilities are transformed into signal transduction probabilities And the probability of being active for each circuit of each pathway can be calculated as well: We have transformed a physical genomic measure (gene expression) into a value that accounts for cell functionality
  22. What would you predict about the consequences of gene activity changes in the apoptosis pathway in a case control experiment of colorectal cancer? The figure shows the gene up-regulations (red) and down- regulations (blue) The effects of changes in gene activity are not obvious
  23. Apoptosis inhibition is not obvious from gene expression Two of the three possible sub- pathways leading to apoptosis are inhibited in colorectal cancer. Upper panel shows the inhibited sub-pathways in blue. Lower panel shows the actual gene up-regulations (red) and down-regulations (blue) that justify this change in the activity of the sub-pathways
  24. Different pathways cross-talk to deregulate programmed death in Fanconi anemia FA is a rare chromosome instability syndrome characterized by aplastic anemia and cancer and leukemia susceptibility. It has been proposed that disruption of the apoptotic control, a hallmark of FA, accounts for part of the phenotype of the disease. No proliferation No degradation Survival No degradation No apoptosis Activation apoptosis pathway
  25. In silico prediction of actionable genes Models enable the estimation of the effect of gene expression on signal transduction, therefore, KOs (or over-expressions) can easily be simulated Colorectal cancer activates a signaling circuit of VEGF pathway that produces PGI2. Virtual KO of COX2 interrupts the circuit (known therapeutic inhibitor in CRC) COX2 gene KO
  26. Patient’s omic data Biological knowledge Systems biology computational models Epigenomics Regulation Interaction Function Proteomics Genomics and transcriptomics Patient Metabolomics Diagnostic biomarkers Personalized medicine Therapeutic targets Cell culture Best combination Xenograft model Drug treatment Network drugs Personalized therapy Are individualized treatments a realistic option? Dopazo, 2003, Drug Discovery Today
  27. Future prospects We need to efficiently query all the information contained in the genome, including all the epigenomic signatures as well as the structural variation. This involves data integration and “epistatic” queries. We need to prepare our health systems to deal with all the genomic data flood Information about variations Processed Raw Genome variant information (VCF) 150 MB 250 GB Epigenome 150 MB 250 GB Each transcriptome 20 MB 80 GB Individual complete variability 400 MB 525 GB Hospital (100.000 patients) 40 TB 50 PB We are only starting to realize the dimension of the daunting challenges posed by genomic big data There are technical (data size) and conceptual problems (data analysis) in the way genomic information is managed that must be addressed.
  28. Software development See interactive map of for the last 24h use http://bioinfo.cipf.es/toolsusage Babelomics is the third most cited tool for functional analysis. Includes more than 30 tools for advanced, systems-biology based data analysis More than 150.000 experiments were analyzed in our tools during the last year HPC on CPU, SSE4, GPUs on NGS data processing Speedups up to 40X Genome maps is now part of the ICGC data portal Ultrafast genome viewer with google technology Mapping Visualization Functional analysis Variant annotation CellBase Knowledge database Variant prioritization NGS panels Signaling network Regulatory network Interaction network Diagnostic
  29. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the BiER (CIBERER Network of Centers for Rare Diseases) @xdopazo @bioinfocipf
Anzeige