Anzeige
Anzeige

Más contenido relacionado

Presentaciones para ti(20)

Similar a The server of the Spanish Population Variability(20)

Anzeige
Anzeige

The server of the Spanish Population Variability

  1. CIBERER Exome Server (CES) The server of the Spanish Population Variability Joaquín Dopazo, PhD Department of Computational Genomics, CIPF, Valencia Hospital Universitario La Paz, Madrid 28 de abril, 2014
  2. Why is interesting to have a Spanish Exome Variant repository Rationale: Local variability is more important than previously thought. The existence of numerous local rare variants, many of them (apparently) deleterious hampers the prioritization of disease variants. Data recycling: CIBERER has accumulated a large number of samples that can be used as (pseudo)controls of normal population
  3. Pipeline of data analysis Primary processing Initial QC FASTQ file Mapping BAM file Variant calling VCF File Knowledge-based prioritization Proximity to other known disease genes Functional proximity Network proximity Burden tests Other prioritization methods Secondary analysis (Successive filtering) Variant annotation Filtering by effect Filtering by MAF Filtering by family segregation Primary analysis Gene prioritization 1000 genomes EVS Local variants
  4. Use known variants and their population frequencies to filter out. •Typically dbSNP, 1000 genomes and the 6515 exomes from the ESP are used as sources of population frequencies. •We selected 75 local controls to add and extra filtering step to the analysis pipeline Novembre et al., 2008. Genes mirror geography within Europe. Nature Comparison of Spanish controls to 1000g How important do you think is local information to detect disease genes?
  5. Filtering with or without local variants Number of genes as a function of individuals in the study of a dominant disease Retinitis Pigmentosa autosomal dominant The use of local variants makes an enormous difference
  6. What do we know about the Spanish population Variability?
  7. Using CIBERER families to create a first version of the database of local variability of Spanish population •In each family we select two unrelated members (preferably the parents) •If there are no parents, then one of the unaffected children (unaffected, if possible) are selected •A total of 75, out of the 136 samples available among the families analyzed in the BiER, were initially selected. •Variant files (VCF) were obtained following the same pipeline (with missing values included) and merged. •Genotype proportions and MAFs were obtained for all the variable positions. ONLY this information is used in the web server.
  8. Samples used UNIT n % U723 12 16 U737 11 14,7 U759 2 2,7 U705 10 13,3 U720 12 16 U732 1 1,3 U755 3 4 U746 9 12 U728 2 2,7 U729 3 4 U703 7 9,3 U718 1 1,3 U730 2 2,7 Total 75 100 DISEASE n % 3-Methylglutaconic aciduria 11 14,7 Atypical fracture 4 5,3 Autosomal DOMINANT non-syndromic hearing loss 1 1,3 Autosomal RECESSIVE non-syndromic hearing loss 1 1,3 BCKDK-deficiency disease 2 2,7 CMT 1 1,3 Congenital disorder of glycosylation types I and II 8 10,7 CoQ disease 3 4,0 CoQ10 deficiency and DNA depletion 3 4,0 CoQ10 deficiency 2 2,7 Inherited Metabolic Disease 2 2,7 MMD (Multiple deletion of mitochondrial DNA) 4 5,3 MSUD (Maple Syrup Urine Disease) 1 1,3 Opitz 8 10,7 Pelizaeus-like 2 2,7 RCD (Respiratory complexes deficiency) 8 10,7 Retinitis pigmentosa 11 14,7 Usher 3 4,0 Total 75 100,0 Gender Man Woman Phenotype Affected Healthy
  9. Variability spectrum of the Spanish population A total of 131.897 variant positions, unique in Spanish population, were detected in all the 75 samples together. Approximately 90.000 were singletons. 51.295 variants are non-synonymous changes and 18.450 correspond to synonymous changes (singleton-driven pattern, opposite to variants shared with 1000g and EVS, from polymorphic positions).
  10. The CIBERER Exome Server (CES): the first repository of variability of the Spanish population Only another similar initiative exists: the GoNL http://www.nlgenome.nl/ http://ciberer.es/bier/exome-server/
  11. Information provided Genotypes in the different reference populations Genomic coordinates, variation, and gene. SNPid if any
  12. Information provided PolyPhen and SIFT patogenicity indexes Phenotyphe, if available
  13. Variants can also be seen in their genomic context GenomeMaps viewer (Medina et al., 2013, NAR) embedded in the application. GenomeMaps is the official genome viewer of the ICGC (http://dcc.icgc.org/)
  14. Occurrence of pathological variants in “normal” population Reference genome is mutated Nine carriers in 1000 genomes One affect and 73 carriers in EVS
  15. Current usage options Query Configuration of the display Genomic context
  16. Spanish variability database. FAQ What is stored in the database? ONLY frequencies of the genotypes observed in the positions in which variants have been found in at least one individual. This information is obtained from Spanish unrelated individuals. What information is provided by the database? Aggregated information on the genotype frequencies of the variable position in the gene(s) requested. Is possible to know that a particular individual is stored in the database? No, unless you sequence the individual and check if the genotype frequencies are compatible with the database, but seems stupid because you already have the information pursued. Lets imagine that I am stupid and managed to know that the individual is in the database, can I retrieve her/his genome? No, it is impossible from the aggregated information
  17. Spanish variability database. FAQ Who can contribute? Anyone (especially if you are sequencing with public resources) What do you need to submit? Anonymized files of variants (VCF: variant calling format) Why VCFs? Because we need to check that your contribution contains no relatives of the individuals in the database
  18. What’s next? •Strategic steps: –Populating the database with contributions of CIBERER and externals. Future project SPANEx –Opening the database •Technical steps: –Automatic access to the local variability data via webservices –Use in gene discovery pipelines –Use for the interpretation of incidental findings in diagnostic panels
  19. Table of Spanish Frequencies (TSF) DB of Spanish variants (DBSV) Chr Position Ref Alt 0/0 0/1 1/1 1 1365313 A T 75 0 0 1 1484884 G A 70 4 1 2 326252 T C 25 35 15 CES use Other countries CES input External Unrelated? (DBSV) VCFs Spanish? (TSF) YES YES NO NO Counts Internal Regional
  20. Future of the Database of variation in Spanish population CIBERER contributions SPANEx contributions
  21. CIBERER 76 samples Unaffected CES II 76+269+X Mixed MGP 269 samples Healthy controls Phase I Phase II Phase III CES II 1000+76+269+X Mixed More CIBERER samples SPANEX: 1000 exomes CIBERER CIBERER exome server roadmap 2014-June 2014 2015
  22. Future utilization. Access via webservices Access to aggregated data of variation and genotype frequencies. Therefore, no confidentiality or privacy issues associated. Spanish variation database CellBase. (Bleda et al., 2012. NAR) Our data server system. Now at the EBI
  23. NA19660 NA19661 NA19600 NA19685 BiERapp: the interactive filtering tool for easy candidate prioritization http://bierapp.babelomics.org
  24. Panel (real or virtual) manager Tool for defining panels New filter based on local population variant frequencies If no diagnostic variants appear, then secondary findings can be studied Diagnostic mutations http://team.babelomics.org
  25. Take home message •Local variability is critical for distinguishing real pathologic variants from local polymorphisms •CES will be populated with the SPANEX project (M.A. Moreno talk) •CES is the starting point of a more ambitious crowdsourcing project that aims at constructing a high-resolution map of the Spanish population variation •Contributions to CES are compliant with confidentially issues. No patient information is shared, only statistical information.
  26. The Computational Genomics Department at the Centro de Investigación Príncipe Felipe (CIPF), Valencia, Spain, and… ...the INB, National Institute of Bioinformatics (Functional Genomics Node) and the CIBERER Network of Centers for Rare Diseases, and… ...the Medical Genome Project (Sevilla) @xdopazo @bioinfocipf
Anzeige