SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
RefSeq curation and annotation of the
reference human genome GRCh38
Kim D. Pruitt
National Center for Biotechnology Information
National Library of Medicine
National Institutes of Health
www.ncbi.nlm.nih.gov/refseq/
RefSeq Background
• RefSeq provides -
• Human genome annotation
• Known transcripts & proteins (manually curated)
• Model transcripts & proteins (annotation pipeline)
• Collaborations -
• Genome Reference Consortium (GRC)
• HUGO Gene Nomenclature Committee (HGNC)
• Consensus CDS (CCDS) Collaboration (HAVANA curators)
• RefSeqGene/Locus Reference Genomic (LRG)/LSDB
RefSeq: www.ncbi.nlm.nih.gov/refseq/ Gene: www.ncbi.nlm.nih.gov/gene/
An NCBI project to provide reference sequence
standards that incorporate current knowledge.
Archaea – Bacteria – Eukaryotes - Virus
Curation support of genic regions of
the reference human assembly
• RefSeqGene and LRG collaboration
• Genomic and cDNA standards for clinical reporting
• Report potential issues to the GRC
• Consensus CDS collaboration
• Stabilized human CDS annotation
• Report potential issues to the GRC
• RefSeq
• Curation of genes, transcript & protein records
• Report potential issues to the GRC
• Review GRC patch updates for gene annotation impact
Genome annotation leverages
curation + computation
Genes:
• Type, location, length
Sequence:
• Accuracy, length
• Alternate splice products
• Functional annotation
Align curated RefSeqs
Align transcripts, proteins
Align RNA-Seq
Filter best alignments
Build model RefSeqs
Assign accessions, GeneID
Evidence-based genome
annotation pipeline
Manual Curation
Sequence - Literature
Transcripts Proteins
Known RefSeqs 50,540 39,363
Model RefSeqs 112,735 60,599
Annotated Genes Count
Protein-coding 20,576
Non-coding 18,037
Pseudogene 12,474
Transition from GRCh37 to GRCh38
• Identify gene/sequence differences vs. GRCh38
• Automatic update at synonymous mismatches
• Curation review of remainder
• >5,100 Known RefSeq transcripts updated since October 2013
• 47,031 Known RefSeqs identical to genome
• 2,916 intentionally retain a mismatch or indel
• ~600 pending
• ~132 genes merged
0 200 400 600 800 1000 1200
2013 Q1
2013 Q3
2014 Q1
2014 Q3
2015 Q1
2015 Q3
Number of updates
* GRCh38 12/24/2013
*
Updating RefSeq to match GRCh38
• Post GRCh38 review:
• NM_173477 updated to match genome (NM_173477.4)
• Model RefSeq XM_005257026.1 promoted to Known RefSeq
GRCh38
GRCh37
alignment
alignment
RefSeq curation & genome
maintenance




 





GRCh38
GRCh37
GRCh37 Issue:
SCX duplication
MROH1 split
GRCh38 update:
Gap closed
MROH1 complete
One SCX gene
gap
RefSeq curation & genome
maintenance
• POLR2A (GeneID:5430) NM_000937.4 has a 2 nt deletion
vs. GRCh38
• This maintains the correct reading frame
GRCh38
alignment
RefSeq curation & genome
maintenance
• RefSeq reported this sequence issue to the GRC
GRCh38 ALT LOCI and PATCHES
Pre-Patch & ALT review
Polymorphic pseudogenes
Haplotype & CNV variation
ALT-specific RefSeq records
Curator-stored placement data
Evidence-based genome
annotation pipeline
Manual Curation
Assembly-ALT alignments
Alignment quality reports
Subsequent genome
annotation build corrects
the annotation
Interim alignment updates
Polymorphic pseudogenes
• RefSeq provides different transcripts to represent the protein-
coding gene versus the pseudogene
• Curators store assembly placement information (chromosome
versus ALT) in a local database
• This is used by annotation pipeline to ensure correct annotation
Assembly Unit GSTT1 GSTT2 GSTT2B GSTTP1 GSTTP2
GRCh38 chr22 null pseudo coding pseudo null
ALT_REF_LOCI_1 coding coding coding pseudo pseudo
An example – GSTT cluster on chromosome 22:
GSTT* variation, chromosome 22
• Copy number variation of glutathione-S-transferase theta genes
is associated with digestive track cancers and more
• Accurate gene annotation is important to downstream users
GRCh38 chr22
GRCh38 ALT
pseudogene
chr22 = null allelecoding allele
ulcerative colitis - laryngeal cancer - esophageal cancer - colorectal cancer
GSTT2 polymorphism
AT splice donor Premature
stop codon
GT splice donor Stop codon
GRCh38 chr22
GRCh38 ALT
GRCh38 chr22 GSTT2 pseudogene
GRCh38 chr22
Data access
• Genes:
• <…ncbi root url…>/gene/
• ftp://ftp.ncbi.nlm.nih.gov/gene/
• NCBI YouTube ‘Download genomic sequence for a gene’
• https://www.youtube.com/watch?v=RHz2nZbzjpA
• RefSeq transcripts and proteins:
• Links from NCBI Gene
• Nucleotide/protein query:
• human[organism] + use facets to specify RefSeq and molecule type
• ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/
• NCBI Genome Annotation
• Links from NCBI Assembly or Genome resources
• <ncbi>/assembly/ or <ncbi>/genome/
Data access to annotated genome
Gene
Assembly details
Genome FTP formats
• FASTA
• genome, transcripts, proteins
• GenBank file format
• – genome transcripts, proteins
• GFF genome annotation
• Feature table
• features and locations in
tabular format
• AGP, Assembly details & statistics
• Repeat masker results
• Md5checksums
• Documentation
• README files
• <ncbi>/genome/doc/ftpfaq/
Acknowledgements
RefSeq Curators
Annotation pipeline
Paul Kitts
Terence Murphy
Francoise Thibaud-Nissen
Eric Cox
Catherine Farrell
Tamara Goldfarb
Tripti Gupta
Vinita Joardar
Vamsi Kodali
Kelly McGarvey
Mike Murphy
Nuala O'Leary
Shashi Pujar
Bhanu Rajput
Sanjida Rangwala
Lillian Riddick
Dave Webb
Matt Wright
Susan Hiatt
www.ncbi.nlm.nih.gov/refseq/
Collaborators
Elspeth Bruford (HGNC)
Jen Harrow (HAVANNA)
Locus-Specific Databases
Expert databases
Individual scientists
NCBI Posters & Booth 2405

Weitere ähnliche Inhalte

Was ist angesagt?

Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonGenome Reference Consortium
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCGenome Reference Consortium
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesGenome Reference Consortium
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)Shaojun Xie
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysisGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyGenome Reference Consortium
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesGenome Reference Consortium
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Genome Reference Consortium
 

Was ist angesagt? (20)

Understanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL HackathonUnderstanding the reference assembly: CSHL Hackathon
Understanding the reference assembly: CSHL Hackathon
 
ABGT 2016 Workshop Schneider
ABGT 2016 Workshop SchneiderABGT 2016 Workshop Schneider
ABGT 2016 Workshop Schneider
 
agbt 2016 workshop church
agbt 2016 workshop churchagbt 2016 workshop church
agbt 2016 workshop church
 
Previewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRCPreviewing GRCm39: Assembly Updates from the GRC
Previewing GRCm39: Assembly Updates from the GRC
 
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic SequencesThe NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
The NCBI Eukaryotic Genome Annotation Pipeline and Alternate Genomic Sequences
 
hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)hg19 (GRCh37) vs. hg38 (GRCh38)
hg19 (GRCh37) vs. hg38 (GRCh38)
 
TAGC2016 schneider
TAGC2016 schneiderTAGC2016 schneider
TAGC2016 schneider
 
agbt 2016 workshop lindsay
agbt 2016 workshop lindsayagbt 2016 workshop lindsay
agbt 2016 workshop lindsay
 
Variant Calling II
Variant Calling IIVariant Calling II
Variant Calling II
 
Theory and practice of graphical population analysis
Theory and practice of graphical population analysisTheory and practice of graphical population analysis
Theory and practice of graphical population analysis
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Agbt2015 workshop schneider
Agbt2015 workshop schneiderAgbt2015 workshop schneider
Agbt2015 workshop schneider
 
Variation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copyVariation graphs and population assisted genome inference copy
Variation graphs and population assisted genome inference copy
 
Grc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudgeGrc ashg2015 workshop_mudge
Grc ashg2015 workshop_mudge
 
Getting the most from the reference assembly
Getting the most from the reference assemblyGetting the most from the reference assembly
Getting the most from the reference assembly
 
Ashg grc workshop2015_tg
Ashg grc workshop2015_tgAshg grc workshop2015_tg
Ashg grc workshop2015_tg
 
Ashg2017 workshop tg
Ashg2017 workshop tgAshg2017 workshop tg
Ashg2017 workshop tg
 
Creating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome AssembliesCreating Reference-Grade Human Genome Assemblies
Creating Reference-Grade Human Genome Assemblies
 
Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...Exploiting long read sequencing technology to build a substantially improved ...
Exploiting long read sequencing technology to build a substantially improved ...
 
Schneider grc workshop_final
Schneider grc workshop_finalSchneider grc workshop_final
Schneider grc workshop_final
 

Andere mochten auch

Μάγιας,Γ.Σκούρτης
Μάγιας,Γ.ΣκούρτηςΜάγιας,Γ.Σκούρτης
Μάγιας,Γ.ΣκούρτηςIliana Kouvatsou
 
Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...
Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...
Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...Operator Warnet Vast Raha
 
Reputación digital_ Universidad de Vigo_160910
Reputación digital_ Universidad de Vigo_160910Reputación digital_ Universidad de Vigo_160910
Reputación digital_ Universidad de Vigo_160910Cristina Aced
 
259881368-Gartner-Research-ERP
259881368-Gartner-Research-ERP259881368-Gartner-Research-ERP
259881368-Gartner-Research-ERPGaurav Ahluwalia
 
Posibilidades educativas de la Realidad Aumentada
Posibilidades educativas de la Realidad AumentadaPosibilidades educativas de la Realidad Aumentada
Posibilidades educativas de la Realidad Aumentadagmsrosario
 
Drew Henry Resume
Drew Henry ResumeDrew Henry Resume
Drew Henry Resumedrew henry
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3GenomeInABottle
 
Configurer la voie modbus du variateur atv31
Configurer la voie modbus du variateur atv31Configurer la voie modbus du variateur atv31
Configurer la voie modbus du variateur atv31valentin Victoire
 
George haydock future-refrigerants
George haydock future-refrigerantsGeorge haydock future-refrigerants
George haydock future-refrigerantsARAaus
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGenome Reference Consortium
 
Clima ecuatoriano
Clima ecuatorianoClima ecuatoriano
Clima ecuatorianojacomec350
 

Andere mochten auch (14)

Μάγιας,Γ.Σκούρτης
Μάγιας,Γ.ΣκούρτηςΜάγιας,Γ.Σκούρτης
Μάγιας,Γ.Σκούρτης
 
Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...
Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...
Sistem pernapasan atau sistem respirasi adalah sistem organ yang digunakan un...
 
Reputación digital_ Universidad de Vigo_160910
Reputación digital_ Universidad de Vigo_160910Reputación digital_ Universidad de Vigo_160910
Reputación digital_ Universidad de Vigo_160910
 
259881368-Gartner-Research-ERP
259881368-Gartner-Research-ERP259881368-Gartner-Research-ERP
259881368-Gartner-Research-ERP
 
Posibilidades educativas de la Realidad Aumentada
Posibilidades educativas de la Realidad AumentadaPosibilidades educativas de la Realidad Aumentada
Posibilidades educativas de la Realidad Aumentada
 
Drew Henry Resume
Drew Henry ResumeDrew Henry Resume
Drew Henry Resume
 
jkkaran 13052016v
jkkaran 13052016v jkkaran 13052016v
jkkaran 13052016v
 
Proses sistem pernapasan
Proses sistem pernapasanProses sistem pernapasan
Proses sistem pernapasan
 
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
Aug2015 Ali Bashir and Jason Chin Pac bio giab_assembly_summary_ali3
 
Configurer la voie modbus du variateur atv31
Configurer la voie modbus du variateur atv31Configurer la voie modbus du variateur atv31
Configurer la voie modbus du variateur atv31
 
George haydock future-refrigerants
George haydock future-refrigerantsGeorge haydock future-refrigerants
George haydock future-refrigerants
 
Graph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regionsGraph and assembly strategies for the MHC and ribosomal DNA regions
Graph and assembly strategies for the MHC and ribosomal DNA regions
 
Clima ecuatoriano
Clima ecuatorianoClima ecuatoriano
Clima ecuatoriano
 
Identidad Digital y Reputación Online
Identidad Digital y Reputación OnlineIdentidad Digital y Reputación Online
Identidad Digital y Reputación Online
 

Ähnlich wie Ashg2015 grc-pruitt

Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfATPowr
 
Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Integrated DNA Technologies
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim D. Pruitt
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinicalGolden Helix
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Elia Brodsky
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Genome Reference Consortium
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907GenomeInABottle
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Stuart MacGowan
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGenomeInABottle
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSGolden Helix
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseNathan Olson
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisGolden Helix
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030GenomeInABottle
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxHaibo Liu
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016GenomeInABottle
 

Ähnlich wie Ashg2015 grc-pruitt (20)

Full-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdfFull-length cDNA Sequencing.pdf
Full-length cDNA Sequencing.pdf
 
Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...Getting started with CRISPR: a review of gene knockout and homology-directed ...
Getting started with CRISPR: a review of gene knockout and homology-directed ...
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 Using the GRCh38 reference assembly for clinical interpretation in VSClinical Using the GRCh38 reference assembly for clinical interpretation in VSClinical
Using the GRCh38 reference assembly for clinical interpretation in VSClinical
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)Advancements in the human genome reference assembly (GRCh38)
Advancements in the human genome reference assembly (GRCh38)
 
2023 GIAB AMP Update
2023 GIAB AMP Update2023 GIAB AMP Update
2023 GIAB AMP Update
 
Benchmarking with GIAB 220907
Benchmarking with GIAB 220907Benchmarking with GIAB 220907
Benchmarking with GIAB 220907
 
Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)Review of Liao et al - A draft human pangenome reference - Nature (2023)
Review of Liao et al - A draft human pangenome reference - Nature (2023)
 
Ashg2017 workshop schneider
Ashg2017 workshop schneiderAshg2017 workshop schneider
Ashg2017 workshop schneider
 
GIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM ForumGIAB for AMP GeT-RM Forum
GIAB for AMP GeT-RM Forum
 
Large Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVSLarge Scale PCA Analysis in SVS
Large Scale PCA Analysis in SVS
 
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference DatabaseDevelopment of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Database
 
ChIP-seq Theory
ChIP-seq TheoryChIP-seq Theory
ChIP-seq Theory
 
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic AnalysisVarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
VarSeq 2.6.0: Advancing Pharmacogenomics and Genomic Analysis
 
Enfin, DAS and BioMart
Enfin, DAS and BioMartEnfin, DAS and BioMart
Enfin, DAS and BioMart
 
HUGenomics: a support for personalized medicine research
HUGenomics: a support for personalized medicine researchHUGenomics: a support for personalized medicine research
HUGenomics: a support for personalized medicine research
 
Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030Genome in a bottle for amp GeT-RM 181030
Genome in a bottle for amp GeT-RM 181030
 
Whole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptxWhole exome sequencing data analysis.pptx
Whole exome sequencing data analysis.pptx
 
Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016Genome in a bottle for ashg grc giab workshop 181016
Genome in a bottle for ashg grc giab workshop 181016
 

Mehr von Genome Reference Consortium

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?Genome Reference Consortium
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesGenome Reference Consortium
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectGenome Reference Consortium
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amGenome Reference Consortium
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsGenome Reference Consortium
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsGenome Reference Consortium
 

Mehr von Genome Reference Consortium (19)

What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?What's new and what's next for the human reference assembly?
What's new and what's next for the human reference assembly?
 
Telomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomesTelomere-to-telomere assembly of a complete human chromosomes
Telomere-to-telomere assembly of a complete human chromosomes
 
Genome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkitGenome variation graphs with the vg toolkit
Genome variation graphs with the vg toolkit
 
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) ProjectThe Matched Annotation from NCBI and EMBL-EBI (MANE) Project
The Matched Annotation from NCBI and EMBL-EBI (MANE) Project
 
Why graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 amWhy graph genome storage and updating wakes me up at 4 am
Why graph genome storage and updating wakes me up at 4 am
 
Mane v2 final
Mane v2 finalMane v2 final
Mane v2 final
 
Lrg and mane 16 oct 2018
Lrg and mane   16 oct 2018Lrg and mane   16 oct 2018
Lrg and mane 16 oct 2018
 
20181016 grc presentation-pa
20181016 grc presentation-pa20181016 grc presentation-pa
20181016 grc presentation-pa
 
2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final2018 1016 trio_binning_ashg_arhie_final
2018 1016 trio_binning_ashg_arhie_final
 
Ashg sedlazeck grc_share
Ashg sedlazeck grc_shareAshg sedlazeck grc_share
Ashg sedlazeck grc_share
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
101717.kh miga ashg_grc
101717.kh miga ashg_grc101717.kh miga ashg_grc
101717.kh miga ashg_grc
 
AGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: FultonAGBT2017 Reference Workshop: Fulton
AGBT2017 Reference Workshop: Fulton
 
AGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: SchneiderAGBT2017 Reference Workshop: Schneider
AGBT2017 Reference Workshop: Schneider
 
AGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: LindsayAGBT2017 Reference Workshop: Lindsay
AGBT2017 Reference Workshop: Lindsay
 
Haplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long readsHaplotype resolved structural variation assembly with long reads
Haplotype resolved structural variation assembly with long reads
 
Everyday de novo diploid assembly
Everyday de novo diploid assemblyEveryday de novo diploid assembly
Everyday de novo diploid assembly
 
Genome in a Bottle
Genome in a BottleGenome in a Bottle
Genome in a Bottle
 
ClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materialsClinVar: Getting the most from the reference assembly and reference materials
ClinVar: Getting the most from the reference assembly and reference materials
 

Kürzlich hochgeladen

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryAlex Henderson
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Silpa
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxRenuJangid3
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Silpa
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfSumit Kumar yadav
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceAlex Henderson
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxDiariAli
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfrohankumarsinghrore1
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...Scintica Instrumentation
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 

Kürzlich hochgeladen (20)

FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.Porella : features, morphology, anatomy, reproduction etc.
Porella : features, morphology, anatomy, reproduction etc.
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.Selaginella: features, morphology ,anatomy and reproduction.
Selaginella: features, morphology ,anatomy and reproduction.
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Chemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdfChemistry 5th semester paper 1st Notes.pdf
Chemistry 5th semester paper 1st Notes.pdf
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptxClimate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
Climate Change Impacts on Terrestrial and Aquatic Ecosystems.pptx
 
Exploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdfExploring Criminology and Criminal Behaviour.pdf
Exploring Criminology and Criminal Behaviour.pdf
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 

Ashg2015 grc-pruitt

  • 1. RefSeq curation and annotation of the reference human genome GRCh38 Kim D. Pruitt National Center for Biotechnology Information National Library of Medicine National Institutes of Health www.ncbi.nlm.nih.gov/refseq/
  • 2. RefSeq Background • RefSeq provides - • Human genome annotation • Known transcripts & proteins (manually curated) • Model transcripts & proteins (annotation pipeline) • Collaborations - • Genome Reference Consortium (GRC) • HUGO Gene Nomenclature Committee (HGNC) • Consensus CDS (CCDS) Collaboration (HAVANA curators) • RefSeqGene/Locus Reference Genomic (LRG)/LSDB RefSeq: www.ncbi.nlm.nih.gov/refseq/ Gene: www.ncbi.nlm.nih.gov/gene/ An NCBI project to provide reference sequence standards that incorporate current knowledge. Archaea – Bacteria – Eukaryotes - Virus
  • 3. Curation support of genic regions of the reference human assembly • RefSeqGene and LRG collaboration • Genomic and cDNA standards for clinical reporting • Report potential issues to the GRC • Consensus CDS collaboration • Stabilized human CDS annotation • Report potential issues to the GRC • RefSeq • Curation of genes, transcript & protein records • Report potential issues to the GRC • Review GRC patch updates for gene annotation impact
  • 4. Genome annotation leverages curation + computation Genes: • Type, location, length Sequence: • Accuracy, length • Alternate splice products • Functional annotation Align curated RefSeqs Align transcripts, proteins Align RNA-Seq Filter best alignments Build model RefSeqs Assign accessions, GeneID Evidence-based genome annotation pipeline Manual Curation Sequence - Literature Transcripts Proteins Known RefSeqs 50,540 39,363 Model RefSeqs 112,735 60,599 Annotated Genes Count Protein-coding 20,576 Non-coding 18,037 Pseudogene 12,474
  • 5. Transition from GRCh37 to GRCh38 • Identify gene/sequence differences vs. GRCh38 • Automatic update at synonymous mismatches • Curation review of remainder • >5,100 Known RefSeq transcripts updated since October 2013 • 47,031 Known RefSeqs identical to genome • 2,916 intentionally retain a mismatch or indel • ~600 pending • ~132 genes merged 0 200 400 600 800 1000 1200 2013 Q1 2013 Q3 2014 Q1 2014 Q3 2015 Q1 2015 Q3 Number of updates * GRCh38 12/24/2013 *
  • 6. Updating RefSeq to match GRCh38 • Post GRCh38 review: • NM_173477 updated to match genome (NM_173477.4) • Model RefSeq XM_005257026.1 promoted to Known RefSeq GRCh38 GRCh37 alignment alignment
  • 7. RefSeq curation & genome maintenance            GRCh38 GRCh37 GRCh37 Issue: SCX duplication MROH1 split GRCh38 update: Gap closed MROH1 complete One SCX gene gap
  • 8. RefSeq curation & genome maintenance • POLR2A (GeneID:5430) NM_000937.4 has a 2 nt deletion vs. GRCh38 • This maintains the correct reading frame GRCh38 alignment
  • 9. RefSeq curation & genome maintenance • RefSeq reported this sequence issue to the GRC
  • 10. GRCh38 ALT LOCI and PATCHES Pre-Patch & ALT review Polymorphic pseudogenes Haplotype & CNV variation ALT-specific RefSeq records Curator-stored placement data Evidence-based genome annotation pipeline Manual Curation Assembly-ALT alignments Alignment quality reports Subsequent genome annotation build corrects the annotation Interim alignment updates
  • 11. Polymorphic pseudogenes • RefSeq provides different transcripts to represent the protein- coding gene versus the pseudogene • Curators store assembly placement information (chromosome versus ALT) in a local database • This is used by annotation pipeline to ensure correct annotation Assembly Unit GSTT1 GSTT2 GSTT2B GSTTP1 GSTTP2 GRCh38 chr22 null pseudo coding pseudo null ALT_REF_LOCI_1 coding coding coding pseudo pseudo An example – GSTT cluster on chromosome 22:
  • 12. GSTT* variation, chromosome 22 • Copy number variation of glutathione-S-transferase theta genes is associated with digestive track cancers and more • Accurate gene annotation is important to downstream users GRCh38 chr22 GRCh38 ALT pseudogene chr22 = null allelecoding allele ulcerative colitis - laryngeal cancer - esophageal cancer - colorectal cancer
  • 13. GSTT2 polymorphism AT splice donor Premature stop codon GT splice donor Stop codon GRCh38 chr22 GRCh38 ALT
  • 14. GRCh38 chr22 GSTT2 pseudogene GRCh38 chr22
  • 15. Data access • Genes: • <…ncbi root url…>/gene/ • ftp://ftp.ncbi.nlm.nih.gov/gene/ • NCBI YouTube ‘Download genomic sequence for a gene’ • https://www.youtube.com/watch?v=RHz2nZbzjpA • RefSeq transcripts and proteins: • Links from NCBI Gene • Nucleotide/protein query: • human[organism] + use facets to specify RefSeq and molecule type • ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/ • NCBI Genome Annotation • Links from NCBI Assembly or Genome resources • <ncbi>/assembly/ or <ncbi>/genome/
  • 16. Data access to annotated genome Gene Assembly details
  • 17. Genome FTP formats • FASTA • genome, transcripts, proteins • GenBank file format • – genome transcripts, proteins • GFF genome annotation • Feature table • features and locations in tabular format • AGP, Assembly details & statistics • Repeat masker results • Md5checksums • Documentation • README files • <ncbi>/genome/doc/ftpfaq/
  • 18. Acknowledgements RefSeq Curators Annotation pipeline Paul Kitts Terence Murphy Francoise Thibaud-Nissen Eric Cox Catherine Farrell Tamara Goldfarb Tripti Gupta Vinita Joardar Vamsi Kodali Kelly McGarvey Mike Murphy Nuala O'Leary Shashi Pujar Bhanu Rajput Sanjida Rangwala Lillian Riddick Dave Webb Matt Wright Susan Hiatt www.ncbi.nlm.nih.gov/refseq/ Collaborators Elspeth Bruford (HGNC) Jen Harrow (HAVANNA) Locus-Specific Databases Expert databases Individual scientists
  • 19. NCBI Posters & Booth 2405