SlideShare ist ein Scribd-Unternehmen logo
1 von 65
Deanna M. Church
Staff Scientist, NCBI
@deannachurch
The intersection of genome
assembly and variation
management.
http://genomereference.org
Valerie Schneider, NCBI
Variation ResourcesTeam at NCBI
Ming Ward
Lon Phan
Brad Holmes
Anna Glodek
Michael Kholodov
Rama Maiti
Juliana Sampson
David Shao
Eugene Shekhtman
Qiang Wang
Hua Zhang
Donna Maglott
Melissa Landrum
Jennifer Lee
George Riley
Ray Tully
Craig Wallin
Shanmuga Chitipiralla
Douglas Hoffman
Wonhee Jang
Ken Katz
Michael Ovetsky
Ricardo Villamarin
Tim Hefferon
John Lopez
John Garner
Chao Chen
Learning Objectives
Why the reference assembly matters for your analysis
How the reference assembly is changing
Tools and Resources to find data
Why should you care about
the Reference Assembly?
Genes, NCBI Homo sapiens Annotation Release 105
Transcript
CDS
dbSNP Build 138 using annotation release 104
http://www.bioplanet.com/gcat
What is the
Reference Assembly?
An assembly is a MODEL of the genome
BAC insert
BAC vector
Shotgun sequence
Assemble
GAPS
“finishers” go in to manually
fill the gaps, often by PCR
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321
RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7
Gaps
http://genomereference.org
NCBI36 (hg18)
GRCh37(hg19)
NCBI35 (hg17)
GRCh37 (hg19)
AL139246.20
AL139246.21
Build sequence contigs based on contigs
defined in TPF (Tiling Path File).
Check for orientation consistencies
Select switch points
Instantiate sequence for further analysis
Switch point
Consensus sequence
NCBI36
nsv832911 (nstd68) Submitted on NCBI35 (hg17)
NCBI35 (hg17) Tiling Path
GRCh37 (hg19) Tiling Path
Gap Inserted
Moved approximately 2 Mb
distal on chr15
NC_0000015.8 (chr15)
NC_0000015.9 (chr15)
Removed from assembly
Added to assembly
HG-24
Sequences from haplotype 1
Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
AC074378.4
AC079749.5
AC134921.2
AC147055.2
AC140484.1
AC019173.4
AC093720.2
AC021146.7
NCBI36NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37NC_000004.11 (chr4) Tiling Path
AC074378.4
AC079749.5
AC134921.1
AC147055.2
AC093720.2
AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4
AC140484.1
AC019173.4
AC226496.2
AC021146.7
TMPRSS11E2
nsv532126 (nstd37)
GRCh37 (hg19)
http://genomereference.org
7 alternate haplotypes
at the MHC
Alternate loci released as:
FASTA
AGP
Alignment to chromosome
UGT2B17 MHC MAPT
MHC (chr6)
Chr 6 representation (PGF)
Alt_Ref_Locus_2 (COX)
Data management and the
Reference Assembly?
NC_000086.123456 CM001013.17 2Mouse chrX: 34,800,000-34,890,000
Mouse chrX: 35,000,000-36,000000
X
MGSCv3 MGSCv36
ABC14-1065514J1
GapsPhase LengthDate
FP565796.1 1 121-Oct-2009
FP565796.2 1 014-Oct-2010
FP565796.3 3 007-Nov-2010
hg19
GRCh37
mm8
MGSCv37
NCBIM37
danRer5
Zv7
chr21:8,913,216-9,246,964
Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX
http://www.ncbi.nlm.nih.gov/genome/assembly
GenBank RefSeqvs
Submitter Owned RefSeq Owned
Redundancy Non-Redundant
Updated rarely Curated
INSDC Not INSDC
BRCA1
83 genomic records
31 mRNA records
27 protein records
3 genomic records
5 mRNA records
1 RNA record
5 protein records
http://www.ncbi.nlm.nih.gov/refseq/rsg
http://www.lrg-sequence.org/
http://www.ncbi.nlm.nih.gov/refseq/rsg
RefSeq Gene
L R
http://www.ncbi.nlm.nih.gov/genome/tools/remap
From Assembly 1 <-> Assembly 2
Assembly <-> RefSeqGene/LRG
Primary Assembly <-> Alternate loci
Variant Calling and the
Reference Assembly
Kidd et al, 2007APOBEC cluster
Part of chr22 assembly
Alternate locus for chr22
White: Insertion
Black: Deletion
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Hydin: chr16 (16q22.2)
Hydin2: chr1 (1q21.1)
Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
(Paralogous)
(Allelic)
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Doggett et al., 2006
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
CDC27
1KG Phase 1 Strict accessibility mask
SNP (all)
SNP (not 1KG)
Sudmant et al., 2010
Issues with the
Reference Assembly
http://genomereference.org
Dennis et al., 2012
1q32 1q21 1p21
1p21 patch alignment to chromosome 1
Fixing Rare/Incorrect Bases
Adding Novel Sequence
Karen Miga and Jim Kent arXiv:1307.0035
Preview of GRCh38 (scheduled Fall 2013)
TEX28 TKTL1
LOC101060233
(opsin related)
LOC101060234
(TEX28 related)
GRCh37 (current reference assembly)
NC_000023.10 (chrX)
NW_003871103.3
FAM23_MRC1 Region, chr10
Segmental Duplications
1KG accessibility Mask
Novel Patch 250 kb of artificial duplication
Adding Novel Sequence
GRCh37p13
120 Fix Patches
60 Novel
Human Resolved for GRCh38
http://genomereference.org
How to identify problem
regions in the
Reference Assembly
1000 Genomes Browser: http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
GeT-RM Browser: http://www.ncbi.nlm.nih.gov/variation/tools/getrm
Variation Viewer: http://www.ncbi.nlm.nih.gov/variation/view (coming Oct 2013!)
Tiling Path
Sequence Bar
Segmental Duplications, Eichler Lab
1000 Genomes strict accessibility mask
Annotated clone assembly problems
dbSNP Build 138 based on annotation run 104
Model based paralogous sequence differences, NCBI annotation run #
Paralogous/pseudo gene alignments, NCBI annotation run #
Single Unique Nucleotide (SUN) map, Sudmant 2010
ClinVar Long Variations
GRC Curation Issues
ClinVar Short Variations

Weitere ähnliche Inhalte

Was ist angesagt?

Jc synthetic biology 6-15-2012
Jc synthetic biology   6-15-2012Jc synthetic biology   6-15-2012
Jc synthetic biology 6-15-2012
Diane Wu
 
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses
GenomeInABottle
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Databricks
 
Genome Editing Comes Of Age
Genome Editing Comes Of AgeGenome Editing Comes Of Age
Genome Editing Comes Of Age
Chris Thorne
 

Was ist angesagt? (20)

The future of gene editing
The future of gene editingThe future of gene editing
The future of gene editing
 
171017 giab for giab grc workshop
171017 giab for giab grc workshop171017 giab for giab grc workshop
171017 giab for giab grc workshop
 
Introduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seqIntroduction to Single-cell RNA-seq
Introduction to Single-cell RNA-seq
 
Crispr cas9-Creative Biogene
Crispr cas9-Creative BiogeneCrispr cas9-Creative Biogene
Crispr cas9-Creative Biogene
 
Jc synthetic biology 6-15-2012
Jc synthetic biology   6-15-2012Jc synthetic biology   6-15-2012
Jc synthetic biology 6-15-2012
 
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Artificial chromosomes
Artificial chromosomesArtificial chromosomes
Artificial chromosomes
 
Examining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencingExamining gene expression and methylation with next gen sequencing
Examining gene expression and methylation with next gen sequencing
 
Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...Variant calling and how to prioritize somatic mutations and inheritated varia...
Variant calling and how to prioritize somatic mutations and inheritated varia...
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
Alvis Brazma, Array Express Gene Expression Atlas, fged_seattle_2013
 
Open Minds Bring Open Collaborations
Open Minds Bring Open CollaborationsOpen Minds Bring Open Collaborations
Open Minds Bring Open Collaborations
 
Micro array study for gene expression in vp
Micro array study for gene expression in vpMicro array study for gene expression in vp
Micro array study for gene expression in vp
 
Whole exome sequencing(wes)
Whole exome sequencing(wes)Whole exome sequencing(wes)
Whole exome sequencing(wes)
 
artificial chromosome
artificial chromosomeartificial chromosome
artificial chromosome
 
140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses140127 rtg phased pedigree analyses
140127 rtg phased pedigree analyses
 
Abrf 2017 hadfield j
Abrf 2017 hadfield jAbrf 2017 hadfield j
Abrf 2017 hadfield j
 
RNA-Seq with R-Bioconductor
RNA-Seq with R-BioconductorRNA-Seq with R-Bioconductor
RNA-Seq with R-Bioconductor
 
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
Building Genomic Data Processing and Machine Learning Workflows Using Apache ...
 
Genome Editing Comes Of Age
Genome Editing Comes Of AgeGenome Editing Comes Of Age
Genome Editing Comes Of Age
 

Andere mochten auch (7)

Church sfaf13
Church sfaf13Church sfaf13
Church sfaf13
 
Church iowa2013
Church iowa2013Church iowa2013
Church iowa2013
 
Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013Church_GenomeAccess_2013_genome2013
Church_GenomeAccess_2013_genome2013
 
Church GeT-RM
Church GeT-RMChurch GeT-RM
Church GeT-RM
 
Church_NCBIvariation2013
Church_NCBIvariation2013Church_NCBIvariation2013
Church_NCBIvariation2013
 
PNÖMONİLER (fazlası için www.tipfakultesi.org )
PNÖMONİLER (fazlası için www.tipfakultesi.org )PNÖMONİLER (fazlası için www.tipfakultesi.org )
PNÖMONİLER (fazlası için www.tipfakultesi.org )
 
Church SFAF2014 keynote
Church SFAF2014 keynoteChurch SFAF2014 keynote
Church SFAF2014 keynote
 

Ähnlich wie Church emory2013

Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
Nils Gehlenborg
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Elia Brodsky
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
Sean Davis
 

Ähnlich wie Church emory2013 (20)

Visualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All TogetherVisualization Approaches for Biomedical Omics Data: Putting It All Together
Visualization Approaches for Biomedical Omics Data: Putting It All Together
 
Dna microarray mehran- u of toronto
Dna microarray  mehran- u of torontoDna microarray  mehran- u of toronto
Dna microarray mehran- u of toronto
 
Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015Kim Pruitt trainingbiocuration2015
Kim Pruitt trainingbiocuration2015
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
155 dna microarray
155 dna microarray155 dna microarray
155 dna microarray
 
Dna microarray mehran
Dna microarray  mehranDna microarray  mehran
Dna microarray mehran
 
New data from giab genomes strand-seq
New data from giab genomes   strand-seqNew data from giab genomes   strand-seq
New data from giab genomes strand-seq
 
Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3Introduction to Bioinformatics: Part 3
Introduction to Bioinformatics: Part 3
 
GIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant posterGIAB ASHG 2019 Structural Variant poster
GIAB ASHG 2019 Structural Variant poster
 
NCBI
NCBINCBI
NCBI
 
Church gia13
Church gia13Church gia13
Church gia13
 
Genomics Technologies
Genomics TechnologiesGenomics Technologies
Genomics Technologies
 
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
XabTracker & SeqAgent: Integrated LIMS & Sequence Analysis Tools for Antibody...
 
Analyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation SequencingAnalyzing Fusion Genes Using Next-Generation Sequencing
Analyzing Fusion Genes Using Next-Generation Sequencing
 
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
Big Data at Golden Helix: Scaling to Meet the Demand of Clinical and Research...
 
CRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and HowCRISPR Screening: the What, Why and How
CRISPR Screening: the What, Why and How
 
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
Mastering RNA-Seq (NGS Data Analysis) - A Critical Approach To Transcriptomic...
 
OKC Grand Rounds 2009
OKC Grand Rounds 2009OKC Grand Rounds 2009
OKC Grand Rounds 2009
 
Ashg2014 grc workshop_schneider
Ashg2014 grc workshop_schneiderAshg2014 grc workshop_schneider
Ashg2014 grc workshop_schneider
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 

Mehr von Deanna Church (10)

Church apr2013
Church apr2013Church apr2013
Church apr2013
 
Church ngs
Church ngsChurch ngs
Church ngs
 
Church agbt13 merge
Church agbt13 mergeChurch agbt13 merge
Church agbt13 merge
 
Church clinical2012
Church clinical2012Church clinical2012
Church clinical2012
 
Church isca2012
Church isca2012Church isca2012
Church isca2012
 
Church nhgri 2012
Church nhgri 2012Church nhgri 2012
Church nhgri 2012
 
Church gmod2012 pt2
Church gmod2012 pt2Church gmod2012 pt2
Church gmod2012 pt2
 
Church gmod2012 pt1
Church gmod2012 pt1Church gmod2012 pt1
Church gmod2012 pt1
 
Imgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorialImgc2011 bioinformatics tutorial
Imgc2011 bioinformatics tutorial
 
Church Fif2009
Church Fif2009Church Fif2009
Church Fif2009
 

Kürzlich hochgeladen

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Kürzlich hochgeladen (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 

Church emory2013

Hinweis der Redaktion

  1. Signpost for biological knowledge: ideogram + list of tracks.
  2. To address assembly issues the GRC to centralize the production of the reference assembly. This gives the community a single point of contact for reporting problems and finding information about the assembly. Additionally, we serve as an aggregator of information- as individual labs find or fix problems, we can integrate this information into the reference assembly so everyone can have access to this data.
  3. Insert dot matrix alignment- pull from assembly-assembly alignments
  4. Alignments refer to pairs of sequence. Once you know how a pair of sequences go together, you can look at stringing the pairs along into a contig. The contig is essentially the consensus sequence that is produced from the components.To create a contig, we use the steps shown on this slide.What are switch points? As you create the consensus sequence of the contig, the switch points tell you where to stop using the sequence from one component and begin using the sequence from the next.
  5. If you are not using the entire assembly in your efforts, you may be missing genes in your exome capture reagents.
  6. Show alignment of a feature from first slide to show how far down the chromosome it has moved…
  7. Keeping track of people is way easier than keeping track of assemblies.
  8. RefSeqGene/LRG screen shot: stable coordinate system for gene level reporting. Gene centric genomic sequences.
  9. Distribution of RefSeqGenes on GRCh37
  10. Remap
  11. Look up how much novel sequence addedAcross all patches: 35 Mb of sequence added
  12. For the intermediate build GRCh37B, we are updating a subset of the high-confidence bases, about 1000, as our proof-of-principle. This panel shows reads from NA12878 aligned to chr. 19 that identify a base with MAF=0 in the LIN37 locus. This creates a non-consensus splice site.To create accessioned sequence for correcting the reference, we are using cortex_con (Iqbal and Caccamo) to generate mini-contigs (&gt;= 50 bp) from collections of 1kG and RP11 WGS reads, the former selected from random 1kG populations.
  13. There are several mechanisms we can use for capturing decoy.Much of the decoy represents centromeric repeat sequence. In collaboration with Karen Hayden in Jim Kent’s lab at UCSC, the GRC is planning to include modeled centromeric sequences in GRCh38.
  14. Adding NOVEL sequence for GRCh38 doesn’t just mean adding sequence that is completely unrepresented in GRCh37. While many of the NOVEL patches, like the one on the previous slide, represent indels, adding novel sequence also means adding sequence variants for regions too complex to be represented by a single path.There is substantial variation at the LRC/KIR region on chr. 19. As shown on this slide, not only has the GRC replaced the GRCh37 path, which was derived from components from different clone libraries, with a single haplotype path from the CHM1 assembly, it also now has 8 different haplotypes represented as alternate loci. The addition of another 10+ haplotypes at this locus is also under consideration.
  15. Update to GRCh37.p13The GRC has been releasing patches to the human assembly on a quarterly cycle, and we’re now at GRCh37.p12. There are two varieties of patches:FIX patches correct existing assembly problems: chromosome will update, patches integrated in GRCh38NOVEL patches add new sequence representations: will become alternate lociThis ideogram shows the current distribution of patches and alternate loci, and you can see that many regions have changed since GRCh37. Note that approximately 3% of the current public human assembly GRCh37 is associated with a region that is represented by a patch or alternate locus.
  16. Browsers: basic setups
  17. Configuring tracks with the Configurator
  18. Tracks of interest
  19. GBA (Glucosidase, beta, acid)