SlideShare ist ein Scribd-Unternehmen logo
1 von 30
Downloaden Sie, um offline zu lesen
Multiple mouse reference
genomes and strain specific
gene annotations
Thomas Keane,
Wellcome Trust Sanger Institute
@drtkeane @mousegenomes
tk2@sanger.ac.uk
Sequence variation
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
*
➢ 36 inbred strains
with whole-genome
illumina sequencing
➢ SNPs, indels, and
structural variants
➢ Are there more inbred
strains with deep
whole genome
illumina sequencing?
➢ LG/J, SM/J, and
JF1/MsJ pending
Anthony Doran, WTSI
Genome assemblies
➢ REL-1412: Illumina mate pair based de
novo scaffolds
➢ REL-1504: Pseudo-chromosomes
○ Alignment synteny with GRCm38
○ Evaluation with PacBio WGS/cDNA showed
excessive reference bias
➢ REL-1509: Pseudo-chromosomes
based on breakpoint graphs
○ Dovetail genomics scaffolds for CAST/EiJ,
PWK/PhJ, and SPRET/EiJ.
nnnn
nnnn
1. Contigs
2. Scaffolds
Chr1
3. Pseudo-chromosomes
Paired-end
Illumina
Large fragment
ends (3,6,10kb,
Dovetail, BAC
ends)
Whole-genome
alignments
PacBio alignments
➢ Use PacBio long reads alignment contiguity to validate the chromosome sequence
➢ Compare the number of inconsistently mapped reads
X
PacBio WGS and cDNA alignments
PWK/PhJ
Dovetail genomics: CAST/EiJ, PWK/PhJ, SPRET/EiJ
A) High molecular weight (50+ kbp) input
DNA
B) Reconstitute chromatin from the input
DNA
C) Addition of a fixative agent (e.g.,
formaldehyde) produces crosslinks
D) Crosslinked chromatin digested with a
restriction endonuclease to generate sticky-
ended fragments
E+F) DNA ligase added to perform blunt-
end ligation of the many ends within a given
chromatin aggregate
G) Chromatin is removed and DNA is
purified and processed to remove biotin
Enriched for biotin-containing fragments
and prepare sequencing library
http://dovetailgenomics.com/
Dovetail Scaffolds
Length
(Gbp)
Scaffolds N50 (Mbp) Largest
(Mbp)
% Ns
CAST/EiJ 2.69 382,843 0.644 4.75 11.4
PWK/PhJ 2.53 271,282 0.390 4.0 6.3
SPRET/EiJ 2.66 297,604 0.361 2.82 9.4
Length
(Gbp)
Scaffolds N50 (Mbp) Largest
(Mbp)
% Ns
CAST/EiJ 2.69 367,627 22.216 90.4 11.5
PWK/PhJ 2.58 251,844 24.066 100.6 7.44
SPRET/EiJ 2.66 272,127 23.475 88.6 9.5
REL-1412
REL-1412+Dovetail
Dovetail Scaffolds
Length
(Gbp)
Scaffolds N50 (Mbp) Largest
(Mbp)
% Ns
CAST/EiJ 2.69 382,843 0.644 4.75 11.4
PWK/PhJ 2.53 271,282 0.390 4.0 6.3
SPRET/EiJ 2.66 297,604 0.361 2.82 9.4
Length
(Gbp)
Scaffolds N50 (Mbp) Largest
(Mbp)
% Ns
CAST/EiJ 2.69 367,627 22.216 90.4 11.5
PWK/PhJ 2.58 251,844 24.066 100.6 7.44
SPRET/EiJ 2.66 272,127 23.475 88.6 9.5
REL-1412
REL-1412+Dovetail
PacBio WGS alignments
➢ Proportion of WGS reads where all hits are one orientation vs. mixed orientations
(lower is better)
Complex regions - Nlrp1 paralogs
Post-dovetailPseudo-chromosomes (pre-dovetail)
➢ A dozen highly polymorphic complex loci
○ Major urinary proteins (MUPs), H2/MHC, IRG, Nlrp etc.
Jingtao Lilue, WTSI
Pseudo-chromosomes (REL-1509)
Strain Length (Gbp) Sequences (>2kb) N50 (Mbp) Largest (Mbp) %N
129S1_SvImJ 2.73 7,153 134.54 202.56 0.15
A_J 2.63 4,687 129.07 194.20 0.11
AKR_J 2.71 5,954 132.98 199.99 0.13
BALB_cJ 2.63 3,824 129.64 194.91 0.11
C3H_HeJ 2.70 4,069 133.07 200.88 0.14
C57BL_6NJ 2.81 3,893 139.12 208.92 0.18
CAST_EiJ 2.65 2,976 133.75 200.42 0.14
CBA_J 2.92 5,465 144.78 216.63 0.21
DBA_2J 2.61 4,104 128.21 192.93 0.11
FVB_NJ 2.59 5,013 127.06 191.00 0.11
LP_J 2.73 3,498 135.16 203.66 0.16
NOD_ShiLtJ 2.98 5,551 147.35 223.33 0.23
NZO_HlLtJ 2.70 7,022 132.96 199.80 0.14
PWK_PhJ 2.60 5,085 127.27 191.61 0.11
SPRET_EiJ 2.63 5,405 131.95 198.85 0.11
WSB_EiJ 2.69 2,238 133.18 200.11 0.16
Pseudo-chromosomes (REL-1509)
Strain Length (Gbp) Sequences (>2kb) N50 (Mbp) Largest (Mbp) %N
129S1_SvImJ 2.73 7,153 134.54 202.56 0.15
A_J 2.63 4,687 129.07 194.20 0.11
AKR_J 2.71 5,954 132.98 199.99 0.13
BALB_cJ 2.63 3,824 129.64 194.91 0.11
C3H_HeJ 2.70 4,069 133.07 200.88 0.14
C57BL_6NJ 2.81 3,893 139.12 208.92 0.18
CAST_EiJ 2.65 2,976 133.75 200.42 0.14
CBA_J 2.92 5,465 144.78 216.63 0.21
DBA_2J 2.61 4,104 128.21 192.93 0.11
FVB_NJ 2.59 5,013 127.06 191.00 0.11
LP_J 2.73 3,498 135.16 203.66 0.16
NOD_ShiLtJ 2.98 5,551 147.35 223.33 0.23
NZO_HlLtJ 2.70 7,022 132.96 199.80 0.14
PWK_PhJ 2.60 5,085 127.27 191.61 0.11
SPRET_EiJ 2.63 5,405 131.95 198.85 0.11
WSB_EiJ 2.69 2,238 133.18 200.11 0.16
➢ Propose to make REL-1509 the first annotated reference genomes
for the laboratory strains
Gene prediction approach
RNA-SeqGencode M7
C57BL/6J Strain specific
Ian Fiddes, UCSC
Stefanie König,
U. Greifswald
Mario Stanke,
U. Greifswald
Evidence
Gene prediction approach
➢ TransMap - utilise as much of the Gencode C57BL/6J genome annotation as
possible
○ Local augustus - refine the lift over to allow small adjustments based on strain specific RNA-Seq
TransMap
RNA-SeqGencode M7
C57BL/6J
Ian Fiddes, UCSC
Stefanie König,
U. Greifswald
Mario Stanke,
U. Greifswald
TransMap+local
Augustus
Strain specific
Evidence
How many genes have at least one fully correct transcript?
Ian Fiddes, UCSC
Gene prediction approach
➢ TransMap - liftover as much of the Gencode C57BL/6J genome annotation as
possible
○ Local augustus - refine the lift over to allow small adjustments based on strain specific RNA-Seq
➢ Comparative gene prediction: Augustus CGP
○ Generate gene predictions based primarily on RNA-Seq evidence
○ Allows for predictions of new transcripts+exons absent in C57BL/6J
TransMap TransMap+local
Augustus
Augustus CGP
RNA-SeqGencode M7
Ian Fiddes, UCSC
Stefanie König,
U. Greifswald
Mario Stanke,
U. Greifswald
Strain specificC57BL/6J
Evidence
Gene prediction approach
➢ TransMap - utilise as much of the Gencode C57BL/6J genome annotation as
possible
○ Local augustus - refine the lift over to allow small adjustments based on strain specific RNA-Seq
➢ Comparative gene prediction: Augustus CGP
○ Generate gene predictions based primarily on RNA-Seq evidence
○ Allows for predictions of new transcripts+exons absent in C57BL/6J
TransMap TransMap+local
Augustus
Augustus CGP
RNA-SeqGencode M7
Consensus gene
set
Ian Fiddes, UCSC
Stefanie König,
U. Greifswald
Mario Stanke,
U. Greifswald
Strain specificC57BL/6J
Evidence
Efcab13-Efcab3 hybrid
Stefanie König,
U. Greifswald
Charlie Steward,
WTSI
What about human?
Efcab13-Efcab3 hybrid
NOT
VALIDATED
(YET)!
Stefanie König,
U. Greifswald
Charlie Steward,
WTSI
Dnah14: dynein, axonemal, heavy chain 14
Stefanie König,
U. Greifswald
Charlie Steward,
WTSI
Charlie Steward,
WTSI
Gene extensions - Dnah14
NOT
VALIDATED
(YET)!
Stefanie König,
U. Greifswald
Complex regions - Nlrp1 paralogs
Jingtao Lilue, WTSI
C57BL/6J
PWK/PhJ
C57BL/6J
PWK/PhJ
PWK/PhJ
assembly
How can I look at the genomes?
http://hgwdev-mus-strain.sdsc.edu
Mark Diekhans, UCSC
Ian Fiddes, UCSC
How can I look at the genomes?
http://hgwdev-mus-strain.sdsc.edu
Mark Diekhans, UCSC
Ian Fiddes, UCSC
Change co-ordinate system to strain of interest
http://hgwdev-mus-strain.sdsc.edu
Mark Diekhans, UCSC
Ian Fiddes, UCSC
How can I look at the genomes?
Developed and maintained by the Genome Reference Informatics Team
http://mice-geval.sanger.ac.uk
Kerstin Howe,
WTSI
Acknowledgements
➢ Wellcome Trust Sanger Institute
○ Anthony Doran, Kim Wong, Dirk-Dominik Dolle, Jingtao Lilue, Monica Abrudan
○ David Adams, Richard Durbin, Kerstin Howe, Jennifer Harrow, Charles Steward, Mark Thomas, Ruth Bennet,, Jo Wood,
James Torrance, Will Chow, Mike Quail, Matt Dunn, Marcela Sjoberg, James Gilbert, Ed Griffiths, Anne Ferguson-Smith
➢ UCSC
○ Benedict Paten, Joel Armstrong, Mark Diekhans, Dent Earl, Ian Fiddes
➢ EBI
○ David Thybert, Duncan Odom, Paul Flicek
➢ University of Greifswald
○ Mario Stanke, Stefanie König
➢ Salk Institute
○ Son Pham, Mikhail Kolmogorov
➢ Yale
○ Fabio Navarro, Cristina Sisu, Mark Gerstein
➢ Wellcome Trust Centre for Human Genetics
○ Jonathan Flint, Richard Mott, Leo Goodstadt
➢ Jackson Laboratory
○ Laura Reinholdt, Anne Czechanski
➢ URLs
○ http://www.sanger.ac.uk/science/data/mouse-genomes-project
○ http://hgwdev-mus-strain.sdsc.edu
○ http://mice-geval.sanger.ac.uk/index.html
2014-2017 2015-2018
Sequence Variation Infrastructure Group, WTSI
BioNano genomics optical mapping
10kb mate-pair consistency

Weitere ähnliche Inhalte

Was ist angesagt?

Lessons learned from high throughput CRISPR targeting in human cell lines
Lessons learned from high throughput CRISPR targeting in human cell linesLessons learned from high throughput CRISPR targeting in human cell lines
Lessons learned from high throughput CRISPR targeting in human cell linesChris Thorne
 
ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1Jason Holzman
 
Gapdh research august 2009
Gapdh research august 2009Gapdh research august 2009
Gapdh research august 2009Lydia Cortes
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
 
Dr. Ben Hause - Metagenomic Sequencing for Virus Discovery and Characterization
Dr. Ben Hause - Metagenomic Sequencing for Virus Discovery and CharacterizationDr. Ben Hause - Metagenomic Sequencing for Virus Discovery and Characterization
Dr. Ben Hause - Metagenomic Sequencing for Virus Discovery and CharacterizationJohn Blue
 
CRISPR Technology
CRISPR TechnologyCRISPR Technology
CRISPR TechnologyRomilMistry
 
In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...Raunak Shrestha
 

Was ist angesagt? (19)

Lessons learned from high throughput CRISPR targeting in human cell lines
Lessons learned from high throughput CRISPR targeting in human cell linesLessons learned from high throughput CRISPR targeting in human cell lines
Lessons learned from high throughput CRISPR targeting in human cell lines
 
CRISPR Cas System concept
CRISPR Cas System conceptCRISPR Cas System concept
CRISPR Cas System concept
 
p21 mechanism slide
p21 mechanism slidep21 mechanism slide
p21 mechanism slide
 
Crispr/Cas9
Crispr/Cas9Crispr/Cas9
Crispr/Cas9
 
ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1ApplicationNote-Brian-D-Gregory_1008V1
ApplicationNote-Brian-D-Gregory_1008V1
 
Gapdh research august 2009
Gapdh research august 2009Gapdh research august 2009
Gapdh research august 2009
 
Basic Steps of the NGS Method
Basic Steps of the NGS MethodBasic Steps of the NGS Method
Basic Steps of the NGS Method
 
Biotech~2 hoza
Biotech~2 hozaBiotech~2 hoza
Biotech~2 hoza
 
Crispr cas9
Crispr cas9Crispr cas9
Crispr cas9
 
In silico PCR
In silico PCRIn silico PCR
In silico PCR
 
Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...Next Generation Sequencing and its Applications in Medical Research - Frances...
Next Generation Sequencing and its Applications in Medical Research - Frances...
 
Cancer
Cancer Cancer
Cancer
 
Dr. Ben Hause - Metagenomic Sequencing for Virus Discovery and Characterization
Dr. Ben Hause - Metagenomic Sequencing for Virus Discovery and CharacterizationDr. Ben Hause - Metagenomic Sequencing for Virus Discovery and Characterization
Dr. Ben Hause - Metagenomic Sequencing for Virus Discovery and Characterization
 
CRISPR Technology
CRISPR TechnologyCRISPR Technology
CRISPR Technology
 
CRISPR
CRISPRCRISPR
CRISPR
 
Crispr cas9
Crispr cas9Crispr cas9
Crispr cas9
 
Snyder, Evan
Snyder, EvanSnyder, Evan
Snyder, Evan
 
In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...In silico reconstruction of viral genomes from small RNAs improves virus-deri...
In silico reconstruction of viral genomes from small RNAs improves virus-deri...
 
Crispr cas9
Crispr cas9Crispr cas9
Crispr cas9
 

Andere mochten auch

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Manikhandan Mudaliar
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Thomas Keane
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataThomas Keane
 
Genomics in the Cloud
Genomics in the CloudGenomics in the Cloud
Genomics in the CloudMatt Wood
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryJan Aerts
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...QBiC_Tue
 
Next-generation sequencing: Data mangement
Next-generation sequencing: Data mangementNext-generation sequencing: Data mangement
Next-generation sequencing: Data mangementGuy Coates
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data Surya Saha
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data PreprocessingcursoNGS
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Maté Ongenaert
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Thomas Keane
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Torsten Seemann
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...Torsten Seemann
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Torsten Seemann
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Torsten Seemann
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingThomas Keane
 
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...Игорь Шадеркин
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Torsten Seemann
 

Andere mochten auch (20)

Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
Variant (SNP) calling - an introduction (with a worked example, using FreeBay...
 
Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1Wellcome Trust Advances Course: NGS Course - Lecture1
Wellcome Trust Advances Course: NGS Course - Lecture1
 
Overview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence dataOverview of methods for variant calling from next-generation sequence data
Overview of methods for variant calling from next-generation sequence data
 
Genomics in the Cloud
Genomics in the CloudGenomics in the Cloud
Genomics in the Cloud
 
Next-generation sequencing - variation discovery
Next-generation sequencing - variation discoveryNext-generation sequencing - variation discovery
Next-generation sequencing - variation discovery
 
Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...Data Management for Quantitative Biology - Data sources (Next generation tech...
Data Management for Quantitative Biology - Data sources (Next generation tech...
 
Next-generation sequencing: Data mangement
Next-generation sequencing: Data mangementNext-generation sequencing: Data mangement
Next-generation sequencing: Data mangement
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
NGS Data Preprocessing
NGS Data PreprocessingNGS Data Preprocessing
NGS Data Preprocessing
 
Workshop NGS data analysis - 2
Workshop NGS data analysis - 2Workshop NGS data analysis - 2
Workshop NGS data analysis - 2
 
AMR surveillance in Europe: historical background and future outlook. Hajo G...
AMR surveillance in Europe: historical background and future outlook.  Hajo G...AMR surveillance in Europe: historical background and future outlook.  Hajo G...
AMR surveillance in Europe: historical background and future outlook. Hajo G...
 
Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...Assessing the impact of transposable element variation on mouse phenotypes an...
Assessing the impact of transposable element variation on mouse phenotypes an...
 
NGS - QC & Dataformat
NGS - QC & Dataformat NGS - QC & Dataformat
NGS - QC & Dataformat
 
Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015Long read sequencing - LSCC lab talk - fri 5 june 2015
Long read sequencing - LSCC lab talk - fri 5 june 2015
 
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...Parallel computing in bioinformatics   t.seemann - balti bioinformatics - wed...
Parallel computing in bioinformatics t.seemann - balti bioinformatics - wed...
 
Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012Assembling NGS Data - IMB Winter School - 3 July 2012
Assembling NGS Data - IMB Winter School - 3 July 2012
 
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
Cleaning illumina reads - LSCC Lab Meeting - Fri 23 Nov 2012
 
Mouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-EditingMouse Genomes Project + RNA-Editing
Mouse Genomes Project + RNA-Editing
 
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...
Antimicrobial resistance (AMR) in N. gonorrhoeae (GC) - global problem but v...
 
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
Comparing bacterial isolates - T.Seemann - IMB winter school 2016 - fri 8 jul...
 

Ähnlich wie Multiple Mouse Reference Genomes and Strain Specific Gene Annotations

MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINALTom Hajek
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Integrated DNA Technologies
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1ICGEB
 
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...Chris Southan
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02鋒博 蔡
 
Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02t7260678
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Thomas Keane
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Reid Robison
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceGenomeInABottle
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expressionishi tandon
 
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue DifferentiationRNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue DifferentiationThermo Fisher Scientific
 
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...Copenhagenomics
 

Ähnlich wie Multiple Mouse Reference Genomes and Strain Specific Gene Annotations (20)

MS thesis presentation_FINAL
MS thesis presentation_FINALMS thesis presentation_FINAL
MS thesis presentation_FINAL
 
ACMG Workshop 2011
ACMG Workshop 2011ACMG Workshop 2011
ACMG Workshop 2011
 
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
Characterizing Alzheimer’s Disease candidate genes and transcripts with targe...
 
Synthetic biology
Synthetic biologySynthetic biology
Synthetic biology
 
2011 Rna Course Part 1
2011 Rna Course Part 12011 Rna Course Part 1
2011 Rna Course Part 1
 
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...The Yoyo Has Stopped:  Reviewing the Evidence for a Low Basal Human Protein...
The Yoyo Has Stopped: Reviewing the Evidence for a Low Basal Human Protein...
 
Gene expression
Gene expressionGene expression
Gene expression
 
Nextgenerationsequencing ngs 131218163555-phpapp02
Nextgenerationsequencing     ngs  131218163555-phpapp02Nextgenerationsequencing     ngs  131218163555-phpapp02
Nextgenerationsequencing ngs 131218163555-phpapp02
 
Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02Nextgenerationsequencing 131218163555-phpapp02
Nextgenerationsequencing 131218163555-phpapp02
 
Embed Repro Test
Embed Repro TestEmbed Repro Test
Embed Repro Test
 
Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010Mouse Genomes Project Summary June 2010
Mouse Genomes Project Summary June 2010
 
New generation Sequencing
New generation Sequencing New generation Sequencing
New generation Sequencing
 
Molecular marker
Molecular markerMolecular marker
Molecular marker
 
Cell 671
Cell 671Cell 671
Cell 671
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
Towards Precision Medicine: Tute Genomics, a cloud-based application for anal...
 
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practiceAug2013 Heidi Rehm integrating large scale sequencing into clinical practice
Aug2013 Heidi Rehm integrating large scale sequencing into clinical practice
 
Gene prediction and expression
Gene prediction and expressionGene prediction and expression
Gene prediction and expression
 
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue DifferentiationRNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
RNA-Seq To Identify Novel Markers For Research on Neural Tissue Differentiation
 
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
NGS in Forensics Genetics – examples using the GS Junior. Sponsored by Roche ...
 

Mehr von Thomas Keane

2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2Thomas Keane
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesThomas Keane
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Thomas Keane
 
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraNext generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraThomas Keane
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...Thomas Keane
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Thomas Keane
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 

Mehr von Thomas Keane (7)

2014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture22014 Wellcome Trust Advances Course: NGS Course - Lecture2
2014 Wellcome Trust Advances Course: NGS Course - Lecture2
 
Large Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and ChallengesLarge Scale Resequencing: Approaches and Challenges
Large Scale Resequencing: Approaches and Challenges
 
Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...Enhanced structural variant and breakpoint detection using SVMerge by integra...
Enhanced structural variant and breakpoint detection using SVMerge by integra...
 
Next generation sequencing in cloud computing era
Next generation sequencing in cloud computing eraNext generation sequencing in cloud computing era
Next generation sequencing in cloud computing era
 
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
1000G/UK10K: Bioinformatics, storage, and compute challenges of large scale r...
 
Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010Mouse Genomes Poster - Genetics 2010
Mouse Genomes Poster - Genetics 2010
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 

Kürzlich hochgeladen

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajanpragatimahajan3
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...Sapna Thakur
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...PsychoTech Services
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxVishalSingh1417
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024Janet Corral
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingTeacherCyreneCayanan
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Disha Kariya
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 

Kürzlich hochgeladen (20)

social pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajansocial pharmacy d-pharm 1st year by Pragati K. Mahajan
social pharmacy d-pharm 1st year by Pragati K. Mahajan
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
BAG TECHNIQUE Bag technique-a tool making use of public health bag through wh...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
IGNOU MSCCFT and PGDCFT Exam Question Pattern: MCFT003 Counselling and Family...
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 

Multiple Mouse Reference Genomes and Strain Specific Gene Annotations

  • 1. Multiple mouse reference genomes and strain specific gene annotations Thomas Keane, Wellcome Trust Sanger Institute @drtkeane @mousegenomes tk2@sanger.ac.uk
  • 2. Sequence variation * * * * * * * * * * * * * * * * ➢ 36 inbred strains with whole-genome illumina sequencing ➢ SNPs, indels, and structural variants ➢ Are there more inbred strains with deep whole genome illumina sequencing? ➢ LG/J, SM/J, and JF1/MsJ pending Anthony Doran, WTSI
  • 3. Genome assemblies ➢ REL-1412: Illumina mate pair based de novo scaffolds ➢ REL-1504: Pseudo-chromosomes ○ Alignment synteny with GRCm38 ○ Evaluation with PacBio WGS/cDNA showed excessive reference bias ➢ REL-1509: Pseudo-chromosomes based on breakpoint graphs ○ Dovetail genomics scaffolds for CAST/EiJ, PWK/PhJ, and SPRET/EiJ. nnnn nnnn 1. Contigs 2. Scaffolds Chr1 3. Pseudo-chromosomes Paired-end Illumina Large fragment ends (3,6,10kb, Dovetail, BAC ends) Whole-genome alignments
  • 4. PacBio alignments ➢ Use PacBio long reads alignment contiguity to validate the chromosome sequence ➢ Compare the number of inconsistently mapped reads X
  • 5. PacBio WGS and cDNA alignments
  • 6. PWK/PhJ Dovetail genomics: CAST/EiJ, PWK/PhJ, SPRET/EiJ A) High molecular weight (50+ kbp) input DNA B) Reconstitute chromatin from the input DNA C) Addition of a fixative agent (e.g., formaldehyde) produces crosslinks D) Crosslinked chromatin digested with a restriction endonuclease to generate sticky- ended fragments E+F) DNA ligase added to perform blunt- end ligation of the many ends within a given chromatin aggregate G) Chromatin is removed and DNA is purified and processed to remove biotin Enriched for biotin-containing fragments and prepare sequencing library http://dovetailgenomics.com/
  • 7. Dovetail Scaffolds Length (Gbp) Scaffolds N50 (Mbp) Largest (Mbp) % Ns CAST/EiJ 2.69 382,843 0.644 4.75 11.4 PWK/PhJ 2.53 271,282 0.390 4.0 6.3 SPRET/EiJ 2.66 297,604 0.361 2.82 9.4 Length (Gbp) Scaffolds N50 (Mbp) Largest (Mbp) % Ns CAST/EiJ 2.69 367,627 22.216 90.4 11.5 PWK/PhJ 2.58 251,844 24.066 100.6 7.44 SPRET/EiJ 2.66 272,127 23.475 88.6 9.5 REL-1412 REL-1412+Dovetail
  • 8. Dovetail Scaffolds Length (Gbp) Scaffolds N50 (Mbp) Largest (Mbp) % Ns CAST/EiJ 2.69 382,843 0.644 4.75 11.4 PWK/PhJ 2.53 271,282 0.390 4.0 6.3 SPRET/EiJ 2.66 297,604 0.361 2.82 9.4 Length (Gbp) Scaffolds N50 (Mbp) Largest (Mbp) % Ns CAST/EiJ 2.69 367,627 22.216 90.4 11.5 PWK/PhJ 2.58 251,844 24.066 100.6 7.44 SPRET/EiJ 2.66 272,127 23.475 88.6 9.5 REL-1412 REL-1412+Dovetail
  • 9. PacBio WGS alignments ➢ Proportion of WGS reads where all hits are one orientation vs. mixed orientations (lower is better)
  • 10. Complex regions - Nlrp1 paralogs Post-dovetailPseudo-chromosomes (pre-dovetail) ➢ A dozen highly polymorphic complex loci ○ Major urinary proteins (MUPs), H2/MHC, IRG, Nlrp etc. Jingtao Lilue, WTSI
  • 11. Pseudo-chromosomes (REL-1509) Strain Length (Gbp) Sequences (>2kb) N50 (Mbp) Largest (Mbp) %N 129S1_SvImJ 2.73 7,153 134.54 202.56 0.15 A_J 2.63 4,687 129.07 194.20 0.11 AKR_J 2.71 5,954 132.98 199.99 0.13 BALB_cJ 2.63 3,824 129.64 194.91 0.11 C3H_HeJ 2.70 4,069 133.07 200.88 0.14 C57BL_6NJ 2.81 3,893 139.12 208.92 0.18 CAST_EiJ 2.65 2,976 133.75 200.42 0.14 CBA_J 2.92 5,465 144.78 216.63 0.21 DBA_2J 2.61 4,104 128.21 192.93 0.11 FVB_NJ 2.59 5,013 127.06 191.00 0.11 LP_J 2.73 3,498 135.16 203.66 0.16 NOD_ShiLtJ 2.98 5,551 147.35 223.33 0.23 NZO_HlLtJ 2.70 7,022 132.96 199.80 0.14 PWK_PhJ 2.60 5,085 127.27 191.61 0.11 SPRET_EiJ 2.63 5,405 131.95 198.85 0.11 WSB_EiJ 2.69 2,238 133.18 200.11 0.16
  • 12. Pseudo-chromosomes (REL-1509) Strain Length (Gbp) Sequences (>2kb) N50 (Mbp) Largest (Mbp) %N 129S1_SvImJ 2.73 7,153 134.54 202.56 0.15 A_J 2.63 4,687 129.07 194.20 0.11 AKR_J 2.71 5,954 132.98 199.99 0.13 BALB_cJ 2.63 3,824 129.64 194.91 0.11 C3H_HeJ 2.70 4,069 133.07 200.88 0.14 C57BL_6NJ 2.81 3,893 139.12 208.92 0.18 CAST_EiJ 2.65 2,976 133.75 200.42 0.14 CBA_J 2.92 5,465 144.78 216.63 0.21 DBA_2J 2.61 4,104 128.21 192.93 0.11 FVB_NJ 2.59 5,013 127.06 191.00 0.11 LP_J 2.73 3,498 135.16 203.66 0.16 NOD_ShiLtJ 2.98 5,551 147.35 223.33 0.23 NZO_HlLtJ 2.70 7,022 132.96 199.80 0.14 PWK_PhJ 2.60 5,085 127.27 191.61 0.11 SPRET_EiJ 2.63 5,405 131.95 198.85 0.11 WSB_EiJ 2.69 2,238 133.18 200.11 0.16 ➢ Propose to make REL-1509 the first annotated reference genomes for the laboratory strains
  • 13. Gene prediction approach RNA-SeqGencode M7 C57BL/6J Strain specific Ian Fiddes, UCSC Stefanie König, U. Greifswald Mario Stanke, U. Greifswald Evidence
  • 14. Gene prediction approach ➢ TransMap - utilise as much of the Gencode C57BL/6J genome annotation as possible ○ Local augustus - refine the lift over to allow small adjustments based on strain specific RNA-Seq TransMap RNA-SeqGencode M7 C57BL/6J Ian Fiddes, UCSC Stefanie König, U. Greifswald Mario Stanke, U. Greifswald TransMap+local Augustus Strain specific Evidence
  • 15. How many genes have at least one fully correct transcript? Ian Fiddes, UCSC
  • 16. Gene prediction approach ➢ TransMap - liftover as much of the Gencode C57BL/6J genome annotation as possible ○ Local augustus - refine the lift over to allow small adjustments based on strain specific RNA-Seq ➢ Comparative gene prediction: Augustus CGP ○ Generate gene predictions based primarily on RNA-Seq evidence ○ Allows for predictions of new transcripts+exons absent in C57BL/6J TransMap TransMap+local Augustus Augustus CGP RNA-SeqGencode M7 Ian Fiddes, UCSC Stefanie König, U. Greifswald Mario Stanke, U. Greifswald Strain specificC57BL/6J Evidence
  • 17. Gene prediction approach ➢ TransMap - utilise as much of the Gencode C57BL/6J genome annotation as possible ○ Local augustus - refine the lift over to allow small adjustments based on strain specific RNA-Seq ➢ Comparative gene prediction: Augustus CGP ○ Generate gene predictions based primarily on RNA-Seq evidence ○ Allows for predictions of new transcripts+exons absent in C57BL/6J TransMap TransMap+local Augustus Augustus CGP RNA-SeqGencode M7 Consensus gene set Ian Fiddes, UCSC Stefanie König, U. Greifswald Mario Stanke, U. Greifswald Strain specificC57BL/6J Evidence
  • 18. Efcab13-Efcab3 hybrid Stefanie König, U. Greifswald Charlie Steward, WTSI
  • 21. Dnah14: dynein, axonemal, heavy chain 14 Stefanie König, U. Greifswald Charlie Steward, WTSI
  • 22. Charlie Steward, WTSI Gene extensions - Dnah14 NOT VALIDATED (YET)! Stefanie König, U. Greifswald
  • 23. Complex regions - Nlrp1 paralogs Jingtao Lilue, WTSI C57BL/6J PWK/PhJ C57BL/6J PWK/PhJ PWK/PhJ assembly
  • 24. How can I look at the genomes? http://hgwdev-mus-strain.sdsc.edu Mark Diekhans, UCSC Ian Fiddes, UCSC
  • 25. How can I look at the genomes? http://hgwdev-mus-strain.sdsc.edu Mark Diekhans, UCSC Ian Fiddes, UCSC
  • 26. Change co-ordinate system to strain of interest http://hgwdev-mus-strain.sdsc.edu Mark Diekhans, UCSC Ian Fiddes, UCSC
  • 27. How can I look at the genomes? Developed and maintained by the Genome Reference Informatics Team http://mice-geval.sanger.ac.uk Kerstin Howe, WTSI
  • 28. Acknowledgements ➢ Wellcome Trust Sanger Institute ○ Anthony Doran, Kim Wong, Dirk-Dominik Dolle, Jingtao Lilue, Monica Abrudan ○ David Adams, Richard Durbin, Kerstin Howe, Jennifer Harrow, Charles Steward, Mark Thomas, Ruth Bennet,, Jo Wood, James Torrance, Will Chow, Mike Quail, Matt Dunn, Marcela Sjoberg, James Gilbert, Ed Griffiths, Anne Ferguson-Smith ➢ UCSC ○ Benedict Paten, Joel Armstrong, Mark Diekhans, Dent Earl, Ian Fiddes ➢ EBI ○ David Thybert, Duncan Odom, Paul Flicek ➢ University of Greifswald ○ Mario Stanke, Stefanie König ➢ Salk Institute ○ Son Pham, Mikhail Kolmogorov ➢ Yale ○ Fabio Navarro, Cristina Sisu, Mark Gerstein ➢ Wellcome Trust Centre for Human Genetics ○ Jonathan Flint, Richard Mott, Leo Goodstadt ➢ Jackson Laboratory ○ Laura Reinholdt, Anne Czechanski ➢ URLs ○ http://www.sanger.ac.uk/science/data/mouse-genomes-project ○ http://hgwdev-mus-strain.sdsc.edu ○ http://mice-geval.sanger.ac.uk/index.html 2014-2017 2015-2018 Sequence Variation Infrastructure Group, WTSI