This document summarizes a presentation on using next generation sequencing to identify the causal variant associated with a complex phenotype like dystocia in cattle. It discusses selecting animals to sequence, the sequencing and analysis process, challenges in annotation and validation, and recent successes in identifying causal mutations for other traits in cattle. The presentation outlines using sequencing to investigate a quantitative trait locus for dystocia on chromosome 18 in cattle that affects traits like birth weight and gestation length. It describes analyzing sequence data to identify variants associated with predicted birth weight and discusses ongoing challenges in sequencing, analysis, and validating causal variants.
Use of NGS to identify the causal variant associated with a complex phenotype
1. J. B. Cole
Animal Improvement Programs Laboratory
Agricultural Research Service, USDA
Beltsville, MD 20705-2350, USA
john.cole@ars.usda.gov
Use of NGS to identify the
causal variant associated
with a complex phenotype
2. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (2) Cole
Overview
Why are we sequencing?
How did we select the animals to
sequence?
What are the steps involved in the
process?
What do you do with the reads once you
have them?
Where are we now?
3. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (3) Cole
Introduction
Several studies (Kuhn et al., 2003; Cole
et al., 2007; Seidenspinner et al., 2009)
have reported QTL on BTA 18 associated
with dystocia
Bioinformatic analysis using SNP data has
not identified the causal variant
Next generation sequencing (NGS) has
recently been used to find causal
variants for novel recessive disorders
4. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (4) Cole
Chromosome 18 is different
Markers on chromosome 18 have large
effects on several traits:
Dystocia and stillbirth: Sire and
daughter calving ease and sire
stillbirth
Conformation: rump width, stature,
strength, and body depth
Efficiency: longevity and net merit
Large calves contribute to reduced
lifetimes and decreased profitability
5. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (5) Cole
Marker effects for dystocia complex
AR-BFGL-NGS-109285
Cole et al., 2009 (J. Dairy Sci. 92:2931–2946)
ARS-BFGL-NGS-109285
Cole et al., 2009 (J. Dairy Sci. 92:2931–2946)
6. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (6) Cole
Correlations in dystocia complex
7. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (7) Cole
The QTL also affects gestation length
Maltecca et al. 2011. Animal Genetics, 42:6, 585-591.
8. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (8) Cole
Overview of the dystocia complex
The key marker is ARS-BFGL-NGS-109285 at
(rs109478645 ) 57,585,121 Mb on BTA18
Intronic to SIGLEC12 (sialic acid binding Ig-like
lectin 12)
Recent results indicate effects on gestation
length (Maltecca et al., 2011) and calf birth
weight (Cole et al., unpublished data)
9. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (9) Cole
This is a gene-rich region
http://useast.ensembl.org/Bos_taurus/Location/View?r=18%3A57583000-57587000
http://www.ncbi.nlm.nih.gov/gene?cmd=Retrieve&dopt=Graphics&list_uids=618463
10. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (10) Cole
Copy number variants are present
ARS-BFGL-NGS-109285 is flanked by CNV
There’s a loss and a gain to the left (8
SNP region)
There’s a gain to the right (10 SNP
region)
This can result in assembly problems
Hou et al. 2011. Genomic characteristics of cattle copy number variations. BMC Genomics. 12:127.http://www.biomedcentral.com/1471-2164/12/127
11. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (11) Cole
Where did this problem come from?
http://aipl.arsusda.gov/CF-queries/Bull_Chromosomal_EBV/bull_chromosomal_ebv.cfm?
40,803 daughters
12. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (12) Cole
What if we look at a different trait?
Cole et al. (2007) proposed the following
mechanism:
SIGLEC12 may sequester circulating
leptin
This increases gestation length
Calf birth weight (BW) is higher
because of increased gestation length
Higher BW is associated with dystocia
13. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (13) Cole
We don’t have birth weight data
Birth weights are not routinely recorded
in the US
Collaborated with Hermann Swalve’s
group to develop a selection index
prediction of BW PTA
Performed GWAS and gene set
enrichment analysis to search for
interesting associations
14. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (14) Cole
GWAS for birth weight PTA
h
Cole et al.(2013), unpublished data
15. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (15) Cole
Are we measuring anything new?
Identified a SNP intronic to LHX4, which
is associated with cow body weight and
length (Ren et al., 2010, Mol. Bio.
Reprod., 37:417-422).
4 SNP in the QTL region on BTA 18 had
large effects
Several other SNP with large effects
intronic or adjacent to genes with
unknown functions
16. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (16) Cole
KEGG pathways for birth weight
What does
regulation of
the actin
cytoskeleton
have to do with
birth weight in
cattle?
That is, do
these results
make sense?
Maybe…these
pathways may
be involved in
establishment
& maintenance
of pregnancy,
as well as
coordination of
growth and
development.
Cole et al.(2013), unpublished data
17. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (17) Cole
Sequencing is becoming very affordable
18. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (18) Cole
Sequencing successes at AIPL/BFGL
Simple loss-of-function mutations
APAF1 – Spontaneous abortions in
Holstein cattle (Adams et al., 2012)
CWC15 – Early embryonic death in
Jersey cattle (Sonstegard et al., 2013)
Weaver syndrome – Neurological
degeneration and death in Brown Swiss
cattle (McClure et al., 2013)
19. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (19) Cole
Original pedigree-based design
Bull A (1968)
AA, SCE: 8
Bull B (1962)
AA, SCE: 7
MGS
Bull H (1989)
Aa, SCE: 14
Bull I (1994)
Aa, SCE: 18
Bull E (1982)
Aa, SCE: 8
Bull F
(1987)
Aa, SCE:
15
Bull C (1975)
AA, SCE: 8
= 10δBull D (1968)
??, SCE: 7
MGS
Bull E (1974)
Aa, SCE: 10
MGS
20. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (20) Cole
Modified pedigree & haplotype design
Bull A (1968)
AA, SCE: 8
Bull B (1962)
AA, SCE: 7
MGS
Bull H (1989)
Aa, SCE: 14
Bull I (1994)
Aa, SCE: 18
Bull E (1982)
Aa, SCE: 8
Bull F
(1987)
Aa, SCE:
15
Bull C (1975)
AA, SCE: 8
= 10δ Bull E (1974)
Aa, SCE: 10
MGS
Bull J (2002)
Aa, SCE: 6
Bull K (2002)
Aa, SCE: 15
Bull J (2002)
aa, SCE: 15
These bulls carry
the haplotype with
the largest, negative
effect on SCE:
Bull D (1968)
??, SCE: 7
Couldn’t obtain DNA:
21. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (21) Cole
DNA Quality
Molecular prep
Sample
Collection
DNA Extraction
Library Construction
Library Quality
Control
22. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (22) Cole
Sample preparation time is substantial
DNA Extraction: ~12 hours (30 mins)
DNA QC: ~1-2 hours (1-2 hours)
Library Construction: 48 hours (12
hours)
Library QC: ~2-4 hours (1 hour)
Total: 3-4 days (15.5 hours)
23. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (23) Cole
DNA quality
24. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (24) Cole
Library quality
25. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (25) Cole
Sequencing stage
• Illumina cBot:
• Preps DNA for sequencing
• Takes 4-5 hours
• Must be done 48 hours before
• Illumina HiSeq 2000:
• Does the sequencing
• Takes ~10-14 days for 100 x 100
• Minimal hands-on time
26. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (26) Cole
Anatomy of a flow cell
8 lanes per flow cell
3 columns per lane
− 96 tiles per column
Each tile imaged 8 times
1 from upper surface, 1 from lower
Approximately 300Gb of sequence per
flow cell
http://www.qbi.uq.edu.au/images/genomics/genomics1.jpg
27. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (27) Cole
Sequencing by synthesis
https://www.broadinstitute.org/files/shared/illuminavids/sequencingSlides.pdf
28. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (28) Cole
How many scientists does it take…
29. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (29) Cole
Flowcell 1: Cluster densities
uster densities from current HiSeq run finished 30 April 2013 (unpublished data):
30. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (30) Cole
Flowcell 2: Cluster densities
uster densities from current HiSeq run started 22 May 2013 (unpublished data):
31. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (31) Cole
The Aftermath
Total Time (sample to sequence):
3 weeks
That’s assuming nothing went wrong!
More realistic: months
Resulting Data
Large text files
~300 gigabytes compressed
Analysis
Often underestimated
Can take months as well
32. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (32) Cole
Variant detection
• Alignment against a reference
genome
• Analysis is very disk I/O-intensive.
Variant DetectionRaw Sequencer Output Alignment to the Genome
33. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (33) Cole
Computational Logistics
Desktop computers
Viable for single lanes
Long computation time
Servers are better
>100GB RAM and >16 processor
cores
Cloud
Amazon Web Services
iAnimal/iPlant
34. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (34) Cole
Storage considerations
What to save?
Raw data?
Processed results?
How much workspace?
Suggestions:
Workspace 10x compressed files
Save alignments
Backup REGULARLY!!!
35. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (35) Cole
Why should you use a pipeline?
• Automates analysis
• Maximizes resource consumption
• Because post-docs aren’t cheap
36. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (36) Cole
Galaxy server
NextGene
Custom pipeline
Scripting languages
Open-source tools
Many options for analysis pipelines
37. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (37) Cole
Challenges
Annotation
This is a mess in the cow
The reference assembly may not be
representative of all taurine cows
Validation
Doing functional genomics with large
mammals is expensive – who pays?
When have we proven something?
38. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (38) Cole
Conclusions
Sequencing is powerful, but presents
many challenges
Computational requirements are
substantial
We’re learning how much we don’t know
about functional genomics in the cow
Validation remains a problem
39. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (39) Cole
Acknowledgments
AIPL: Derek Bickhart, Dan Null, Paul
VanRaden
BFGL: Reuben Anderson, Steve
Schroeder, Tad Sonstegard, Curt Van
Tassell
40. Animal Sciences Group, Wageningen UR Livestock Research, The Netherlands, 29 May 2013 (40) Cole
Questions?
http://gigaom.com/2012/05/31/t-mobile-pits-its-math-against-verizons-the-loser-common-sense/shutterstock_76826245/