SlideShare ist ein Scribd-Unternehmen logo
1 von 16
Mammalian DNA regulatory regions predicted by
utilizing human genomics, transcriptomics and
epigenetics data
Quan H. Nguyen, Ross L. Tellam, Marina Naval-Sanchez, Laercio R. Porto-Neto, William Barendse, Antonio Reverter,
Benjamin Hayes, James Kijas, and Brian P. Dalrymple
Commonwealth Scientific and Industrial Research Organisation (CSIRO), Livestock Genomics, Brisbane, Australia
(Carlson et al., 2016, Nat Biotech)
Why are we searching for DNA regulatory regions?
• There are genome assemblies for many species
• We know little about which parts of the genome are
functional:
– We know mostly about protein-coding genes (~ 2% of the genome)
– Coding genes are mostly similar (in sequence and numbers) between
mammalian species
– The control of gene expression distinguishes species, individuals, and
tissues
– Regulatory DNA sequences are binding sites of transcriptional regulation
proteins (e.g. transcription factors)
1 |
• To utilise the genome information, we need to explore
beyond protein-coding sequences (i.e. regulatory
sequences)
Functional Annotation of Animal
Genomes (FAANG)
Proteins
2 |
How to identify regulatory regions experimentally?
• Regulatory regions are identified by a
combination of:
1. Epigenetics data: histone
modifications (e.g. ChIP-Seq
H3K26me3, H4K20me1…), DNA
methylation (WGBS)
2. Genomics data: open-chromatin
assays (DNAse), chromatin
interactions (Hi-C)
3. Transcriptomics data: RNA-seq,
CAGE
*ROADMAP consortium, Nature, 2015
Promoter Inactive Enhancer
Human Atlases and Encyclopedia of Regulatory Databases
• Lots of human data available for many: 1) cell types, 2) tissues, and 3) assay types
• Much less data exist for other species
• We developed a method, HPRS (Human Projection of Regulatory Sequences), to map data in
humans to the genomes of other species
3 |
*ROADMAP consortium, Nature, 2015
HPRS predicts three broad categories of regulatory regions
• Promoters: more conserved, relatively easy to identify,
potentially many novel promoters of non-coding genes and
alternative transcription start sites
• Enhancers: less conserved, more tissue/cell type specific
• Other regulatory sequences: defined by
transcription factor binding sites
4 |
1. Map
2. Filter
3. Use
5 |
Dataset Number
regions
Region types Tissues/cell
lines
Data Types
ENCODE 108,000 TF binding 0/12 ChIPseq
ROADMAP 5,917,129 Enhancers 48/40
ChIPseq,
DNAse I
FANTOM
Enhancers
43,011 Enhancers 135/673 CAGE
ENSEMBL 2,427,934 Enhancers 0/18
ChIPseq,
DNAse I
FANTOM
Promoters
201,802 Promoters 152/823 CAGE
Map: Selecting datasets from humans
• Datasets for mapping promoters, enhancers and
transcription factors are selected so that they
represent:
- Different tissues and cell lines
- Different biochemical assay types
Map: Maximizing enhancer coverage prediction
*Villar et al., 2015 Cell;160(3):554-66
• Mapping is based on inter-species conservation at two levels: 1)
primary sequence and 2) genome organization (relative locations
between regions)
• We optimized mapping parameters and mapping strategies (reciprocal
map & multimap) to recover most reference enhancers and promoters
• For example: we found lower similarity threshold resulted in higher
coverage of cattle liver reference enhancer dataset* but not specificity
6 |
7 |
• Each filter step recovers a dataset with
more promoters/enhancers per Mb than
the initial baseline (whole genome without
HPRS):
 Filter 1: H3K27Ac is the histone modification
mark for enhancers
 Filter 2: CAGE measures bidirectional promoters
as signature for enhancers
 Filter 3: Enhancer activity scored by SVM
(support vector machine)*
 Filter 4: RNAseq measures active transcription
 Filter 5: Number of regulatory features mapped
to the region
 Filter 6: Sequence conservation (across 100
vertebrates)
 Filter 7: Number of predicted transcription
factor binding sites
Filter: Seven steps for filtering mappable regulatory regions
*Lee et al., Nat Gen, 2015
• Started with 729,246 non-overlapping
regions in the cattle genome (the mapping
created an Universal dataset)
• Number of Villar et al* cattle liver reference
promoters (P) and enhancers (E) per 1 Mb
length was used to refine filtering
parameters
• Filtered dataset: ~7 fold enrichment in
cattle liver reference enhancers and
promoters* and ~4 fold reduction in regions
• Filtered dataset contains 70% and 79% of
enhancer and promoters in the cattle
liver reference set*
*Villar et al., 2015 Cell;160(3):554-66
8 |
Filter: Seven steps for filtering mappable regulatory regions
9 |
HPRS prediction for 10 species
• We mapped 42 ROADMAP tissues
to 10 species
• Data from more biologically related
tissues produced higher coverage
(highest in liver tissue)
• Data from more evolutionarily
related species produced higher
coverage (highest in monkey –
macaca and marmoset)
• Combining multiple tissues
increased the enhancer coverage
markedly, to 65-87%
Enrichment of associated SNPs in regulatory regions
10 |
*Bolormaa et al., 2014, PLOS Genetics; 10 (3) e1004198
• There are 10s of millions of SNPs (Single Nucleotide
Polymorphisms (SNPs) ) in a genome:
- Commercial SNP arrays are small (~ 50,000 SNPs, i.e. 0.5% of the
total SNPs)
- SNPs affecting protein functions: minority ~ 5%
- SNPs affecting gene expression (regulatory SNPs): ~ 95%
• We tested significant SNPs for 32 traits (feed intake, growth, body
composition and reproduction), in 10,191 beef cattle*
• Substantial fold enrichment of low p-value SNPs in regulatory set v.
all other sets, including the set of SNPs 5kb upstream of protein
coding genes
11 |
HPRS predicted results guide the selection of causative SNPs
13 GWAS SNP
*Karim et al., 2011, Nat. Genetics
• 13 SNPs at PLAG1 (pleomorphic
adenoma gene-1) region are
significantly associated to the cattle
stature (lower height) phenotype*
• We found 2 SNPs within promoters
and 1 SNP within an intergenic
enhancer
• The 2 SNPs at the promoter region
were validated by Karim et al.* to
change promoter activity
Using the HPRS dataset to understand mechanism - polled
12 |
• Two possible causal mutations of the
polled phenotype*:
 A 212 base insertion/10 base deletion
mutation,
 A ~80 kb duplication, at ~300 kb
away
• We found the deletion mutation is
located within a predicted enhancer and
a HAND1 transcription factor binding site
• The HAND1 deletion may lead to
downregulation of OLIG1, OLIG2 and
lincRNA2 (via distal enhancer interaction)
Hand1
Celtic deletion
Enhancer
Enhancer targets
OLIG1
OLIG1
lincRNA2
*Allais-Bonnet et al., 2013, PLOS ONE 8 e63512
Summary
1. The data in humans are useful to predict regulatory sequences in other
species (by HPRS mapping and filtering pipelines)
2. HPRS is a fast and economical approach, applicable when most data in a
target species are not available
3. SNPs significantly associated with phenotypes are enriched in the
predicted regulatory sequences (more enriched than traditional SNP
selection based on known coding regions)
4. HPRS results can contribute to genomics technology development, for
instance: to design a new generation causative SNP chip for large-scale
genotyping, or to predict regulatory targets as candidates for genome
editing
13 |
Acknowledgements
• CSIRO:
- Brian P. Dalrymple
- Juca Porto-Neto
- Ross L. Tellam
- James Kijas
- Bill Barendse
- Marina Naval-Sanchez
- Antonio Reverter
• QAFFI: Ben Hayes
• Funding: CSIRO OCE fellowship
14 |
St Lucia Campus, Brisbane, Australia
A machine learning tool to predict SNP effects in an enhancer
Red arrows show SNPs
• gkmSVM (gapped k mers Support Vector Machine)
scores regulatory activity by comparing enhancer
with non-enhancer regions
• deltaSVM scores were calculated for every base
across cattle ALDOB enhancer (projected from
human)
• deltaSVM scores reduced markedly at locations
overlapping transcription factor binding sites
(indicating loss of binding if mutation occurs)
15 |
*Lee et al., Nat Gen, 2015

Weitere ähnliche Inhalte

Ähnlich wie Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data

PadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptxPadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptxDESMONDEZIEKE1
 
Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Francesco Gadaleta
 
CrossGen-Merck manuscript
CrossGen-Merck manuscriptCrossGen-Merck manuscript
CrossGen-Merck manuscriptKush Sharma
 
Nature Article: A promoter-level mammalian expression atlas (FANTOM5)
Nature Article: A promoter-level mammalian expression atlas (FANTOM5)Nature Article: A promoter-level mammalian expression atlas (FANTOM5)
Nature Article: A promoter-level mammalian expression atlas (FANTOM5)Sumit Middha
 
Experimental methods and the big data sets
Experimental methods and the big data sets Experimental methods and the big data sets
Experimental methods and the big data sets improvemed
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGLong Pei
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data AnalysisRavi Gandham
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Syed Muhammad Ali Hasnain
 
2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekingeProf. Wim Van Criekinge
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPatricia Francis-Lyon
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqManjappa Ganiger
 
Molecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxMolecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxTrilokMandal2
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing priyanka raviraj
 
Gene mapping and DNA markers
Gene mapping and DNA markersGene mapping and DNA markers
Gene mapping and DNA markersAFSATH
 

Ähnlich wie Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data (20)

16S classifier
16S classifier16S classifier
16S classifier
 
PadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptxPadminiNarayanan-Intro-2018.pptx
PadminiNarayanan-Intro-2018.pptx
 
Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19Gene expression group presentation at GAW 19
Gene expression group presentation at GAW 19
 
CrossGen-Merck manuscript
CrossGen-Merck manuscriptCrossGen-Merck manuscript
CrossGen-Merck manuscript
 
A New Day for Myeloid Genomic Profiling - How NGS Advancements Are Providing ...
A New Day for Myeloid Genomic Profiling - How NGS Advancements Are Providing ...A New Day for Myeloid Genomic Profiling - How NGS Advancements Are Providing ...
A New Day for Myeloid Genomic Profiling - How NGS Advancements Are Providing ...
 
Mapping genetic diversity through genetic markers
Mapping genetic diversity through genetic markersMapping genetic diversity through genetic markers
Mapping genetic diversity through genetic markers
 
Nature Article: A promoter-level mammalian expression atlas (FANTOM5)
Nature Article: A promoter-level mammalian expression atlas (FANTOM5)Nature Article: A promoter-level mammalian expression atlas (FANTOM5)
Nature Article: A promoter-level mammalian expression atlas (FANTOM5)
 
Experimental methods and the big data sets
Experimental methods and the big data sets Experimental methods and the big data sets
Experimental methods and the big data sets
 
Genomics experimental-methods
Genomics experimental-methodsGenomics experimental-methods
Genomics experimental-methods
 
Impact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEGImpact_of_gene_length_on_DEG
Impact_of_gene_length_on_DEG
 
RNA Seq Data Analysis
RNA Seq Data AnalysisRNA Seq Data Analysis
RNA Seq Data Analysis
 
Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...Quantifying the content of biomedical semantic resources as a core for drug d...
Quantifying the content of biomedical semantic resources as a core for drug d...
 
2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge2017 molecular profiling_wim_vancriekinge
2017 molecular profiling_wim_vancriekinge
 
Predicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learningPredicting phenotype from genotype with machine learning
Predicting phenotype from genotype with machine learning
 
Whole Exome Sequencing .pptx
Whole Exome Sequencing .pptxWhole Exome Sequencing .pptx
Whole Exome Sequencing .pptx
 
Catalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seqCatalyzing Plant Science Research with RNA-seq
Catalyzing Plant Science Research with RNA-seq
 
Molecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptxMolecular Markers and Their Application in Animal Breed.pptx
Molecular Markers and Their Application in Animal Breed.pptx
 
Molecular profiling 2013
Molecular profiling 2013Molecular profiling 2013
Molecular profiling 2013
 
Third Generation Sequencing
Third Generation Sequencing Third Generation Sequencing
Third Generation Sequencing
 
Gene mapping and DNA markers
Gene mapping and DNA markersGene mapping and DNA markers
Gene mapping and DNA markers
 

Mehr von GigaScience, BGI Hong Kong

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...GigaScience, BGI Hong Kong
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteGigaScience, BGI Hong Kong
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...GigaScience, BGI Hong Kong
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...GigaScience, BGI Hong Kong
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...GigaScience, BGI Hong Kong
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...GigaScience, BGI Hong Kong
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...GigaScience, BGI Hong Kong
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...GigaScience, BGI Hong Kong
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...GigaScience, BGI Hong Kong
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixGigaScience, BGI Hong Kong
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserGigaScience, BGI Hong Kong
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...GigaScience, BGI Hong Kong
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceGigaScience, BGI Hong Kong
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...GigaScience, BGI Hong Kong
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...GigaScience, BGI Hong Kong
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveGigaScience, BGI Hong Kong
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...GigaScience, BGI Hong Kong
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...GigaScience, BGI Hong Kong
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...GigaScience, BGI Hong Kong
 

Mehr von GigaScience, BGI Hong Kong (20)

IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...IDW2022: A decades experiences in transparent and interactive publication of ...
IDW2022: A decades experiences in transparent and interactive publication of ...
 
Scott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByteScott Edmunds: Preparing a data paper for GigaByte
Scott Edmunds: Preparing a data paper for GigaByte
 
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
STM Week: Demonstrating bringing publications to life via an End-to-end XML p...
 
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
Measuring richness. A RCT to quantify the benefits of metadata quality; Scott...
 
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
Scott Edmunds: A new publishing workflow for rapid dissemination of genomes u...
 
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
Scott Edmunds: Quantifying how FAIR is Hong Kong: The Hong Kong Shareability ...
 
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
Scott Edmunds talk at IARC: How can we make science more trustworthy and FAIR...
 
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...PAGAsia19 - The Digitalization of Ruili Botanical Garden Project:  Production...
PAGAsia19 - The Digitalization of Ruili Botanical Garden Project: Production...
 
Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...Democratising biodiversity and genomics research: open and citizen science to...
Democratising biodiversity and genomics research: open and citizen science to...
 
Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10Hong Kong Open Access & GigaScience: CCHK@10
Hong Kong Open Access & GigaScience: CCHK@10
 
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU GuixRicardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
Ricardo Wurmus: Reproducible genomics analysis pipelines with GNU Guix
 
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browserAnil Thanki at #ICG13: Aequatus: An open-source homology browser
Anil Thanki at #ICG13: Aequatus: An open-source homology browser
 
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
Paul Pavlidis at #ICG13: Monitoring changes in the Gene Ontology and their im...
 
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant scienceVenice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
Venice Juanillas at #ICG13: Rice Galaxy: an open resource for plant science
 
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
Stefan Prost at #ICG13: Genome analyses show strong selection on coloration, ...
 
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
Lisa Johnson at #ICG13: Re-assembly, quality evaluation, and annotation of 67...
 
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global PerspectiveChris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
Chris Armit at IDW2018: Democratising Data Publishing: A Global Perspective
 
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
EMBL OA Week: FAIR or unfair? Principled publishing for more Open & Democrati...
 
Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...Reproducible method and benchmarking publishing for the data (and evidence) d...
Reproducible method and benchmarking publishing for the data (and evidence) d...
 
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
Mary Ann Tuli: What MODs can learn from Journals – a GigaDB curator’s perspec...
 

Kürzlich hochgeladen

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyDrAnita Sharma
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.Nitya salvi
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PPRINCE C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfSumit Kumar yadav
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 

Kürzlich hochgeladen (20)

Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 

Quan Nguyen at #ICG12: Mammalian genomic regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data

  • 1. Mammalian DNA regulatory regions predicted by utilizing human genomics, transcriptomics and epigenetics data Quan H. Nguyen, Ross L. Tellam, Marina Naval-Sanchez, Laercio R. Porto-Neto, William Barendse, Antonio Reverter, Benjamin Hayes, James Kijas, and Brian P. Dalrymple Commonwealth Scientific and Industrial Research Organisation (CSIRO), Livestock Genomics, Brisbane, Australia (Carlson et al., 2016, Nat Biotech)
  • 2. Why are we searching for DNA regulatory regions? • There are genome assemblies for many species • We know little about which parts of the genome are functional: – We know mostly about protein-coding genes (~ 2% of the genome) – Coding genes are mostly similar (in sequence and numbers) between mammalian species – The control of gene expression distinguishes species, individuals, and tissues – Regulatory DNA sequences are binding sites of transcriptional regulation proteins (e.g. transcription factors) 1 | • To utilise the genome information, we need to explore beyond protein-coding sequences (i.e. regulatory sequences) Functional Annotation of Animal Genomes (FAANG) Proteins
  • 3. 2 | How to identify regulatory regions experimentally? • Regulatory regions are identified by a combination of: 1. Epigenetics data: histone modifications (e.g. ChIP-Seq H3K26me3, H4K20me1…), DNA methylation (WGBS) 2. Genomics data: open-chromatin assays (DNAse), chromatin interactions (Hi-C) 3. Transcriptomics data: RNA-seq, CAGE *ROADMAP consortium, Nature, 2015 Promoter Inactive Enhancer
  • 4. Human Atlases and Encyclopedia of Regulatory Databases • Lots of human data available for many: 1) cell types, 2) tissues, and 3) assay types • Much less data exist for other species • We developed a method, HPRS (Human Projection of Regulatory Sequences), to map data in humans to the genomes of other species 3 | *ROADMAP consortium, Nature, 2015
  • 5. HPRS predicts three broad categories of regulatory regions • Promoters: more conserved, relatively easy to identify, potentially many novel promoters of non-coding genes and alternative transcription start sites • Enhancers: less conserved, more tissue/cell type specific • Other regulatory sequences: defined by transcription factor binding sites 4 | 1. Map 2. Filter 3. Use
  • 6. 5 | Dataset Number regions Region types Tissues/cell lines Data Types ENCODE 108,000 TF binding 0/12 ChIPseq ROADMAP 5,917,129 Enhancers 48/40 ChIPseq, DNAse I FANTOM Enhancers 43,011 Enhancers 135/673 CAGE ENSEMBL 2,427,934 Enhancers 0/18 ChIPseq, DNAse I FANTOM Promoters 201,802 Promoters 152/823 CAGE Map: Selecting datasets from humans • Datasets for mapping promoters, enhancers and transcription factors are selected so that they represent: - Different tissues and cell lines - Different biochemical assay types
  • 7. Map: Maximizing enhancer coverage prediction *Villar et al., 2015 Cell;160(3):554-66 • Mapping is based on inter-species conservation at two levels: 1) primary sequence and 2) genome organization (relative locations between regions) • We optimized mapping parameters and mapping strategies (reciprocal map & multimap) to recover most reference enhancers and promoters • For example: we found lower similarity threshold resulted in higher coverage of cattle liver reference enhancer dataset* but not specificity 6 |
  • 8. 7 | • Each filter step recovers a dataset with more promoters/enhancers per Mb than the initial baseline (whole genome without HPRS):  Filter 1: H3K27Ac is the histone modification mark for enhancers  Filter 2: CAGE measures bidirectional promoters as signature for enhancers  Filter 3: Enhancer activity scored by SVM (support vector machine)*  Filter 4: RNAseq measures active transcription  Filter 5: Number of regulatory features mapped to the region  Filter 6: Sequence conservation (across 100 vertebrates)  Filter 7: Number of predicted transcription factor binding sites Filter: Seven steps for filtering mappable regulatory regions *Lee et al., Nat Gen, 2015
  • 9. • Started with 729,246 non-overlapping regions in the cattle genome (the mapping created an Universal dataset) • Number of Villar et al* cattle liver reference promoters (P) and enhancers (E) per 1 Mb length was used to refine filtering parameters • Filtered dataset: ~7 fold enrichment in cattle liver reference enhancers and promoters* and ~4 fold reduction in regions • Filtered dataset contains 70% and 79% of enhancer and promoters in the cattle liver reference set* *Villar et al., 2015 Cell;160(3):554-66 8 | Filter: Seven steps for filtering mappable regulatory regions
  • 10. 9 | HPRS prediction for 10 species • We mapped 42 ROADMAP tissues to 10 species • Data from more biologically related tissues produced higher coverage (highest in liver tissue) • Data from more evolutionarily related species produced higher coverage (highest in monkey – macaca and marmoset) • Combining multiple tissues increased the enhancer coverage markedly, to 65-87%
  • 11. Enrichment of associated SNPs in regulatory regions 10 | *Bolormaa et al., 2014, PLOS Genetics; 10 (3) e1004198 • There are 10s of millions of SNPs (Single Nucleotide Polymorphisms (SNPs) ) in a genome: - Commercial SNP arrays are small (~ 50,000 SNPs, i.e. 0.5% of the total SNPs) - SNPs affecting protein functions: minority ~ 5% - SNPs affecting gene expression (regulatory SNPs): ~ 95% • We tested significant SNPs for 32 traits (feed intake, growth, body composition and reproduction), in 10,191 beef cattle* • Substantial fold enrichment of low p-value SNPs in regulatory set v. all other sets, including the set of SNPs 5kb upstream of protein coding genes
  • 12. 11 | HPRS predicted results guide the selection of causative SNPs 13 GWAS SNP *Karim et al., 2011, Nat. Genetics • 13 SNPs at PLAG1 (pleomorphic adenoma gene-1) region are significantly associated to the cattle stature (lower height) phenotype* • We found 2 SNPs within promoters and 1 SNP within an intergenic enhancer • The 2 SNPs at the promoter region were validated by Karim et al.* to change promoter activity
  • 13. Using the HPRS dataset to understand mechanism - polled 12 | • Two possible causal mutations of the polled phenotype*:  A 212 base insertion/10 base deletion mutation,  A ~80 kb duplication, at ~300 kb away • We found the deletion mutation is located within a predicted enhancer and a HAND1 transcription factor binding site • The HAND1 deletion may lead to downregulation of OLIG1, OLIG2 and lincRNA2 (via distal enhancer interaction) Hand1 Celtic deletion Enhancer Enhancer targets OLIG1 OLIG1 lincRNA2 *Allais-Bonnet et al., 2013, PLOS ONE 8 e63512
  • 14. Summary 1. The data in humans are useful to predict regulatory sequences in other species (by HPRS mapping and filtering pipelines) 2. HPRS is a fast and economical approach, applicable when most data in a target species are not available 3. SNPs significantly associated with phenotypes are enriched in the predicted regulatory sequences (more enriched than traditional SNP selection based on known coding regions) 4. HPRS results can contribute to genomics technology development, for instance: to design a new generation causative SNP chip for large-scale genotyping, or to predict regulatory targets as candidates for genome editing 13 |
  • 15. Acknowledgements • CSIRO: - Brian P. Dalrymple - Juca Porto-Neto - Ross L. Tellam - James Kijas - Bill Barendse - Marina Naval-Sanchez - Antonio Reverter • QAFFI: Ben Hayes • Funding: CSIRO OCE fellowship 14 | St Lucia Campus, Brisbane, Australia
  • 16. A machine learning tool to predict SNP effects in an enhancer Red arrows show SNPs • gkmSVM (gapped k mers Support Vector Machine) scores regulatory activity by comparing enhancer with non-enhancer regions • deltaSVM scores were calculated for every base across cattle ALDOB enhancer (projected from human) • deltaSVM scores reduced markedly at locations overlapping transcription factor binding sites (indicating loss of binding if mutation occurs) 15 | *Lee et al., Nat Gen, 2015