Computational approaches to the regulatory genomics of neurogenesis

Computational approaches to the regulatory genomics of neurogenesis

Dr. Ian Simpson

Centre for Integrative Physiology
University of Edinburgh

Edinburgh Neuroscience Day, March 2010

1 / 20

Introduction animal model of neurogenesis

Anatomy of the Drosophila PNS - Sense organs

2 / 20

Introduction animal model of neurogenesis

Development of the Drosophila PNS

3 / 20

main gene regulatory networks

GRN for endomesoderm speciﬁcation in the Sea Urchin

from Peter and Davidson (2009)
4 / 20

main scale and complexity

How to study gene regulatory networks ?

High throughput gene expression experiments
analysing c.15,000 genes on c.100 chips (scale)
proﬁle, temporal, spatial, cell-type (complex)

Predicting transcription factor binding sites (TFBSs)
genomic search space (scale)
100s-1000s of PWMs (TFBS proﬁles) (scale)
multiple TFBSs arranged combinatorially (complex)
multiple evidence types to integrate, phylogenetic, protein interaction, genome
localisation (complex)
identifying cis-regulatory modules (complex)

5 / 20

main scale and complexity

How to study gene regulatory networks ?

High throughput gene expression experiments
analysing c.15,000 genes on c.100 chips (scale)
proﬁle, temporal, spatial, cell-type (complex)

Predicting transcription factor binding sites (TFBSs)
genomic search space (scale)
100s-1000s of PWMs (TFBS proﬁles) (scale)
multiple TFBSs arranged combinatorially (complex)
multiple evidence types to integrate, phylogenetic, protein interaction, genome
localisation (complex)
identifying cis-regulatory modules (complex)

6 / 20

main example 1 : Clustering with re-sampling statistics

Gene expression proﬁles of cells expressing atonal

7 / 20


An example annotated cluster

cluster membership
Cluster Size
C1 13
C2 36
C3 23
C4 16
C5 65
C6 6
cluster 3
Sensory Organ Development
GO:0007423 (p=6e-6)
Gene name
argos ato
CG6330 CG31464
CG13653 nrm
unc sca
rho ImpL3
CG11671 CG7755
CG16815 CG15704
CG32150 knrl
CG32037 Toll-6
phyl nvy
cato

8 / 20


Consensus clustering, a method to assess the quality of clustering

The basic approach
iterate thousands of clustering experiments with sub-samples of the data
calculate the average connectivity of any two members - consensus matrix
derive the robustness of the clusters and their members from the consensus matrix

The problem
huge parameter space (cluster number, distance metric, sample proportion...)
huge number of different algorithms to chose from
large dataset, multiple conditions to test

The solution
Break each iteration (individual clustering experiment) into a single process
Batch the processes out to nodes on Eddie/ECDF (batch array)
Collate back into consensus matrices and calculate robustness measures

R-package for consensus clustering - clusterCons
available from CRAN and sourceforge (http://bit.ly/clusterCons)

9 / 20



The basic approach

The problem

The solution


10 / 20



The basic approach

The problem

The solution


11 / 20



The basic approach

The problem

The solution


12 / 20


Heatmap of the consensus matrix

13 / 20


Gene prioritisation by consensus clustering

Re-sampling using hclust, it=1000, rf=80%

cluster robustness membership robustness

cluster3
affy_id mem affy_id mem
1639896_at 0.68 1641578_at 0.56
cluster rob
1640363_a_at 0.54 1623314_at 0.53
1 0.4731433
1636998_at 0.49 1637035_at 0.36
2 0.7704514
1631443_at 0.35 1639062_at 0.31
3 0.7295124
1623977_at 0.31 1627520_at 0.3
4 0.7196309
1637824_at 0.28 1632882_at 0.27
5 0.7033960
1624262_at 0.26 1640868_at 0.26
6 0.6786388
1631872_at 0.26 1637057_at 0.24
1625275_at 0.24 1624790_at 0.22
1635227_at 0.08 1623462_at 0.07
1635462_at 0.03 1628430_at 0.03
1626059_at 0.02

there are 8 out of 23 genes with <25% conservation in the cluster

14 / 20

main example 2 : TFBS and CRM detection on the genomic scale

An example of intersecting a state list with developmental module

normal high

low off

15 / 20


cis-regulatory module detection by HMM

after Wu and Xie, JCB 2008
16 / 20


TFBS binding probability calculation with a Bayesian integration framework

Mulitple prior data sources are combined in a probabilistic model to predict the
probability of TF binding
PWMs, ChIP-ChIP, Chip-Seq, damID, conservation, nucleosome positioning, regulatory potential...

after Lahdesmaki et al. PLoSOne, 2008

17 / 20

summary

Summary

Beneﬁts of ECDF use for biological data analysis
Easy to use (honestly)
Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab...
Most common bioinformatic problems are similar analyses performed many times -> batch arrays
Often minimum re-coding needed
Free up workstations and local nodes, allow wider exploration of parameter space
Allow genome scale screening with multiple data sources
Current limitations of ECDF use for biological data analysis
Few computational biology algorithms are written for parallel processing
Loading large datasets can be problematic (memory limits)
Not generally accessible to the ’general user’ (although biological applications using GRID technologies are
appearing)

18 / 20

summary

Summary

Beneﬁts of ECDF use for biological data analysis
Easy to use (honestly)
Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab...
Most common bioinformatic problems are similar analyses performed many times -> batch arrays
Often minimum re-coding needed
Free up workstations and local nodes, allow wider exploration of parameter space
Allow genome scale screening with multiple data sources
Current limitations of ECDF use for biological data analysis
Few computational biology algorithms are written for parallel processing
Loading large datasets can be problematic (memory limits)
Not generally accessible to the ’general user’ (although biological applications using GRID technologies are
appearing)

19 / 20

Acknowledgements

University of Edinburgh
Centre for Integrative Physiology
Andrew Jarman
Douglas Armstrong
Ian Simpson
Petra zur Lage
Lynn Powell
Sebastian Cachero
Lina Ma
Fay Newton
Guiseppe Gallone
Daniel Moore
Sadie Kemp

20 / 20

Computational approaches to the regulatory genomics of neurogenesis

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (18)

Andere mochten auch

Andere mochten auch (10)

Ähnlich wie Computational approaches to the regulatory genomics of neurogenesis

Ähnlich wie Computational approaches to the regulatory genomics of neurogenesis (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Computational approaches to the regulatory genomics of neurogenesis