SlideShare ist ein Scribd-Unternehmen logo
1 von 20
Computational approaches to the regulatory genomics of neurogenesis


                         Dr. Ian Simpson

                     Centre for Integrative Physiology
                         University of Edinburgh


                 Edinburgh Neuroscience Day, March 2010




                                                                      1 / 20
Introduction   animal model of neurogenesis


Anatomy of the Drosophila PNS - Sense organs




                                                                                  2 / 20
Introduction   animal model of neurogenesis


Development of the Drosophila PNS




                                                                                  3 / 20
main   gene regulatory networks


GRN for endomesoderm specification in the Sea Urchin




from Peter and Davidson (2009)
                                                                          4 / 20
main   scale and complexity


How to study gene regulatory networks ?




     High throughput gene expression experiments
          analysing c.15,000 genes on c.100 chips (scale)
          profile, temporal, spatial, cell-type (complex)


     Predicting transcription factor binding sites (TFBSs)
          genomic search space (scale)
          100s-1000s of PWMs (TFBS profiles) (scale)
          multiple TFBSs arranged combinatorially (complex)
          multiple evidence types to integrate, phylogenetic, protein interaction, genome
          localisation (complex)
          identifying cis-regulatory modules (complex)




                                                                                            5 / 20
main   scale and complexity


How to study gene regulatory networks ?




     High throughput gene expression experiments
          analysing c.15,000 genes on c.100 chips (scale)
          profile, temporal, spatial, cell-type (complex)


     Predicting transcription factor binding sites (TFBSs)
          genomic search space (scale)
          100s-1000s of PWMs (TFBS profiles) (scale)
          multiple TFBSs arranged combinatorially (complex)
          multiple evidence types to integrate, phylogenetic, protein interaction, genome
          localisation (complex)
          identifying cis-regulatory modules (complex)




                                                                                            6 / 20
main   example 1 : Clustering with re-sampling statistics


Gene expression profiles of cells expressing atonal




                                                                                                       7 / 20
main   example 1 : Clustering with re-sampling statistics


An example annotated cluster


     cluster membership
                                    Cluster      Size
                                      C1          13
                                      C2          36
                                      C3          23
                                      C4          16
                                      C5          65
                                      C6           6
     cluster 3
                               Sensory Organ Development
                                  GO:0007423 (p=6e-6)
                                       Gene name
                                 argos           ato
                               CG6330         CG31464
                               CG13653          nrm
                                  unc            sca
                                  rho          ImpL3
                               CG11671        CG7755
                               CG16815        CG15704
                               CG32150          knrl
                               CG32037         Toll-6
                                 phyl           nvy
                                  cato



                                                                                               8 / 20
main   example 1 : Clustering with re-sampling statistics


Consensus clustering, a method to assess the quality of clustering




      The basic approach
           iterate thousands of clustering experiments with sub-samples of the data
           calculate the average connectivity of any two members - consensus matrix
           derive the robustness of the clusters and their members from the consensus matrix

      The problem
           huge parameter space (cluster number, distance metric, sample proportion...)
           huge number of different algorithms to chose from
           large dataset, multiple conditions to test

      The solution
           Break each iteration (individual clustering experiment) into a single process
           Batch the processes out to nodes on Eddie/ECDF (batch array)
           Collate back into consensus matrices and calculate robustness measures


      R-package for consensus clustering - clusterCons
      available from CRAN and sourceforge (http://bit.ly/clusterCons)



                                                                                                                    9 / 20
main   example 1 : Clustering with re-sampling statistics


Consensus clustering, a method to assess the quality of clustering




      The basic approach
           iterate thousands of clustering experiments with sub-samples of the data
           calculate the average connectivity of any two members - consensus matrix
           derive the robustness of the clusters and their members from the consensus matrix

      The problem
           huge parameter space (cluster number, distance metric, sample proportion...)
           huge number of different algorithms to chose from
           large dataset, multiple conditions to test

      The solution
           Break each iteration (individual clustering experiment) into a single process
           Batch the processes out to nodes on Eddie/ECDF (batch array)
           Collate back into consensus matrices and calculate robustness measures


      R-package for consensus clustering - clusterCons
      available from CRAN and sourceforge (http://bit.ly/clusterCons)



                                                                                                                    10 / 20
main   example 1 : Clustering with re-sampling statistics


Consensus clustering, a method to assess the quality of clustering




      The basic approach
           iterate thousands of clustering experiments with sub-samples of the data
           calculate the average connectivity of any two members - consensus matrix
           derive the robustness of the clusters and their members from the consensus matrix

      The problem
           huge parameter space (cluster number, distance metric, sample proportion...)
           huge number of different algorithms to chose from
           large dataset, multiple conditions to test

      The solution
           Break each iteration (individual clustering experiment) into a single process
           Batch the processes out to nodes on Eddie/ECDF (batch array)
           Collate back into consensus matrices and calculate robustness measures


      R-package for consensus clustering - clusterCons
      available from CRAN and sourceforge (http://bit.ly/clusterCons)



                                                                                                                    11 / 20
main   example 1 : Clustering with re-sampling statistics


Consensus clustering, a method to assess the quality of clustering




      The basic approach
           iterate thousands of clustering experiments with sub-samples of the data
           calculate the average connectivity of any two members - consensus matrix
           derive the robustness of the clusters and their members from the consensus matrix

      The problem
           huge parameter space (cluster number, distance metric, sample proportion...)
           huge number of different algorithms to chose from
           large dataset, multiple conditions to test

      The solution
           Break each iteration (individual clustering experiment) into a single process
           Batch the processes out to nodes on Eddie/ECDF (batch array)
           Collate back into consensus matrices and calculate robustness measures


      R-package for consensus clustering - clusterCons
      available from CRAN and sourceforge (http://bit.ly/clusterCons)



                                                                                                                    12 / 20
main   example 1 : Clustering with re-sampling statistics


Heatmap of the consensus matrix




                                                                                              13 / 20
main   example 1 : Clustering with re-sampling statistics


Gene prioritisation by consensus clustering



Re-sampling using hclust, it=1000, rf=80%


       cluster robustness                                                membership robustness

                                                                                                        cluster3
                                                                               affy_id                mem        affy_id    mem
                                                                             1639896_at               0.68     1641578_at   0.56
                      cluster       rob
                                                                            1640363_a_at              0.54     1623314_at   0.53
                         1       0.4731433
                                                                             1636998_at               0.49     1637035_at   0.36
                         2       0.7704514
                                                                             1631443_at               0.35     1639062_at   0.31
                         3       0.7295124
                                                                             1623977_at               0.31     1627520_at    0.3
                         4       0.7196309
                                                                             1637824_at               0.28     1632882_at   0.27
                         5       0.7033960
                                                                             1624262_at               0.26     1640868_at   0.26
                         6       0.6786388
                                                                             1631872_at               0.26     1637057_at   0.24
                                                                             1625275_at               0.24     1624790_at   0.22
                                                                             1635227_at               0.08     1623462_at   0.07
                                                                             1635462_at               0.03     1628430_at   0.03
                                                                             1626059_at               0.02

there are 8 out of 23 genes with <25% conservation in the cluster




                                                                                                                              14 / 20
main   example 2 : TFBS and CRM detection on the genomic scale


An example of intersecting a state list with developmental module




          normal                                               high



          low                                                  off




                                                                                                            15 / 20
main   example 2 : TFBS and CRM detection on the genomic scale


cis-regulatory module detection by HMM




after Wu and Xie, JCB 2008
                                                                                                          16 / 20
main   example 2 : TFBS and CRM detection on the genomic scale


TFBS binding probability calculation with a Bayesian integration framework



         Mulitple prior data sources are combined in a probabilistic model to predict the
         probability of TF binding
                  PWMs, ChIP-ChIP, Chip-Seq, damID, conservation, nucleosome positioning, regulatory potential...




after Lahdesmaki et al. PLoSOne, 2008




                                                                                                                           17 / 20
summary


Summary




    Benefits of ECDF use for biological data analysis
          Easy to use (honestly)
          Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab...
          Most common bioinformatic problems are similar analyses performed many times -> batch arrays
          Often minimum re-coding needed
          Free up workstations and local nodes, allow wider exploration of parameter space
          Allow genome scale screening with multiple data sources
    Current limitations of ECDF use for biological data analysis
          Few computational biology algorithms are written for parallel processing
          Loading large datasets can be problematic (memory limits)
          Not generally accessible to the ’general user’ (although biological applications using GRID technologies are
          appearing)




                                                                                                                   18 / 20
summary


Summary




    Benefits of ECDF use for biological data analysis
          Easy to use (honestly)
          Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab...
          Most common bioinformatic problems are similar analyses performed many times -> batch arrays
          Often minimum re-coding needed
          Free up workstations and local nodes, allow wider exploration of parameter space
          Allow genome scale screening with multiple data sources
    Current limitations of ECDF use for biological data analysis
          Few computational biology algorithms are written for parallel processing
          Loading large datasets can be problematic (memory limits)
          Not generally accessible to the ’general user’ (although biological applications using GRID technologies are
          appearing)




                                                                                                                   19 / 20
Acknowledgements




                   University of Edinburgh
                   Centre for Integrative Physiology
                   Andrew Jarman
                   Douglas Armstrong
                   Ian Simpson
                   Petra zur Lage
                   Lynn Powell
                   Sebastian Cachero
                   Lina Ma
                   Fay Newton
                   Guiseppe Gallone
                   Daniel Moore
                   Sadie Kemp




                                                       20 / 20

Weitere ähnliche Inhalte

Was ist angesagt?

The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeJustin Johnson
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiomejukais
 
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASAmin Mohamed
 
Goodwin2016 ngs 10 years
Goodwin2016 ngs 10 yearsGoodwin2016 ngs 10 years
Goodwin2016 ngs 10 yearsPrakash Koringa
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08Computer Science Club
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialThomas Keane
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyLex Nederbragt
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsGolden Helix Inc
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishingNikolay Vyahhi
 
Transposagen Q3 2012 Overview
Transposagen Q3 2012 OverviewTransposagen Q3 2012 Overview
Transposagen Q3 2012 OverviewAVIVE, INC.
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysismikaelhuss
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisJunsu Ko
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionJatinder Singh
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single CellQIAGEN
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngsDin Apellidos
 

Was ist angesagt? (18)

The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single MoleculeThe Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
The Next, Next Generation of Sequencing - From Semiconductor to Single Molecule
 
Ngs microbiome
Ngs microbiomeNgs microbiome
Ngs microbiome
 
Dna cloning
Dna cloningDna cloning
Dna cloning
 
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWASEvaluation of Pool-Seq as a cost-effective alternative to GWAS
Evaluation of Pool-Seq as a cost-effective alternative to GWAS
 
Goodwin2016 ngs 10 years
Goodwin2016 ngs 10 yearsGoodwin2016 ngs 10 years
Goodwin2016 ngs 10 years
 
20100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture0820100516 bioinformatics kapushesky_lecture08
20100516 bioinformatics kapushesky_lecture08
 
Rna seq
Rna seqRna seq
Rna seq
 
Exome Sequencing
Exome SequencingExome Sequencing
Exome Sequencing
 
ECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing TutorialECCB 2010 Next-gen sequencing Tutorial
ECCB 2010 Next-gen sequencing Tutorial
 
Combining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assemblyCombining PacBio with short read technology for improved de novo genome assembly
Combining PacBio with short read technology for improved de novo genome assembly
 
Knowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and VariantsKnowing Your NGS Upstream: Alignment and Variants
Knowing Your NGS Upstream: Alignment and Variants
 
Assembly and finishing
Assembly and finishingAssembly and finishing
Assembly and finishing
 
Transposagen Q3 2012 Overview
Transposagen Q3 2012 OverviewTransposagen Q3 2012 Overview
Transposagen Q3 2012 Overview
 
RNA-seq differential expression analysis
RNA-seq differential expression analysisRNA-seq differential expression analysis
RNA-seq differential expression analysis
 
Kogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysisKogo 2013 RNA-seq analysis
Kogo 2013 RNA-seq analysis
 
RNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential ExpressionRNASeq - Analysis Pipeline for Differential Expression
RNASeq - Analysis Pipeline for Differential Expression
 
DNA Sequencing from Single Cell
DNA Sequencing from Single CellDNA Sequencing from Single Cell
DNA Sequencing from Single Cell
 
2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs2011 jeroen vanhoudt_ngs
2011 jeroen vanhoudt_ngs
 

Andere mochten auch

Hämeenlinna 21.5.2010
Hämeenlinna 21.5.2010Hämeenlinna 21.5.2010
Hämeenlinna 21.5.2010Media Maja
 
Genetic Disorders
Genetic DisordersGenetic Disorders
Genetic Disordersguest8088b5
 
Friend Oslo 2012-09-09
Friend Oslo 2012-09-09Friend Oslo 2012-09-09
Friend Oslo 2012-09-09Sage Base
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Sage Base
 
The Value Of Genomic Predictions in Beef Cattle
The Value Of Genomic Predictions in Beef CattleThe Value Of Genomic Predictions in Beef Cattle
The Value Of Genomic Predictions in Beef CattleJared Decker
 
New genetic tests for women who are expecting
New genetic tests for women who are expecting  New genetic tests for women who are expecting
New genetic tests for women who are expecting MaxiMedRx
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Altuna Akalin
 
Forum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeForum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeJoaquin Dopazo
 
Bioinformatics
BioinformaticsBioinformatics
BioinformaticsJTADrexel
 

Andere mochten auch (10)

Hämeenlinna 21.5.2010
Hämeenlinna 21.5.2010Hämeenlinna 21.5.2010
Hämeenlinna 21.5.2010
 
Genetic Disorders
Genetic DisordersGenetic Disorders
Genetic Disorders
 
Friend Oslo 2012-09-09
Friend Oslo 2012-09-09Friend Oslo 2012-09-09
Friend Oslo 2012-09-09
 
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
Stephen Friend Complex Traits: Genomics and Computational Approaches 2012-02-23
 
Donna Gitter, "Informed Consent and Privacy of De-Identified and Estimated Da...
Donna Gitter, "Informed Consent and Privacy of De-Identified and Estimated Da...Donna Gitter, "Informed Consent and Privacy of De-Identified and Estimated Da...
Donna Gitter, "Informed Consent and Privacy of De-Identified and Estimated Da...
 
The Value Of Genomic Predictions in Beef Cattle
The Value Of Genomic Predictions in Beef CattleThe Value Of Genomic Predictions in Beef Cattle
The Value Of Genomic Predictions in Beef Cattle
 
New genetic tests for women who are expecting
New genetic tests for women who are expecting  New genetic tests for women who are expecting
New genetic tests for women who are expecting
 
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
Computational genomics course poster 2015 (BIMSB/MDC-Berlin)
 
Forum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decadeForum on Personalized Medicine: Challenges for the next decade
Forum on Personalized Medicine: Challenges for the next decade
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 

Ähnlich wie Computational approaches to the regulatory genomics of neurogenesis

Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysayeshasattarsandhu
 
2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune SystemsLeandro de Castro
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Natalio Krasnogor
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018David Cook
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsNatalio Krasnogor
 
Thesis def
Thesis defThesis def
Thesis defJay Vyas
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmIOSR Journals
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsFrancis Rowland
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataIRJET Journal
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesCSCJournals
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersIJCSEA Journal
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomicsAjit Shinde
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017David Cook
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1abc
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing codeISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing codeKengo Sato
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationSaigeRutherford
 

Ähnlich wie Computational approaches to the regulatory genomics of neurogenesis (20)

Microarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarraysMicroarray biotechnologg ppy dna microarrays
Microarray biotechnologg ppy dna microarrays
 
2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems2005: A Matlab Tour on Artificial Immune Systems
2005: A Matlab Tour on Artificial Immune Systems
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018scRNA-Seq Workshop Presentation - Stem Cell Network 2018
scRNA-Seq Workshop Presentation - Stem Cell Network 2018
 
Integrative Networks Centric Bioinformatics
Integrative Networks Centric BioinformaticsIntegrative Networks Centric Bioinformatics
Integrative Networks Centric Bioinformatics
 
Microarray Analysis
Microarray AnalysisMicroarray Analysis
Microarray Analysis
 
Thesis def
Thesis defThesis def
Thesis def
 
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...
 
An Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution AlgorithmAn Adaptive Masker for the Differential Evolution Algorithm
An Adaptive Masker for the Differential Evolution Algorithm
 
Complementing Computation with Visualization in Genomics
Complementing Computation with Visualization in GenomicsComplementing Computation with Visualization in Genomics
Complementing Computation with Visualization in Genomics
 
CCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression DataCCC-Bicluster Analysis for Time Series Gene Expression Data
CCC-Bicluster Analysis for Time Series Gene Expression Data
 
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA SequencesA Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
A Comparative Analysis of Feature Selection Methods for Clustering DNA Sequences
 
An Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal ClustersAn Automatic Clustering Technique for Optimal Clusters
An Automatic Clustering Technique for Optimal Clusters
 
Functional genomics
Functional genomicsFunctional genomics
Functional genomics
 
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
scRNA-Seq Lecture - Stem Cell Network RNA-Seq Workshop 2017
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Fuzzy clustering1
Fuzzy clustering1Fuzzy clustering1
Fuzzy clustering1
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing codeISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
ISMB2014読み会 イントロ + Deep learning of the tissue-regulated splicing code
 
Developmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual VariationDevelopmental Mega Sample: Exploring Inter-Individual Variation
Developmental Mega Sample: Exploring Inter-Individual Variation
 

Kürzlich hochgeladen

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpinRaunakKeshri1
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 

Kürzlich hochgeladen (20)

Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Student login on Anyboli platform.helpin
Student login on Anyboli platform.helpinStudent login on Anyboli platform.helpin
Student login on Anyboli platform.helpin
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 

Computational approaches to the regulatory genomics of neurogenesis

  • 1. Computational approaches to the regulatory genomics of neurogenesis Dr. Ian Simpson Centre for Integrative Physiology University of Edinburgh Edinburgh Neuroscience Day, March 2010 1 / 20
  • 2. Introduction animal model of neurogenesis Anatomy of the Drosophila PNS - Sense organs 2 / 20
  • 3. Introduction animal model of neurogenesis Development of the Drosophila PNS 3 / 20
  • 4. main gene regulatory networks GRN for endomesoderm specification in the Sea Urchin from Peter and Davidson (2009) 4 / 20
  • 5. main scale and complexity How to study gene regulatory networks ? High throughput gene expression experiments analysing c.15,000 genes on c.100 chips (scale) profile, temporal, spatial, cell-type (complex) Predicting transcription factor binding sites (TFBSs) genomic search space (scale) 100s-1000s of PWMs (TFBS profiles) (scale) multiple TFBSs arranged combinatorially (complex) multiple evidence types to integrate, phylogenetic, protein interaction, genome localisation (complex) identifying cis-regulatory modules (complex) 5 / 20
  • 6. main scale and complexity How to study gene regulatory networks ? High throughput gene expression experiments analysing c.15,000 genes on c.100 chips (scale) profile, temporal, spatial, cell-type (complex) Predicting transcription factor binding sites (TFBSs) genomic search space (scale) 100s-1000s of PWMs (TFBS profiles) (scale) multiple TFBSs arranged combinatorially (complex) multiple evidence types to integrate, phylogenetic, protein interaction, genome localisation (complex) identifying cis-regulatory modules (complex) 6 / 20
  • 7. main example 1 : Clustering with re-sampling statistics Gene expression profiles of cells expressing atonal 7 / 20
  • 8. main example 1 : Clustering with re-sampling statistics An example annotated cluster cluster membership Cluster Size C1 13 C2 36 C3 23 C4 16 C5 65 C6 6 cluster 3 Sensory Organ Development GO:0007423 (p=6e-6) Gene name argos ato CG6330 CG31464 CG13653 nrm unc sca rho ImpL3 CG11671 CG7755 CG16815 CG15704 CG32150 knrl CG32037 Toll-6 phyl nvy cato 8 / 20
  • 9. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 9 / 20
  • 10. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 10 / 20
  • 11. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 11 / 20
  • 12. main example 1 : Clustering with re-sampling statistics Consensus clustering, a method to assess the quality of clustering The basic approach iterate thousands of clustering experiments with sub-samples of the data calculate the average connectivity of any two members - consensus matrix derive the robustness of the clusters and their members from the consensus matrix The problem huge parameter space (cluster number, distance metric, sample proportion...) huge number of different algorithms to chose from large dataset, multiple conditions to test The solution Break each iteration (individual clustering experiment) into a single process Batch the processes out to nodes on Eddie/ECDF (batch array) Collate back into consensus matrices and calculate robustness measures R-package for consensus clustering - clusterCons available from CRAN and sourceforge (http://bit.ly/clusterCons) 12 / 20
  • 13. main example 1 : Clustering with re-sampling statistics Heatmap of the consensus matrix 13 / 20
  • 14. main example 1 : Clustering with re-sampling statistics Gene prioritisation by consensus clustering Re-sampling using hclust, it=1000, rf=80% cluster robustness membership robustness cluster3 affy_id mem affy_id mem 1639896_at 0.68 1641578_at 0.56 cluster rob 1640363_a_at 0.54 1623314_at 0.53 1 0.4731433 1636998_at 0.49 1637035_at 0.36 2 0.7704514 1631443_at 0.35 1639062_at 0.31 3 0.7295124 1623977_at 0.31 1627520_at 0.3 4 0.7196309 1637824_at 0.28 1632882_at 0.27 5 0.7033960 1624262_at 0.26 1640868_at 0.26 6 0.6786388 1631872_at 0.26 1637057_at 0.24 1625275_at 0.24 1624790_at 0.22 1635227_at 0.08 1623462_at 0.07 1635462_at 0.03 1628430_at 0.03 1626059_at 0.02 there are 8 out of 23 genes with <25% conservation in the cluster 14 / 20
  • 15. main example 2 : TFBS and CRM detection on the genomic scale An example of intersecting a state list with developmental module normal high low off 15 / 20
  • 16. main example 2 : TFBS and CRM detection on the genomic scale cis-regulatory module detection by HMM after Wu and Xie, JCB 2008 16 / 20
  • 17. main example 2 : TFBS and CRM detection on the genomic scale TFBS binding probability calculation with a Bayesian integration framework Mulitple prior data sources are combined in a probabilistic model to predict the probability of TF binding PWMs, ChIP-ChIP, Chip-Seq, damID, conservation, nucleosome positioning, regulatory potential... after Lahdesmaki et al. PLoSOne, 2008 17 / 20
  • 18. summary Summary Benefits of ECDF use for biological data analysis Easy to use (honestly) Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab... Most common bioinformatic problems are similar analyses performed many times -> batch arrays Often minimum re-coding needed Free up workstations and local nodes, allow wider exploration of parameter space Allow genome scale screening with multiple data sources Current limitations of ECDF use for biological data analysis Few computational biology algorithms are written for parallel processing Loading large datasets can be problematic (memory limits) Not generally accessible to the ’general user’ (although biological applications using GRID technologies are appearing) 18 / 20
  • 19. summary Summary Benefits of ECDF use for biological data analysis Easy to use (honestly) Can execute jobs in familiar languages: C,C++,Perl/BioPerl, R, Matlab... Most common bioinformatic problems are similar analyses performed many times -> batch arrays Often minimum re-coding needed Free up workstations and local nodes, allow wider exploration of parameter space Allow genome scale screening with multiple data sources Current limitations of ECDF use for biological data analysis Few computational biology algorithms are written for parallel processing Loading large datasets can be problematic (memory limits) Not generally accessible to the ’general user’ (although biological applications using GRID technologies are appearing) 19 / 20
  • 20. Acknowledgements University of Edinburgh Centre for Integrative Physiology Andrew Jarman Douglas Armstrong Ian Simpson Petra zur Lage Lynn Powell Sebastian Cachero Lina Ma Fay Newton Guiseppe Gallone Daniel Moore Sadie Kemp 20 / 20