SlideShare ist ein Scribd-Unternehmen logo
1 von 46
Ondex – Data integration and 			visualisation Catherine Canevet Rothamsted Research London Biogeeks – May Tech Meet
Rothamsted Research North Wyke ,[object Object]
Almost certainly the oldest in the world (started in 1843)
350 Scientific staff
Open weekend May 22nd-23rd 11am-5pmwww.rothamsted.ac.uk/openweekend/
Outline ,[object Object]
 Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
 Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
Genomics, transcriptomics, proteomics, metabolomics, …
The biological systems span multiple levels of biological organisation
Non-trivial to integrate the data 2 main challenges
Syntactic integration challenge Over 1000 databases freely available to public Over 60 million sequences in GenBank Over 870 complete genomes and many ongoing projects Over 17 million citations in PubMed PubMed growth by 600,000 publications each year Integration of Life Science data sources is essential for Systems Biology research http://www.ncbi.nlm.nih.gov/Database
Ear Semantic Integration challenge Same concept different names Synonyms Same name different concepts Homographs
Outline ,[object Object]
Data integration in Ondex
 Data visualisation in Ondex and application cases,[object Object]
Concepts and relations (1/2) interact Cell Protein – Protein interaction network (PPI) Cellular location of proteins Protein Protein e.g. Network of Concepts and Relations RelationType interact located in ConceptClass ConceptClass Protein CelComp Protein Protein Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …  Ontology of Concept Classes, Relation Types and additional Properties
Reaction Reaction produced by consumed by consumed by produced by Metabolite Metabolite Metabolite Concepts and relations (2/2) Transformation to binary graph Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum …  Concepts: Relations:
Data integration in Ondex Data Integration Data Input Graph of concepts and relations  Biological Databases Import Ontologies & Free Text Data alignment ,[object Object]
 Sequence analysis
 Text miningExperimental Data
Importing data into Ondex What databases to import What format these are in Ondex parsers already written Generic OBO, PSI-MI, SBML, Tab-delimited, Fasta Database-specific Aracyc, AtRegNet, BioCyc, BioGRID, Brenda, Drastic, EcoCyc, GO, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, Oglycbase, OMIM, PDB, Pfam, SGD, TAIR, TIGR, Transfac, Transpath, UniProt, WGS, WordNet
Example of resulting graph Has similar sequence Target sequence Binds to, has similar sequence Repressed by, regulated by, activated by Member is part of Gene Protein Encoded by Is_a Member is part of Is_a Transcription factor Is_a Member is part of Enzyme Protein complex Is_a catalyses Catalysing class Member is part of Reaction Member is part of EC Is_a Pathway
Ondex Data Integration Scheme Treatments from DRASTIC Graph alignment Pathways from KEGG Data input& transformation Data integration Visualisation Clients/Tools Heterogeneous  data sources Ondexgraph warehouse Integration Methods Ondex Visualization  Tool Kit UniProt Accession Generalized Object Data Model Database Layer Parser Name based Web Client AraCyc Parser Transitive Taverna KEGG Blast Parser ProteinFamily Transfac Data Exchange Parser Pfam2GO OXL/RDF Microarray Lucene Parser Web Service
Semantic Integration by Graph Alignment Create relations between equivalent entries from different data sources Identified by mapping methods Concept accessions (UniProt ID) Concept name (gene name), synonyms Sequence methods Graph neighbourhood Text mining
Outline ,[object Object]
Data integration in Ondex
Data visualisation in Ondex and application cases,[object Object]
Complexity of interactions
PPI, co-expression, 	co-citation, … ,[object Object]
Candidate gene prioritisation and pathway discovery Use Ondex tools (filters, annotators, layouts …)
Filters Integrating different datasets   large resulting graph Need to narrow down Select meaningful areas of the graph Example in Ondex protein-protein interaction network
Filters in Ondex Protein protein interactions measured using quantitative techniques ,[object Object]
 Threshold filter,[object Object]
http://www.phi-base.org/ ,[object Object]
Loss of pathogenicity
Reduced virulence
Only genes validated by gene disruption experiments,[object Object]
Integrated phenotype and comparative genome information
Annotators (1/3) ,[object Object]
Colour
Shape
Size,[object Object]
Annotators (2/3) Virtual Knock-out Annotator to see how important a single concept is to all possible paths contained in a network  Ondex resizes the concepts based on this score Scale Concept by Value  Pie charts Up/down regulation is indicated in red/green

Weitere ähnliche Inhalte

Was ist angesagt?

Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
Shruthi Choudary
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databases
cschlos2
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
Chris Evelo
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
Daniela Rotariu
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
EBI
 

Was ist angesagt? (20)

Bioinformatics introduction
Bioinformatics introductionBioinformatics introduction
Bioinformatics introduction
 
Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Bioinformatics Final Report
Bioinformatics Final ReportBioinformatics Final Report
Bioinformatics Final Report
 
Analysis with biological pathways:
Analysis with biological pathways: Analysis with biological pathways:
Analysis with biological pathways:
 
Bioinformatics Databases
Bioinformatics DatabasesBioinformatics Databases
Bioinformatics Databases
 
Emerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomicsEmerging challenges in data-intensive genomics
Emerging challenges in data-intensive genomics
 
Bioinformatics-General_Intro
Bioinformatics-General_IntroBioinformatics-General_Intro
Bioinformatics-General_Intro
 
LECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICSLECTURE NOTES ON BIOINFORMATICS
LECTURE NOTES ON BIOINFORMATICS
 
Using ontologies to do integrative systems biology
Using ontologies to do integrative systems biologyUsing ontologies to do integrative systems biology
Using ontologies to do integrative systems biology
 
Databases in Bioinformatics
Databases in BioinformaticsDatabases in Bioinformatics
Databases in Bioinformatics
 
Literature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resourcesLiterature Based Framework for Semantic Descriptions of e-Science resources
Literature Based Framework for Semantic Descriptions of e-Science resources
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...Using biological network approaches for dynamic extension of micronutrient re...
Using biological network approaches for dynamic extension of micronutrient re...
 
BioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomicsBioInformatics Tools -Genomics , Proteomics and metablomics
BioInformatics Tools -Genomics , Proteomics and metablomics
 
The Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of BioinformaticsThe Seven Deadly Sins of Bioinformatics
The Seven Deadly Sins of Bioinformatics
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Project report-on-bio-informatics
Project report-on-bio-informaticsProject report-on-bio-informatics
Project report-on-bio-informatics
 
Tools of bioinforformatics by kk
Tools of bioinforformatics by kkTools of bioinforformatics by kk
Tools of bioinforformatics by kk
 
UniProt-GOA
UniProt-GOAUniProt-GOA
UniProt-GOA
 
introduction of Bioinformatics
introduction of Bioinformaticsintroduction of Bioinformatics
introduction of Bioinformatics
 

Ähnlich wie Ondex: Data integration and visualisation

UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
Chimezie Ogbuji
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
Atai Rabby
 

Ähnlich wie Ondex: Data integration and visualisation (20)

Prediction of protein function
Prediction of protein functionPrediction of protein function
Prediction of protein function
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
UniProt and the Semantic Web
UniProt and the Semantic WebUniProt and the Semantic Web
UniProt and the Semantic Web
 
String.pptx
String.pptxString.pptx
String.pptx
 
Informal presentation on bioinformatics
Informal presentation on bioinformaticsInformal presentation on bioinformatics
Informal presentation on bioinformatics
 
The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects The swings and roundabouts of a decade of fun and games with Research Objects
The swings and roundabouts of a decade of fun and games with Research Objects
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 
OVium Bioinformatic Solutions
OVium Bioinformatic SolutionsOVium Bioinformatic Solutions
OVium Bioinformatic Solutions
 
Protein databases
Protein databasesProtein databases
Protein databases
 
Major biological nucleotide databases
Major biological nucleotide databasesMajor biological nucleotide databases
Major biological nucleotide databases
 
T-BioInfo Methods and Approaches
T-BioInfo Methods and ApproachesT-BioInfo Methods and Approaches
T-BioInfo Methods and Approaches
 
T-bioinfo overview
T-bioinfo overviewT-bioinfo overview
T-bioinfo overview
 
Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]Bio onttalk 30minutes-june2003[1]
Bio onttalk 30minutes-june2003[1]
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...Una estrategia para la integración de ontologías, servicios web y PLN en el a...
Una estrategia para la integración de ontologías, servicios web y PLN en el a...
 
gky1131.pdf
gky1131.pdfgky1131.pdf
gky1131.pdf
 
Introduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdfIntroduction to Bioinformatics-1.pdf
Introduction to Bioinformatics-1.pdf
 
Ontologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontologyOntologies for life sciences: examples from the gene ontology
Ontologies for life sciences: examples from the gene ontology
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 

Mehr von Biogeeks

Mehr von Biogeeks (6)

Perl cures coronary heart disease
Perl cures coronary heart diseasePerl cures coronary heart disease
Perl cures coronary heart disease
 
Poing: a coder’s take on protein modelling
Poing: a coder’s take on protein modellingPoing: a coder’s take on protein modelling
Poing: a coder’s take on protein modelling
 
Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...Identifying genes and proteins in text: a short review of available tools and...
Identifying genes and proteins in text: a short review of available tools and...
 
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
DASbrick: A cloud based Rich internet application for Synthetic Biology Parts...
 
Building your own search engine with Apache Solr
Building your own search engine with Apache SolrBuilding your own search engine with Apache Solr
Building your own search engine with Apache Solr
 
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU supportABC-SysBio – Approximate Bayesian Computation in Python with GPU support
ABC-SysBio – Approximate Bayesian Computation in Python with GPU support
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Kürzlich hochgeladen (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 

Ondex: Data integration and visualisation

  • 1. Ondex – Data integration and visualisation Catherine Canevet Rothamsted Research London Biogeeks – May Tech Meet
  • 2.
  • 3. Almost certainly the oldest in the world (started in 1843)
  • 5. Open weekend May 22nd-23rd 11am-5pmwww.rothamsted.ac.uk/openweekend/
  • 6.
  • 8.
  • 10.
  • 12. The biological systems span multiple levels of biological organisation
  • 13. Non-trivial to integrate the data 2 main challenges
  • 14. Syntactic integration challenge Over 1000 databases freely available to public Over 60 million sequences in GenBank Over 870 complete genomes and many ongoing projects Over 17 million citations in PubMed PubMed growth by 600,000 publications each year Integration of Life Science data sources is essential for Systems Biology research http://www.ncbi.nlm.nih.gov/Database
  • 15. Ear Semantic Integration challenge Same concept different names Synonyms Same name different concepts Homographs
  • 16.
  • 18.
  • 19. Concepts and relations (1/2) interact Cell Protein – Protein interaction network (PPI) Cellular location of proteins Protein Protein e.g. Network of Concepts and Relations RelationType interact located in ConceptClass ConceptClass Protein CelComp Protein Protein Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Ontology of Concept Classes, Relation Types and additional Properties
  • 20. Reaction Reaction produced by consumed by consumed by produced by Metabolite Metabolite Metabolite Concepts and relations (2/2) Transformation to binary graph Properties: compound name, protein sequence, protein structure, cellular component, KM-value, PH optimum … Concepts: Relations:
  • 21.
  • 24. Importing data into Ondex What databases to import What format these are in Ondex parsers already written Generic OBO, PSI-MI, SBML, Tab-delimited, Fasta Database-specific Aracyc, AtRegNet, BioCyc, BioGRID, Brenda, Drastic, EcoCyc, GO, GOA, Gramene, Grassius, KEGG, Medline, MetaCyc, Oglycbase, OMIM, PDB, Pfam, SGD, TAIR, TIGR, Transfac, Transpath, UniProt, WGS, WordNet
  • 25. Example of resulting graph Has similar sequence Target sequence Binds to, has similar sequence Repressed by, regulated by, activated by Member is part of Gene Protein Encoded by Is_a Member is part of Is_a Transcription factor Is_a Member is part of Enzyme Protein complex Is_a catalyses Catalysing class Member is part of Reaction Member is part of EC Is_a Pathway
  • 26. Ondex Data Integration Scheme Treatments from DRASTIC Graph alignment Pathways from KEGG Data input& transformation Data integration Visualisation Clients/Tools Heterogeneous data sources Ondexgraph warehouse Integration Methods Ondex Visualization Tool Kit UniProt Accession Generalized Object Data Model Database Layer Parser Name based Web Client AraCyc Parser Transitive Taverna KEGG Blast Parser ProteinFamily Transfac Data Exchange Parser Pfam2GO OXL/RDF Microarray Lucene Parser Web Service
  • 27. Semantic Integration by Graph Alignment Create relations between equivalent entries from different data sources Identified by mapping methods Concept accessions (UniProt ID) Concept name (gene name), synonyms Sequence methods Graph neighbourhood Text mining
  • 28.
  • 30.
  • 32.
  • 33. Candidate gene prioritisation and pathway discovery Use Ondex tools (filters, annotators, layouts …)
  • 34. Filters Integrating different datasets  large resulting graph Need to narrow down Select meaningful areas of the graph Example in Ondex protein-protein interaction network
  • 35.
  • 36.
  • 37.
  • 40.
  • 41. Integrated phenotype and comparative genome information
  • 42.
  • 44. Shape
  • 45.
  • 46. Annotators (2/3) Virtual Knock-out Annotator to see how important a single concept is to all possible paths contained in a network Ondex resizes the concepts based on this score Scale Concept by Value Pie charts Up/down regulation is indicated in red/green
  • 47. AraCyc ONDEX Application case2: Mapping microarray expression data to integrated pathways Parser tab file Arabidopsis C/N uptake OXL tab file Jan Taubert Accession based Mapping usingTAIR IDs Ondex Interactive exploration Enriched spreadsheet, e.g. AraCyc pathways
  • 48.
  • 49.
  • 53. Network diameter Add annotation to the graph
  • 54. Application case 3: Arabidopsis PPI network Artem Lysenko IntAct TAIR BioGRID  Mapping the 3 databases based on TAIR accessions
  • 55. Adding 3 sources of evidence co-expression sequence similarity co-occurrence in scientific literature  facilitate the identification of functionally related groups of proteins
  • 56. Added attributes to nodes/edges Network stats Betweenness centrality (BWC)  How influential (bridge) Degree centrality (DC)  Hub likeness Markov Clustering Identifies strongly connected groups of proteins in the network
  • 57.
  • 58. Degree centrality repr. by node size
  • 59. Betweenness centrality repr. by node colourArtem Lysenko
  • 60. Filters, annotators and layouts Combination of these three types of tools in Ondex  a more complex application case …
  • 61. Application case 4: Bioenergy Project Use bioinformatics to support phenotype-genotype research in bioenergy crops Given a phenotypic variant is it possible to pin down the relevant genes? Develop tools to support systematic analysis of QTL regions to pin down relevant genes Identify genes implicated in biomass production in willow Prioritise genes for experimental validation Keywan Hassani-Pak Biofuel Conversion Process http://www.jgi.doe.gov/education/bioenergy/bioenergy_1.html
  • 62. QTL and Genomic Data QTL Willow genome is not sequenced yetQTL may encompass many potentialcandidates, perhaps hundreds Poplar is the first tree with fully sequenced genome 19 Chromosomes, 45778 predicted genes 4x larger than Arabidopsis genome Not much known about the function of the genes
  • 63. Linking genes to data sources Linked References model e.g. Poplar, Arabidopsis Willow Pathways Plant Hormones QTL Map Orthologous Markers Physical map Expression Patterns Genes Gene Function List of candidate genes linked to biological processes
  • 64. Relevant Data Sources Release 15.10 Poplar Gene Prediction v2.0 (Jan 2010) All plants: 739,396 proteins Reviewed: 28,404 proteins (3,84%) PoplarCyc 1.0: 285 pathways, 3434 enzymes, 1363 compounds (Oct 2009) Pfam 24.0: 11,912 protein families (Oct 2009) Poplar Transcription Factors - DPTF: 2,576 putative TF (March 2007) - PlnTFDB: 2,901 putative TF (July 2009) 29,365 GO terms (Jan 2010) Poplar/ Willow QTL - work in progress - preliminary dataset available Only loading referenced publications ~15,000 articles
  • 65. Unique Knowledge Base for Poplar Proteins annotated with functional information and publications Based on Comparative genomics and Protein familyanalysis Genes, QTLs enriched withpositionalinformation Data integration was done in Ondex
  • 66. Ondex Genomics Layout Genomic Layout displays chromosomes, genes and QTLs Chromosomal regions and QTLs can be selected
  • 67. Ondex Genomics Filter Genes of interest Enriched protein annotation network
  • 68. Phenotypic Information in Literature HMMer: 650581 – HLH E-Value: 3.4E-7 Score: 30.0 BLAST 217086 – LAX E-Value: 8.3E-17 Score: 80.88 BLAST 217086 – BHLH63 E-Value: 8.3E-9 Score: 54.3 PMID:13130077 “LAX and SPA: major regulators of shoot branching in rice.” Poplar protein 217086 We identified two remote homologs in Rice (LAX) and in Arabidopsis (BHLH63), as well as one protein domain HLH The LAX homolog contains evidence to be a major regulator of shoot branching  Hypothesis generation
  • 69.
  • 71. Text miningExperimental Data Hypothesis New experiments
  • 72.
  • 81.
  • 91.
  • 101.
  • 106.
  • 107.
  • 109.

Hinweis der Redaktion

  1. Light pink – Increased virulenceLight blue – Reduced virulenceLight Green – Loss of pathogenicityYellow – Unaffected pathogenicityStar – animalCircle – plant
  2. Virtual KO scoreis based on 3 other scores: - "extension" gives the number of paths that would be extended if a concept was added- "deletion" gives the number of paths that would be deleted if this concept was deleted- "nochange" gives the number of paths that would not be shortened/extended if this concept was deleted
  3. IntAct4625 protein interactions (data derived from literature curation or direct user submissions)TAIR (The Arabidopsis Information Resource) – 1143 interactionsgenome sequence, gene structure, gene product information, metabolism, gene expression, DNA and seed stocks, genome maps, genetic and physical markers, publicationsBioGrid (General Repository for Interaction Datasets)collections of protein and genetic interactions from major model organism species1223 interactions for Arabidopsis derived from high-throughput studies and conventional focused studies
  4. ATTED II (Arabidopsis thalianatrans-factor and cis-element prediction database)provides co-regulated gene relationships in Arabidopsis to estimate gene functionsgives the Pearson correlation coefficients of co-expressed genes in Arabidopsis calculated from available microarray dataNCBI PSI-BLASTidentify similarities between our reference set of proteinsMatching against Arabidopsis subset of UNIPROTCo-occurrence of protein names25,900 Medline abstracts related to Arabidopsis ThalianaIntegrated Lucene-based mapping method
  5. Solid biomass (in the form of plants and trees) can be converted into liquid fuels (such as ethanol, methanol, and biodiesel)The challenge lies in efficient conversion,creating more energy than the input required to produce itincrease biomass yieldDevelop means to support systematic analysis of QTL regions and prioritise genes for experimental analyses identify genes controlling biomass production in willow
  6. QTL are genomic regions that assign variations observed in a phenotype to a region on the genetic mapBiomass traits: branching, height, leaf number etc.Going from Willow to Poplar to Arabidopsis and other species
  7. Reduced hypothesis space from 100 potential candidates to 3 hot candidates.Next steps: Cloning and transformation for experimental validation.