SlideShare ist ein Scribd-Unternehmen logo
1 von 47
  From Data Integration to Data mining in Semantic Web systems chemical biology as a case study Bin Chen School of Informatics and Computing Indiana University at Bloomington Lecture for S636 Nov 17, 2011
Outline ,[object Object],[object Object],[object Object],[object Object]
[object Object],Chemical Biology Systems Phenotype interacting mapping Compound Drug Protein Gene PPI Metabolic Pathway Gene Regulatory Disease Side effect Toxicity Oprea TI, et al, Systems chemical biology, nature, 2007 Chemogenomics
[object Object],MATADOR
Semantic Web ,[object Object],Semantic web Stack http://en.wikipedia.org/wiki/Semantic_Web
SPARQL RDF Ontology Algorithm and tools Applications Experimental Data Text mining Data Chem2Bio2RDF Chem2Bio2OWL Path finding; Association search; Association ranking and prediction Polypharmacology; drug side effect Architecture of Semantic Systems Chemical Biology
Outline ,[object Object],[object Object],[object Object],[object Object]
RDF (Resource Description Framework) ,[object Object],Resource (subject) Value (object) Property (predicate) Drug Lipitor name <RDF> <Description about=&quot;http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB01076&quot;> <name> Lipitor </author> <company>Pfizer</company> </Description> </RDF> company Pfizer http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB01076 URI
Use RDF to Integrate Data http://chem2bio2rdf.org/drugbank/DB01076 name company lipitor Pfizer http://chem2bio2rdf.org/drugbank/DB01076 Molecular_Weight formula 558.6398 C33H35FN2O5 Database 1 Database 2 Same URI, merged!
Use RDF to Link Data http://chem2bio2rdf.org/drugbank/DB01076 sameAs http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01076 http://chem2bio2rdf.org/pubchem/resource/pubchem_compound/60823 cid
uniprot Bio2RDF Others LODD Chem2Bio2RDF Virtuoso Triple store SPARQL ENDPOINTS Dereferenable URI Browsing PlotViz: Visualization Cytoscape Plugin Linked Path Generation and Ranking Third party tools
Workflow for RDF conversion XML CSV DB TXT Relational DB D2R Mapping D2R server Dumping Virtuoso Triple  Store Scripts Ontology Publishing External Sources Download Local copy … Chen B,et al. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics. 2010
# Table c2b2r_DrugBankDrug map:c2b2r_DrugBankDrug a d2rq:ClassMap; d2rq:dataStorage map:database; d2rq:uriPattern &quot;drugbank_drug/@@c2b2r_DrugBankDrug.DBID|urlify@@&quot;; d2rq:class drugbank:DrugBankDrug; d2rq:classDefinitionLabel &quot;c2b2r_DrugBankDrug&quot;; . map:c2b2r_DrugBankDrug__label a d2rq:PropertyBridge; d2rq:belongsToClassMap map:c2b2r_DrugBankDrug; d2rq:property rdfs:label; d2rq:pattern &quot;@@c2b2r_DrugBankDrug.Generic_Name@@&quot;; . map:c2b2r_DrugBankDrug_DBID a d2rq:PropertyBridge; d2rq:belongsToClassMap map:c2b2r_DrugBankDrug; d2rq:property drugbank:DBID; d2rq:propertyDefinitionLabel &quot;c2b2r_DrugBankDrug DBID&quot;; d2rq:column &quot;c2b2r_DrugBankDrug.DBID&quot;; Table D2R mapping RDF Exhibit link
Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the  linkage type. E.g,., the type  compound  includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Chem2Bio2RDF Datasets http://chem2bio2rdf.org Chem2Bio2RDF data Other data venders compound protein/gene chemogenomics literature others
http://linkeddata.org
uniprot Bio2RDF Others LODD Chem2Bio2RDF Virtuoso Triple store SPARQL ENDPOINTS Dereferenable URI Browsing PlotViz: Visualization Cytoscape Plugin Linked Path Generation and Ranking Third party tools
SPARQL ,[object Object]
Implement cheminformatics and bioinformatics tools into SPARQL ARQ Function Extension SPARQL Chemistry Development Kits BioJAVA Web Services PREFIX drugbank: < http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ > PREFIX f: <java:org.bio2chem2rdf.arq.>  SELECT ?x ?s  WHERE { ?x drugbank:smilesStringCanonical ?s FILTER ( f:tanimoto( 'NS(=O)(=O)C1=CC(=C(Cl)C(Cl)=C1)S(N)(=O)=O', ?s, 'MACCS') > 0.9 ) } f:tanimoto is used for compound similarity search
Answer scientific questions ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],More in  http://chem2bio2rdf.wikispaces.com/multiple+sources
link
Outline ,[object Object],[object Object],[object Object],[object Object]
Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the  linkage type. E.g,., the type  compound  includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Chem2Bio2RDF Datasets http://chem2bio2rdf.org Chem2Bio2RDF data Other data venders compound protein/gene chemogenomics literature others
Ontology workflow
Step 1: Hunting for scientific questions and targeting goals ,[object Object],[object Object],[object Object],[object Object],[object Object]
Step 2: Propose framework and basic classes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Step 3: Define classes, relations and data properties ,[object Object],[object Object],[object Object],[object Object],[object Object],http://chem2bio2owl.wikispaces.com/Version+1.0
Step 4: Align with External ontology ,[object Object],[object Object],[object Object],[object Object],[object Object]
Chem2Bio2OWL
Step 5: Populate Chem2Bio2OWL ,[object Object],[object Object],[object Object],[object Object],Chem2Bio2RDF Protégé API Virtuoso Pellet reasoning Chem2Bio2OWL
Step 6: Evaluation---Consistence checking ,[object Object],[object Object]
Step 6: Evaluation---case study ,[object Object],PREFIX c2b2r:  http://chem2bio2rdf.org/chem2bio2rdf.owl# PREFIX bp: <http://www.biopax.org/release/biopax-level3.owl#>  PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>  select distinct ?target from <http://chem2bio2rdf.org/owl#>  where { ?chemical rdfs:label ?drugName ; c2b2r:hasInteraction ?interaction . ?interaction c2b2r:hasTarget [bp:name ?target]; c2b2r:drugTarget true .    FILTER (str(?drugName)=&quot;Troglitazone&quot;) } Annotated Chem2Bio2OWL Mashed Chem2Bio2RDF
Outline ,[object Object],[object Object],[object Object],[object Object],[object Object]
 
Two objects are similar if they are related to similar objects Coauthorship Same Target
Two objects are related if they share same objects or their related objects are related Compound 1 Protein 2  Protein 1 Compound 1 Protein 2  Protein 1 Compound 2 Computer Science Person2 Person 1 Computer Science Person2 paper1 paper2 advisor major publish cite conference
Cmpd1 Protein 1 Protein 2 Cmpd 2 Cmpd 1 Cmpd 2 Protein 1 Neighbor Chemogenomics Chemogenomics Chemogenomics Chemogenomics Protein 2 Cmpd1 Protein 1 Chemogenomics hasGO hasGO Protein 2 Cmpd1 Protein 1 Chemogenomics PPI GO:0001  Sample patterns Cmpd1 Protein 1 Cmpd 2 Chemogenomics hypertension Side effect Side effect Cmpd1 Protein 1 Cmpd 2 Chemogenomics Substructure substructure substructure
Target 2 Compound1 Compound 2 Compound 3 Target 3 GO:00001  hasGO hasGO chemogenomics chemogenomics chemogenomics chemogenomics chemogenomics neighbor Side Effect 1 hasSideEffect hasSideEffect Gene Family 1 hasGeneFamily hasGeneFamily Target 1 chemogenomics Target 4 chemogenomics proteinProteinInteraction Association depends on its neighborhood
 
Statistical Model Convert the question to a path surfing problem Gene i Gene j PPI PPI PPI hasGO hasGO hasPathway chemogenomics P(i  j) =1/3
Protein 2 Cmpd1 (s) Protein 1 (t) e1 e2
[object Object],[object Object],Pattern Samples: Pattern Distribution
Statistical Model 3. Nodes association estimation Raw score of random pairs fit to normal distribution!
Direct: drug target pairs with IC50<30um Indirect: drug target pairs with no interaction Random: random pairs
 
SLAP interface
Acknowledgement ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Thanks!

Weitere ähnliche Inhalte

Was ist angesagt?

PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
Sunghwan Kim
 

Was ist angesagt? (20)

Opening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs apiOpening up pharmacological space, the OPEN PHACTs api
Opening up pharmacological space, the OPEN PHACTs api
 
Revolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and BiologyRevolution in the Connectivity Between Medicinal Chemistry and Biology
Revolution in the Connectivity Between Medicinal Chemistry and Biology
 
Dinesh Barupal @ California Biomonitoring SGP Meeting July 2020
Dinesh Barupal @ California Biomonitoring SGP Meeting July 2020Dinesh Barupal @ California Biomonitoring SGP Meeting July 2020
Dinesh Barupal @ California Biomonitoring SGP Meeting July 2020
 
Analysis and visualization of microarray experiment data integrating Pipeline...
Analysis and visualization of microarray experiment data integrating Pipeline...Analysis and visualization of microarray experiment data integrating Pipeline...
Analysis and visualization of microarray experiment data integrating Pipeline...
 
Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019Metabolic Set Enrichment Analysis - chemrich - 2019
Metabolic Set Enrichment Analysis - chemrich - 2019
 
Desperately seeking DARCP
Desperately seeking DARCPDesperately seeking DARCP
Desperately seeking DARCP
 
Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY Curatorial data wrangling for the Guide to PHARMACOLGY
Curatorial data wrangling for the Guide to PHARMACOLGY
 
Printout webinar r ax costanza 05 05-2020
Printout webinar r ax costanza 05 05-2020Printout webinar r ax costanza 05 05-2020
Printout webinar r ax costanza 05 05-2020
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and EducationGuide to PHARMACOLOGY: a web-Based Compendium for Research and Education
Guide to PHARMACOLOGY: a web-Based Compendium for Research and Education
 
Antimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosureAntimalarial drug dscovery data disclosure
Antimalarial drug dscovery data disclosure
 
PubChem for chemical information literacy training
PubChem for chemical information literacy trainingPubChem for chemical information literacy training
PubChem for chemical information literacy training
 
Capturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor dataCapturing BIA-10-2474 and related FAAH inhibitor data
Capturing BIA-10-2474 and related FAAH inhibitor data
 
Synthesizing Big Data into Actionable Knowledge
Synthesizing Big Data into Actionable KnowledgeSynthesizing Big Data into Actionable Knowledge
Synthesizing Big Data into Actionable Knowledge
 
Mapping metabolites against pathway databases
Mapping metabolites against pathway databases Mapping metabolites against pathway databases
Mapping metabolites against pathway databases
 
Querying Bio2RDF data
Querying Bio2RDF dataQuerying Bio2RDF data
Querying Bio2RDF data
 
How can you access PubChem programmatically?
How can you access PubChem programmatically?How can you access PubChem programmatically?
How can you access PubChem programmatically?
 
PubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligencePubChem for drug discovery in the age of big data and artificial intelligence
PubChem for drug discovery in the age of big data and artificial intelligence
 
GtoPDB_ELIXIR_UK_AllHands_update_Dec2019
GtoPDB_ELIXIR_UK_AllHands_update_Dec2019GtoPDB_ELIXIR_UK_AllHands_update_Dec2019
GtoPDB_ELIXIR_UK_AllHands_update_Dec2019
 

Andere mochten auch

Playstation 4
Playstation 4Playstation 4
Playstation 4
ocjs
 

Andere mochten auch (6)

LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7LOD2 Webinar Series: Virtuoso 7
LOD2 Webinar Series: Virtuoso 7
 
Network pharmacology: From BioAssay Response Data to Network
Network pharmacology: From BioAssay Response Data to NetworkNetwork pharmacology: From BioAssay Response Data to Network
Network pharmacology: From BioAssay Response Data to Network
 
Chem2bio2rdf portal
Chem2bio2rdf portalChem2bio2rdf portal
Chem2bio2rdf portal
 
Playstation 4
Playstation 4Playstation 4
Playstation 4
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...Leveraging molecular and clinical data to transform drug discovery in the era...
Leveraging molecular and clinical data to transform drug discovery in the era...
 

Ähnlich wie Towards semantic systems chemical biology

Eradicating diseases (genome)
Eradicating diseases (genome)Eradicating diseases (genome)
Eradicating diseases (genome)
Utkarsh Verma
 

Ähnlich wie Towards semantic systems chemical biology (20)

Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS FoundationPistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
Pistoia Alliance European Conference 2015 - Nick Lynch / Open PHACTS Foundation
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
2015-02-10 The Open PHACTS Discovery Platform: Semantic Data Integration for ...
 
Cadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.PharmCadd and molecular modeling for M.Pharm
Cadd and molecular modeling for M.Pharm
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Pharmacophore mapping in Drug Development
Pharmacophore mapping in Drug DevelopmentPharmacophore mapping in Drug Development
Pharmacophore mapping in Drug Development
 
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDBMetabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
Metabolic pathway mapping against KEGG, Reactome, HMDB and CPDB
 
Bioinformatics
BioinformaticsBioinformatics
Bioinformatics
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
GPU-accelerated Virtual Screening
GPU-accelerated Virtual ScreeningGPU-accelerated Virtual Screening
GPU-accelerated Virtual Screening
 
Recent trends in bioinformatics
Recent trends in bioinformaticsRecent trends in bioinformatics
Recent trends in bioinformatics
 
Drug design
Drug designDrug design
Drug design
 
bioinformatic.pptx
bioinformatic.pptxbioinformatic.pptx
bioinformatic.pptx
 
Chemoinformatic File Format.pptx
Chemoinformatic File Format.pptxChemoinformatic File Format.pptx
Chemoinformatic File Format.pptx
 
presentation
presentationpresentation
presentation
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
Computer aided drug designing (CADD)
Computer aided drug designing (CADD)Computer aided drug designing (CADD)
Computer aided drug designing (CADD)
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
Sourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicologySourcing high quality online data resources for computational toxicology
Sourcing high quality online data resources for computational toxicology
 
Eradicating diseases (genome)
Eradicating diseases (genome)Eradicating diseases (genome)
Eradicating diseases (genome)
 

Kürzlich hochgeladen

Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
MateoGardella
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..Sports & Fitness Value Added Course FY..
Sports & Fitness Value Added Course FY..
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
fourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writingfourth grading exam for kindergarten in writing
fourth grading exam for kindergarten in writing
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
Gardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch LetterGardella_PRCampaignConclusion Pitch Letter
Gardella_PRCampaignConclusion Pitch Letter
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17Advanced Views - Calendar View in Odoo 17
Advanced Views - Calendar View in Odoo 17
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
SECOND SEMESTER TOPIC COVERAGE SY 2023-2024 Trends, Networks, and Critical Th...
 
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
Ecological Succession. ( ECOSYSTEM, B. Pharmacy, 1st Year, Sem-II, Environmen...
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 

Towards semantic systems chemical biology

  • 1. From Data Integration to Data mining in Semantic Web systems chemical biology as a case study Bin Chen School of Informatics and Computing Indiana University at Bloomington Lecture for S636 Nov 17, 2011
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. SPARQL RDF Ontology Algorithm and tools Applications Experimental Data Text mining Data Chem2Bio2RDF Chem2Bio2OWL Path finding; Association search; Association ranking and prediction Polypharmacology; drug side effect Architecture of Semantic Systems Chemical Biology
  • 7.
  • 8.
  • 9. Use RDF to Integrate Data http://chem2bio2rdf.org/drugbank/DB01076 name company lipitor Pfizer http://chem2bio2rdf.org/drugbank/DB01076 Molecular_Weight formula 558.6398 C33H35FN2O5 Database 1 Database 2 Same URI, merged!
  • 10. Use RDF to Link Data http://chem2bio2rdf.org/drugbank/DB01076 sameAs http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugs/DB01076 http://chem2bio2rdf.org/pubchem/resource/pubchem_compound/60823 cid
  • 11. uniprot Bio2RDF Others LODD Chem2Bio2RDF Virtuoso Triple store SPARQL ENDPOINTS Dereferenable URI Browsing PlotViz: Visualization Cytoscape Plugin Linked Path Generation and Ranking Third party tools
  • 12. Workflow for RDF conversion XML CSV DB TXT Relational DB D2R Mapping D2R server Dumping Virtuoso Triple Store Scripts Ontology Publishing External Sources Download Local copy … Chen B,et al. Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data. BMC Bioinformatics. 2010
  • 13. # Table c2b2r_DrugBankDrug map:c2b2r_DrugBankDrug a d2rq:ClassMap; d2rq:dataStorage map:database; d2rq:uriPattern &quot;drugbank_drug/@@c2b2r_DrugBankDrug.DBID|urlify@@&quot;; d2rq:class drugbank:DrugBankDrug; d2rq:classDefinitionLabel &quot;c2b2r_DrugBankDrug&quot;; . map:c2b2r_DrugBankDrug__label a d2rq:PropertyBridge; d2rq:belongsToClassMap map:c2b2r_DrugBankDrug; d2rq:property rdfs:label; d2rq:pattern &quot;@@c2b2r_DrugBankDrug.Generic_Name@@&quot;; . map:c2b2r_DrugBankDrug_DBID a d2rq:PropertyBridge; d2rq:belongsToClassMap map:c2b2r_DrugBankDrug; d2rq:property drugbank:DBID; d2rq:propertyDefinitionLabel &quot;c2b2r_DrugBankDrug DBID&quot;; d2rq:column &quot;c2b2r_DrugBankDrug.DBID&quot;; Table D2R mapping RDF Exhibit link
  • 14. Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Chem2Bio2RDF Datasets http://chem2bio2rdf.org Chem2Bio2RDF data Other data venders compound protein/gene chemogenomics literature others
  • 16. uniprot Bio2RDF Others LODD Chem2Bio2RDF Virtuoso Triple store SPARQL ENDPOINTS Dereferenable URI Browsing PlotViz: Visualization Cytoscape Plugin Linked Path Generation and Ranking Third party tools
  • 17.
  • 18. Implement cheminformatics and bioinformatics tools into SPARQL ARQ Function Extension SPARQL Chemistry Development Kits BioJAVA Web Services PREFIX drugbank: < http://www4.wiwiss.fu-berlin.de/drugbank/resource/drugbank/ > PREFIX f: <java:org.bio2chem2rdf.arq.> SELECT ?x ?s WHERE { ?x drugbank:smilesStringCanonical ?s FILTER ( f:tanimoto( 'NS(=O)(=O)C1=CC(=C(Cl)C(Cl)=C1)S(N)(=O)=O', ?s, 'MACCS') > 0.9 ) } f:tanimoto is used for compound similarity search
  • 19.
  • 20. link
  • 21.
  • 22. Node represents each database colored by its RDF vender; Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Chem2Bio2RDF Datasets http://chem2bio2rdf.org Chem2Bio2RDF data Other data venders compound protein/gene chemogenomics literature others
  • 24.
  • 25.
  • 26.
  • 27.
  • 29.
  • 30.
  • 31.
  • 32.
  • 33.  
  • 34. Two objects are similar if they are related to similar objects Coauthorship Same Target
  • 35. Two objects are related if they share same objects or their related objects are related Compound 1 Protein 2 Protein 1 Compound 1 Protein 2 Protein 1 Compound 2 Computer Science Person2 Person 1 Computer Science Person2 paper1 paper2 advisor major publish cite conference
  • 36. Cmpd1 Protein 1 Protein 2 Cmpd 2 Cmpd 1 Cmpd 2 Protein 1 Neighbor Chemogenomics Chemogenomics Chemogenomics Chemogenomics Protein 2 Cmpd1 Protein 1 Chemogenomics hasGO hasGO Protein 2 Cmpd1 Protein 1 Chemogenomics PPI GO:0001 Sample patterns Cmpd1 Protein 1 Cmpd 2 Chemogenomics hypertension Side effect Side effect Cmpd1 Protein 1 Cmpd 2 Chemogenomics Substructure substructure substructure
  • 37. Target 2 Compound1 Compound 2 Compound 3 Target 3 GO:00001 hasGO hasGO chemogenomics chemogenomics chemogenomics chemogenomics chemogenomics neighbor Side Effect 1 hasSideEffect hasSideEffect Gene Family 1 hasGeneFamily hasGeneFamily Target 1 chemogenomics Target 4 chemogenomics proteinProteinInteraction Association depends on its neighborhood
  • 38.  
  • 39. Statistical Model Convert the question to a path surfing problem Gene i Gene j PPI PPI PPI hasGO hasGO hasPathway chemogenomics P(i j) =1/3
  • 40. Protein 2 Cmpd1 (s) Protein 1 (t) e1 e2
  • 41.
  • 42. Statistical Model 3. Nodes association estimation Raw score of random pairs fit to normal distribution!
  • 43. Direct: drug target pairs with IC50<30um Indirect: drug target pairs with no interaction Random: random pairs
  • 44.  
  • 46.

Hinweis der Redaktion

  1. Learn a basic and used several software, What, how,
  2. What is scb, why we need seman, the whole architecture,
  3. Antibacterial drug, 4 parts, data, we wanna ask can this drug have side effect?
  4. Can we use semantic web, answered it by google or siri?
  5. Remove logic, to application
  6. Demo, links
  7. Show indice search
  8. To link them, 25 database; why need download (nobody else is doing) why relation database (already have, and doubt semantic web, data quality), ontology or mapping file is key
  9. Show mapping file, demo one database, SQL no difference. show exhibit: http://cheminfov.informatics.indiana.edu:8080/exhibit/drugbank.html Show triples select * where { &lt;http://chem2bio2rdf.org/drugbank/resource/drugbank_drug/DB00041&gt; ?p ?o }
  10. Here is the figure of all chem2bio2rdf datasets. Node represents each database colored by its RDF vender; red nodes are RDF data sources provided in chem2bio2rdf. Directed edge shows the linkage from one dataset to another dataset, colored by the linkage type. E.g,., the type compound includes CID, CAS, ChEBI, DBID and so on. The size of nodes and the width of edges are dependent on the # of triples and # of linkages respectively. Up to now, we already have over 110million triples
  11. Show indice search
  12. Now we can use our portal to answer a variety of questions in systems chemical biology. From the basic one like give me all info about this compound, to advanced one like linke kegg and pubchem to identify potential multiple pathway inhibitors for MAPK
  13. homogenous data, why OWL?
  14. Bottom up, basic ontology…. Bottom down… recommend BFO, why bottom up
  15. Concepts…concise as possible
  16. Different with only pizza, more like beverage, cheese… each concept like pizza. Interaction decribes all kinds of relation between objects
  17. Subclass---interaction,, utility class---chemical structure, chemical physical property
  18. Show NCBO bioportal
  19. Two author similar
  20. demo