SlideShare ist ein Scribd-Unternehmen logo
1 von 57
Semantic (Web) Technologies for Translational Research in Life Sciences Ohio State University, June 16, 2011 Amit P. Sheth Ohio Center ofExcellence in Knowledge-enabled Computing (Kno.e.sis) amit.sheth@wright.edu Thanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith); Collaborators at CTEGD UGA(Dr. Tarleton, Brent Weatherly), NLM(Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford,  CITAR/WSU
Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing
Web ofpeople    - social networks, user-createdcasualcontent Web of resources     - data, service, data, mashups Web of databases    - dynamically generated pages    - web query interfaces Web of pages    - text, manually created links    - extensive navigation Evolutionof Web & Semantic Computing Tech assimilated in life Web ofSensors, Devices/IoT - 40 billionsensors, 5 billionmobile connections 2007 Situations, Events Web 3.0 Semantic TechnologyUsed Objects Web 2.0 Patterns Keywords 1997 Web 1.0
Outline Semantic Web – very brief intro Scenarios to demonstrate the applications and benefit of semantic web technologies HealthCare BiomedicalResearch Translational
Biomedical Informatics... Biomedical Informatics Pubmed Clinical  Trials.gov ...needs a connection Hypothesis Validation Experiment design Predictions Personalized medicine Semantic Web research aims at providing this connection! Etiology  Pathogenesis Clinical findings Diagnosis Prognosis Treatment Genome Transcriptome Proteome Metabolome Physiome ...ome More advanced capabilities for  	search,  	integration,  	analysis,  	linking to new insights  	and discoveries! Genbank Uniprot Medical Informatics Bioinformatics
Decision Making, Insights, InnovationsHuman Performance Data and Facts Knowledge and Understanding Health & Performance Cognitive Science, Psychology Neuroscience Anatomy, Physiology Cellular biology Molecular Biology ACATATGGGTACTATTTACTATTCATGGGTACTATTTATGGCATATGGCGTACTATTCTAATCCTATATCCGTCTAATCTATTTACTATTATCTATTACTATACCTTTTGGGGAAAAAAATTCTATACCGTCTAATCCTATAAATCAAGCCG Biochemistry
Semantic Web standards @ W3C Semantic Web is built in a layered manner Not everybody needs all the layers … Queries: SPARQL, Rules: RIF Semantic Web Rich ontologies: OWL Simple data models & taxonomies: RDF Schema  Uniformmetamodel: RDF+ URI  Encoding structure: XML  Encoding characters : Unicode
Linked Data: Semantic Web “diluted” Achieve for data what Web did to documents Relationship with the original Semantic Web vision: no AI, no agents, no autonomy Interoperability is still very important interoperability of formats interoperability of semantics Enables interchange of large data sets (thus very useful in, say, collaborative research) Semantic Web vision is largely predicated on the availability of data Linked Data is a movement that gets us there Thanks – OraLassila
Opportunity: exploiting clinical and biomedical data text Health  Information  Services Elsevier  iConsult Scientific  Literature PubMed 300 Documents  Published Online  each day User-contributed  Content (Informal) GeneRifs WikiGene NCBI  Public Datasets Genome,  Protein DBs new sequences daily Laboratory  Data Lab tests,  RTPCR, Mass spec Clinical Data Personal  health history Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
Major Community Efforts W3C Semantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/ Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperability National Center for Biomedical Ontologies: http://bioportal.bioontology.org/
Major SW Projects OpenPHACTS: A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/ LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/ NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
Semantic Web Enablers and Techniques Ontology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge base Semantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), Automatic Semantic Computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, inferencing, reasoning, hypothesis validation, discovery, visualization
Drug Ontology Hierarchy(showing is-a relationships) owl:thing prescription_drug_ brand_name brandname_undeclared brandname_composite prescription_drug monograph_ix_class cpnum_ group prescription_drug_ property indication_ property formulary_ property non_drug_ reactant interaction_property property formulary brandname_individual interaction_with_prescription_drug interaction indication generic_ individual prescription_drug_ generic generic_ composite interaction_with_monograph_ix_class interaction_ with_non_ drug_reactant
N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4 GNT-Vattaches GlcNAc at position 6 N-acetyl-glucosaminyl_transferase_V UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=>  UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2  UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021  N-Glycosylation metabolic pathway GNT-Iattaches GlcNAc at position 2
Maturing capabilites and ongoing research Ontology Creation SemanticAnnotation & Textmining: Entity recognition, Relationship extraction SemanticIntegration & Provenance:  Integratingalltypesof data used in biomedicalresearch: text, experimetal data, curated/structured/publicandmultimedia Semantic search, browsing, analysis Clinical and Scientific Workflows with semantic web services SemanticExplorationofscientific literature, Undiscovered publicknowledge
Project 1: ASEMR Why:Improve Quality of Care and Decision Making without loss of Efficiency in active Cardiology practice.  What: Use of semantic Web technologies for clinical decision support Where: Athens Heart Center & its partners and labs Status: In usecontinuously since 01/2006
Operational since January 2006 Details: http://knoesis.org/library/resource.php?id=00004
Active Semantic EMR Annotate ICD9s Annotate Doctors Lexical Annotation Insurance Formulary Level 3 Drug Interaction Drug Allergy Demo at: http://knoesis.org/library/demos/
Project 2: Glycomics Why:To help in the treatment of certain kinds of cancer and Parkinson's Disease. What: Semantic Annotation of Experiment Data Where:Complex Carbohydrate Research Center, UGA Status: Research prototype in use Workflow with Semantic Annotation of Experimental Data already in use
N-Glycosylation Process (NGP) Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Glycopeptide identification and quantification Peptide list N-dimensional array Data correlation Signal integration
Agent  Agent  Agent  Agent  Biological Sample  Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/Sequest) Results Post-process (ProValt) O I O I O I O I O Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Biological Information Scientific workflow for proteome analysis Semantic Annotation Applications
Semantic Annotation of Experimental Data  parent ion charge 830.9570    194.9604    2     580.2985     0.3592     688.3214     0.2526     779.4759    38.4939     784.3607    21.7736    1543.7476     1.3822    1544.7595     2.9977    1562.8113    37.4790    1660.7776   476.5043 parent ion m/z parent ionabundance fragment ion m/z fragment ionabundance ms/ms peaklist data Mass Spectrometry (MS) Data
Semantic Annotation of Experimental Data  <ms-ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer”        mode=“ms-ms”/> 	<parent_ionm-z=“830.9570” abundance=“194.9604” z=“2”/> 			<fragment_ionm-z=“580.2985” abundance=“0.3592”/> 			<fragment_ionm-z=“688.3214” abundance=“0.2526”/> 			<fragment_ionm-z=“779.4759” abundance=“38.4939”/> 			<fragment_ionm-z=“784.3607” abundance=“21.7736”/> 			<fragment_ionm-z=“1543.7476” abundance=“1.3822”/> 			<fragment_ionm-z=“1544.7595” abundance=“2.9977”/> 			<fragment_ionm-z=“1562.8113” abundance=“37.4790”/> 			<fragment_ionm-z=“1660.7776” abundance=“476.5043”/> </ms-ms_peak_list> OntologicalConcepts Semantically Annotated MS Data
Project 3:  Why: To associate genotype and phenotype information for knowledge discovery What:integrated data sources to run complex queries Enriching data with ontologies for integration, querying, and automation Ontologies beyond vocabularies: the power of relationships Where: NCRR (NIH)  Status:Completed
Use data to test hypothesis Gene name GO Interactions gene Sequence PubMed OMIM Link between glycosyltransferase activity and congenital muscular dystrophy? Glycosyltransferase Congenital muscular dystrophy Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
In a Web pages world… (GeneID: 9215) has_associated_disease Congenital muscular dystrophy,type 1D has_molecular_function Acetylglucosaminyl-transferase activity Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
With the semantically enhanced data glycosyltransferase GO:0016757 isa GO:0008194 GO:0016758 acetylglucosaminyl-transferase GO:0008375 has_molecular_function acetylglucosaminyl-transferase GO:0008375 EG:9215 LARGE Muscular dystrophy, congenital, type 1D  MIM:608840 has_associated_phenotype SELECT DISTINCT ?t ?g ?d  {     ?t is_a GO:0016757 .     ?g has molecular function ?t .     ?g has_associated_phenotype ?b2 .     ?b2 has_textual_description ?d . FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”)      } From medinfo paper. Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
Project 4: Nicotine Dependence Why: For understanding the genetic basis of nicotine dependence.  What:Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources).  Where: NLM (NIH)  Status: Completed research
Motivation NIDA study on nicotine dependency List of candidate genes in humans Analysis objectives include: ,[object Object]
Identification of active genes – maximum number of pathways
Identification of genes based on anatomical locationsRequires integration of genome and biological pathway information
Genome and pathway information integration KEGG Reactome HumanCyc ,[object Object]
protein
pmidEntrez Gene ,[object Object]
protein
pmid
pathway
protein
pmidGeneOntology HomoloGene ,[object Object]
HomoloGene ID,[object Object]
Entrez Knowledge Model (EKoM) BioPAX ontology
Results: Gene Pathway network and Hub Genes involved with Nicotine Dependence
Project 5: T. cruzi SPSE  Why: For Integrative Parasite Research to help expedite knowledge discovery What: Semantics and Services Enabled Problem Solving Environment (PSE) for Trypanosomacruzi Where: Center for Tropical and Emerging Global  Diseases (CTEGD), UGA  Who: Kno.e.sis, UGA, NCBO (Stanford) Status: Research prototype – in regular lab use
Project Outline Data Sources ,[object Object],Gene Knockout Strain Creation Microarray Proteome ,[object Object],Ontological Infrastructure ,[object Object]
Parasite ExperimentQuery processing ,[object Object],Results
Provenance in Parasite Research Gene Name Sequence Extraction Gene Knockout and Strain Creation* Related Queries from Biologists List all groups in the lab that used a Target Region Plasmid? Which researcher created a new strain of the parasite (with ID = 66)? An experiment was not successful – has this experiment been conducted earlier? What were the results?  3‘ & 5’ Region Drug Resistant Plasmid Gene Name Plasmid Construction Knockout Construct Plasmid T.Cruzi sample ? Transfection Transfected Sample Drug Selection Cloned Sample Selected Sample Cell Cloning Cloned Sample *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
Research Accomplishments SPSE ,[object Object]
Developed semantic provenance framework and influence W3C community
SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets.  For example:
Show me proteins that are downregulated in the epimastigote stage and exist in a single metabolic pathway.
Give me the gene knockout summaries, both for plasmid construction and strain creation, for all gene knockout targets that are 2-fold upregulated in amastigotes at the transcript level and that have orthologs in Leishmania but not in Trypanosomabrucei.,[object Object],[object Object]
 Focused KB Work Flow  (Use case: HPCO) HPC keywords Doozer: Base Hierarchy from Wikipedia Focused Pattern based extraction SenseLab Neuroscience Ontologies Initial KB Creation Meta Knowledgebase PubMed Abstracts Knoesis: Parsing based NLP Triples   Enrich Knowledge Base NLM: Rule based BKR Triples Final Knowledge Base
 Triple Extraction Approaches Open Extraction  No fixed number of predetermined entities and predicates At  Knoesis – NLP (parsing and dependency trees) Supervised Extraction Predetermined set of entities and predicates At  Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniques At NLM – NLP and rule based approaches
Mapping Triples to Base Hierarchy Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB Preliminary synonyms based on anchor labels and page redirects in Wikipedia Prolactostatin redirects to Dopamine Predicates  (verbs) and entities are subjected to stemming using Wordnet
Scooner:  Full Architecture
Scooner Features Knowledge-based browsing: Relations window, inverse relations, creating trails Persistent projects: Work bench, browsing history, comments, filtering Collaboration: comments, dashboard, exporting (sub)projects, importing projects
Scooner Screenshot

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Michel Dumontier
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013
Andrea de Souza
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011
cmzmasek
 

Was ist angesagt? (20)

FAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use CaseFAIR Agronomy, where are we? The KnetMiner Use Case
FAIR Agronomy, where are we? The KnetMiner Use Case
 
Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3Ontomaton icbo2013-alternative order-t_wv3
Ontomaton icbo2013-alternative order-t_wv3
 
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
Powering Scientific Discovery with the Semantic Web (VanBUG 2014)
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
2016 ACS Semantic Approaches for Biochemical Knowledge Discovery
 
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...
 
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object FrameworksResults Vary: The Pragmatics of Reproducibility and Research Object Frameworks
Results Vary: The Pragmatics of Reproducibility and Research Object Frameworks
 
Enriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentationEnriching Scholarship Personal Genomics presentation
Enriching Scholarship Personal Genomics presentation
 
High-performance web services for gene and variant annotations
High-performance web services for gene and variant annotationsHigh-performance web services for gene and variant annotations
High-performance web services for gene and variant annotations
 
RARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research ObjectsRARE and FAIR Science: Reproducibility and Research Objects
RARE and FAIR Science: Reproducibility and Research Objects
 
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
Semantic approaches for biomedical knowledge discovery - Discovery Science 20...
 
Generating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web TechnologiesGenerating Biomedical Hypotheses Using Semantic Web Technologies
Generating Biomedical Hypotheses Using Semantic Web Technologies
 
2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG2011-11-28 Open PHACTS at RSC CICAG
2011-11-28 Open PHACTS at RSC CICAG
 
BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013BioAssay Research Database Presentation at the Chem Axon UGM 2013
BioAssay Research Database Presentation at the Chem Axon UGM 2013
 
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
2015-04-28 Open PHACTS at Swedish Linked Data Network Meet-up
 
2015 balti-and-bioinformatics
2015 balti-and-bioinformatics2015 balti-and-bioinformatics
2015 balti-and-bioinformatics
 
A biologist in e-Science
A biologist in e-ScienceA biologist in e-Science
A biologist in e-Science
 
2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe2011-10-11 Open PHACTS at BioIT World Europe
2011-10-11 Open PHACTS at BioIT World Europe
 
Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011Zmasek TOPSAN Biohackathon 2011
Zmasek TOPSAN Biohackathon 2011
 
dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019dkNET Poster Experimental Biology 2019
dkNET Poster Experimental Biology 2019
 

Ähnlich wie Semantic (Web) Technologies for Translational Research in Life Sciences

2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
c.titus.brown
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
US Environmental Protection Agency (EPA), Center for Computational Toxicology and Exposure
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
webuploader
 

Ähnlich wie Semantic (Web) Technologies for Translational Research in Life Sciences (20)

Semantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical InformaticsSemantic Web for Health Care and Biomedical Informatics
Semantic Web for Health Care and Biomedical Informatics
 
D1803012022
D1803012022D1803012022
D1803012022
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Data analysis & integration challenges in genomics
Data analysis & integration challenges in genomicsData analysis & integration challenges in genomics
Data analysis & integration challenges in genomics
 
Practical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS projectPractical semantics in the pharmaceutical industry - the Open PHACTS project
Practical semantics in the pharmaceutical industry - the Open PHACTS project
 
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
Inauguration Function - Ohio Center of Excellence in Knowledge-Enabled Comput...
 
Investigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysisInvestigating plant systems using data integration and network analysis
Investigating plant systems using data integration and network analysis
 
Dynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical CommunicationsDynamic Semantic Metadata in Biomedical Communications
Dynamic Semantic Metadata in Biomedical Communications
 
DCC Keynote 2007
DCC Keynote 2007DCC Keynote 2007
DCC Keynote 2007
 
The Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture DataThe Roots: Linked data and the foundations of successful Agriculture Data
The Roots: Linked data and the foundations of successful Agriculture Data
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...WikiPathways: how open source and open data can make omics technology more us...
WikiPathways: how open source and open data can make omics technology more us...
 
Bioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future PerspectivesBioinformatics databases: Current Trends and Future Perspectives
Bioinformatics databases: Current Trends and Future Perspectives
 
Online Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery SystemsOnline Resources to Support Open Drug Discovery Systems
Online Resources to Support Open Drug Discovery Systems
 
is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.is there life between standards? Data interoperability for AI.
is there life between standards? Data interoperability for AI.
 
'A PAL's Life' for OMII-UK Board, May 2008
'A PAL's Life' for OMII-UK Board, May 2008'A PAL's Life' for OMII-UK Board, May 2008
'A PAL's Life' for OMII-UK Board, May 2008
 
Thesis def
Thesis defThesis def
Thesis def
 
Improving online chemistry one structure at a time
Improving online chemistry one structure at a timeImproving online chemistry one structure at a time
Improving online chemistry one structure at a time
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Bioinformatics data mining
Bioinformatics data miningBioinformatics data mining
Bioinformatics data mining
 

Kürzlich hochgeladen

The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
heathfieldcps1
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Kürzlich hochgeladen (20)

Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Measures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SDMeasures of Dispersion and Variability: Range, QD, AD and SD
Measures of Dispersion and Variability: Range, QD, AD and SD
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 

Semantic (Web) Technologies for Translational Research in Life Sciences

  • 1. Semantic (Web) Technologies for Translational Research in Life Sciences Ohio State University, June 16, 2011 Amit P. Sheth Ohio Center ofExcellence in Knowledge-enabled Computing (Kno.e.sis) amit.sheth@wright.edu Thanks to Kno.e.sis team (Satya, Priti, Rama, and Ajith); Collaborators at CTEGD UGA(Dr. Tarleton, Brent Weatherly), NLM(Olivier Bodenreider), CCRC, UGA (Will York), NCBO/Stanford, CITAR/WSU
  • 2. Kno.e.sis: Ohio Center of Excellence in Knowledge-enabled Computing
  • 3. Web ofpeople - social networks, user-createdcasualcontent Web of resources - data, service, data, mashups Web of databases - dynamically generated pages - web query interfaces Web of pages - text, manually created links - extensive navigation Evolutionof Web & Semantic Computing Tech assimilated in life Web ofSensors, Devices/IoT - 40 billionsensors, 5 billionmobile connections 2007 Situations, Events Web 3.0 Semantic TechnologyUsed Objects Web 2.0 Patterns Keywords 1997 Web 1.0
  • 4. Outline Semantic Web – very brief intro Scenarios to demonstrate the applications and benefit of semantic web technologies HealthCare BiomedicalResearch Translational
  • 5. Biomedical Informatics... Biomedical Informatics Pubmed Clinical Trials.gov ...needs a connection Hypothesis Validation Experiment design Predictions Personalized medicine Semantic Web research aims at providing this connection! Etiology Pathogenesis Clinical findings Diagnosis Prognosis Treatment Genome Transcriptome Proteome Metabolome Physiome ...ome More advanced capabilities for search, integration, analysis, linking to new insights and discoveries! Genbank Uniprot Medical Informatics Bioinformatics
  • 6. Decision Making, Insights, InnovationsHuman Performance Data and Facts Knowledge and Understanding Health & Performance Cognitive Science, Psychology Neuroscience Anatomy, Physiology Cellular biology Molecular Biology ACATATGGGTACTATTTACTATTCATGGGTACTATTTATGGCATATGGCGTACTATTCTAATCCTATATCCGTCTAATCTATTTACTATTATCTATTACTATACCTTTTGGGGAAAAAAATTCTATACCGTCTAATCCTATAAATCAAGCCG Biochemistry
  • 7. Semantic Web standards @ W3C Semantic Web is built in a layered manner Not everybody needs all the layers … Queries: SPARQL, Rules: RIF Semantic Web Rich ontologies: OWL Simple data models & taxonomies: RDF Schema Uniformmetamodel: RDF+ URI Encoding structure: XML Encoding characters : Unicode
  • 8. Linked Data: Semantic Web “diluted” Achieve for data what Web did to documents Relationship with the original Semantic Web vision: no AI, no agents, no autonomy Interoperability is still very important interoperability of formats interoperability of semantics Enables interchange of large data sets (thus very useful in, say, collaborative research) Semantic Web vision is largely predicated on the availability of data Linked Data is a movement that gets us there Thanks – OraLassila
  • 9. Opportunity: exploiting clinical and biomedical data text Health Information Services Elsevier iConsult Scientific Literature PubMed 300 Documents Published Online each day User-contributed Content (Informal) GeneRifs WikiGene NCBI Public Datasets Genome, Protein DBs new sequences daily Laboratory Data Lab tests, RTPCR, Mass spec Clinical Data Personal health history Search, browsing, complex query, integration, workflow, analysis, hypothesis validation, decision support.
  • 10. Major Community Efforts W3C Semantic Web Health Care & Life Sciences Interest Group: http://www.w3.org/2001/sw/hcls/ Clinical Observations Interoperability: EMR + Clinical Trials: http://esw.w3.org/HCLS/ClinicalObservationsInteroperability National Center for Biomedical Ontologies: http://bioportal.bioontology.org/
  • 11. Major SW Projects OpenPHACTS: A knowledge management project of the Innovative Medicines Initiative (IMI), a unique partnership between the European Community and the European Federation of Pharmaceutical Industries and Associations (EFPIA). http://www.openphacts.org/ LarKC: develop the Large Knowledge Collider, a platform for massive distributed incomplete reasoning that will remove the scalability barriers of currently existing reasoning systems for the Semantic Web. http://www.larkc.eu/ NCBO: contribute to collaborative science and translational research. http://bioportal.bioontology.org/
  • 12. Semantic Web Enablers and Techniques Ontology: Agreement with Common Vocabulary & Domain Knowledge; Schema + Knowledge base Semantic Annotation (meatadata Extraction): Manual, Semi-automatic (automatic with human verification), Automatic Semantic Computation: semantics enabled search, integration, complex queries, analysis (paths, subgraph), pattern finding, mining, inferencing, reasoning, hypothesis validation, discovery, visualization
  • 13. Drug Ontology Hierarchy(showing is-a relationships) owl:thing prescription_drug_ brand_name brandname_undeclared brandname_composite prescription_drug monograph_ix_class cpnum_ group prescription_drug_ property indication_ property formulary_ property non_drug_ reactant interaction_property property formulary brandname_individual interaction_with_prescription_drug interaction indication generic_ individual prescription_drug_ generic generic_ composite interaction_with_monograph_ix_class interaction_ with_non_ drug_reactant
  • 14. N-glycan_beta_GlcNAc_9 N-glycan_alpha_man_4 GNT-Vattaches GlcNAc at position 6 N-acetyl-glucosaminyl_transferase_V UDP-N-acetyl-D-glucosamine + alpha-D-Mannosyl-1,3-(R1)-beta-D-mannosyl-R2 <=> UDP + N-Acetyl-$beta-D-glucosaminyl-1,2-alpha-D-mannosyl-1,3-(R1)-beta-D-mannosyl-$R2 UDP-N-acetyl-D-glucosamine + G00020 <=> UDP + G00021 N-Glycosylation metabolic pathway GNT-Iattaches GlcNAc at position 2
  • 15. Maturing capabilites and ongoing research Ontology Creation SemanticAnnotation & Textmining: Entity recognition, Relationship extraction SemanticIntegration & Provenance: Integratingalltypesof data used in biomedicalresearch: text, experimetal data, curated/structured/publicandmultimedia Semantic search, browsing, analysis Clinical and Scientific Workflows with semantic web services SemanticExplorationofscientific literature, Undiscovered publicknowledge
  • 16. Project 1: ASEMR Why:Improve Quality of Care and Decision Making without loss of Efficiency in active Cardiology practice. What: Use of semantic Web technologies for clinical decision support Where: Athens Heart Center & its partners and labs Status: In usecontinuously since 01/2006
  • 17. Operational since January 2006 Details: http://knoesis.org/library/resource.php?id=00004
  • 18. Active Semantic EMR Annotate ICD9s Annotate Doctors Lexical Annotation Insurance Formulary Level 3 Drug Interaction Drug Allergy Demo at: http://knoesis.org/library/demos/
  • 19. Project 2: Glycomics Why:To help in the treatment of certain kinds of cancer and Parkinson's Disease. What: Semantic Annotation of Experiment Data Where:Complex Carbohydrate Research Center, UGA Status: Research prototype in use Workflow with Semantic Annotation of Experimental Data already in use
  • 20. N-Glycosylation Process (NGP) Cell Culture extract Glycoprotein Fraction proteolysis Glycopeptides Fraction 1 Separation technique I n Glycopeptides Fraction PNGase n Peptide Fraction Separation technique II n*m Peptide Fraction Mass spectrometry ms data ms/ms data Data reduction Data reduction ms peaklist ms/ms peaklist binning Peptide identification Glycopeptide identification and quantification Peptide list N-dimensional array Data correlation Signal integration
  • 21. Agent Agent Agent Agent Biological Sample Analysis by MS/MS Raw Data to Standard Format Data Pre- process DB Search (Mascot/Sequest) Results Post-process (ProValt) O I O I O I O I O Storage Standard Format Data Raw Data Filtered Data Search Results Final Output Biological Information Scientific workflow for proteome analysis Semantic Annotation Applications
  • 22. Semantic Annotation of Experimental Data parent ion charge 830.9570 194.9604 2 580.2985 0.3592 688.3214 0.2526 779.4759 38.4939 784.3607 21.7736 1543.7476 1.3822 1544.7595 2.9977 1562.8113 37.4790 1660.7776 476.5043 parent ion m/z parent ionabundance fragment ion m/z fragment ionabundance ms/ms peaklist data Mass Spectrometry (MS) Data
  • 23. Semantic Annotation of Experimental Data <ms-ms_peak_list> <parameter instrument=“micromass_QTOF_2_quadropole_time_of_flight_mass_spectrometer” mode=“ms-ms”/> <parent_ionm-z=“830.9570” abundance=“194.9604” z=“2”/> <fragment_ionm-z=“580.2985” abundance=“0.3592”/> <fragment_ionm-z=“688.3214” abundance=“0.2526”/> <fragment_ionm-z=“779.4759” abundance=“38.4939”/> <fragment_ionm-z=“784.3607” abundance=“21.7736”/> <fragment_ionm-z=“1543.7476” abundance=“1.3822”/> <fragment_ionm-z=“1544.7595” abundance=“2.9977”/> <fragment_ionm-z=“1562.8113” abundance=“37.4790”/> <fragment_ionm-z=“1660.7776” abundance=“476.5043”/> </ms-ms_peak_list> OntologicalConcepts Semantically Annotated MS Data
  • 24. Project 3: Why: To associate genotype and phenotype information for knowledge discovery What:integrated data sources to run complex queries Enriching data with ontologies for integration, querying, and automation Ontologies beyond vocabularies: the power of relationships Where: NCRR (NIH) Status:Completed
  • 25. Use data to test hypothesis Gene name GO Interactions gene Sequence PubMed OMIM Link between glycosyltransferase activity and congenital muscular dystrophy? Glycosyltransferase Congenital muscular dystrophy Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • 26. In a Web pages world… (GeneID: 9215) has_associated_disease Congenital muscular dystrophy,type 1D has_molecular_function Acetylglucosaminyl-transferase activity Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • 27. With the semantically enhanced data glycosyltransferase GO:0016757 isa GO:0008194 GO:0016758 acetylglucosaminyl-transferase GO:0008375 has_molecular_function acetylglucosaminyl-transferase GO:0008375 EG:9215 LARGE Muscular dystrophy, congenital, type 1D MIM:608840 has_associated_phenotype SELECT DISTINCT ?t ?g ?d { ?t is_a GO:0016757 . ?g has molecular function ?t . ?g has_associated_phenotype ?b2 . ?b2 has_textual_description ?d . FILTER (?d, “muscular distrophy”, “i”) . FILTER (?d, “congenital”, “i”) } From medinfo paper. Adapted from: Olivier Bodenreider, presentation at HCLS Workshop, WWW07
  • 28. Project 4: Nicotine Dependence Why: For understanding the genetic basis of nicotine dependence. What:Integrate gene and pathway information and show how three complex biological queries can be answered by the integrated knowledge base. How: Semantic Web technologies (especially RDF, OWL, and SPARQL) support information integration and make it easy to create semantic mashups (semantically integrated resources). Where: NLM (NIH) Status: Completed research
  • 29.
  • 30. Identification of active genes – maximum number of pathways
  • 31. Identification of genes based on anatomical locationsRequires integration of genome and biological pathway information
  • 32.
  • 34.
  • 36. pmid
  • 39.
  • 40.
  • 41. Entrez Knowledge Model (EKoM) BioPAX ontology
  • 42. Results: Gene Pathway network and Hub Genes involved with Nicotine Dependence
  • 43. Project 5: T. cruzi SPSE Why: For Integrative Parasite Research to help expedite knowledge discovery What: Semantics and Services Enabled Problem Solving Environment (PSE) for Trypanosomacruzi Where: Center for Tropical and Emerging Global Diseases (CTEGD), UGA Who: Kno.e.sis, UGA, NCBO (Stanford) Status: Research prototype – in regular lab use
  • 44.
  • 45.
  • 46. Provenance in Parasite Research Gene Name Sequence Extraction Gene Knockout and Strain Creation* Related Queries from Biologists List all groups in the lab that used a Target Region Plasmid? Which researcher created a new strain of the parasite (with ID = 66)? An experiment was not successful – has this experiment been conducted earlier? What were the results? 3‘ & 5’ Region Drug Resistant Plasmid Gene Name Plasmid Construction Knockout Construct Plasmid T.Cruzi sample ? Transfection Transfected Sample Drug Selection Cloned Sample Selected Sample Cell Cloning Cloned Sample *T.cruzi Semantic Problem Solving Environment Project, Courtesy of D.B. Weatherly and Flora Logan, Tarleton Lab, University of Georgia
  • 47.
  • 48. Developed semantic provenance framework and influence W3C community
  • 49. SPSE supports complex biological queries that help find gene knockout, drug and/or vaccination targets. For example:
  • 50. Show me proteins that are downregulated in the epimastigote stage and exist in a single metabolic pathway.
  • 51.
  • 52. Focused KB Work Flow (Use case: HPCO) HPC keywords Doozer: Base Hierarchy from Wikipedia Focused Pattern based extraction SenseLab Neuroscience Ontologies Initial KB Creation Meta Knowledgebase PubMed Abstracts Knoesis: Parsing based NLP Triples Enrich Knowledge Base NLM: Rule based BKR Triples Final Knowledge Base
  • 53. Triple Extraction Approaches Open Extraction No fixed number of predetermined entities and predicates At Knoesis – NLP (parsing and dependency trees) Supervised Extraction Predetermined set of entities and predicates At Knoesis – Pattern based extraction to connect entities in the base hierarchy using statistical techniques At NLM – NLP and rule based approaches
  • 54. Mapping Triples to Base Hierarchy Entities in both subject and object must contain at least one concept from the hierarchy to be mapped to the KB Preliminary synonyms based on anchor labels and page redirects in Wikipedia Prolactostatin redirects to Dopamine Predicates (verbs) and entities are subjected to stemming using Wordnet
  • 55. Scooner: Full Architecture
  • 56. Scooner Features Knowledge-based browsing: Relations window, inverse relations, creating trails Persistent projects: Work bench, browsing history, comments, filtering Collaboration: comments, dashboard, exporting (sub)projects, importing projects
  • 58. New Knowledge/hypothesis Example Three triples from different abstracts VIP Peptide – increases – Catecholamine Biosynthesis Catecholamines – induce – β-adrenergic receptor activity β-adrenergic receptors – are involved – fear conditioning New implicit knowledge VIP Peptide – affects – fear conditioning Caveat: Each triple above was observed in a different organism (cows, mice, humans), but still interesting hypothesis. Scooner’s contextual browsing makes this clear to the user.
  • 59. Project 7: Drug Abuse Why: To study social trends in pharmaceutical opioid abuse What: Describe drug user’s knowledge, attitudes, and behaviors related to illicit use of OxyContin® Describe temporal patterns of non-medical use of OxyContin® tablets as discussed on Web-based forums Where: CITAR (Center for Interventions, Treatment and Addictions Research) at Wright State Univ. Status: In-progress (Recently funded from NIDA)
  • 60.
  • 61. Project 8: NMR Why: Streamline the NMR data processing tasks. Processing NMR experimental data is complex and time consuming. What: Providing biologists with tools to effectively process and manage Nuclear Magnetic Resonance (NMR) experimental data. How: Use Domain Specific Languages (DSL) to create scientist-friendly abstractions for complex statistical workflows. Use semantics based techniques to store and manage data. Where: Air Force Research Lab Status: In progress
  • 62.
  • 63. A complex NMR spectrum, marked with chemical compound identifiers by human observers.
  • 64.
  • 65. Use a DSL to provide abstractions for the operators (named SCALE)
  • 66.
  • 67. Future Interoperability Challenge:360 degree health Insurance, Financial Aspects Clinical Care Follow up, Lifestyle Genetic Tests… Profiles Clinical Trials Social Media
  • 68. For each component in 360-degree health care, we have data, processes, knowledge and experience. Interoperability solutions need to encompass all these! Possibly largest growth in data will be in sensors (eg Body Area Networks, Biosensors) and social content. Extensive use of mobile phones. Credit: ece.virginia.edu
  • 69. Summary Semantic Web is an “interoperability technology” Semantic Web provides the needed interoperability, and can accommodate all necessary “points of view” Linked Data as a way of sharing data is highly promising Many examples of viable usage of Semantic Web technologies Words of warning about deployment Significant research challenges remain as Health presents the most complex domain
  • 70. Representative References A. Sheth, S. Agrawal, J. Lathem, N. Oldham, H. Wingate, P. Yadav, and K. Gallagher, Active Semantic Electronic Medical Record, Intl Semantic Web Conference, 2006. SatyaSahoo, Olivier Bodenreider, Kelly Zeng, and AmitSheth, An Experiment in Integrating Large Biomedical Knowledge Resources with RDF: Application to Associating Genotype and Phenotype InformationWWW2007 HCLS Workshop, May 2007. Satya S. Sahoo, Kelly Zeng, Olivier Bodenreider, and AmitSheth, From "Glycosyltransferase to Congenital Muscular Dystrophy: Integrating Knowledge from NCBI Entrez Gene and the Gene Ontology, Amsterdam: IOS, August 2007, PMID: 17911917, pp. 1260-4 Satya S. Sahoo, Olivier Bodenreider, Joni L. Rutter, Karen J. Skinner , Amit P. Sheth, An ontology-driven semantic mash-up of gene and biological pathway information: Application to the domain of nicotine dependence, Journal of Biomedical Informatics, 2008. CarticRamakrishnan, Krzysztof J. Kochut, and AmitSheth, "A Framework for Schema-Driven Relationship Discovery from Unstructured Text", Intl Semantic Web Conference, 2006, pp. 583-596 Satya S. Sahoo, Christopher Thomas, AmitSheth, William S. York, and SamirTartir, "Knowledge Modeling and Its Application in Life Sciences: A Tale of Two Ontologies", 15th International World Wide Web Conference (WWW2006), Edinburgh, Scotland, May 23-26, 2006. Satya S. Sahoo, Olivier Bodenreider, Pascal Hitzler, Amit Sheth and KrishnaprasadThirunarayan, 'Provenance Context Entity (PaCE): Scalable provenance tracking for scientific RDF data.’ SSDBM, Heidelberg, Germany 2010. Papers: http://knoesis.org/library Demos at: http://knoesis.wright.edu/library/demos/

Hinweis der Redaktion

  1. Cognitive model, cognitive behavioral model
  2. In parasite research, create new strains of a parasite by knocking out specific genes. So, given a cloned sample, we may need to know the gene(s) that was knocked out.Both these scenarios are real world examples of the importance of provenance. There are many research issues in provenance management. This presentation is on addressing 1) the provenance modeling issue. Specifically, provenance interoperability, consistent modeling, and reduction of terminological heterogeneity. (2) Provenance Query
  3. References: http://www.armman.org/projecthero http://www.armman.org/mmitra