SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Neurowiki
          Vulcan Inc.
           in conjunction with

Allen Institute for Brain Science
Today we are Discussing…
•  What is the use case and who requested it?
•  How does this apply to a Semantic Media Wiki?
•  How do you import and normalize thousands of
   triples worth of gene RDF triples?
•  Creating instance pages without knowing exactly
   what will be displayed on them.
•  Embedding SPARQL within SMW Templates.
•  Expanding existing instance pages with Semantic
   Results Formatters, a PHP view layer, and creating
   dynamic charts.
What is the Allen Institute?
•  Launched in 2003 with seed funding from founder and
   philanthropist Paul G. Allen.
•  Serving the scientific community is at the center of our mission
   to accelerate progress toward understanding the brain and
   neurological systems.

•  The Allen Institute's multidisciplinary staff includes
   neuroscientists, molecular biologists, informaticists, and
   engineers.

    “The Allen Institute for Brain Science is an
 independent 501(c)(3) nonprofit medical research
     organization dedicated to accelerating the
  understanding of how the human brain works.”
Human Brain
         Map
•    Open, public online access
•    A detailed, interactive three-
     dimensional anatomic atlas of
     the "normal" human brain
•    Data from multiple human brains
•    Genomic analysis of every brain
     structure, providing a quantitative
     inventory of which genes are
     turned on where
•    High-resolution atlases of key
     brain structures, pinpointing
     where selected genes are
     expressed down to the cellular
     level
•    Navigation and analysis tools for
     accessing and mining the data
What is Neurowiki?

•  A joint project between Vulcan Inc. and the Allen
   Institute to build a Semantic Wiki mapping genetic
   instances.
•  A finished prototype testing the import pipelines and
   display components for combining 5 major RDF
   datasets from 4 different sources.
•  Current planning includes mapping complete
   datasets, curating a better ontology, creating multiple
   ontology management for a user class, and
   importing scientific papers.
Biological Linked
    Data Map
•    Open, public online access
•    Data from multiple RDF data
     stores
•    Complete import pipeline
     using LDIF framework
•    Outlines of each imported
     instance embedding inline wiki
     properties and providing views
     of imported properties from
     original RDF datasets
•    Charting tools that ‘pivot’
     SPARQL queries providing
     several views of each query
•    Navigation and composition
     tools for accessing and mining
     the data
Where did we get the data?
•    KEGG : Kyoto Encyclopedia of Genes and Genomes
     •  “KEGG GENES is a collection of gene catalogs for all complete genomes
        generated from publicly available resources, mostly NCBI RefSeq.”

•    Diseasome
     •  “The Diseasome website is a disease/disorder relationships explorer and a
        sample of an innovative map-oriented scientific work. Built by a team of
        researchers and engineers, it uses the Human Disease Network dataset.”

•    DrugBank
     •  “The DrugBank database is a unique bioinformatics and cheminformatics
        resource that combines detailed drug data with comprehensive drug target
        information.”

•    SIDER
     •  “SIDER contains information on marketed medicines and their recorded adverse
        drug reactions. The information is extracted from public documents and package
        inserts.”
Wiki Ontology Map
•  Genes
     •    DrugBank : 4,553
     •    Diseasome : 3,919
     •    KEGG : 9,841

•  Diseases
     •    Diseasome : 4,213
     •    KEGG : 459

•  Drugs
     •    DrugBank : 4,772
     •    KEGG : 2,482
     •    SIDER : 924

•  Effects
     •    SIDER : 1,737

•  Pathways
     •    KEGG : 28,442
                              We chose to intentionally simplify the ontology
                              due to disagreements between researchers about
 61,342 Instances Available          entity relationships and subclasses.
         for Import
Importing and                                  Networked                   Local

Mapping the Linked
                                                Storage                   Storage




Data                                                        Download

•    R2R
     •     32,900 instances were converted                   R2R         Maps Entities to Wiki
           to the wiki ontology.                            Mapping
                                                                                    Ontology
                                                            Engine
     •     583,746 properties mapped
     •     Pathways were ignored for wiki
           ontology import, but are                         Import to
                                                              Wiki
           available within the triple store
           KEGG Pathway graph.

•  SILK                                                      SILK         Normalizes Entities
                                                            Mapping
                                                                         across data sources
     •     20,849 instances available in                    Engine

           wiki ontology after SILK
           normalization
                                                           Normalize
     •     Instance merging effected drugs,                 Entities
           genes, and diseases across
           datasets.

•  OntoBroker Triplestore                                  OntoBroker
                                                                              Available with
                                                           Triplestore      SPARQL Queries
Create List of Page


Creating Instance
                                       Names


                                             RDF Data          1.0

Wiki Pages
                                              Download
The triplestore now contained tens
of thousands of recognized
category instances. Creating the              Sanitize
                                               Script
                                                               2.0
pages would require a bot.
1.  Fetch the RDF dumps from an                                       Category
                                             Create CSV
    active D2R server                                                Page Names


2.  Use regex to fetch the rdf:label
    property that was mapped by             Text of Wiki
                                           markup for page           Read Open
                                              instance
    R2R as an instance name
                                                              3.0
3.  Open category specific text file   Create MediaWiki Page
    of wiki markup (page of                  MediaWiki
    template includes)                       Gateway rb
                                             Framework

4.  Contact Neurowiki and
    request a new page from the                 REST
                                                                      Neurowiki
                                                                      Instance
                                              interface
    list of names with the category                          4.0
                                                                        Page


    content
Four Initial Templates
for Each Instance by
Category
1.  Custom infobox within outline
    template
     •    Visible inline properties

2.  Outline template providing
    instance information
3.  Widget template displaying
    dynamic charts or third party
    services
     •    Donut charts and disease Twitter
          feed

4.  Broad table SPARQL queries
    showing instance relationships
5.  Hidden inline properties for
    other extensions
Adding a True View                   Format SPARQL Result         Execute Template
                                                                  Population


Layer to Semantic                          MediaWiki
                                           Template
                                                                     Strategy Pattern
                                                                       for Template
                                                                         Variables
                                                                                         2.0

Results Formatter
1.  Standard Semantic Results               Embed                         Create
                                           Formatter                     Variables
    Formatter.
2.  Classes made extending the
    stock formatter and                   OntoBroker
                                          Triplestore
                                                                         Smarty
                                                                        Processor
                                                                                        3.0
    providing a library of
    strategy classes for rendering
    different graph types.                Send Result
                                                                          Embed
                                                                         Variables
3.  Assembled template
    variables are given to the
    Smarty PHP templating                  Semantic
                                            Results         1.0        Completed
    engine with a path to the              Formatter                    Template
                                                                      Added to Wiki
                                                                                        4.0

    template file.                                                        page


4.  Template is populated and            Format Results
    inserted / injected into the
    complete MediaWiki page.
Embedding SPARQL
  Semantic Results Formatters
•  Every piece of content on every instance page is
   generated by Semantic Result Formatters interpreting
   SPARQL results.
•  Most inline properties are embedded in templates
   returned by SPARQL formatters.
•  All 3 dynamic graph types are interpreting results of
   SPARQL queries and injecting a JavaScript template into
   the head of the page.
•  The outline template takes selected predicates and objects
   from a SPARQL query, defined in the query embedding,
   and generates an HTML template for the page.
SELECT DISTINCT ?p ?o ?p1 ?o1 {
    GRAPH ?G {
       ?gene rdfs:label "{{{1}}}" ;
       diseasome:bio2rdfSymbol ?sym ;
       ?p ?o ;
       .
    }
OPTIONAL {
    GRAPH ?G1 {
       ?gene2 drugbank:bio2rdfSymbol ?sym ;
       ?p1 ?o1 ;
       .
}
SELECT DISTINCT ?group1 ?item1 ?group2 ?item2 {
     GRAPH ?G {
           ?target drugbank:geneName "{{{1}}}" ;
            drugbank:geneName ?geneName ;
            .
            ?drug drugbank:target ?target ;
            drugbank:genericName ?item2 ;
            drugbank:affectedOrganism ?group2 ;
            .
      }
     GRAPH ?G1 {
           ?siderDrug sider:drugName ?item2 ;
           rdfs:label ?group1 ;
           sider:sideEffect ?effect;
           .
           ?effect rdfs:label ?item1 .
     }
}
Final Application Stack

   Frameworks       Extensions          View Libraries

•  MediaWiki    •  Data Import        •  Jquery

•  LDIF         •  Semantic Results   •  JqueryUI
                   Format
•  Semantic                           •  Smarty
   MediaWiki    •  SMWHalo
                                      •  Highcharts
•  SMW+         •  Enhanced
                   Retreival

                •  WikiTags
Bugs, Bottlenecks, and
        Unexpected Features
•  In original data rdf:label was not normalized
  •  Breast_cancer_302400

•  Ignored labels that would break MediaWiki URL
   schemes
  •  4’-acetate(30px-[4l]’”)-di_tetraoxil’’(40-Cl2)

•  Certain drug compounds are related to almost everything
  •  Calcium
  •  Vitamin A

•  Page create bot had to re-edit pages upon completion to
   register inline properties with SMW & SMW+
   extensions.
Neurowiki in Action!

•  Which drugs are used in Chemotherapy?

•  What are the dangers of Propofol?

•  How are base entities like Calcium represented?

•  How are new inline properties added to entities?
  •  Can these be searched?
  •  Can these be queried using ASK?

•  Do existing extensions work with the framework?
Demo Links
•    http://neurowiki.alleninstitute.org/index.php/Main_Page

•    http://neurowiki.alleninstitute.org/index.php/AR

•    http://neurowiki.alleninstitute.org/index.php/ABL1

•    http://neurowiki.alleninstitute.org/index.php/AASS

•    http://neurowiki.alleninstitute.org/index.php/AAC2

•    http://neurowiki.alleninstitute.org/index.php/Thioridazine

•    http://neurowiki.alleninstitute.org/index.php/Risperidone

•    http://neurowiki.alleninstitute.org/index.php/Propofol

•    http://neurowiki.alleninstitute.org/index.php/Calcium

Weitere ähnliche Inhalte

Andere mochten auch (8)

Evaluation
EvaluationEvaluation
Evaluation
 
cas
cascas
cas
 
2014 Holiday Idea Gallery
2014 Holiday Idea Gallery2014 Holiday Idea Gallery
2014 Holiday Idea Gallery
 
finalresumeonline
finalresumeonlinefinalresumeonline
finalresumeonline
 
india 2016
india 2016india 2016
india 2016
 
¿De que estamos formados los seres vivos?
¿De que estamos formados los seres vivos?¿De que estamos formados los seres vivos?
¿De que estamos formados los seres vivos?
 
Civilización china e india por Nataly Gallegos
Civilización china e india por Nataly GallegosCivilización china e india por Nataly Gallegos
Civilización china e india por Nataly Gallegos
 
Biotecnologia en la agricultura
Biotecnologia en la agriculturaBiotecnologia en la agricultura
Biotecnologia en la agricultura
 

Ähnlich wie Allen Institute Neurowiki Presentation

Semantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - posterSemantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - poster
Fabrizio Orlandi
 
Brohee_wiki_BOSC2009
Brohee_wiki_BOSC2009Brohee_wiki_BOSC2009
Brohee_wiki_BOSC2009
bosc
 

Ähnlich wie Allen Institute Neurowiki Presentation (20)

SMWCon 2012 Linked Data Visualizations
SMWCon 2012 Linked Data VisualizationsSMWCon 2012 Linked Data Visualizations
SMWCon 2012 Linked Data Visualizations
 
Applied semantic technology and linked data
Applied semantic technology and linked dataApplied semantic technology and linked data
Applied semantic technology and linked data
 
Semantic Web use cases in outcomes research
Semantic Web use cases in outcomes researchSemantic Web use cases in outcomes research
Semantic Web use cases in outcomes research
 
Semantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - posterSemantic Search on Heterogeneous Wiki Systems - poster
Semantic Search on Heterogeneous Wiki Systems - poster
 
Enabling cross-wikis integration by extending the SIOC ontology
Enabling cross-wikis integration by extending the SIOC ontologyEnabling cross-wikis integration by extending the SIOC ontology
Enabling cross-wikis integration by extending the SIOC ontology
 
Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...Dynamic and repeatable transformation of existing Thesauri and Authority list...
Dynamic and repeatable transformation of existing Thesauri and Authority list...
 
Semtech2006
Semtech2006Semtech2006
Semtech2006
 
Role of Semantic Web in Health Informatics
Role of Semantic Web in Health InformaticsRole of Semantic Web in Health Informatics
Role of Semantic Web in Health Informatics
 
Extracting Structured Records From Wikipedia
Extracting Structured Records From WikipediaExtracting Structured Records From Wikipedia
Extracting Structured Records From Wikipedia
 
The WorldCat Search API
The WorldCat Search APIThe WorldCat Search API
The WorldCat Search API
 
Experience with MarkLogic at Elsevier
Experience with MarkLogic at ElsevierExperience with MarkLogic at Elsevier
Experience with MarkLogic at Elsevier
 
Swoogle
SwoogleSwoogle
Swoogle
 
Semantic web assignment 3
Semantic web assignment 3Semantic web assignment 3
Semantic web assignment 3
 
From ontology to wiki
From ontology to wikiFrom ontology to wiki
From ontology to wiki
 
BlogForever poster
BlogForever posterBlogForever poster
BlogForever poster
 
Bratsas Web Science Semantic Wiki
Bratsas Web Science Semantic WikiBratsas Web Science Semantic Wiki
Bratsas Web Science Semantic Wiki
 
Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...Building genomic data cyberinfrastructure with the online database software T...
Building genomic data cyberinfrastructure with the online database software T...
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
SMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev UpdateSMWCon Spring 2012 SMW+ Team Dev Update
SMWCon Spring 2012 SMW+ Team Dev Update
 
Brohee_wiki_BOSC2009
Brohee_wiki_BOSC2009Brohee_wiki_BOSC2009
Brohee_wiki_BOSC2009
 

Allen Institute Neurowiki Presentation

  • 1. Neurowiki Vulcan Inc. in conjunction with Allen Institute for Brain Science
  • 2. Today we are Discussing… •  What is the use case and who requested it? •  How does this apply to a Semantic Media Wiki? •  How do you import and normalize thousands of triples worth of gene RDF triples? •  Creating instance pages without knowing exactly what will be displayed on them. •  Embedding SPARQL within SMW Templates. •  Expanding existing instance pages with Semantic Results Formatters, a PHP view layer, and creating dynamic charts.
  • 3. What is the Allen Institute? •  Launched in 2003 with seed funding from founder and philanthropist Paul G. Allen. •  Serving the scientific community is at the center of our mission to accelerate progress toward understanding the brain and neurological systems. •  The Allen Institute's multidisciplinary staff includes neuroscientists, molecular biologists, informaticists, and engineers. “The Allen Institute for Brain Science is an independent 501(c)(3) nonprofit medical research organization dedicated to accelerating the understanding of how the human brain works.”
  • 4. Human Brain Map •  Open, public online access •  A detailed, interactive three- dimensional anatomic atlas of the "normal" human brain •  Data from multiple human brains •  Genomic analysis of every brain structure, providing a quantitative inventory of which genes are turned on where •  High-resolution atlases of key brain structures, pinpointing where selected genes are expressed down to the cellular level •  Navigation and analysis tools for accessing and mining the data
  • 5. What is Neurowiki? •  A joint project between Vulcan Inc. and the Allen Institute to build a Semantic Wiki mapping genetic instances. •  A finished prototype testing the import pipelines and display components for combining 5 major RDF datasets from 4 different sources. •  Current planning includes mapping complete datasets, curating a better ontology, creating multiple ontology management for a user class, and importing scientific papers.
  • 6. Biological Linked Data Map •  Open, public online access •  Data from multiple RDF data stores •  Complete import pipeline using LDIF framework •  Outlines of each imported instance embedding inline wiki properties and providing views of imported properties from original RDF datasets •  Charting tools that ‘pivot’ SPARQL queries providing several views of each query •  Navigation and composition tools for accessing and mining the data
  • 7. Where did we get the data? •  KEGG : Kyoto Encyclopedia of Genes and Genomes •  “KEGG GENES is a collection of gene catalogs for all complete genomes generated from publicly available resources, mostly NCBI RefSeq.” •  Diseasome •  “The Diseasome website is a disease/disorder relationships explorer and a sample of an innovative map-oriented scientific work. Built by a team of researchers and engineers, it uses the Human Disease Network dataset.” •  DrugBank •  “The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug data with comprehensive drug target information.” •  SIDER •  “SIDER contains information on marketed medicines and their recorded adverse drug reactions. The information is extracted from public documents and package inserts.”
  • 8. Wiki Ontology Map •  Genes •  DrugBank : 4,553 •  Diseasome : 3,919 •  KEGG : 9,841 •  Diseases •  Diseasome : 4,213 •  KEGG : 459 •  Drugs •  DrugBank : 4,772 •  KEGG : 2,482 •  SIDER : 924 •  Effects •  SIDER : 1,737 •  Pathways •  KEGG : 28,442 We chose to intentionally simplify the ontology due to disagreements between researchers about 61,342 Instances Available entity relationships and subclasses. for Import
  • 9. Importing and Networked Local Mapping the Linked Storage Storage Data Download •  R2R •  32,900 instances were converted R2R Maps Entities to Wiki to the wiki ontology. Mapping Ontology Engine •  583,746 properties mapped •  Pathways were ignored for wiki ontology import, but are Import to Wiki available within the triple store KEGG Pathway graph. •  SILK SILK Normalizes Entities Mapping across data sources •  20,849 instances available in Engine wiki ontology after SILK normalization Normalize •  Instance merging effected drugs, Entities genes, and diseases across datasets. •  OntoBroker Triplestore OntoBroker Available with Triplestore SPARQL Queries
  • 10. Create List of Page Creating Instance Names RDF Data 1.0 Wiki Pages Download The triplestore now contained tens of thousands of recognized category instances. Creating the Sanitize Script 2.0 pages would require a bot. 1.  Fetch the RDF dumps from an Category Create CSV active D2R server Page Names 2.  Use regex to fetch the rdf:label property that was mapped by Text of Wiki markup for page Read Open instance R2R as an instance name 3.0 3.  Open category specific text file Create MediaWiki Page of wiki markup (page of MediaWiki template includes) Gateway rb Framework 4.  Contact Neurowiki and request a new page from the REST Neurowiki Instance interface list of names with the category 4.0 Page content
  • 11. Four Initial Templates for Each Instance by Category 1.  Custom infobox within outline template •  Visible inline properties 2.  Outline template providing instance information 3.  Widget template displaying dynamic charts or third party services •  Donut charts and disease Twitter feed 4.  Broad table SPARQL queries showing instance relationships 5.  Hidden inline properties for other extensions
  • 12. Adding a True View Format SPARQL Result Execute Template Population Layer to Semantic MediaWiki Template Strategy Pattern for Template Variables 2.0 Results Formatter 1.  Standard Semantic Results Embed Create Formatter Variables Formatter. 2.  Classes made extending the stock formatter and OntoBroker Triplestore Smarty Processor 3.0 providing a library of strategy classes for rendering different graph types. Send Result Embed Variables 3.  Assembled template variables are given to the Smarty PHP templating Semantic Results 1.0 Completed engine with a path to the Formatter Template Added to Wiki 4.0 template file. page 4.  Template is populated and Format Results inserted / injected into the complete MediaWiki page.
  • 13. Embedding SPARQL Semantic Results Formatters •  Every piece of content on every instance page is generated by Semantic Result Formatters interpreting SPARQL results. •  Most inline properties are embedded in templates returned by SPARQL formatters. •  All 3 dynamic graph types are interpreting results of SPARQL queries and injecting a JavaScript template into the head of the page. •  The outline template takes selected predicates and objects from a SPARQL query, defined in the query embedding, and generates an HTML template for the page.
  • 14. SELECT DISTINCT ?p ?o ?p1 ?o1 { GRAPH ?G { ?gene rdfs:label "{{{1}}}" ; diseasome:bio2rdfSymbol ?sym ; ?p ?o ; . } OPTIONAL { GRAPH ?G1 { ?gene2 drugbank:bio2rdfSymbol ?sym ; ?p1 ?o1 ; . }
  • 15. SELECT DISTINCT ?group1 ?item1 ?group2 ?item2 { GRAPH ?G { ?target drugbank:geneName "{{{1}}}" ; drugbank:geneName ?geneName ; . ?drug drugbank:target ?target ; drugbank:genericName ?item2 ; drugbank:affectedOrganism ?group2 ; . } GRAPH ?G1 { ?siderDrug sider:drugName ?item2 ; rdfs:label ?group1 ; sider:sideEffect ?effect; . ?effect rdfs:label ?item1 . } }
  • 16. Final Application Stack Frameworks Extensions View Libraries •  MediaWiki •  Data Import •  Jquery •  LDIF •  Semantic Results •  JqueryUI Format •  Semantic •  Smarty MediaWiki •  SMWHalo •  Highcharts •  SMW+ •  Enhanced Retreival •  WikiTags
  • 17. Bugs, Bottlenecks, and Unexpected Features •  In original data rdf:label was not normalized •  Breast_cancer_302400 •  Ignored labels that would break MediaWiki URL schemes •  4’-acetate(30px-[4l]’”)-di_tetraoxil’’(40-Cl2) •  Certain drug compounds are related to almost everything •  Calcium •  Vitamin A •  Page create bot had to re-edit pages upon completion to register inline properties with SMW & SMW+ extensions.
  • 18. Neurowiki in Action! •  Which drugs are used in Chemotherapy? •  What are the dangers of Propofol? •  How are base entities like Calcium represented? •  How are new inline properties added to entities? •  Can these be searched? •  Can these be queried using ASK? •  Do existing extensions work with the framework?
  • 19. Demo Links •  http://neurowiki.alleninstitute.org/index.php/Main_Page •  http://neurowiki.alleninstitute.org/index.php/AR •  http://neurowiki.alleninstitute.org/index.php/ABL1 •  http://neurowiki.alleninstitute.org/index.php/AASS •  http://neurowiki.alleninstitute.org/index.php/AAC2 •  http://neurowiki.alleninstitute.org/index.php/Thioridazine •  http://neurowiki.alleninstitute.org/index.php/Risperidone •  http://neurowiki.alleninstitute.org/index.php/Propofol •  http://neurowiki.alleninstitute.org/index.php/Calcium