SlideShare ist ein Scribd-Unternehmen logo
1 von 47
Downloaden Sie, um offline zu lesen
Presentation Materials
http://l.bitcasa.com/ayav_jSQ	

Cross Search Service for Life
Science and Semantic web	
National Institute of Biomedical Innovation
Maori Ito	

1
Sagace	
Search for Biomedical Data &
Resources in Japan
Features	
• 
• 
• 
• 

Focus on biomedical database
Semi-automated Ranking
Refining search results with facets
More informative search results with
metadata
h"p://integbio.jp/en/	
4
Mechanisms of Search
Engine	
1.  Crawling
2.  Indexing
3.  Query Processing
4.  Scoring
Crawling	
Databases	

Crawling Program	

6
Indexing	
•  Split data convenient size and store
own server	
Indexing Data	

Internal Server
Query Processing and
Scoring
Search System	
NIBIO	

NBDC	
  /	
  DBCLS	

AgriTogo	
  

MEDALS	

Collaborate by
using P2P
architecture	

JCGGDB	
  

9
Log Analysis and Reflect
Search Results	

•  The members of top 8 databases are almost
the same.
–  Patents
–  KEGG MEDICUS
–  Medicine and pharmaceutical proceedings
–  Drug emergency call
–  Ingredients information of health food
–  Merck Manual
–  Medical Information Network Distribution Service
–  The Encyclopedia of Psychoactive Drugs
10
Comparison of Databases	
•  Popular databases are Medical or
Pharmaceutical “literal rich” databases.
•  Top databases run away with the
winnings!
•  More than half of databases have never
clicked!	

11
Unpopular databases	
•  Sagace has started the service in March
2012.
•  Some databases have never clicked
since then.
•  Eliminate these databases.
•  Databases
–  272 DB -> 122 DB	

12
Results	
•  Accuracy for users must have improved.
•  Reducing databases also caused speed
up. 	

13
Specific databases in life
science	
•  Some databases in life science is lacked
“literal information” .
•  Cross search engine is suitable to show
literal information.
•  Semantic web will help these databases.	

14
Semantic Web?	

15
What is semantic web?	
Semantic web is constructed by
Web of Meaningful and Machine
Understandable Data	

16
Web of Document	

h"p://pdbj.org/mine/summary/2yi1	

17
Search Engine Results	
Query	
  “2yi1	
  pdbj”	
  search	
  on	
  google	

Search	
  engine	
  can	
  reflect	
  only	
  text	
  data.	

18
Web of Document to Web of Data	
Data	

Data	

Data	
Data	

Data	
Data	
Data	

Data	
Data	
Data	
Data	
Data	
Data	
Data	
 h"p://pdbj.org/mine/summary/2yi1	

19
How should the
computer recognize
these data?	
20
A.(Focus on search service)
Mark-up with Metadata
by Database Developer	

21
What is metadata?	
•  Data about Data	
Entry	
  ID	

See	
  Also	
Keywords	
Species	

Reference	
Experimental	
  
method	

Image	

Entry ID: 2YI1	
Species:HOMO SAPIENS
Reference: PubMed ID 22343627
See Also:2YHY,2YHW
Experimental method: X-RAY
DIFFRACTION
Image: http://pdbj.org/pdb_images/
2yi1.jpg
	
22
Reflect Search Results	
•  Metadata encourage encounter Users and
Database	

	
Image

23
How to markup?
(microdata)	
•  Add metadata with html tag	

Declare	
  Vocabulary	

<div	
  itemscope=“”	
  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>	
  
	
  <span	
  itemprop=“entryID”>2YI1</span>	
  
</div>	
Property	
  
Content	
  
(Predicate)	
  	
(Object)	

http://pdbj.org/mine/summary/2yi1	

2YI1	

http://schema.org/BiologicalDatabaseEntry/entryID	

24
How to reflect?	
•  Crawler program can find metadata easily!
<div	
  itemscope=“”	
  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>	
  
	
  <span	
  itemprop=“entryID”>2YI1</span>	
  
</div>	

•  Add indexed data
@BiologicalDatabaseEntry_entryID=2YI1	

•  Reflect search results

25
Machine Understandable Data	
•  Declaration of vocabulary is important.	
biological?	
  	

E.g. entryID	

book?	

products?	
recipe?	

26
Machine Understandable Data	
•  Declaration of vocabulary is important.	
<div	
  itemscope=“”	
  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>	
  
	
  <span	
  itemprop=“entryID”>2YI1</span>	
  
</div>	
E.g. entryID=2YI1	
Biological	
  
DatabaseEntry!!	

27
What is schema.org?	
•  "Schema.org is a set of extensible
schemas that enables webmasters to
embed structured data on their web
pages for use by search engines and
other applications.”
–  (http://schema.org/)

28
It’s not only in Sagace.	
•  "Search engines including Bing, Google,
Yahoo! and Yandex rely on this markup
to improve the display of search results,
making it easier for people to find the
right web pages.” (h"p://schema.org/)	
  

29
•  Google support these content types:
–  Reviews
–  People
–  Products
–  Businesses and organizations
–  Recipes
–  Events
–  Music	
30
Current Situation	
•  Define original properties for Biological Database and
Biological Database Entry for schema.org
–  entryID, isEntryOf, taxon, seeAlso, reference
–  Schema.org proposal
–  http://www.w3.org/wiki/WebSchemas/BioDatabases

•  Sagace can reflect them to search results.
•  Search Collaboration organization will also reflect
them to search results.
–  NBDC
–  MEDALS (molprof)

•  How to mark up and search results examples in Sagace
•  http://sagace.nibio.go.jp/press/metadata/markup/	
	
31
Sagace reflects these
properties	
• 
• 
• 
• 
• 
• 
• 
• 

image  
isEntryOf  (Database name)
entryID
taxon(Species)
disease
seeAlso (Reference database entry)
dateModified (last modified)
reference (Reference article)
32
To reflect biological data into major search
engine, it requires adding schema.org.	

schema.org
Reflect Search Results

Biological Database and
Biological Database Entry 	

schema.org
Proposal
33
•  To achieve adding our proposal into
schema.org,“Need more people who
think it is a good idea.” (by organizers @
schema.org)
•  We need more databases!	

34
9 DBs have applied
microdata!	
•  DoBISCUIT (Database Of BIoSynthesis clusters
CUrated and InTegrated)
•  JCRB Cell Bank
•  Functional Glycomics with KO mice database
•  Glyco-Disease Genes Database
•  Carbohydrate Interaction Database (Carint)

• 
• 
• 
• 

JCGGDB Report
MEDALS
Integbio Database Catalog
Life Science Database Archive
35
Search Results Example 1	

36
Search Results Example 2 	

37
Issues (Cons) for Microdata	
•  Microdata strongly recommend using
schema.org vocabulary.
•  Microdata is W3C working group not
recommendation
•  If we integrate RDF data, we have to
consider again which vocabularies are
suitable.
RDFa Lite	
•  RDFa Lite is a minimal subset of RDFa,
the Resource Description Framework in
attributes (http://www.w3.org/TR/rdfa-lite/)
–  Affected by Microdata
–  W3C recommendation 07 June 2012

•  Ability to specify more than one
vocabulary (not only schema.org)
•  Easy to mark up	
39
How to markup? (RDFa Lite)	
•  Add metadata with html tag	

Declare	
  Vocabulary	

<div	
  vocab=“h"p://schema.org”	
  typeof=“BiologicalDatabaseEntry”>	
  
	
  <span	
  property=“entryID”>2YI1</span>	
  
</div>	
Property	
  
Content	
  
(Predicate)	
  	
(Object)	

http://pdbj.org/mine/summary/2yi1	

2YI1	

http://schema.org/BiologicalDatabaseEntry/entryID	

40
If you use PDBo as
extension vocabulary	
Declare	
  Vocabulary	

<div prefix="PDBo : http://rdf.wwpdb.org/schema/pdbx-v40.owl#">
<span property="PDBo:exptl.method">X-RAY DIFFRACTION</span>
</div>	
Content	
  
Property	
  
(Predicate)	
  	

(Object)	

	
Image
41
If metadata add into
database...,	
•  Search engine can pick up many
important data.
•  Database developers can appeal their
service more effectively.
•  Users can find easily which they are
looking for.	

42
Current Situation	
•  KNApSAcK has applied RDFa Lite.
•  We’d like to reflect more information by
using RDFa Lite.
•  If you add metadata into your databases,
please contact NBDC or me
(maori@nibio.go.jp)
•  Please collaborate with us !
•  Please tell me what kind of information is
suitable to show and refine.	
43
Acknowledgement	
• 

National Institute of
Biomedical Innovation
–  Mizuguchi Kenji	
–  Morita Mizuki	
–  Igarashi Yoshinobu 	
–  Sakate Ryuichi	
–  Nagao Chioko	
–  Chen Yi-an	
–  Akiko Fukagawa	
–  Tohru Masui
–  Johan Nystrom-Persson 	

• 
• 

• 

• 

National Bioscience
Database Center (NBDC)
National Institute of
Agrobiological Sciences
database (NIAS)
Molecular Profiling
Research Center for Drug
Discovery (molprof)
Japan Consortium for
Glycobiology and
Glycotechnology DataBase
(JCGGDB)

•  This project is supported by a collaboration "Database integration in
NIBIO and cooperation with outside organizations" with the NBDC.	

44
45
Web of Data
(Concept)	

46
xxxx	

http://pdbj.org/mine/summary/xxxx	
http://schema.org/BiologicalDatabaseEntry/entryID	
http://schema.org/BiologicalDatabaseEntry/isEntryOf	
http://schema.org/BiologicalDatabaseEntry/reference	

PDBj	

PubMed:xxxxxxx	
http://schema.org/BiologicalDatabaseEntry/reference	
http://schema.org/BiologicalDatabaseEntry/isEntryOf	

http://databaseA.org/publication	

Database A	

47

Weitere ähnliche Inhalte

Was ist angesagt?

Leveraging Your Taxonomy With Navtree and MAIQuery
Leveraging Your Taxonomy With Navtree and MAIQueryLeveraging Your Taxonomy With Navtree and MAIQuery
Leveraging Your Taxonomy With Navtree and MAIQueryAccess Innovations, Inc.
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsBert Carelli
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsMerce Crosas
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataTom Plasterer
 
Clinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govClinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govGeorge Thomas
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisRavi Madduri
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...DATAVERSITY
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsAnita de Waard
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Kai Eckert
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageTom Plasterer
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingMaaike Duine
 
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationMichael Bar-Sinai
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBMongoDB
 

Was ist angesagt? (20)

Leveraging Your Taxonomy With Navtree and MAIQuery
Leveraging Your Taxonomy With Navtree and MAIQueryLeveraging Your Taxonomy With Navtree and MAIQuery
Leveraging Your Taxonomy With Navtree and MAIQuery
 
Asis&t webinar people directories access innovations
Asis&t webinar people directories access innovationsAsis&t webinar people directories access innovations
Asis&t webinar people directories access innovations
 
Dataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTagsDataverse, Cloud Dataverse, and DataTags
Dataverse, Cloud Dataverse, and DataTags
 
Webinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BDWebinar@AIMS: LODE-BD
Webinar@AIMS: LODE-BD
 
Dataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* DataDataset Catalogs as a Foundation for FAIR* Data
Dataset Catalogs as a Foundation for FAIR* Data
 
Preparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR PrinciplesPreparing Data for Sharing: The FAIR Principles
Preparing Data for Sharing: The FAIR Principles
 
Rda in a_nutshell_december_2018
Rda in a_nutshell_december_2018Rda in a_nutshell_december_2018
Rda in a_nutshell_december_2018
 
Clinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.govClinical Quality Linked Data on health.data.gov
Clinical Quality Linked Data on health.data.gov
 
Globus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS AnalysisGlobus Genomics: Democratizing NGS Analysis
Globus Genomics: Democratizing NGS Analysis
 
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
dkNET Webinar: FAIR Data & Software in the Research Life Cycle 01/22/2021
 
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
WEBINAR: The Yosemite Project: An RDF Roadmap for Healthcare Information Inte...
 
Rda in a_nutshell_october_2018
Rda in a_nutshell_october_2018Rda in a_nutshell_october_2018
Rda in a_nutshell_october_2018
 
CNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data CommonsCNI 2018: A Research Object Authoring Tool for the Data Commons
CNI 2018: A Research Object Authoring Tool for the Data Commons
 
Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1Metadata Provenance Tutorial at SWIB 13, Part 1
Metadata Provenance Tutorial at SWIB 13, Part 1
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier Linking
 
Biothings presentation
Biothings presentationBiothings presentation
Biothings presentation
 
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
Crossref LIVE: The Benefits of Open Infrastructure (APAC time zones) - 29th O...
 
DataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse IntegrationDataTags, The Tags Toolset, and Dataverse Integration
DataTags, The Tags Toolset, and Dataverse Integration
 
Accelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDBAccelerate Pharmaceutical R&D with Big Data and MongoDB
Accelerate Pharmaceutical R&D with Big Data and MongoDB
 

Andere mochten auch

31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIO31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIOMaori Ito
 
Microdata semantic-extend
Microdata semantic-extendMicrodata semantic-extend
Microdata semantic-extendSeek Tan
 
38th MTG in NIBIO
38th MTG in NIBIO38th MTG in NIBIO
38th MTG in NIBIOMaori Ito
 
40th MTG in NIBIO
40th MTG in NIBIO40th MTG in NIBIO
40th MTG in NIBIOMaori Ito
 
42nd MTG in NIBIO
42nd MTG in NIBIO42nd MTG in NIBIO
42nd MTG in NIBIOMaori Ito
 
41st MTG in NIBIO
41st MTG in NIBIO41st MTG in NIBIO
41st MTG in NIBIOMaori Ito
 

Andere mochten auch (8)

31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIO31st Integrated DB MTG in NIBIO
31st Integrated DB MTG in NIBIO
 
Microdata semantic-extend
Microdata semantic-extendMicrodata semantic-extend
Microdata semantic-extend
 
38th MTG in NIBIO
38th MTG in NIBIO38th MTG in NIBIO
38th MTG in NIBIO
 
40th MTG in NIBIO
40th MTG in NIBIO40th MTG in NIBIO
40th MTG in NIBIO
 
42nd MTG in NIBIO
42nd MTG in NIBIO42nd MTG in NIBIO
42nd MTG in NIBIO
 
41st MTG in NIBIO
41st MTG in NIBIO41st MTG in NIBIO
41st MTG in NIBIO
 
Lod farm
Lod farmLod farm
Lod farm
 
Lod farm
Lod farmLod farm
Lod farm
 

Ähnlich wie Presentation forpd bj_1

Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataMaori Ito
 
Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Maori Ito
 
OSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | BioschemasOSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | BioschemasOpen Science Fair
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overviewjbgraybeal
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 Scott Edmunds
 
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Alexandre Riazanov
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Bradford Condon
 
Fried connecting across silos seminar
Fried connecting across silos seminarFried connecting across silos seminar
Fried connecting across silos seminarJeff Fried
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013François Belleau
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogsValeria Pesce
 
Knowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataKnowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataTim Williams
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas WorkshopNiall Beard
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Anita de Waard
 
Research data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciencesResearch data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciencesBlue BRIDGE
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset cataloguese-ROSA
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...OpenAIRE
 
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.Rafael C. Jimenez
 

Ähnlich wie Presentation forpd bj_1 (20)

Life Science Database Cross Search and Metadata
Life Science Database Cross Search and MetadataLife Science Database Cross Search and Metadata
Life Science Database Cross Search and Metadata
 
Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013Schema.org extension for biological database @ Biohackathon2013
Schema.org extension for biological database @ Biohackathon2013
 
Overview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data AnalysisOverview of Next Gen Sequencing Data Analysis
Overview of Next Gen Sequencing Data Analysis
 
OSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | BioschemasOSFair2017 Workshop | Bioschemas
OSFair2017 Workshop | Bioschemas
 
Cedar Overview
Cedar OverviewCedar Overview
Cedar Overview
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
NISO/DCMI September 25 Webinar: Implementing Linked Data in Developing Countr...
 
HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9 HKU Data Curation MLIM7350 Class 9
HKU Data Curation MLIM7350 Class 9
 
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
Comprehensive Self-Service Lif Science Data Federation with SADI semantic Web...
 
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
Tripal v3, the Collaborative Online Database Platform Supporting an Internati...
 
Fried connecting across silos seminar
Fried connecting across silos seminarFried connecting across silos seminar
Fried connecting across silos seminar
 
Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013Producing, publishing and consuming linked data - CSHALS 2013
Producing, publishing and consuming linked data - CSHALS 2013
 
Data discovery through federated dataset catalogs
Data discovery through federated dataset catalogsData discovery through federated dataset catalogs
Data discovery through federated dataset catalogs
 
Knowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About DataKnowledge Graphs: Changing How We Think About Data
Knowledge Graphs: Changing How We Think About Data
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
 
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
Research Object Composer: A Tool for Publishing Complex Data Objects in the C...
 
Research data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciencesResearch data catalogues and data interoperability in life sciences
Research data catalogues and data interoperability in life sciences
 
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogueseROSA Stakeholder WS1: Data discovery through federated dataset catalogues
eROSA Stakeholder WS1: Data discovery through federated dataset catalogues
 
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
DataCite – Bridging the gap and helping to find, access and reuse data – Herb...
 
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
PSI-MI & PSICQUIC. Community effort to provide molecular interaction data.
 

Mehr von Maori Ito

39th MTG in NIBIO
39th MTG in NIBIO39th MTG in NIBIO
39th MTG in NIBIOMaori Ito
 
Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization Maori Ito
 
Test for lab_j Psiver j
Test for lab_j Psiver jTest for lab_j Psiver j
Test for lab_j Psiver jMaori Ito
 
37th mtg in NIBIO
37th mtg in NIBIO37th mtg in NIBIO
37th mtg in NIBIOMaori Ito
 
36th mtg in NIBIO
 36th mtg in NIBIO 36th mtg in NIBIO
36th mtg in NIBIOMaori Ito
 
35th mtg in NIBIO
35th mtg in NIBIO35th mtg in NIBIO
35th mtg in NIBIOMaori Ito
 
34th mtg in NIBIO
34th mtg in NIBIO34th mtg in NIBIO
34th mtg in NIBIOMaori Ito
 
33rd MTG In NIBIO
33rd MTG In NIBIO33rd MTG In NIBIO
33rd MTG In NIBIOMaori Ito
 
32nd MTG in NIBIO
32nd MTG in NIBIO32nd MTG in NIBIO
32nd MTG in NIBIOMaori Ito
 
30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIO30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIOMaori Ito
 
29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIO29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIOMaori Ito
 
Bh13.13 sagace 1
Bh13.13 sagace 1Bh13.13 sagace 1
Bh13.13 sagace 1Maori Ito
 
Cross search and_semantic_web_mbsj2013
Cross search and_semantic_web_mbsj2013Cross search and_semantic_web_mbsj2013
Cross search and_semantic_web_mbsj2013Maori Ito
 

Mehr von Maori Ito (20)

39th MTG in NIBIO
39th MTG in NIBIO39th MTG in NIBIO
39th MTG in NIBIO
 
Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization Test slide for the lab - Target prioritization
Test slide for the lab - Target prioritization
 
Test for lab_j Psiver j
Test for lab_j Psiver jTest for lab_j Psiver j
Test for lab_j Psiver j
 
Psiver j
Psiver jPsiver j
Psiver j
 
37th mtg in NIBIO
37th mtg in NIBIO37th mtg in NIBIO
37th mtg in NIBIO
 
36th mtg in NIBIO
 36th mtg in NIBIO 36th mtg in NIBIO
36th mtg in NIBIO
 
35th mtg in NIBIO
35th mtg in NIBIO35th mtg in NIBIO
35th mtg in NIBIO
 
34th mtg in NIBIO
34th mtg in NIBIO34th mtg in NIBIO
34th mtg in NIBIO
 
33rd MTG In NIBIO
33rd MTG In NIBIO33rd MTG In NIBIO
33rd MTG In NIBIO
 
32nd MTG in NIBIO
32nd MTG in NIBIO32nd MTG in NIBIO
32nd MTG in NIBIO
 
30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIO30th Integrated DB MTG in NIBIO
30th Integrated DB MTG in NIBIO
 
29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIO29th Integrated DB MTG in NIBIO
29th Integrated DB MTG in NIBIO
 
Bh13.13 sagace 1
Bh13.13 sagace 1Bh13.13 sagace 1
Bh13.13 sagace 1
 
28th mtg
28th mtg28th mtg
28th mtg
 
27th mtg 1
27th mtg 127th mtg 1
27th mtg 1
 
Cross search and_semantic_web_mbsj2013
Cross search and_semantic_web_mbsj2013Cross search and_semantic_web_mbsj2013
Cross search and_semantic_web_mbsj2013
 
26th mtg
26th mtg26th mtg
26th mtg
 
25th mtg 1
25th mtg 125th mtg 1
25th mtg 1
 
24th mtg 1
24th mtg 124th mtg 1
24th mtg 1
 
Cellsalon5
Cellsalon5Cellsalon5
Cellsalon5
 

Presentation forpd bj_1

  • 1. Presentation Materials http://l.bitcasa.com/ayav_jSQ Cross Search Service for Life Science and Semantic web National Institute of Biomedical Innovation Maori Ito 1
  • 2. Sagace Search for Biomedical Data & Resources in Japan
  • 3. Features •  •  •  •  Focus on biomedical database Semi-automated Ranking Refining search results with facets More informative search results with metadata
  • 5. Mechanisms of Search Engine 1.  Crawling 2.  Indexing 3.  Query Processing 4.  Scoring
  • 7. Indexing •  Split data convenient size and store own server Indexing Data Internal Server
  • 9. Search System NIBIO NBDC  /  DBCLS AgriTogo   MEDALS Collaborate by using P2P architecture JCGGDB   9
  • 10. Log Analysis and Reflect Search Results •  The members of top 8 databases are almost the same. –  Patents –  KEGG MEDICUS –  Medicine and pharmaceutical proceedings –  Drug emergency call –  Ingredients information of health food –  Merck Manual –  Medical Information Network Distribution Service –  The Encyclopedia of Psychoactive Drugs 10
  • 11. Comparison of Databases •  Popular databases are Medical or Pharmaceutical “literal rich” databases. •  Top databases run away with the winnings! •  More than half of databases have never clicked! 11
  • 12. Unpopular databases •  Sagace has started the service in March 2012. •  Some databases have never clicked since then. •  Eliminate these databases. •  Databases –  272 DB -> 122 DB 12
  • 13. Results •  Accuracy for users must have improved. •  Reducing databases also caused speed up. 13
  • 14. Specific databases in life science •  Some databases in life science is lacked “literal information” . •  Cross search engine is suitable to show literal information. •  Semantic web will help these databases. 14
  • 16. What is semantic web? Semantic web is constructed by Web of Meaningful and Machine Understandable Data 16
  • 18. Search Engine Results Query  “2yi1  pdbj”  search  on  google Search  engine  can  reflect  only  text  data. 18
  • 19. Web of Document to Web of Data Data Data Data Data Data Data Data Data Data Data Data Data Data Data h"p://pdbj.org/mine/summary/2yi1 19
  • 20. How should the computer recognize these data? 20
  • 21. A.(Focus on search service) Mark-up with Metadata by Database Developer 21
  • 22. What is metadata? •  Data about Data Entry  ID See  Also Keywords Species Reference Experimental   method Image Entry ID: 2YI1 Species:HOMO SAPIENS Reference: PubMed ID 22343627 See Also:2YHY,2YHW Experimental method: X-RAY DIFFRACTION Image: http://pdbj.org/pdb_images/ 2yi1.jpg 22
  • 23. Reflect Search Results •  Metadata encourage encounter Users and Database Image 23
  • 24. How to markup? (microdata) •  Add metadata with html tag Declare  Vocabulary <div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>   </div> Property   Content   (Predicate)   (Object) http://pdbj.org/mine/summary/2yi1 2YI1 http://schema.org/BiologicalDatabaseEntry/entryID 24
  • 25. How to reflect? •  Crawler program can find metadata easily! <div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>   </div> •  Add indexed data @BiologicalDatabaseEntry_entryID=2YI1 •  Reflect search results 25
  • 26. Machine Understandable Data •  Declaration of vocabulary is important. biological?   E.g. entryID book? products? recipe? 26
  • 27. Machine Understandable Data •  Declaration of vocabulary is important. <div  itemscope=“”  itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>    <span  itemprop=“entryID”>2YI1</span>   </div> E.g. entryID=2YI1 Biological   DatabaseEntry!! 27
  • 28. What is schema.org? •  "Schema.org is a set of extensible schemas that enables webmasters to embed structured data on their web pages for use by search engines and other applications.” –  (http://schema.org/) 28
  • 29. It’s not only in Sagace. •  "Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages.” (h"p://schema.org/)   29
  • 30. •  Google support these content types: –  Reviews –  People –  Products –  Businesses and organizations –  Recipes –  Events –  Music 30
  • 31. Current Situation •  Define original properties for Biological Database and Biological Database Entry for schema.org –  entryID, isEntryOf, taxon, seeAlso, reference –  Schema.org proposal –  http://www.w3.org/wiki/WebSchemas/BioDatabases •  Sagace can reflect them to search results. •  Search Collaboration organization will also reflect them to search results. –  NBDC –  MEDALS (molprof) •  How to mark up and search results examples in Sagace •  http://sagace.nibio.go.jp/press/metadata/markup/ 31
  • 32. Sagace reflects these properties •  •  •  •  •  •  •  •  image   isEntryOf  (Database name) entryID taxon(Species) disease seeAlso (Reference database entry) dateModified (last modified) reference (Reference article) 32
  • 33. To reflect biological data into major search engine, it requires adding schema.org. schema.org Reflect Search Results Biological Database and Biological Database Entry schema.org Proposal 33
  • 34. •  To achieve adding our proposal into schema.org,“Need more people who think it is a good idea.” (by organizers @ schema.org) •  We need more databases! 34
  • 35. 9 DBs have applied microdata! •  DoBISCUIT (Database Of BIoSynthesis clusters CUrated and InTegrated) •  JCRB Cell Bank •  Functional Glycomics with KO mice database •  Glyco-Disease Genes Database •  Carbohydrate Interaction Database (Carint) •  •  •  •  JCGGDB Report MEDALS Integbio Database Catalog Life Science Database Archive 35
  • 38. Issues (Cons) for Microdata •  Microdata strongly recommend using schema.org vocabulary. •  Microdata is W3C working group not recommendation •  If we integrate RDF data, we have to consider again which vocabularies are suitable.
  • 39. RDFa Lite •  RDFa Lite is a minimal subset of RDFa, the Resource Description Framework in attributes (http://www.w3.org/TR/rdfa-lite/) –  Affected by Microdata –  W3C recommendation 07 June 2012 •  Ability to specify more than one vocabulary (not only schema.org) •  Easy to mark up 39
  • 40. How to markup? (RDFa Lite) •  Add metadata with html tag Declare  Vocabulary <div  vocab=“h"p://schema.org”  typeof=“BiologicalDatabaseEntry”>    <span  property=“entryID”>2YI1</span>   </div> Property   Content   (Predicate)   (Object) http://pdbj.org/mine/summary/2yi1 2YI1 http://schema.org/BiologicalDatabaseEntry/entryID 40
  • 41. If you use PDBo as extension vocabulary Declare  Vocabulary <div prefix="PDBo : http://rdf.wwpdb.org/schema/pdbx-v40.owl#"> <span property="PDBo:exptl.method">X-RAY DIFFRACTION</span> </div> Content   Property   (Predicate)   (Object) Image 41
  • 42. If metadata add into database..., •  Search engine can pick up many important data. •  Database developers can appeal their service more effectively. •  Users can find easily which they are looking for. 42
  • 43. Current Situation •  KNApSAcK has applied RDFa Lite. •  We’d like to reflect more information by using RDFa Lite. •  If you add metadata into your databases, please contact NBDC or me (maori@nibio.go.jp) •  Please collaborate with us ! •  Please tell me what kind of information is suitable to show and refine. 43
  • 44. Acknowledgement •  National Institute of Biomedical Innovation –  Mizuguchi Kenji –  Morita Mizuki –  Igarashi Yoshinobu –  Sakate Ryuichi –  Nagao Chioko –  Chen Yi-an –  Akiko Fukagawa –  Tohru Masui –  Johan Nystrom-Persson •  •  •  •  National Bioscience Database Center (NBDC) National Institute of Agrobiological Sciences database (NIAS) Molecular Profiling Research Center for Drug Discovery (molprof) Japan Consortium for Glycobiology and Glycotechnology DataBase (JCGGDB) •  This project is supported by a collaboration "Database integration in NIBIO and cooperation with outside organizations" with the NBDC. 44
  • 45. 45