3. Features
•
•
•
•
Focus on biomedical database
Semi-automated Ranking
Refining search results with facets
More informative search results with
metadata
10. Log Analysis and Reflect
Search Results
• The members of top 8 databases are almost
the same.
– Patents
– KEGG MEDICUS
– Medicine and pharmaceutical proceedings
– Drug emergency call
– Ingredients information of health food
– Merck Manual
– Medical Information Network Distribution Service
– The Encyclopedia of Psychoactive Drugs
10
11. Comparison of Databases
• Popular databases are Medical or
Pharmaceutical “literal rich” databases.
• Top databases run away with the
winnings!
• More than half of databases have never
clicked!
11
12. Unpopular databases
• Sagace has started the service in March
2012.
• Some databases have never clicked
since then.
• Eliminate these databases.
• Databases
– 272 DB -> 122 DB
12
13. Results
• Accuracy for users must have improved.
• Reducing databases also caused speed
up.
13
14. Specific databases in life
science
• Some databases in life science is lacked
“literal information” .
• Cross search engine is suitable to show
literal information.
• Semantic web will help these databases.
14
21. A.(Focus on search service)
Mark-up with Metadata
by Database Developer
21
22. What is metadata?
• Data about Data
Entry
ID
See
Also
Keywords
Species
Reference
Experimental
method
Image
Entry ID: 2YI1
Species:HOMO SAPIENS
Reference: PubMed ID 22343627
See Also:2YHY,2YHW
Experimental method: X-RAY
DIFFRACTION
Image: http://pdbj.org/pdb_images/
2yi1.jpg
22
24. How to markup?
(microdata)
• Add metadata with html tag
Declare
Vocabulary
<div
itemscope=“”
itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>
<span
itemprop=“entryID”>2YI1</span>
</div>
Property
Content
(Predicate)
(Object)
http://pdbj.org/mine/summary/2yi1
2YI1
http://schema.org/BiologicalDatabaseEntry/entryID
24
25. How to reflect?
• Crawler program can find metadata easily!
<div
itemscope=“”
itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>
<span
itemprop=“entryID”>2YI1</span>
</div>
• Add indexed data
@BiologicalDatabaseEntry_entryID=2YI1
• Reflect search results
25
26. Machine Understandable Data
• Declaration of vocabulary is important.
biological?
E.g. entryID
book?
products?
recipe?
26
27. Machine Understandable Data
• Declaration of vocabulary is important.
<div
itemscope=“”
itemtype=“h"p://schema.org/BiologicalDatabaseEntry”>
<span
itemprop=“entryID”>2YI1</span>
</div>
E.g. entryID=2YI1
Biological
DatabaseEntry!!
27
28. What is schema.org?
• "Schema.org is a set of extensible
schemas that enables webmasters to
embed structured data on their web
pages for use by search engines and
other applications.”
– (http://schema.org/)
28
29. It’s not only in Sagace.
• "Search engines including Bing, Google,
Yahoo! and Yandex rely on this markup
to improve the display of search results,
making it easier for people to find the
right web pages.” (h"p://schema.org/)
29
30. • Google support these content types:
– Reviews
– People
– Products
– Businesses and organizations
– Recipes
– Events
– Music
30
31. Current Situation
• Define original properties for Biological Database and
Biological Database Entry for schema.org
– entryID, isEntryOf, taxon, seeAlso, reference
– Schema.org proposal
– http://www.w3.org/wiki/WebSchemas/BioDatabases
• Sagace can reflect them to search results.
• Search Collaboration organization will also reflect
them to search results.
– NBDC
– MEDALS (molprof)
• How to mark up and search results examples in Sagace
• http://sagace.nibio.go.jp/press/metadata/markup/
31
33. To reflect biological data into major search
engine, it requires adding schema.org.
schema.org
Reflect Search Results
Biological Database and
Biological Database Entry
schema.org
Proposal
33
34. • To achieve adding our proposal into
schema.org,“Need more people who
think it is a good idea.” (by organizers @
schema.org)
• We need more databases!
34
35. 9 DBs have applied
microdata!
• DoBISCUIT (Database Of BIoSynthesis clusters
CUrated and InTegrated)
• JCRB Cell Bank
• Functional Glycomics with KO mice database
• Glyco-Disease Genes Database
• Carbohydrate Interaction Database (Carint)
•
•
•
•
JCGGDB Report
MEDALS
Integbio Database Catalog
Life Science Database Archive
35
38. Issues (Cons) for Microdata
• Microdata strongly recommend using
schema.org vocabulary.
• Microdata is W3C working group not
recommendation
• If we integrate RDF data, we have to
consider again which vocabularies are
suitable.
39. RDFa Lite
• RDFa Lite is a minimal subset of RDFa,
the Resource Description Framework in
attributes (http://www.w3.org/TR/rdfa-lite/)
– Affected by Microdata
– W3C recommendation 07 June 2012
• Ability to specify more than one
vocabulary (not only schema.org)
• Easy to mark up
39
40. How to markup? (RDFa Lite)
• Add metadata with html tag
Declare
Vocabulary
<div
vocab=“h"p://schema.org”
typeof=“BiologicalDatabaseEntry”>
<span
property=“entryID”>2YI1</span>
</div>
Property
Content
(Predicate)
(Object)
http://pdbj.org/mine/summary/2yi1
2YI1
http://schema.org/BiologicalDatabaseEntry/entryID
40
41. If you use PDBo as
extension vocabulary
Declare
Vocabulary
<div prefix="PDBo : http://rdf.wwpdb.org/schema/pdbx-v40.owl#">
<span property="PDBo:exptl.method">X-RAY DIFFRACTION</span>
</div>
Content
Property
(Predicate)
(Object)
Image
41
42. If metadata add into
database...,
• Search engine can pick up many
important data.
• Database developers can appeal their
service more effectively.
• Users can find easily which they are
looking for.
42
43. Current Situation
• KNApSAcK has applied RDFa Lite.
• We’d like to reflect more information by
using RDFa Lite.
• If you add metadata into your databases,
please contact NBDC or me
(maori@nibio.go.jp)
• Please collaborate with us !
• Please tell me what kind of information is
suitable to show and refine.
43
44. Acknowledgement
•
National Institute of
Biomedical Innovation
– Mizuguchi Kenji
– Morita Mizuki
– Igarashi Yoshinobu
– Sakate Ryuichi
– Nagao Chioko
– Chen Yi-an
– Akiko Fukagawa
– Tohru Masui
– Johan Nystrom-Persson
•
•
•
•
National Bioscience
Database Center (NBDC)
National Institute of
Agrobiological Sciences
database (NIAS)
Molecular Profiling
Research Center for Drug
Discovery (molprof)
Japan Consortium for
Glycobiology and
Glycotechnology DataBase
(JCGGDB)
• This project is supported by a collaboration "Database integration in
NIBIO and cooperation with outside organizations" with the NBDC.
44