Associate Professor of Chemistry at University of North Florida um University of North Florida
12. Sep 2014•0 gefällt mir•387 views
1 von 16
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
12. Sep 2014•0 gefällt mir•387 views
Downloaden Sie, um offline zu lesen
Melden
Wissenschaft
Development of plugins for access to researchers identified in VIVO on the ScientistsDB website. Also developed a plugin to access Elasticsearch from within Eureka.
ACS 248th Paper 146 VIVO/ScientistsDB Integration into Eureka
1. Faculty Profiling and Searching in
the Eureka Research Workbench
using VIVO and ScientistsDB
Matthew Morse, Israel Hurst, and Stuart J. Chalk
Department of Chemistry
University of North Florida
schalk@unf.edu
2014 Fall ACS Meeting
2. Outline
Motivation
What is Eureka?
What is VIVO?
VIVO API
What is ScientistDB?
MediaWiki API
Search Approaches
ElasticSearch
Usage
Future Plans
Conclusion
3. Motivation
Eureka Research Workbench is an Electronic
Laboratory Notebook (ELN) …
…plus representation of resources
…and needs to be social
Find colleagues that you can collaborate with
There are many places to get this information
4. Electronic Notebooks
Scientists need to move to
digital notebooks…
...and record not just the data
but the flow and context
How science is done
is important for searching,
aggregation, meta-analysis
We need more than an electronic version of a notebook
We need a science version of “Second Life” (SciLife?)
5. Eureka Research Workbench (ERW)
Started in 2006 after getting involved in the
Analytical Information Markup Language (AnIML) project
Store all research notes/data in a digital format
Capture the workflow of scientists
Writing in a lab notebook is equivalent to
“multi-type” blogging in the digital world
How to capture information? Many data types! (ExptML)
How to store files “online”? (Fedora-Commons)
How to access files in the browser? (CakePHP)
How to represent laboratory resources? (ExptML)
How to link data together? RDF (in Fedora-Commons)
6. Experiment Markup Language (ExptML)
A specification (written in XML) that describes different
types of information recorded during the scientific process
(http://exptml.sourceforge.net)
Sample
Solution
Space
Specimen
Substance
Task
Template
Timeline
User
Vendor
Annotation
Api
Calculation
Chemical
Citation
Customer
Data
Dataset
Definition
Element
Equipment
Event
Experiment
Group
Message
Project
Protocol
Quote
Report
Result
7. What is VIVO?
An interdisciplinary network: Enabling collaboration and discovery
among scientists across all disciplines.
Open source software out of Cornell University
Now part of Duraspace (Dspace, Fedora-Commons, and VIVO)
Often integrated with other academic services
Semantic representation -> Vivo Ontology
(https://wiki.duraspace.org/display/VIVO/VIVO-ISF+Ontology)
http://vivoweb.org/
8. VIVO API
Interface to search for different types of ‘individuals’
Faculty members
Subjects
Departments
…
Available in multiple download formats
N-Triples, RDF, N3, Turtle, JSON-LD
https://wiki.duraspace.org/display/VIVO/The+ListRDF+API
9. What is ScientistsDB?
Mediawiki site containing nearly 50,000 scientists
Wikipedia entries
…plus manual additions
Tony Williams, RSC
Sean Atkins, CDD Vault
http://www.scientistsdb.com/
10. MediaWiki API
Mediawiki is the software that runs Wikipedia
Available for download (http://www.mediawiki.org)
Access to all data in a mediawiki MySQL database
Components
Authentication
Search
CRUD
http://www.mediawiki.org/wiki/API:Main_page
11. Search Approaches
VIVO
listRDF API for faculty
(http://<instance>/listrdf?vclass=http://vivoweb.org/ontology/core#
FacultyMember)
Faculty member information (as JSON)
(http://<instance>/individual/a52486491431389?format=json)
ScientistsDB
Retrieve infobox
(http://www.scientistsdb.com/api.php?action=query&format=json
&list=categorymembers&cmtitle=Category:Scientist
Extract records with ‘fields’ field
12. ElasticSearch
Data is stored on a cluster of computers running
Elasticsearch NoSQL software
All data is ingested as JSON
Uses Apache Lucene to index data
http://www.elasticsearch.org/overview/elasticsearch
13. Implementation
Development of CakePHP plugins for
VIVO (multiple locations)
ScientistDB
Elasticsearch
CakePHP can access each of these anywhere in its
Model-View-Controller (MVC) code
14. Future Plans
Ingest more installations of VIVO
Work with technical staff at VIVO to make multi-site
search available to all VIVO users
Improve code to clean up infobox data
Work with Tony and Sean to evaluate if there are
better ways to retrieve subject fields
15. Conclusion
ScientistDB plugin works
VIVO plugin very close…
Eureka needs to be collaborative software and
therefore being able to find other researchers in your
field is an important part of the system
Development of many more plugins to access online
datasources within Eureka