In 2012, the University of Idaho Library began implementing VIVO, an open-source Semantic Web application, both as a discovery layer for its fledgling institutional repository and as a database to describe, visualize, and report university research activity. The presenters will detail some of the challenges they encountered developing this resource, while discussing the tools and techniques they used for obtaining, editing, and uploading institutional data into the RDF-based VIVO system.
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
VIVO at the University of Idaho
1. VIVO at the
University of
Idaho
SHINY HAPPY PEOPLE HOLDING
NODES: USING VIVO (A
SEMANTIC WEB APPLICATION)
TO REVEAL UNIVERSITY OF IDAHO
RESEARCH AND RESEARCHERS
2. What is VIVO?
An Open-Source …
Semantic Web application …
RDF (Resource Description Framework) Triples, which are controlled
subject-predicate-object expressions that produce consistent
relationships
and Data Harvesting procedures
Data structured so that it can be shared and reused
using Linked Data practices and standards…
Freely available with a community of librarians and web developers
Collecting, ingesting and publishing (public/private) data in batches
to create a searchable, browseable, and reusable network of
information on research and researchers.
3. Early History of VIVO
1997-2005: VIVO Network idea developed at Cornell
for life and social sciences.
Intended to provide a view of sciences and research
“across disciplinary and administrative boundaries.”
2005: Released for Life Sciences
2007: Expanded to all of Cornell University (thru
Library)
2009: $12.2 million NIH grant provided to develop a
national version with several other partners
2010 – Present: More and more institutions adopting
and developing VIVO instances
from “VIVO: Enabling National Networking of Scientists”
4. VIVO at the University of Idaho
Spring 2012 – Fall 2012
Approached by Idaho INBRE (a Biomedical Researcher
network in Idaho) with question about possibly installing
VIVO instance
Installed VIVO, began setting up and learning the
system, while gathering feedback from INBRE and other
stakeholders
Garnered approval from INBRE faculty to publish their
information in the system
Harvested INBRE related information from public
resources: PubMed and NIH and NSF grants database
5. VIVO at the University of Idaho
Spring 2013
Began to pursue expanded VIVO
Receive approval from institutional IT evaluation group
to go forward
Re-branded instance
Presented VIVO to library faculty and administration as
possible project going forward
Presented instance and proposal for new position to VP
of Research
6. VIVO at the University of Idaho
Summer 2013
VP approved expanded use of VIVO for Research
Groups on campus and funding for position
Annie Gaines begins as Scholarly Communication
Librarian
Ingest, Ingest, Ingest,
Added three additional research groups, as well as the Law
School, and associated faculty
Added thousands of grants, publications, and people into
the system.
7. VIVO at the University of Idaho
Fall 2013
Presented VIVO publicly on campus for first time
VIVO goes live (accessible from off campus)
Additional organizational descriptions added
(Department, College, Grant Strucutures, etc.)
Gained approval and access to use campus database
system, Banner
8. VIVO at the University of Idaho
VIVO Today
Beginning to explore VIVO as front-end for historical
documents
Adding all University Faculty
Creating applications and access points for data
Cleaning, always cleaning …
Using this presentation as a prompt for further
development of application, as well as further defining:
the system’s presentation
our data’s preservation
and our mission and goals in using the system
9. Hosting
Provided by the Northwest Knowledge Network
www.northwestknowledge.net
NKN focuses on providing technical support to
researchers
Division of UI’s Office of Research
Strong relationship with the UI Library (they are in the
building)
Data is replicated to a data center at Idaho National
Laboratory
Present future opportunities for integrating VIVO’s
information with other research-related tools/systems
11. Building VIVO – Two Approaches
Approach #1 – the high-resource approach (ideal)
Requires
Available programmers and developers
Discrete IT department
Formal IT project management
Advantages
Advanced customization and configuration
High-level of integration into existing systems/services
Reasonably short time from inception to production
Disadvantages
Red-tape
Represents a large commitment by the unit
12. Building VIVO – Two Approaches
Approach #2 – the low-resource approach (practical)
Requires
Experimental mindset
Minimum recommended staff identified in the VIVO implementation guide
View VIVO as a series of small projects, rather than one large integration into
university activities
Advantages
Simple
Manageable
Disadvantages
Time (takes much longer)
Integration with existing services
Creation of custom data ingest tools
13. Implementation Goals
Start with low-hanging fruit. It is easier to collect
When considering custom tools and processes, our priorities:
1 – re-use from community or locally
2 – buy if possible
3 – build as needed
Build institutional interest in the existing data before soliciting more
resources to further our development
Investigate third-party solutions (Symplectic Elements) as
alternatives to custom-building internal methods of collecting data
14. Data Ingestion - General
Typical workflow:
1. Receive data in source format
2. Convert to RDF (usually RDF/XML or Turtle)
3. Associate with VIVO ontology (as needed)
4. Reconcile against existing database
5. Load into the application
6. Re-index if needed
15. Data Ingestion - Sources
Public Sources
NSF, NIH, USDA Awards
Pubmed
Commercial Sources
Web of Science
Must remove “intellectual effort”
CVs, Publication Lists
Must have some means of soliciting them
Local Databases (central university, research groups)
Several institutional sources
Must work through the gatekeepers of each
Need data security review to ensure that institutional concerns are met before
public exposure
16. Data Ingestion - Tools
VIVO Harvester
Extract, Transform, and Load (ETL) tool that takes data from
a source and loads it into VIVO automatically
OpenRefine
Very flexible for different datatypes
Extension enables export in RDF format
Data cleaning tool
Reconciliation service allows us to match and deduplicate entries before export
Custom Conversion Tools (in Python)
Used for CRIS reports output, as well as other consistent,
but unusual formats
17. Ontology Extensions
Custom University of Idaho model prefixed with
“uidaho:”
Goals with our extensions
Establish the local need before creating
Re-use as much as possible
Always associate classes within the VIVO hierarchy so
that data is not fully reliant on uidaho for context
Examples
Members of Idaho EPSCoR, Idaho INBRE, REACCH-PNA
Non-UI/Courtesy Faculty
18. Data Re-use - Fuseki
Apache Jena - Fuseki project
jena.apache.org/documentation/serving_data/
Enables external access to VIVO data
Without Fuseki, data re-use is limited to those authenticated
with the system
Created examples of data re-use to assist in marketing efforts
Goal: to establish value-addness of putting data in VIVO
Example: Labs who need to report the results of their research
by creating publication lists, or displaying spatial, temporal, or
conceptual aspects of UI research to stakeholders or students
could use this feature
19. Data Re-use - Fuseki
Example 1:
A very simple way to
look at awards data.
This presents the number
of awards by agency. It
is using a javascript
library called sgvizler to
turn JSON data from
Fuseki into a Google
Charts visualization.
20. Data Re-use - Fuseki
Example 2:
An other simple view
using sg-vizler. This
shows a comparison of
two variables – awards
and publications – for
personnel in a specific
research group. It
would need work as a
formal graph, but it
points to the way that
the data can be reused.
21. Data Re-use - Fuseki
Example 3:
An other simple example
of data re-use using a
javascript/ajax technique
to display a list of journal
titles and faculty within a
specific research group.
Links to the faculty
members’ VIVO profiles
are associated with their
names.
23. Background
When Annie was brought on for Scholarly
Communications, one of her tasks was to develop
an IR for the UI.
Some potential platforms to use for UI IR:
CONTENTdm – too flat
Bepress – too expensive
VIVO?
24. ‘Institutional repositories’
“A set of services that a university offers to the
members of its community for the management
and dissemination of digital materials created by
the institution and its community members.”
Clifford Lynch, ARL Bimonthly Report 226, Feb. 2003.
“Digital collections that capture and preserve the
intellectual output of university communities.”
Ryam Crowe, Case for Institutional Repositories, SPARC,
2002
25. ‘Institutional repositories’
Are:
Collection of scholarly work
Both cumulative and perpetual
Institutionally defined and managed
Open
Provide:
Long term preservation
Wide dissemination
Showcase for scholars and the institution
27. VIVO as IR?
Not your typical IR interface
Interconnectedness in a large network
Includes diverse materials, not just article pre-prints
Includes citations for all works, not just the ones hosted
in the IR
Dynamic browsing and searching
Linked data format allows for reuse of data for a variety
of purposes
The following page shows a theses document in
VIVO
28.
29. Theory vs. Practice
Although VIVO can act as a front end, the
documents must be hosted elsewhere
We deposit our docs in CONTENTdm and link to the
PDF in VIVO
This makes things easier, but also more complicated
See example of the same theses document in
CONTENTdm on the next page
30.
31. Theory vs. Practice
We wanted to close this presentation by asking
some questions to the group. If you have any
advice for us on this project we would love to hear
from you!
Are more access points better or more confusing?
Should we include historical documents in the VIVO IR?
Which page should be the main collection?
Should we provide links to all collections? Or link from
one into the other?
What are best practices with unusually constructed Irs?
Example 1:A very simple way to look at awards data. This presents the number of awards by agency. It is using a javascript library called sgvizler to turn JSON data from Fuseki into a Google Charts visualization.
Example 2:An other simple view using sg-vizler. This shows a comparison of two variables – awards and publications – for personnel in a specific research group. It would need work as a formal graph, but it points to the way that the data can be re-used.
Example 3:An other simple example of data re-use using a javascript/ajax technique to display a list of journal titles and faculty within a specific research group. Links to the faculty members’ VIVO profiles are associated with their names.