Diese Präsentation wurde erfolgreich gemeldet.
Die SlideShare-Präsentation wird heruntergeladen. ×

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service


Hier ansehen

1 von 64 Anzeige

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

Herunterladen, um offline zu lesen


Weitere Verwandte Inhalte

Diashows für Sie (20)

Ähnlich wie NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service (20)


Weitere von National Information Standards Organization (NISO) (20)

Aktuellste (20)


NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

  1. 1. NISO/DCMI Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service May 22, 2013 Speaker: John Fereira, Senior Programmer/Analyst and Technology Strategist at Cornell University http://www.niso.org/news/events/2013/dcmi/vivo
  2. 2. Semantic mashups across large, heterogeneous institutions: experiences from the VIVO service John Fereira Cornell University
  3. 3. Overview • What is VIVO? • History of VIVO • High level Overview • Ingesting Data into VIVO • Exposing Data in Vivo
  4. 4. What is VIVO? • VIVO is not an acronym • A semantic web application that enables the discovery of research and scholarship across disciplines in an institution. • VIVO enables collaboration and understanding across an institution and among institutions – and not just for scientists. • A powerful search/browse functionality for locating people and information within or across institutions.
  5. 5. What is VIVO? • An ontology editor. Vivo includes a “vivo” ontology with can be modified and extended • An instance editor. Instances of classes such as a Person, Organization, Event, etc. can be created, modified, and deleted • Content can also be brought into VIVO in automated ways from local systems of record, such as HR, grants, course, and faculty activity databases, or from database providers such as publication aggregators and funding agencies.
  6. 6. What is VIVO? • VIVO is a content disseminator • Views of People, Organizations, etc. can be highly customized • VIVO provides visualizations such as topic maps, co- authorship networks • Open data means other applications can use it
  7. 7. A brief History of VIVO • 2003 – Vivo created for local use at Cornell University for life sciences collaboration • 2007 - Reimplemented using RDF, OWL, Jena and SPARQL • 2007 – Implemented at Cornell and University of Florida as “production” systems
  8. 8. A brief History of VIVO • 2009 - seven institutions received $12.2 million in funding from the National Center for Research Resources of the NIH to enable a national network of scientists • 2010 – Version 1.0 released as open source • 2013 – Now at version 1.5.1 • 2013 – Transitioning from funded project to a sustainable community open source project
  9. 9. A high level Overview • Core ideas • Searching/browsing • Self editing
  10. 10. Core ideas • Research and researchers should be discoverable independently of administrative hierarchies • Relationships are as interesting as the facts • It’s the network, not just the nodes • Static data models are too confining • Granular data management allows multiple views and re-purposing • Discovery is improved by linking pages to surrounding context
  11. 11. VIVO and Linked Open Data • VIVO enables authoritative data about researchers to become part of the Linked Open Data (LOD) cloud Tim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl
  12. 12. Linked Data principles Tim Berners-Lee: ▫ Use URIs as names for things ▫ Use HTTP URIs so that people can look up those names ▫ When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) ▫ Include links to other URIs so that people can discover more things http://linkeddata.org
  13. 13. VIVO in the LOD cloud
  14. 14. Searching and Browsing • Triple store indexed into a SOLR instance • Searches are against SOLR • Instance data comes from triplestore • An example…
  15. 15. Food security
  16. 16. Self Editing • Users can edit their own profile • System can delegate editing to “proxy” editors • Some data can be locked • An example
  17. 17. Editable and non-editable fields
  18. 18. Most text fields support “rich text”
  19. 19. External Concepts for “terms”
  20. 20. Data Ingest (harvesting)
  21. 21. VIVO harvests much of its data automatically from verified sources •Reduces the need for manual input of data •Provides an integrated and flexible source of publicly visible data at an institutional level Data, data, data Individuals may also edit and customize their profiles to suit their professional needs External data sources Internal data sources
  22. 22. Ingesting data with the Vivo Harvester • A pipeline of tools • Tools are written java, using Jena APIs • Can fetch data from a variety of data formats • Data can be sanitized and disambiguated • Data is ingested directly to the triple store…does not require VIVO web app to be running
  23. 23. Harvesting Pipeline • Fetcher/Parser • Translate: maps rdf to “vivo” RDF • Transfer to local triple store (Jena TDB) • Disambiguate using Scoring/Matching • Changenamespace (mint unique URIs) • Diff with previous model to create subtractions • Transfer to VIVO triple store
  24. 24. Fetching and Parsing • Fetches data from a URL, Database, local file • Many different types of fetchers ▫ CSV fetcher ▫ JDBC fetcher ▫ SimpleXMLFetcher ▫ JSONFetcher • Output is intermediate RDF Format, one file per record • “Fake” namespace used
  25. 25. <?xml version="1.0"?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:node-person="http://vivo.example.com/harvest/aims_users/fields/person/" xml:base="http://vivo.example.com/harvest/aims_users/person"> <rdf:Description rdf:ID="node_-_0"> <rdf:type rdf:resource="http://vivo.example.com/harvest/aims_users/types#person"/> <node-person:Picture>http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpg</node-person:Picture> <node-person:Website>http://www.valeriapesce.name</node-person:Website> <node-person:Nid>108074</node-person:Nid> <node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively on metad ata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCS group in FAO.</node-person:Profile> <node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization> <node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise> <node-person:LastName>Pesce</node-person:LastName> <node-person:Country>Italy</node-person:Country> <node-person:Email>valeria.pesce@fao.org</node-person:Email> <node-person:geolocation>http://aims.fao.org/aos/geopolitical.owl#Italy</node-person:geolocation> <node-person:Profile_URL>http://aims.fao.org/node/108074</node-person:Profile_URL> <node-person:Username>valeria.pesce</node-person:Username> <node-person:FirstName>Valeria</node-person:FirstName> <node-person:Role>Information Management Specialist</node-person:Role> <node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD Content Management Task Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - International Association of Agricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data - LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests> </rdf:Description> </rdf:RDF>
  26. 26. Translate • Map “fake” namespace to VIVO classes and properties • Uses XSLT transform • Unique ID for each record • node-person:Organization becomes foaf:Organization • Relationships created
  27. 27. Translated RDF <rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/> <rdfs:label>Pesce, Valeria</rdfs:label> <core:currentMemberOf rdf:resource="http://vivo.example.com/harvest/aims_users/org/aims"/> <foaf:firstName>Valeria</foaf:firstName> <foaf:lastName>Pesce</foaf:lastName> <core:primaryEmail>valeria.pesce@fao.org</core:primaryEmail> <core:positionInOrganization rdf:resource="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20 United%20Nations%20(FAO)"/> </rdf:Description> <rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20Uni ted%20Nations%20(FAO)"> <rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/> <rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label> <core:organizationForPosition rdf:resource="http://vivo.example.com/harvest/aims_users/position/positionFor108074inFood%20and%20Agriculture%20Organ ization%20of%20the%20United%20Nations%20(FAO)"/> <core:hasGeographicLocation rdf:resource="http://aims.fao.org/aos/geopolitical.owl#Italy"/> </rdf:Description>
  28. 28. Transfer • Load RDF into TDB triplestore • Duplicate URIs are not loaded • Further operations are made in the triple store
  29. 29. Scoring/Match • Disambiguates People, Organizations, etc. based upon property values • Supports Equality, NameCompare, NormalizedLevenshteinDifference, Soundex algorithms • Each property is weighted ▫ firstName: 0.5 ▫ lastName: 0.5 ▫ Email: 1.0 • MatchThreshHold: 1.0
  30. 30. Matching • Determines what should be done with a record which matches another record based upon it’s “score” ▫ Replace old record ▫ Merge records ▫ Ignore record
  31. 31. ChangeNameSpace • Match old namespace pattern in configuration file http://vivo.example.com/harvest/aims_users/person/ • Specify namespace in VIVO http://agrivivodev.mannlib.cornell.edu/vivo/individual/ • Mint a new URI in the vivo namespace http://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456
  32. 32. Diff of previous harvest • Compare TDB model with previous harvest • Generate vivo-additions.rdf • Generate vivo-substractions.rdf
  33. 33. Final Transfer • Load vivo-subtractions.rdf file into SDB • Load vivo-additions.rdf file into SDB
  34. 34. Data Ingest alternatives • Karma: an information integration tool which provides a GUI for modeling data into an ontology • Google Refine: Good for one time ingests and has a VIVO RDF plugin • VIVO admin tools can load RDF
  35. 35. Exposing Data in VIVO • Vivo web pages • View data as RDF • Query a Sparql Endpoint and transform results • Drupal front end
  36. 36. Default VIVO theme
  37. 37. Cornell VIVO
  38. 38. Griffiths University
  39. 39. Melbourne Find an Expert
  40. 40. Visualization • Completed Work ▫ Co-Author visualization ▫ Sparklines ▫ VIVO world activity map
  41. 41. VIVO 1.0 source code was publicly released on April 14, 2010 87 downloads by June 11, 2010. 917 downloads on July 16, 2o10. The more institutions adopt VIVO, the more high quality data will be available to understand, navigate, manage, utilize, and communicate progress in science and technology. 06/2010
  42. 42. View RDF from profile page
  43. 43. Requesting RDF using an Accept Header • curl -H "Accept: application/rdf+xml" -X GET http://vivo.ufl.edu/display/n25562
  44. 44. Retrieving data with SPARQL • Fuseki sparql endpoint installed (not included) • Callable with a SPARQL Client • Semantic Services ▫ Manages custom sparql queries ▫ Exposes URL for external sites ▫ Can ask for output as html, xml, json
  45. 45. Semantic Services application
  46. 46. Hector Abruna in VIVO
  47. 47. Hector Abruna on Chemistry Site
  48. 48. Viewing VIVO data with Drupal • Import data with Feeds module and Linked Data Importer • Examples
  49. 49. Cals Impact Statements
  50. 50. Agrivivo Home Page
  51. 51. Agrivivo map page
  52. 52. AgriVivo
  53. 53. VivoSearch: search across multiple vivo sites
  54. 54. Vivo SearchLight bookmarklet
  55. 55. Vivo Searchlight
  56. 56. Some Links • Vivoweb ▫ http://vivoweb.org • Vivoweb on Sourceforge ▫ http://www.sourceforge.net/projects/vivo • VivoSearch ▫ http://vivosearch.org • Vivo Wiki on Duraspace ▫ https://wiki.duraspace.org/display/VIVO • Mailing Lists ▫ http://sourceforge.net/p/vivo/sfx-list/
  57. 57. Thank you
  58. 58. NISO/DCMI Webinar Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service NISO/DCMI Webinar • May 22, 2013 Questions? All questions will be posted with presenter answers on the NISO website following the webinar: http://www.niso.org/news/events/2013/dcmi/vivo
  59. 59. Thank you for joining us today. Please take a moment to fill out the brief online survey. We look forward to hearing from you! THANK YOU

Hinweis der Redaktion

  • Authoritative data, diverse formats, filter out private informationTalk about verified dataTalking points: Much of the data in VIVO profiles is ingested from authoritative sources so it is accurate and current, reducing the need for manual input.Private or sensitive information is never imported into VIVO. Only public information will be stored and displayed. Data is housed and maintained at the local institutions. There it can be updated on a regular basis. There are three ways to get data: internal, external, individuals. Internal is authoritative!The rich information in VIVO profiles can be repurposed and shared with other institutional web pages and consumers, reducing cost and increasing efficiencies across the institution.
  • Co-author visAn at-a-glance view of an individual&apos;s collaboration space. Who do they collaborate with most often? Do they always work with the same people, or do they work with multiple separate communities?Links increase in size and color with more frequent collaboration. Co-authors are clustered into communities. Users can explore the social network by traveling to co-authors pages.
  • Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.
  • Since VIVO stores profile information drawn from a variety of sources in a single, flexible format, it can be easily “re-skinned” or “re-purposed” to present specialized views into the institution.