NISO/DCMI Webinar:Semantic Mashups AcrossLarge, Heterogeneous Institutions:Experiences from the VIVO ServiceMay 22, 2013Sp...
Semantic mashups acrosslarge, heterogeneousinstitutions: experiencesfrom the VIVO serviceJohn FereiraCornell University
Overview• What is VIVO?• History of VIVO• High level Overview• Ingesting Data into VIVO• Exposing Data in Vivo
What is VIVO?• VIVO is not an acronym• A semantic web application that enables the discovery ofresearch and scholarship ac...
What is VIVO?• An ontology editor. Vivo includes a “vivo” ontologywith can be modified and extended• An instance editor. I...
What is VIVO?• VIVO is a content disseminator• Views of People, Organizations, etc. can be highlycustomized• VIVO provides...
A brief History of VIVO• 2003 – Vivo created for local use at Cornell Universityfor life sciences collaboration• 2007 - Re...
A brief History of VIVO• 2009 - seven institutions received $12.2 million infunding from the National Center for ResearchR...
A high level Overview• Core ideas• Searching/browsing• Self editing
Core ideas• Research and researchers should be discoverableindependently of administrative hierarchies• Relationships are ...
VIVO and Linked Open Data• VIVO enables authoritative data about researchers to becomepart of the Linked Open Data (LOD) c...
Linked Data principlesTim Berners-Lee:▫ Use URIs as names for things▫ Use HTTP URIs so that people can look up those names...
VIVO in the LOD cloud
Searching and Browsing• Triple store indexed into a SOLR instance• Searches are against SOLR• Instance data comes from tri...
Food security
Self Editing• Users can edit their own profile• System can delegate editing to “proxy” editors• Some data can be locked• A...
Editable and non-editable fields
Most text fields support “rich text”
External Concepts for “terms”
Data Ingest (harvesting)
VIVO harvests much of its data automatically fromverified sources•Reduces the need for manual input of data•Provides an in...
Ingesting data with the Vivo Harvester• A pipeline of tools• Tools are written java, using Jena APIs• Can fetch data from ...
Harvesting Pipeline• Fetcher/Parser• Translate: maps rdf to “vivo” RDF• Transfer to local triple store (Jena TDB)• Disambi...
Fetching and Parsing• Fetches data from a URL, Database, local file• Many different types of fetchers▫ CSV fetcher▫ JDBC f...
<?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:node-person="http://vivo.exampl...
Translate• Map “fake” namespace to VIVO classes andproperties• Uses XSLT transform• Unique ID for each record• node-person...
Translated RDF<rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074"><rdf:type rdf:reso...
Transfer• Load RDF into TDB triplestore• Duplicate URIs are not loaded• Further operations are made in the triple store
Scoring/Match• Disambiguates People, Organizations, etc. basedupon property values• Supports Equality, NameCompare,Normali...
Matching• Determines what should be done with a recordwhich matches another record based upon it’s“score”▫ Replace old rec...
ChangeNameSpace• Match old namespace pattern in configuration filehttp://vivo.example.com/harvest/aims_users/person/• Spec...
Diff of previous harvest• Compare TDB model with previous harvest• Generate vivo-additions.rdf• Generate vivo-substraction...
Final Transfer• Load vivo-subtractions.rdf file into SDB• Load vivo-additions.rdf file into SDB
Data Ingest alternatives• Karma: an information integration tool whichprovides a GUI for modeling data into an ontology• G...
Exposing Data in VIVO• Vivo web pages• View data as RDF• Query a Sparql Endpoint and transform results• Drupal front end
Default VIVO theme
Cornell VIVO
Griffiths University
Melbourne Find an Expert
Visualization• Completed Work▫ Co-Author visualization▫ Sparklines▫ VIVO world activity map
VIVO 1.0 source code was publicly released on April 14, 201087 downloads by June 11, 2010. 917 downloads on July 16, 2o10....
View RDF from profile page
Requesting RDF using an Accept Header• curl -H "Accept: application/rdf+xml" -X GEThttp://vivo.ufl.edu/display/n25562
Retrieving data with SPARQL• Fuseki sparql endpoint installed (not included)• Callable with a SPARQL Client• Semantic Serv...
Semantic Services application
Hector Abruna in VIVO
Hector Abruna on Chemistry Site
Viewing VIVO data with Drupal• Import data with Feeds module and Linked DataImporter• Examples
Cals Impact Statements
Agrivivo Home Page
Agrivivo map page
AgriVivo
VivoSearch: search across multiplevivo sites
Vivo SearchLight bookmarklet
Vivo Searchlight
Some Links• Vivoweb▫ http://vivoweb.org• Vivoweb on Sourceforge▫ http://www.sourceforge.net/projects/vivo• VivoSearch▫ htt...
Thank you
NISO/DCMI WebinarSemantic Mashups Across Large, HeterogeneousInstitutions: Experiences from the VIVO ServiceNISO/DCMI Webi...
Thank you for joining us today.Please take a moment to fill out the brief online survey.We look forward to hearing from yo...
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
 NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service
Nächste SlideShare
Wird geladen in …5
×

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

54.043 Aufrufe

Veröffentlicht am

Veröffentlicht in: Bildung, Technologie
  • Als Erste(r) kommentieren

NISO/DCMI May 22 Webinar: Semantic Mashups Across Large, Heterogeneous Institutions: Experiences from the VIVO Service

  1. 1. NISO/DCMI Webinar:Semantic Mashups AcrossLarge, Heterogeneous Institutions:Experiences from the VIVO ServiceMay 22, 2013Speaker:John Fereira,Senior Programmer/Analyst andTechnology Strategist at Cornell Universityhttp://www.niso.org/news/events/2013/dcmi/vivo
  2. 2. Semantic mashups acrosslarge, heterogeneousinstitutions: experiencesfrom the VIVO serviceJohn FereiraCornell University
  3. 3. Overview• What is VIVO?• History of VIVO• High level Overview• Ingesting Data into VIVO• Exposing Data in Vivo
  4. 4. What is VIVO?• VIVO is not an acronym• A semantic web application that enables the discovery ofresearch and scholarship across disciplines in aninstitution.• VIVO enables collaboration and understanding across aninstitution and among institutions – and not just forscientists.• A powerful search/browse functionality for locating peopleand information within or across institutions.
  5. 5. What is VIVO?• An ontology editor. Vivo includes a “vivo” ontologywith can be modified and extended• An instance editor. Instances of classes such as aPerson, Organization, Event, etc. can be created,modified, and deleted• Content can also be brought into VIVO in automatedways from local systems of record, such as HR,grants, course, and faculty activity databases, orfrom database providers such as publicationaggregators and funding agencies.
  6. 6. What is VIVO?• VIVO is a content disseminator• Views of People, Organizations, etc. can be highlycustomized• VIVO provides visualizations such as topic maps, co-authorship networks• Open data means other applications can use it
  7. 7. A brief History of VIVO• 2003 – Vivo created for local use at Cornell Universityfor life sciences collaboration• 2007 - Reimplemented using RDF, OWL, Jena andSPARQL• 2007 – Implemented at Cornell and University ofFlorida as “production” systems
  8. 8. A brief History of VIVO• 2009 - seven institutions received $12.2 million infunding from the National Center for ResearchResources of the NIH to enable a national network ofscientists• 2010 – Version 1.0 released as open source• 2013 – Now at version 1.5.1• 2013 – Transitioning from funded project to asustainable community open source project
  9. 9. A high level Overview• Core ideas• Searching/browsing• Self editing
  10. 10. Core ideas• Research and researchers should be discoverableindependently of administrative hierarchies• Relationships are as interesting as the facts• It’s the network, not just the nodes• Static data models are too confining• Granular data management allows multiple views andre-purposing• Discovery is improved by linking pages to surroundingcontext
  11. 11. VIVO and Linked Open Data• VIVO enables authoritative data about researchers to becomepart of the Linked Open Data (LOD) cloudTim Berners-Lee, http://www.w3.org/2009/Talks/0204-ted-tbl
  12. 12. Linked Data principlesTim Berners-Lee:▫ Use URIs as names for things▫ Use HTTP URIs so that people can look up those names▫ When someone looks up a URI, provide usefulinformation, using the standards (RDF, SPARQL)▫ Include links to other URIs so that people can discovermore thingshttp://linkeddata.org
  13. 13. VIVO in the LOD cloud
  14. 14. Searching and Browsing• Triple store indexed into a SOLR instance• Searches are against SOLR• Instance data comes from triplestore• An example…
  15. 15. Food security
  16. 16. Self Editing• Users can edit their own profile• System can delegate editing to “proxy” editors• Some data can be locked• An example
  17. 17. Editable and non-editable fields
  18. 18. Most text fields support “rich text”
  19. 19. External Concepts for “terms”
  20. 20. Data Ingest (harvesting)
  21. 21. VIVO harvests much of its data automatically fromverified sources•Reduces the need for manual input of data•Provides an integrated and flexible source of publiclyvisible data at an institutional levelData, data, dataIndividuals may also edit and customize their profiles tosuit their professional needsExternal datasourcesInternal datasources
  22. 22. Ingesting data with the Vivo Harvester• A pipeline of tools• Tools are written java, using Jena APIs• Can fetch data from a variety of data formats• Data can be sanitized and disambiguated• Data is ingested directly to the triple store…does notrequire VIVO web app to be running
  23. 23. Harvesting Pipeline• Fetcher/Parser• Translate: maps rdf to “vivo” RDF• Transfer to local triple store (Jena TDB)• Disambiguate using Scoring/Matching• Changenamespace (mint unique URIs)• Diff with previous model to create subtractions• Transfer to VIVO triple store
  24. 24. Fetching and Parsing• Fetches data from a URL, Database, local file• Many different types of fetchers▫ CSV fetcher▫ JDBC fetcher▫ SimpleXMLFetcher▫ JSONFetcher• Output is intermediate RDF Format, one file perrecord• “Fake” namespace used
  25. 25. <?xml version="1.0"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"xmlns:node-person="http://vivo.example.com/harvest/aims_users/fields/person/"xml:base="http://vivo.example.com/harvest/aims_users/person"><rdf:Description rdf:ID="node_-_0"><rdf:type rdf:resource="http://vivo.example.com/harvest/aims_users/types#person"/><node-person:Picture>http://aims.fao.org/sites/default/files/profiles/profile_image_108074.jpg</node-person:Picture><node-person:Website>http://www.valeriapesce.name</node-person:Website><node-person:Nid>108074</node-person:Nid><node-person:Profile>In the last six years at the Global Forum on Agricultural research (GFAR) I have worked extensively onmetadata standards and protocols for managing and exchanging information between systems, in strict collaboration with the OEKCSgroup inFAO.</node-person:Profile><node-person:Organization>Food and Agriculture Organization of the United Nations (FAO)</node-person:Organization><node-person:Expertise>Information management tools, information systems, information architectures</node-person:Expertise><node-person:LastName>Pesce</node-person:LastName><node-person:Country>Italy</node-person:Country><node-person:Email>valeria.pesce@fao.org</node-person:Email><node-person:geolocation>http://aims.fao.org/aos/geopolitical.owl#Italy</node-person:geolocation><node-person:Profile_URL>http://aims.fao.org/node/108074</node-person:Profile_URL><node-person:Username>valeria.pesce</node-person:Username><node-person:FirstName>Valeria</node-person:FirstName><node-person:Role>Information Management Specialist</node-person:Role><node-person:Interests>agINFRA, AgriDrupal, AgriFeeds, AgriVIVO, authority control, automatic indexing, CIARD ContentManagementTask Force, CIARD RING, cloud services, CMS - Content Management Systems, data exchange, Drupal, IAALD - InternationalAssociation ofAgricultural Information Specialists, information management, institutional repository software, interoperability, Linked Open Data- LOD, RDF - Resource Description Framework, Semantic Web</node-person:Interests></rdf:Description></rdf:RDF>
  26. 26. Translate• Map “fake” namespace to VIVO classes andproperties• Uses XSLT transform• Unique ID for each record• node-person:Organization becomesfoaf:Organization• Relationships created
  27. 27. Translated RDF<rdf:Description rdf:about="http://vivo.example.com/harvest/aims_users/person/uid-108074"><rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/><rdfs:label>Pesce, Valeria</rdfs:label><core:currentMemberOf rdf:resource="http://vivo.example.com/harvest/aims_users/org/aims"/><foaf:firstName>Valeria</foaf:firstName><foaf:lastName>Pesce</foaf:lastName><core:primaryEmail>valeria.pesce@fao.org</core:primaryEmail><core:positionInOrganizationrdf:resource="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20United%20Nations%20(FAO)"/></rdf:Description><rdf:Descriptionrdf:about="http://vivo.example.com/harvest/aims_users/org/Food%20and%20Agriculture%20Organization%20of%20the%20United%20Nations%20(FAO)"><rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Organization"/><rdfs:label>Food and Agriculture Organization of the United Nations (FAO)</rdfs:label><core:organizationForPositionrdf:resource="http://vivo.example.com/harvest/aims_users/position/positionFor108074inFood%20and%20Agriculture%20Organization%20of%20the%20United%20Nations%20(FAO)"/><core:hasGeographicLocation rdf:resource="http://aims.fao.org/aos/geopolitical.owl#Italy"/></rdf:Description>
  28. 28. Transfer• Load RDF into TDB triplestore• Duplicate URIs are not loaded• Further operations are made in the triple store
  29. 29. Scoring/Match• Disambiguates People, Organizations, etc. basedupon property values• Supports Equality, NameCompare,NormalizedLevenshteinDifference, Soundexalgorithms• Each property is weighted▫ firstName: 0.5▫ lastName: 0.5▫ Email: 1.0• MatchThreshHold: 1.0
  30. 30. Matching• Determines what should be done with a recordwhich matches another record based upon it’s“score”▫ Replace old record▫ Merge records▫ Ignore record
  31. 31. ChangeNameSpace• Match old namespace pattern in configuration filehttp://vivo.example.com/harvest/aims_users/person/• Specify namespace in VIVOhttp://agrivivodev.mannlib.cornell.edu/vivo/individual/• Mint a new URI in the vivo namespacehttp://agrivivodev.mannlib.cornell.edu/vivo/individual/n123456
  32. 32. Diff of previous harvest• Compare TDB model with previous harvest• Generate vivo-additions.rdf• Generate vivo-substractions.rdf
  33. 33. Final Transfer• Load vivo-subtractions.rdf file into SDB• Load vivo-additions.rdf file into SDB
  34. 34. Data Ingest alternatives• Karma: an information integration tool whichprovides a GUI for modeling data into an ontology• Google Refine: Good for one time ingests and has aVIVO RDF plugin• VIVO admin tools can load RDF
  35. 35. Exposing Data in VIVO• Vivo web pages• View data as RDF• Query a Sparql Endpoint and transform results• Drupal front end
  36. 36. Default VIVO theme
  37. 37. Cornell VIVO
  38. 38. Griffiths University
  39. 39. Melbourne Find an Expert
  40. 40. Visualization• Completed Work▫ Co-Author visualization▫ Sparklines▫ VIVO world activity map
  41. 41. VIVO 1.0 source code was publicly released on April 14, 201087 downloads by June 11, 2010. 917 downloads on July 16, 2o10.The more institutions adopt VIVO, the more high quality data will be available to understand, navigate,manage, utilize, and communicate progress in science and technology.06/2010
  42. 42. View RDF from profile page
  43. 43. Requesting RDF using an Accept Header• curl -H "Accept: application/rdf+xml" -X GEThttp://vivo.ufl.edu/display/n25562
  44. 44. Retrieving data with SPARQL• Fuseki sparql endpoint installed (not included)• Callable with a SPARQL Client• Semantic Services▫ Manages custom sparql queries▫ Exposes URL for external sites▫ Can ask for output as html, xml, json
  45. 45. Semantic Services application
  46. 46. Hector Abruna in VIVO
  47. 47. Hector Abruna on Chemistry Site
  48. 48. Viewing VIVO data with Drupal• Import data with Feeds module and Linked DataImporter• Examples
  49. 49. Cals Impact Statements
  50. 50. Agrivivo Home Page
  51. 51. Agrivivo map page
  52. 52. AgriVivo
  53. 53. VivoSearch: search across multiplevivo sites
  54. 54. Vivo SearchLight bookmarklet
  55. 55. Vivo Searchlight
  56. 56. Some Links• Vivoweb▫ http://vivoweb.org• Vivoweb on Sourceforge▫ http://www.sourceforge.net/projects/vivo• VivoSearch▫ http://vivosearch.org• Vivo Wiki on Duraspace▫ https://wiki.duraspace.org/display/VIVO• Mailing Lists▫ http://sourceforge.net/p/vivo/sfx-list/
  57. 57. Thank you
  58. 58. NISO/DCMI WebinarSemantic Mashups Across Large, HeterogeneousInstitutions: Experiences from the VIVO ServiceNISO/DCMI Webinar • May 22, 2013Questions?All questions will be posted with presenter answers onthe NISO website following the webinar:http://www.niso.org/news/events/2013/dcmi/vivo
  59. 59. Thank you for joining us today.Please take a moment to fill out the brief online survey.We look forward to hearing from you!THANK YOU

×