SlideShare ist ein Scribd-Unternehmen logo
1 von 74
Downloaden Sie, um offline zu lesen
ResourceSync:
                    Web-Based
                      Resource
                 Synchronization
                     Herbert Van de Sompel
                           Los Alamos National Laboratory
                                              @hvdsomp


                            ResourceSync is funded by
                          The Sloan Foundation & JISC


         ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync Core Team – NISO & OAI


Cornell University & OAI:
   Berhard Haslhofer, Carl Lagoze, Simeon Warner

Old Dominion University & OAI:
   Michael L. Nelson

Los Alamos National Laboratory & OAI:
   Martin Klein, Robert Sanderson, Herbert Van de Sompel

NISO:
   Todd Carpenter, Nettie Lagace, Peter Murray


                             ResourceSync – Herbert Van de Sompel
                    TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync Technical Group


•  Manuel Bernhardt, Delving B.V.
•  Kevin Ford, Library of Congress
•  Richard Jones, JISC
•  Graham Klyne, JISC
•  Stuart Lewis, JISC
•  David Rosenthal, LOCKSS
•  Christian Sadilek, Red Hat
•  Shlomo Sanders, Ex Libris, Inc.
•  Sjoerd Siebinga, Delving B.V.
•  Ed Summers, Library of Congress
•  Jeff Young, OCLC Online Computer Library Center


                             ResourceSync – Herbert Van de Sompel
                    TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Synchronize What?

•  Web resources – things with a URI that can be dereferenced and
   are cache-able (no dependency on underlying OS, technologies
   etc.)

•  Small websites/repositories (a few resources) to large
   repositories/datasets/linked data collections (many millions of
   resources)

•  That change slowly (weeks/months) or quickly (seconds), and
   where latency needs may vary

•  Focus on needs of research communication and cultural heritage
   organizations, but aim for generality


                                   ResourceSync – Herbert Van de Sompel
                          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Why?

… because lots of projects and services are doing synchronization
but have to resort to ad-hoc, case by case, approaches!

•  Project team involved with projects that need this

•  Experience with OAI-PMH: widely used in repos but
    o  XML metadata only

    o  Attempts at synchronizing actual content via OAI-PMH

       (complex object formats, dc:identifier) not successful.
    o  Web technology has moved on since 1999




•  Devise a shared solution for data, metadata, linked data?


                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Cases – The Basics




              ResourceSync – Herbert Van de Sompel
     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Cases - More




           ResourceSync – Herbert Van de Sompel
  TICER Summer School, August 22 2012, Tilburg, The Netherlands
Out Of Scope (For Now)

•  Bidirectional synchronization

•  Destination-defined selective synchronization (query)

•  Bulk URI migration




                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Case: arXiv Mirroring

•  1M article versions, ~800/day created or
   updated at 8 PM US Eastern Time

•  Metadata and full-text for each article

•  Accuracy important

•  Want low barrier for others to use

•  Look for more general solution than current
   homebrew mirroring (running with minor
   modifications since 1994!) and occasional rsync
   (filesystem layout specific, auth issues)

                                   ResourceSync – Herbert Van de Sompel
                          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Use Case: DBpedia Live Duplication

•  Average of 2 updates per second
•  Want low latency => need a push technology




                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync Problem


•  Consideration:
    •  Source (server) A has resources that change over time: they
       get created, modified, deleted
    •  Destination (servers) X, Y, and Z leverage (some) resources
       of Source A.
•  Problem:
    •  Destinations want to keep in step with the resource changes
       at Source A: resource synchronization.
•  Goal:
    •  Design an approach for resource synchronization aligned
       with the Web Architecture that has a fair chance of adoption
       by different communities.
        •  The approach must scale better than recurrent HTTP
           HEAD/GET on resources.


                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Destination: 3 Basic Synchronization Needs

1.  Baseline synchronization – A destination must be able to
    perform an initial load or catch-up with a source
       -  avoid out-of-band setup

2.  Incremental synchronization – A destination must have some
    way to keep up-to-date with changes at a source
       -  subject to some latency; minimal: create/update/delete
       -  allow to catch-up after destination has been offline

3.  Audit – A destination should be able to determine whether it is
    synchronized with a source
       -  subject to some latency



                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 1: Describing Content

In order to advertise the resources that a source wants destinations
to know about, it may describe them:

    o    Publish an inventory of resource URIs and possibly
         associated metadata
         -  Destination GETs the Content Description
         -  Destination GETs listed resources by their URI




                                    ResourceSync – Herbert Van de Sompel
                           TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
           updated resources, removes deleted resources.




                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 2: Communicating Change Events

In order to achieve lower latency, a source may communicate about
changes to its resources:

   o     2.1. Change Set: Publish a list of recent change events
         (create, update, delete resource)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.

   o     2.2. Push Change Set: Push a list of recent change events
         (create, update, delete resource) towards (a) destination(s)
        -  Destination acts upon change events, e.g. GETs created/
            updated resources, removes deleted resources.




                                   ResourceSync – Herbert Van de Sompel
                          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set




                                    ResourceSync – Herbert Van de Sompel
                           TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 3: Providing Access to Versions

In order to allow a destination to catch up with missed changes, a
source may support:

   o    3.1. Historical Change Sets: Provide access to change events that
        occurred prior to the ones listed in the current Change Set

   o    3.2. Historical Content: Provide access to prior resource versions




                                     ResourceSync – Herbert Van de Sompel
                            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump




                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capability 4: Transferring Content

By default, content is transferred in response to a GET issued by a
destination against a URI of a source’s resource. But a source may
support additional mechanisms:

   o     4.1. Dump: Publish a package of resource representations
         and necessary metadata
        -  Destination GETs the Dump
        -  Destination unpacks the Dump

   o    4.2. Alternate Content Transfer: Support alternative
        mechanisms to optimize getting content, e.g. content via a
        mirror site, only changes not the entire changed resource.



                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source: Advertise Capabilities

A source needs to advertise the capabilities it supports to allow a
destination to discover them

•     Some capabilities may be provided by a third party, not the
      source itself
     o   e.g. Historical Change Sets, Historical Content
     o   But the source should still make those third party capabilities
         discoverable - trust




                                    ResourceSync – Herbert Van de Sompel
                           TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync


ResourceSync: What & Why?

Problem Perspective & Conceptual Approach

Technical Details

Q&A




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
So Many Choices

                                                   Push
  DSNotify
                OAI-PMH                                                   Pull
     rsync
                   Crawl
                                         OAI-ORE
        RDFsync
                                                           WebDAV Col. Syn.
                                 XMPP
 Atom                                                SWORD                    AtomPub
             Sitemap             RSS

SPARQLpush                                                       PubSubHubbub
                  SDShare                  XMPP

                                  ResourceSync – Herbert Van de Sompel
                         TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands
A Framework Based on Sitemaps

•  Modular framework allowing selective deployment

•  Sitemap is the core component throughout the
   framework

   o    Introduce extension elements and attributes:
          -  In ResourceSync namespace (rs:) to
             accommodate synchronization needs
          -  In XHTML namespace (xhtml:) mainly to
             accommodate discovery needs
   o    Reuse Sitemap format for Change Sets (both
        current and historical) and for manifest in Dump

                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Sitemap with Added Datetime




                ResourceSync – Herbert Van de Sompel
       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Types: Extend lastmod, Use expires
                                        !




                       ResourceSync – Herbert Van de Sompel
              TICER Summer School, August 22 2012, Tilburg, The Netherlands
Sitemap with lastmod and expires
                               !




                   ResourceSync – Herbert Van de Sompel
          TICER Summer School, August 22 2012, Tilburg, The Netherlands
Sitemap Discovery via robots.txt
                               !




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set: An rs Typed Sitemap




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
More rs Extension Elements




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set with rs and xhtml Extensions




                      ResourceSync – Herbert Van de Sompel
             TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set Discovery via Sitemap




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Pushing Change Sets via XMPP PubSub

XMPP Publish-Subscribe: Client to Subscription Service,
  Subscription Service to Client(s) communication

•  One of the XMPP (Extensible Messaging and Presence Protocol)
   extensions http://xmpp.org/extensions/xep-0060.html
•  Apple Notifications based on XMPP PubSub
•  Available tools, see http://xmpp.org/about-xmpp/
   technology-overview/pubsub/#impl-client
    o  XMPP Servers with PubSub support:

        -  ejabberd , OpenFire , Tigase , SleekXMPP
    o  XMPP libraries with PubSub support:

        -  Strophe (C, JavaScript), XMPP4R (Ruby), SleekXMPP
           (Python), PubSub Client (Python)



                                 ResourceSync – Herbert Van de Sompel
                        TICER Summer School, August 22 2012, Tilburg, The Netherlands
Pushing Change Sets via XMPP PubSub




                    ResourceSync – Herbert Van de Sompel
           TICER Summer School, August 22 2012, Tilburg, The Netherlands
Change Set via XMPP




            ResourceSync – Herbert Van de Sompel
   TICER Summer School, August 22 2012, Tilburg, The Netherlands
Push Change Set Discovery via Sitemap




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Discovering a Historical Change Set via a Current Change Set




                                ResourceSync – Herbert Van de Sompel
                       TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Discovering Historical Content – Link to Version Resource




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Memento Intermezzo




 http://www.mementoweb.org/

            ResourceSync – Herbert Van de Sompel
   TICER Summer School, August 22 2012, Tilburg, The Netherlands
Original Resources and Mementos




                  ResourceSync – Herbert Van de Sompel
         TICER Summer School, August 22 2012, Tilburg, The Netherlands
Bridge from Present to Past




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Bridge from Past to Present




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Memento Framework




           ResourceSync – Herbert Van de Sompel
  TICER Summer School, August 22 2012, Tilburg, The Netherlands
Discovering Historical Content – Link to Memento TimeGate




                              ResourceSync – Herbert Van de Sompel
                     TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Dump

•  Two formats currently under discussion:

   o    Format based on ZIP:
         -  Package content
         -  Add manifest (manifest.xml) expressed in
            Sitemap format
         -  ZIP it up

   o    WARC files as used by the web archiving
        community



                               ResourceSync – Herbert Van de Sompel
                      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Mapping URI to File Path with rs:path
                                    !




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Manifest (manifest.xml) Expressed in Sitemap Format




                            ResourceSync – Herbert Van de Sompel
                   TICER Summer School, August 22 2012, Tilburg, The Netherlands
Dump Discovery via Sitemap




               ResourceSync – Herbert Van de Sompel
      TICER Summer School, August 22 2012, Tilburg, The Netherlands
Source Capabilities – Destination Needs




                     ResourceSync – Herbert Van de Sompel
            TICER Summer School, August 22 2012, Tilburg, The Netherlands
Alternate Location




           ResourceSync – Herbert Van de Sompel
  TICER Summer School, August 22 2012, Tilburg, The Netherlands
Alternate Protocol, e.g. Obtain Changes Only




                        ResourceSync – Herbert Van de Sompel
               TICER Summer School, August 22 2012, Tilburg, The Netherlands
Timeline
•  August 2012
    o  First draft spec shared for feedback with ResourceSync team




•  September 2012
    o  In-person meeting of ResourceSync Team

    o  Revise spec, conduct experiments

    o  Solicit broad feedback

    o  Paper in D-Lib Magazine




•  December 2012 – Finalize specification (?)




                                 ResourceSync – Herbert Van de Sompel
                        TICER Summer School, August 22 2012, Tilburg, The Netherlands
Pointers
•  First draft spec:
   http://www.openarchives.org/rs/0.1/resourcesync!

•  Simulator code on github
   http://github.org/resync/simulator!

•  NISO workspace
   http://www.niso.org/workrooms/resourcesync/!
   !
•  List for public comment coming soon




                           ResourceSync – Herbert Van de Sompel
                  TICER Summer School, August 22 2012, Tilburg, The Netherlands
ResourceSync:
                    Web-Based
                      Resource
                 Synchronization
                     Herbert Van de Sompel
                           Los Alamos National Laboratory
                                              @hvdsomp


                            ResourceSync is funded by
                          The Sloan Foundation & JISC


         ResourceSync – Herbert Van de Sompel
TICER Summer School, August 22 2012, Tilburg, The Netherlands

Weitere ähnliche Inhalte

Was ist angesagt?

ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveHerbert Van de Sompel
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesUri Laserson
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Humphrey Southall
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsAaron Brooks
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2tcloudcomputing-tw
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014eswcsummerschool
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked DataEUCLID project
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3tcloudcomputing-tw
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introductionXuan-Chao Huang
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoopguest27e6764
 
Lessons learnt from the Murchison Widefield Array Data Archive
Lessons learnt from the Murchison Widefield Array Data ArchiveLessons learnt from the Murchison Widefield Array Data Archive
Lessons learnt from the Murchison Widefield Array Data ArchiveChen Wu
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Laurent Lefort
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceeakasit_dpu
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with ExamplesJoe McTee
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Mat Kelly
 
End-to-End : Open Access Process Review and Improvements
End-to-End: Open Access Process Review and ImprovementsEnd-to-End: Open Access Process Review and Improvements
End-to-End : Open Access Process Review and ImprovementsRepository Fringe
 

Was ist angesagt? (20)

ResourceSync
ResourceSyncResourceSync
ResourceSync
 
ResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem PerspectiveResourceSync: Conceptual and Technical Problem Perspective
ResourceSync: Conceptual and Technical Problem Perspective
 
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
NISO Forum, Denver, September 24, 2012: ResourceSync: Web-Based Resource Sync...
 
Hadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciencesHadoop ecosystem for health/life sciences
Hadoop ecosystem for health/life sciences
 
Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...Maintaining scholarly standards in the digital age: Publishing historical gaz...
Maintaining scholarly standards in the digital age: Publishing historical gaz...
 
Druid Scaling Realtime Analytics
Druid Scaling Realtime AnalyticsDruid Scaling Realtime Analytics
Druid Scaling Realtime Analytics
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q2
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
Scaling up Linked Data
Scaling up Linked DataScaling up Linked Data
Scaling up Linked Data
 
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
Tcloud Computing Hadoop Family and Ecosystem Service 2013.Q3
 
20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction20131205 hadoop-hdfs-map reduce-introduction
20131205 hadoop-hdfs-map reduce-introduction
 
Large Scale Data With Hadoop
Large Scale Data With HadoopLarge Scale Data With Hadoop
Large Scale Data With Hadoop
 
Hadoop Family and Ecosystem
Hadoop Family and EcosystemHadoop Family and Ecosystem
Hadoop Family and Ecosystem
 
Lessons learnt from the Murchison Widefield Array Data Archive
Lessons learnt from the Murchison Widefield Array Data ArchiveLessons learnt from the Murchison Widefield Array Data Archive
Lessons learnt from the Murchison Widefield Array Data Archive
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
Using the Data Cube vocabulary for Publishing Environmental Linked Data on la...
 
Introduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduceIntroduction to Hadoop and MapReduce
Introduction to Hadoop and MapReduce
 
Hadoop tools with Examples
Hadoop tools with ExamplesHadoop tools with Examples
Hadoop tools with Examples
 
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
Visualizing Digital Collections of Web Archives from Columbia Web Archiving C...
 
End-to-End : Open Access Process Review and Improvements
End-to-End: Open Access Process Review and ImprovementsEnd-to-End: Open Access Process Review and Improvements
End-to-End : Open Access Process Review and Improvements
 

Ähnlich wie ResourceSync: Web-Based Resource Synchronization

A new approach to aggregation
A new approach to aggregation A new approach to aggregation
A new approach to aggregation Enno Meijers
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaEnno Meijers
 
ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013Simeon Warner
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage informationsemanticsconference
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamEnno Meijers
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked dataEnno Meijers
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationEnno Meijers
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
 
IPRES 2014 paper presentation: significant environment information for LTDP
IPRES 2014 paper presentation: significant environment information for LTDPIPRES 2014 paper presentation: significant environment information for LTDP
IPRES 2014 paper presentation: significant environment information for LTDPFabio Corubolo
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportPascal-Nicolas Becker
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...openminted_eu
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly ResourcesRobert Sanderson
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?Ivan Herman
 
A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...Europeana
 

Ähnlich wie ResourceSync: Web-Based Resource Synchronization (20)

A new approach to aggregation
A new approach to aggregation A new approach to aggregation
A new approach to aggregation
 
A distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL IndiaA distributed network of digital heritage information - Unesco/NDL India
A distributed network of digital heritage information - Unesco/NDL India
 
ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013ResourceSync Tutorial from Open Repositories 2013
ResourceSync Tutorial from Open Repositories 2013
 
Session 1.4 a distributed network of heritage information
Session 1.4   a distributed network of heritage informationSession 1.4   a distributed network of heritage information
Session 1.4 a distributed network of heritage information
 
A distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics AmsterdamA distributed network of digital heritage information - Semantics Amsterdam
A distributed network of digital heritage information - Semantics Amsterdam
 
lodlam summit session browsable linked data
lodlam summit session browsable linked datalodlam summit session browsable linked data
lodlam summit session browsable linked data
 
CLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage informationCLARIAH Toogdag 2018: A distributed network of digital heritage information
CLARIAH Toogdag 2018: A distributed network of digital heritage information
 
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTDBpedia Archive using Memento, Triple Pattern Fragments, and HDT
DBpedia Archive using Memento, Triple Pattern Fragments, and HDT
 
Sharing Big Data - Bob Jones
Sharing Big Data - Bob JonesSharing Big Data - Bob Jones
Sharing Big Data - Bob Jones
 
IPRES 2014 paper presentation: significant environment information for LTDP
IPRES 2014 paper presentation: significant environment information for LTDPIPRES 2014 paper presentation: significant environment information for LTDP
IPRES 2014 paper presentation: significant environment information for LTDP
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Illuminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data SupportIlluminating DSpace's Linked Data Support
Illuminating DSpace's Linked Data Support
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...Resource sync overview and real-world use cases for discovery, harvesting, an...
Resource sync overview and real-world use cases for discovery, harvesting, an...
 
Annotating Scholarly Resources
Annotating Scholarly ResourcesAnnotating Scholarly Resources
Annotating Scholarly Resources
 
What is New in W3C land?
What is New in W3C land?What is New in W3C land?
What is New in W3C land?
 
Open Energy Data
Open Energy DataOpen Energy Data
Open Energy Data
 
A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...A distributed network of digital heritage information by Enno Meijers - Europ...
A distributed network of digital heritage information by Enno Meijers - Europ...
 
FFL & CNYH
FFL & CNYHFFL & CNYH
FFL & CNYH
 

Mehr von Herbert Van de Sompel

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about itHerbert Van de Sompel
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DoneHerbert Van de Sompel
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueHerbert Van de Sompel
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly recordHerbert Van de Sompel
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsHerbert Van de Sompel
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Herbert Van de Sompel
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarshipHerbert Van de Sompel
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordHerbert Van de Sompel
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Herbert Van de Sompel
 

Mehr von Herbert Van de Sompel (20)

The web is rotting and what to do about it
The web is rotting and what to do about itThe web is rotting and what to do about it
The web is rotting and what to do about it
 
Researcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized WebResearcher Pod: Scholarly Communication Using the Decentralized Web
Researcher Pod: Scholarly Communication Using the Decentralized Web
 
Persistent Identification: Easier Said than Done
Persistent Identification: Easier Said than DonePersistent Identification: Easier Said than Done
Persistent Identification: Easier Said than Done
 
FAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning IssueFAIR Signposting: A KISS Approach to a Burning Issue
FAIR Signposting: A KISS Approach to a Burning Issue
 
Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)Registration / Certification Interoperability Architecture (overlay peer-review)
Registration / Certification Interoperability Architecture (overlay peer-review)
 
Collecting the organizational scholarly record
Collecting the organizational scholarly recordCollecting the organizational scholarly record
Collecting the organizational scholarly record
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
Almost two decades at LANL
Almost two decades at LANLAlmost two decades at LANL
Almost two decades at LANL
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Paul Evan Peters Lecture
Paul Evan Peters LecturePaul Evan Peters Lecture
Paul Evan Peters Lecture
 
Achieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed CollectionsAchieving Link Integrity for Managed Collections
Achieving Link Integrity for Managed Collections
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
Signposting Overview
Signposting OverviewSignposting Overview
Signposting Overview
 
PID Signposting Pattern
PID Signposting PatternPID Signposting Pattern
PID Signposting Pattern
 
Interoperability for web based scholarship
Interoperability for web based scholarshipInteroperability for web based scholarship
Interoperability for web based scholarship
 
Reminiscing about interoperability
Reminiscing about interoperabilityReminiscing about interoperability
Reminiscing about interoperability
 
Creating Pockets of Persistence
Creating Pockets of PersistenceCreating Pockets of Persistence
Creating Pockets of Persistence
 
Memento 101
Memento 101Memento 101
Memento 101
 
A Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly RecordA Perspective on Archiving the Scholarly Record
A Perspective on Archiving the Scholarly Record
 
Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013Hiberlink: Investigating Reference Rot, December 2013
Hiberlink: Investigating Reference Rot, December 2013
 

Kürzlich hochgeladen

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 

ResourceSync: Web-Based Resource Synchronization

  • 1. ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 2. ResourceSync Core Team – NISO & OAI Cornell University & OAI: Berhard Haslhofer, Carl Lagoze, Simeon Warner Old Dominion University & OAI: Michael L. Nelson Los Alamos National Laboratory & OAI: Martin Klein, Robert Sanderson, Herbert Van de Sompel NISO: Todd Carpenter, Nettie Lagace, Peter Murray ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 3. ResourceSync Technical Group •  Manuel Bernhardt, Delving B.V. •  Kevin Ford, Library of Congress •  Richard Jones, JISC •  Graham Klyne, JISC •  Stuart Lewis, JISC •  David Rosenthal, LOCKSS •  Christian Sadilek, Red Hat •  Shlomo Sanders, Ex Libris, Inc. •  Sjoerd Siebinga, Delving B.V. •  Ed Summers, Library of Congress •  Jeff Young, OCLC Online Computer Library Center ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 4. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 5. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 6. Synchronize What? •  Web resources – things with a URI that can be dereferenced and are cache-able (no dependency on underlying OS, technologies etc.) •  Small websites/repositories (a few resources) to large repositories/datasets/linked data collections (many millions of resources) •  That change slowly (weeks/months) or quickly (seconds), and where latency needs may vary •  Focus on needs of research communication and cultural heritage organizations, but aim for generality ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 7. Why? … because lots of projects and services are doing synchronization but have to resort to ad-hoc, case by case, approaches! •  Project team involved with projects that need this •  Experience with OAI-PMH: widely used in repos but o  XML metadata only o  Attempts at synchronizing actual content via OAI-PMH (complex object formats, dc:identifier) not successful. o  Web technology has moved on since 1999 •  Devise a shared solution for data, metadata, linked data? ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 8. Use Cases – The Basics ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 9. Use Cases - More ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 10. Out Of Scope (For Now) •  Bidirectional synchronization •  Destination-defined selective synchronization (query) •  Bulk URI migration ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 11. Use Case: arXiv Mirroring •  1M article versions, ~800/day created or updated at 8 PM US Eastern Time •  Metadata and full-text for each article •  Accuracy important •  Want low barrier for others to use •  Look for more general solution than current homebrew mirroring (running with minor modifications since 1994!) and occasional rsync (filesystem layout specific, auth issues) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 12. Use Case: DBpedia Live Duplication •  Average of 2 updates per second •  Want low latency => need a push technology ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 13. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 14. ResourceSync Problem •  Consideration: •  Source (server) A has resources that change over time: they get created, modified, deleted •  Destination (servers) X, Y, and Z leverage (some) resources of Source A. •  Problem: •  Destinations want to keep in step with the resource changes at Source A: resource synchronization. •  Goal: •  Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities. •  The approach must scale better than recurrent HTTP HEAD/GET on resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 15. Destination: 3 Basic Synchronization Needs 1.  Baseline synchronization – A destination must be able to perform an initial load or catch-up with a source -  avoid out-of-band setup 2.  Incremental synchronization – A destination must have some way to keep up-to-date with changes at a source -  subject to some latency; minimal: create/update/delete -  allow to catch-up after destination has been offline 3.  Audit – A destination should be able to determine whether it is synchronized with a source -  subject to some latency ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 16. Source Capability 1: Describing Content In order to advertise the resources that a source wants destinations to know about, it may describe them: o  Publish an inventory of resource URIs and possibly associated metadata -  Destination GETs the Content Description -  Destination GETs listed resources by their URI ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 17.
  • 18.
  • 19. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Source Capability 2: Communicating Change Events In order to achieve lower latency, a source may communicate about changes to its resources: o  2.1. Change Set: Publish a list of recent change events (create, update, delete resource) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. o  2.2. Push Change Set: Push a list of recent change events (create, update, delete resource) towards (a) destination(s) -  Destination acts upon change events, e.g. GETs created/ updated resources, removes deleted resources. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 25. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 26.
  • 27.
  • 28.
  • 29. Source Capability 3: Providing Access to Versions In order to allow a destination to catch up with missed changes, a source may support: o  3.1. Historical Change Sets: Provide access to change events that occurred prior to the ones listed in the current Change Set o  3.2. Historical Content: Provide access to prior resource versions ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 30. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 31.
  • 32. Source Capability 4: Transferring Content By default, content is transferred in response to a GET issued by a destination against a URI of a source’s resource. But a source may support additional mechanisms: o  4.1. Dump: Publish a package of resource representations and necessary metadata -  Destination GETs the Dump -  Destination unpacks the Dump o  4.2. Alternate Content Transfer: Support alternative mechanisms to optimize getting content, e.g. content via a mirror site, only changes not the entire changed resource. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 33. Source: Advertise Capabilities A source needs to advertise the capabilities it supports to allow a destination to discover them •  Some capabilities may be provided by a third party, not the source itself o  e.g. Historical Change Sets, Historical Content o  But the source should still make those third party capabilities discoverable - trust ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 34. ResourceSync ResourceSync: What & Why? Problem Perspective & Conceptual Approach Technical Details Q&A ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 35. So Many Choices Push DSNotify OAI-PMH Pull rsync Crawl OAI-ORE RDFsync WebDAV Col. Syn. XMPP Atom SWORD AtomPub Sitemap RSS SPARQLpush PubSubHubbub SDShare XMPP ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 36. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 37. ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 38. A Framework Based on Sitemaps •  Modular framework allowing selective deployment •  Sitemap is the core component throughout the framework o  Introduce extension elements and attributes: -  In ResourceSync namespace (rs:) to accommodate synchronization needs -  In XHTML namespace (xhtml:) mainly to accommodate discovery needs o  Reuse Sitemap format for Change Sets (both current and historical) and for manifest in Dump ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 39. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 40. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 41. Sitemap with Added Datetime ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 42. Change Types: Extend lastmod, Use expires ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 43. Sitemap with lastmod and expires ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 44. Sitemap Discovery via robots.txt ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 45. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 46. Change Set: An rs Typed Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 47. More rs Extension Elements ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 48. Change Set with rs and xhtml Extensions ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 49. Change Set Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 50. Pushing Change Sets via XMPP PubSub XMPP Publish-Subscribe: Client to Subscription Service, Subscription Service to Client(s) communication •  One of the XMPP (Extensible Messaging and Presence Protocol) extensions http://xmpp.org/extensions/xep-0060.html •  Apple Notifications based on XMPP PubSub •  Available tools, see http://xmpp.org/about-xmpp/ technology-overview/pubsub/#impl-client o  XMPP Servers with PubSub support: -  ejabberd , OpenFire , Tigase , SleekXMPP o  XMPP libraries with PubSub support: -  Strophe (C, JavaScript), XMPP4R (Ruby), SleekXMPP (Python), PubSub Client (Python) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 51. Pushing Change Sets via XMPP PubSub ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 52. Change Set via XMPP ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 53. Push Change Set Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 54. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 55. Discovering a Historical Change Set via a Current Change Set ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 56. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 57. Discovering Historical Content – Link to Version Resource ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 58. Memento Intermezzo http://www.mementoweb.org/ ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 59. Original Resources and Mementos ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 60. Bridge from Present to Past ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 61. Bridge from Past to Present ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 62. Memento Framework ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 63. Discovering Historical Content – Link to Memento TimeGate ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 64. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 65. Dump •  Two formats currently under discussion: o  Format based on ZIP: -  Package content -  Add manifest (manifest.xml) expressed in Sitemap format -  ZIP it up o  WARC files as used by the web archiving community ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 66. Mapping URI to File Path with rs:path ! ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 67. Manifest (manifest.xml) Expressed in Sitemap Format ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 68. Dump Discovery via Sitemap ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 69. Source Capabilities – Destination Needs ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 70. Alternate Location ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 71. Alternate Protocol, e.g. Obtain Changes Only ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 72. Timeline •  August 2012 o  First draft spec shared for feedback with ResourceSync team •  September 2012 o  In-person meeting of ResourceSync Team o  Revise spec, conduct experiments o  Solicit broad feedback o  Paper in D-Lib Magazine •  December 2012 – Finalize specification (?) ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 73. Pointers •  First draft spec: http://www.openarchives.org/rs/0.1/resourcesync! •  Simulator code on github http://github.org/resync/simulator! •  NISO workspace http://www.niso.org/workrooms/resourcesync/! ! •  List for public comment coming soon ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands
  • 74. ResourceSync: Web-Based Resource Synchronization Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync is funded by The Sloan Foundation & JISC ResourceSync – Herbert Van de Sompel TICER Summer School, August 22 2012, Tilburg, The Netherlands