SlideShare ist ein Scribd-Unternehmen logo
1 von 16
The Planets Interoperability Framework

Scalable Services for Digital Preservation

   DPE, Planets and CASPAR Third Annual Conference:
  Costs, Benefits and Motivations for Digital Preservation
                     30. October 2008
              Ross King, Christian Sadilek, Rainer Schmidt
               Austrian Research Centers GmbH – ARC
Outline
 Planets Interoperability Framework
 Grids and Clouds
 Initial Experimental Results
The Planets Interoperability Framework
Motivation
  There are a number of functions that all (or nearly all) software
   applications commonly need. These include functions such as
    • Data persistence
    • User management
    • Authentication and Authorization
    • Monitoring, Logging, and Notification
  The Interoperability Framework (IF) software components
   provide these commonly required functions.
The Planets Interoperability Framework
 Defines an Service-Oriented Architecture for Digital Preservation
     Set of Services, Interfaces, a common Data Model

 Implements Common Services
     Authentication and Authorization, Monitoring, Logging, Notification, …
     Service Registration and Lookup

 Provides APIs for Applications that use Planets
     Testbed Experiments, Executing Preservation Plans

 Provides Workflow Enactment Service and Engine
     Components-based, XML serialization
The Problem of Scalability
 Planets is a preservation architecture based on Web Services
     Supports interoperability and a distributed environment
     Sufficient for a controlled experiments (Testbed)
 Not sufficient for handling a production environment
     Massively, uncontrolled user requests
     Mass migration of hundreds of TBytes of data
 Content Holders are faced with loosing vast amounts of data
     Sufficient computational resources in-house?
 There is a clear demand for incorporating Grid or Cloud Technology
Integrating Virtual Clusters and Clouds
 Basic Idea: Extending Planets SOA with Grid Services
 The Planets IF Job Submission Services
     Allow Job Submission to a PC cluster (e.g. Hadoop, Condor)
     Grid approach/standards (SOAP, HPC-BP, JSDL)
 Cluster nodes are instantiated from specific system images
     Most Preservation Tools are 3rd party applications
     Software need to be preinstalled on cluster nodes
 Cluster and JSS be instantiated in-house (e.g. a PC lab) or
   on top of (leased) cloud resources (AWS EC2).
     Computation be moved to data or vice-versa
Integration – Planets Tiered Architecture
Preservation Planning                   reference
                                                      Tools +       Planets
        and               Service                     Format
Workflow Generation       Registry                                  Services
                                                      Registry
                                                                                Experimental
                                        lookup                                  Environment
                                                                                 (qualitative)
                                                                   3rd Party
                                                                     Tool
   Workbench for                         Execution                 Services
Testbed Experiments                       Engine

                        Workflow Def.
                                          maintain                 Grid/Cloud
                                                                   Execution
                                                                    Services
     Web Portal                 Metadata             Data Model
      Browser                    Store
                                                                          Production
                                                                          Environment
                                                                  Web/Grid
     Clients              Planets Service Context                                  Resources
                                                                  Services
Experimental Setup
 Amazon Elastic Compute Cloud (EC2)
     1 – 150 cluster nodes
     Custom image based on RedHat Fedora 8 i386

 Amazon Simple Storage Service (S3)
     max. 1TB I/O,
     ~32,5MBit/s download / ~13,8MBit/s upload (cloud internally)

 Apache Hadoop (v.0.18)
     MapReduce Implementation

 Pre-installed command line tools (e.g, ps2pdf )
Preservation Planning                   reference
                                                      Tools +       Planets
        and               Service                     Format
Workflow Generation       Registry                                  Services
                                                      Registry
                                        lookup


                                                                   3rd Party
                                                                     Tool
   Workbench for                         Execution                 Services
Testbed Experiments                       Engine

                        Workflow Def.
                                          maintain                 Grid/Cloud
                                                                   Execution
                                                                    Services
     Web Portal                 Metadata             Data Model
      Browser                    Store



                                                                  Web/Grid
     Clients              Planets Service Context                               Resources
                                                                  Services
Experimental Setup
Virtual Cluster (Apache Hadoop)                  Cloud Infrastructure (EC2)




                                           Job
                 JSDL

                                  JSS                                Virtual Node
  Job Description File                                                   (Xen)

                                                               Storage
                                                            Infrastructure
                                                                 (S3)


    Raw Data
                   Data Transfer Service
Experimental Results 1 – Scaling Job Size
                         x(1k) = 3,5   x(1k) = 4,4             x(1k) = 3,6
             10,00                                                             number of nodes = 5
              9,00
              8,00                                                                    EC2 0,07 MB
              7,00                                                                    EC2 7,5 MB
time [min]




              6,00
                                                                                      EC2 250 MB
              5,00
                                                                                      SLE 0,07 MB
              4,00
              3,00                                                                    SLE 7,5 MB
              2,00                                                                    SLE 250 MB
              1,00
              0,00
                     1                   10              100            1000
                                                 tasks                         x(1k) = t_seq / t_parallel
                                                                               and tasks = 1000
Experimental Results 2 – Scaling #Nodes
             40,00
                         n=1, t=36, s = 0.72, e=72%
                     X

             35,00

             30,00
                          n=1 (local), s1=1, t=26
                     X




             25,00
time [min]




                                                                                                   EC2 1000 x 0,07 MB
             20,00
                           n=5, t=8, s=3.25, e=65%                                                 SLE 1000 x 0,07 MB
             15,00
                               n=10, t=4.5, s=5.8, e=58%
             10,00
                     X




                                               n=50, t=1.68, s=15.5, e=31%
              5,00        X                                         n=100, t=1.03, s=25.2, e=25%
              0,00
                                           X                   X
                     0                    50                  100                   150
                                                    nodes
Conclusions
   Preservation systems will need to employ Grid/Cloud resources
      Therefore there is a need to bridge communities in the areas of digital libraries
        and e-science.
   Cloud and virtual infrastructures provide a powerful solution for obtaining
    on-demand access to computational resources.
   Planets IF Job Submission Service provides a first step
      Submission to virtual cluster of preservation nodes using Grid protocols.
      Performance scales roughly with the number of nodes, accounting for expected
        overheads
      Many open issues remain! Security, reliability, standardization, legal aspects...
Thank you for your attention!
 Planets Project
   http://www.planets-project.eu


 Contacts
   Ross King           ross.king@arcs.ac.at
   Rainer Schmidt      rainer.schmidt@arcs.ac.at
Sample JSDL code
<?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?>
<jsdl:JobDefinition xmlns=quot;http://www.example.org/quot;
    xmlns:jsdl=quot;http://schemas.ggf.org/jsdl/2005/11/jsdlquot;
    xmlns:jsdl-posix=quot;http://schemas.ggf.org/jsdl/2005/11/jsdl-posixquot;
    xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot;
    xsi:schemaLocation=quot;http://schemas.ggf.org/jsdl/2005/11/jsdl jsdl.xsd quot;>
  <jsdl:JobDescription>
    <jsdl:JobIdentification>
      <jsdl:JobName>start vi</jsdl:JobName>
    </jsdl:JobIdentification>
    <jsdl:Application>
      <jsdl:ApplicationName>ls</jsdl:ApplicationName>
      <jsdl-posix:POSIXApplication>
        <jsdl-posix:Executable>/bin/ls</jsdl-posix:Executable>
        <jsdl-posix:Argument>-la file.txt</jsdl-posix:Argument>
        <jsdl-posix:Environment name=quot;LD_LIBRARY_PATHquot;>/usr/local/lib</jsdl-posix:Environment>
        <jsdl-posix:Input>/dev/null</jsdl-posix:Input>
        <jsdl-posix:Output>stdout.${JOB_ID}</jsdl-posix:Output>
        <jsdl-posix:Error>stderr.${JOB_ID}</jsdl-posix:Error>
      </jsdl-posix:POSIXApplication>
    </jsdl:Application>
  </jsdl:JobDescription>
</jsdl:JobDefinition>
Map-Reduce for Migrating Digital Objects
   Map-Reduce implements a framework and prog. model for processing
    large documents (Sorting, Searching, Indexing) on multiple nodes.
      Automated decomposition (split)
      Mapping to intermediary pairs (map), optionally (combine)
      Merge output (reduce)
   Provides implementation for data parallel problems, i/o intensive,
   Example: Conversion digital object (e.g website, folder, archive)
      Decompose into atomic pieces (e.g. file, image, movie)
      On each node, convert piece to target format
      Merge pieces and create new data object

Weitere ähnliche Inhalte

Was ist angesagt?

Cascading: Enterprise Data Workflows based on Functional Programming
Cascading: Enterprise Data Workflows based on Functional ProgrammingCascading: Enterprise Data Workflows based on Functional Programming
Cascading: Enterprise Data Workflows based on Functional ProgrammingPaco Nathan
 
Ms Sql Server Black Book
Ms Sql Server Black BookMs Sql Server Black Book
Ms Sql Server Black BookLiquidHub
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developersukdpe
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012Weiwei Chen
 
Geo alberta2010 ppt_template
Geo alberta2010 ppt_templateGeo alberta2010 ppt_template
Geo alberta2010 ppt_templatePaul Bekker
 
CloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stackCloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stackbuildacloud
 
MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012Amazon Web Services
 
Innovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceInnovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceBob Rhubart
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANASAP Technology
 

Was ist angesagt? (11)

Cascading: Enterprise Data Workflows based on Functional Programming
Cascading: Enterprise Data Workflows based on Functional ProgrammingCascading: Enterprise Data Workflows based on Functional Programming
Cascading: Enterprise Data Workflows based on Functional Programming
 
Wpf Tech Overview2009
Wpf Tech Overview2009Wpf Tech Overview2009
Wpf Tech Overview2009
 
Ms Sql Server Black Book
Ms Sql Server Black BookMs Sql Server Black Book
Ms Sql Server Black Book
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
Partitioning CCGrid 2012
Partitioning CCGrid 2012Partitioning CCGrid 2012
Partitioning CCGrid 2012
 
Geo alberta2010 ppt_template
Geo alberta2010 ppt_templateGeo alberta2010 ppt_template
Geo alberta2010 ppt_template
 
CloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stackCloudStack Collaboration Conference 12; Refactoring cloud stack
CloudStack Collaboration Conference 12; Refactoring cloud stack
 
MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012MED301 Is My CDN Performing? - AWS re: Invent 2012
MED301 Is My CDN Performing? - AWS re: Invent 2012
 
Innovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle CoherenceInnovations in Grid Computing with Oracle Coherence
Innovations in Grid Computing with Oracle Coherence
 
Liquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANALiquidity Risk Management powered by SAP HANA
Liquidity Risk Management powered by SAP HANA
 
vBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and BeyondvBACD July 2012 - Apache Hadoop, Now and Beyond
vBACD July 2012 - Apache Hadoop, Now and Beyond
 

Andere mochten auch (9)

The Caspar Artistic Testbed Jerome Barthelemy
The Caspar Artistic Testbed Jerome BarthelemyThe Caspar Artistic Testbed Jerome Barthelemy
The Caspar Artistic Testbed Jerome Barthelemy
 
Demo Caspar Web Desktop Luigi Briguglio
Demo Caspar Web Desktop Luigi BriguglioDemo Caspar Web Desktop Luigi Briguglio
Demo Caspar Web Desktop Luigi Briguglio
 
Akp
AkpAkp
Akp
 
Trm Pp4 Summary And Outlook
Trm Pp4 Summary And OutlookTrm Pp4 Summary And Outlook
Trm Pp4 Summary And Outlook
 
Esa Scientifics Testbed Sergio Albani
Esa Scientifics  Testbed Sergio AlbaniEsa Scientifics  Testbed Sergio Albani
Esa Scientifics Testbed Sergio Albani
 
Trm Trusted Repositories
Trm Trusted RepositoriesTrm Trusted Repositories
Trm Trusted Repositories
 
Di Iorio
Di IorioDi Iorio
Di Iorio
 
Trusted Repositories
Trusted RepositoriesTrusted Repositories
Trusted Repositories
 
Drambora Vilnius Lecture Ross
Drambora Vilnius Lecture RossDrambora Vilnius Lecture Ross
Drambora Vilnius Lecture Ross
 

Ähnlich wie Scalable Services For Digital Preservation Ross King

6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services6.Live Framework 和Mesh Services
6.Live Framework 和Mesh ServicesGaryYoung
 
(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive
(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive
(ATS3-GS03) Accelrys Enterprise Platform Deeper DiveBIOVIA
 
SQLUG event: An evening in the cloud: the old, the new and the big
 SQLUG event: An evening in the cloud: the old, the new and the big  SQLUG event: An evening in the cloud: the old, the new and the big
SQLUG event: An evening in the cloud: the old, the new and the big Mike Martin
 
IT Modernization and Cloud Computing
IT Modernization and Cloud ComputingIT Modernization and Cloud Computing
IT Modernization and Cloud ComputingBarry Gervin
 
(ATS3-GS02) Accelrys Enterprise Platform in Enterprise Architectures
(ATS3-GS02) Accelrys Enterprise Platform in Enterprise Architectures(ATS3-GS02) Accelrys Enterprise Platform in Enterprise Architectures
(ATS3-GS02) Accelrys Enterprise Platform in Enterprise ArchitecturesBIOVIA
 
Play with cloud foundry
Play with cloud foundryPlay with cloud foundry
Play with cloud foundryPeng Wan
 
Google App Engine At A Glance
Google App Engine At A GlanceGoogle App Engine At A Glance
Google App Engine At A GlanceStefan Christoph
 
Windows Azure Uzerinden Alinabilen Hizmetler
Windows Azure Uzerinden Alinabilen HizmetlerWindows Azure Uzerinden Alinabilen Hizmetler
Windows Azure Uzerinden Alinabilen HizmetlerMustafa
 
Windows Azure Üzerinden Alınabilecek Hizmetler
Windows Azure Üzerinden Alınabilecek HizmetlerWindows Azure Üzerinden Alınabilecek Hizmetler
Windows Azure Üzerinden Alınabilecek HizmetlerMSHOWTO Bilisim Toplulugu
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategydrmarcustillett
 
Integration SharePoint 2010 with CRM 2010 by Mai Omar Desouki
Integration SharePoint 2010 with CRM 2010 by Mai Omar DesoukiIntegration SharePoint 2010 with CRM 2010 by Mai Omar Desouki
Integration SharePoint 2010 with CRM 2010 by Mai Omar DesoukiMai Omar Desouki
 
04.egovFrame Runtime Environment Workshop
04.egovFrame Runtime Environment Workshop04.egovFrame Runtime Environment Workshop
04.egovFrame Runtime Environment WorkshopChuong Nguyen
 
02 Ms Online Identity Session 1
02 Ms Online Identity   Session 102 Ms Online Identity   Session 1
02 Ms Online Identity Session 1Sivadon Chaisiri
 
SVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control SystemsSVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control SystemsTao Jiang
 
The Art & Sience of Optimization
The Art & Sience of OptimizationThe Art & Sience of Optimization
The Art & Sience of OptimizationHertzel Karbasi
 
Windows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing PlatformWindows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing PlatformEduardo Castro
 
2009 Q2 WSO2 Technical Update
2009 Q2 WSO2 Technical Update2009 Q2 WSO2 Technical Update
2009 Q2 WSO2 Technical UpdateWSO2
 
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)Spiffy
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftLee Stott
 
(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0
(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0
(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0BIOVIA
 

Ähnlich wie Scalable Services For Digital Preservation Ross King (20)

6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services6.Live Framework 和Mesh Services
6.Live Framework 和Mesh Services
 
(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive
(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive
(ATS3-GS03) Accelrys Enterprise Platform Deeper Dive
 
SQLUG event: An evening in the cloud: the old, the new and the big
 SQLUG event: An evening in the cloud: the old, the new and the big  SQLUG event: An evening in the cloud: the old, the new and the big
SQLUG event: An evening in the cloud: the old, the new and the big
 
IT Modernization and Cloud Computing
IT Modernization and Cloud ComputingIT Modernization and Cloud Computing
IT Modernization and Cloud Computing
 
(ATS3-GS02) Accelrys Enterprise Platform in Enterprise Architectures
(ATS3-GS02) Accelrys Enterprise Platform in Enterprise Architectures(ATS3-GS02) Accelrys Enterprise Platform in Enterprise Architectures
(ATS3-GS02) Accelrys Enterprise Platform in Enterprise Architectures
 
Play with cloud foundry
Play with cloud foundryPlay with cloud foundry
Play with cloud foundry
 
Google App Engine At A Glance
Google App Engine At A GlanceGoogle App Engine At A Glance
Google App Engine At A Glance
 
Windows Azure Uzerinden Alinabilen Hizmetler
Windows Azure Uzerinden Alinabilen HizmetlerWindows Azure Uzerinden Alinabilen Hizmetler
Windows Azure Uzerinden Alinabilen Hizmetler
 
Windows Azure Üzerinden Alınabilecek Hizmetler
Windows Azure Üzerinden Alınabilecek HizmetlerWindows Azure Üzerinden Alınabilecek Hizmetler
Windows Azure Üzerinden Alınabilecek Hizmetler
 
Choosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform StrategyChoosing Your Windows Azure Platform Strategy
Choosing Your Windows Azure Platform Strategy
 
Integration SharePoint 2010 with CRM 2010 by Mai Omar Desouki
Integration SharePoint 2010 with CRM 2010 by Mai Omar DesoukiIntegration SharePoint 2010 with CRM 2010 by Mai Omar Desouki
Integration SharePoint 2010 with CRM 2010 by Mai Omar Desouki
 
04.egovFrame Runtime Environment Workshop
04.egovFrame Runtime Environment Workshop04.egovFrame Runtime Environment Workshop
04.egovFrame Runtime Environment Workshop
 
02 Ms Online Identity Session 1
02 Ms Online Identity   Session 102 Ms Online Identity   Session 1
02 Ms Online Identity Session 1
 
SVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control SystemsSVG in Data Acquisition and Control Systems
SVG in Data Acquisition and Control Systems
 
The Art & Sience of Optimization
The Art & Sience of OptimizationThe Art & Sience of Optimization
The Art & Sience of Optimization
 
Windows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing PlatformWindows Sql Azure Cloud Computing Platform
Windows Sql Azure Cloud Computing Platform
 
2009 Q2 WSO2 Technical Update
2009 Q2 WSO2 Technical Update2009 Q2 WSO2 Technical Update
2009 Q2 WSO2 Technical Update
 
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)
Windows Azure Platform Technical Deep Dive - Chris Auld (Intergen)
 
MEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop MicrosoftMEW22 22nd Machine Evaluation Workshop Microsoft
MEW22 22nd Machine Evaluation Workshop Microsoft
 
(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0
(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0
(ATS4-GS02) A Lap Around Accelrys Enterprise Platform and Pipeline Pilot 9.0
 

Mehr von DigitalPreservationEurope

Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigitalPreservationEurope
 
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...DigitalPreservationEurope
 
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerSignificant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerDigitalPreservationEurope
 
Preservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallPreservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallDigitalPreservationEurope
 
Preservation And Reuse In High Energy Physics Salvatore Mele
Preservation And Reuse In High Energy Physics Salvatore MelePreservation And Reuse In High Energy Physics Salvatore Mele
Preservation And Reuse In High Energy Physics Salvatore MeleDigitalPreservationEurope
 

Mehr von DigitalPreservationEurope (20)

Infrastructure Training Session
Infrastructure Training SessionInfrastructure Training Session
Infrastructure Training Session
 
Drm Training Session
Drm Training SessionDrm Training Session
Drm Training Session
 
2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor2009 Barcelona Wepreserve Nestor
2009 Barcelona Wepreserve Nestor
 
Preservation Metadata
Preservation MetadataPreservation Metadata
Preservation Metadata
 
An Introduction to Digital Preservation
An Introduction to Digital PreservationAn Introduction to Digital Preservation
An Introduction to Digital Preservation
 
Introduction to Planets
Introduction to PlanetsIntroduction to Planets
Introduction to Planets
 
Digital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and RequirementsDigital Preservation Process: Preparation and Requirements
Digital Preservation Process: Preparation and Requirements
 
The Planets Preservation Planning workflow
The Planets Preservation Planning workflowThe Planets Preservation Planning workflow
The Planets Preservation Planning workflow
 
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
Building A Sustainable Model for Digital Preservation Services, Clive Billenn...
 
Preservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCCPreservation Metadata, Michael Day, DCC
Preservation Metadata, Michael Day, DCC
 
PLATTER - Jan Hutar
PLATTER - Jan HutarPLATTER - Jan Hutar
PLATTER - Jan Hutar
 
Sustainability Clive Billenness
Sustainability Clive  BillennessSustainability Clive  Billenness
Sustainability Clive Billenness
 
Significant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred ThallerSignificant Characteristics In Planets Manfred Thaller
Significant Characteristics In Planets Manfred Thaller
 
Shaman Project Hemmje
Shaman Project  HemmjeShaman Project  Hemmje
Shaman Project Hemmje
 
Risks Benefits And Motivations Seamus Ross
Risks Benefits And Motivations Seamus RossRisks Benefits And Motivations Seamus Ross
Risks Benefits And Motivations Seamus Ross
 
Representation Information Steve Rankin
Representation Information Steve RankinRepresentation Information Steve Rankin
Representation Information Steve Rankin
 
Preservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian UpshallPreservation Challenge Radioactive Waste Ian Upshall
Preservation Challenge Radioactive Waste Ian Upshall
 
Preservation And Reuse In High Energy Physics Salvatore Mele
Preservation And Reuse In High Energy Physics Salvatore MelePreservation And Reuse In High Energy Physics Salvatore Mele
Preservation And Reuse In High Energy Physics Salvatore Mele
 
Platter Colin Rosenthal
Platter Colin RosenthalPlatter Colin Rosenthal
Platter Colin Rosenthal
 
Planets Testbed Brian Aitken
Planets Testbed Brian AitkenPlanets Testbed Brian Aitken
Planets Testbed Brian Aitken
 

Kürzlich hochgeladen

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Kürzlich hochgeladen (20)

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Scalable Services For Digital Preservation Ross King

  • 1. The Planets Interoperability Framework Scalable Services for Digital Preservation DPE, Planets and CASPAR Third Annual Conference: Costs, Benefits and Motivations for Digital Preservation 30. October 2008 Ross King, Christian Sadilek, Rainer Schmidt Austrian Research Centers GmbH – ARC
  • 2. Outline  Planets Interoperability Framework  Grids and Clouds  Initial Experimental Results
  • 3. The Planets Interoperability Framework Motivation  There are a number of functions that all (or nearly all) software applications commonly need. These include functions such as • Data persistence • User management • Authentication and Authorization • Monitoring, Logging, and Notification  The Interoperability Framework (IF) software components provide these commonly required functions.
  • 4. The Planets Interoperability Framework  Defines an Service-Oriented Architecture for Digital Preservation  Set of Services, Interfaces, a common Data Model  Implements Common Services  Authentication and Authorization, Monitoring, Logging, Notification, …  Service Registration and Lookup  Provides APIs for Applications that use Planets  Testbed Experiments, Executing Preservation Plans  Provides Workflow Enactment Service and Engine  Components-based, XML serialization
  • 5. The Problem of Scalability  Planets is a preservation architecture based on Web Services  Supports interoperability and a distributed environment  Sufficient for a controlled experiments (Testbed)  Not sufficient for handling a production environment  Massively, uncontrolled user requests  Mass migration of hundreds of TBytes of data  Content Holders are faced with loosing vast amounts of data  Sufficient computational resources in-house?  There is a clear demand for incorporating Grid or Cloud Technology
  • 6. Integrating Virtual Clusters and Clouds  Basic Idea: Extending Planets SOA with Grid Services  The Planets IF Job Submission Services  Allow Job Submission to a PC cluster (e.g. Hadoop, Condor)  Grid approach/standards (SOAP, HPC-BP, JSDL)  Cluster nodes are instantiated from specific system images  Most Preservation Tools are 3rd party applications  Software need to be preinstalled on cluster nodes  Cluster and JSS be instantiated in-house (e.g. a PC lab) or on top of (leased) cloud resources (AWS EC2).  Computation be moved to data or vice-versa
  • 7. Integration – Planets Tiered Architecture Preservation Planning reference Tools + Planets and Service Format Workflow Generation Registry Services Registry Experimental lookup Environment (qualitative) 3rd Party Tool Workbench for Execution Services Testbed Experiments Engine Workflow Def. maintain Grid/Cloud Execution Services Web Portal Metadata Data Model Browser Store Production Environment Web/Grid Clients Planets Service Context Resources Services
  • 8. Experimental Setup  Amazon Elastic Compute Cloud (EC2)  1 – 150 cluster nodes  Custom image based on RedHat Fedora 8 i386  Amazon Simple Storage Service (S3)  max. 1TB I/O,  ~32,5MBit/s download / ~13,8MBit/s upload (cloud internally)  Apache Hadoop (v.0.18)  MapReduce Implementation  Pre-installed command line tools (e.g, ps2pdf )
  • 9. Preservation Planning reference Tools + Planets and Service Format Workflow Generation Registry Services Registry lookup 3rd Party Tool Workbench for Execution Services Testbed Experiments Engine Workflow Def. maintain Grid/Cloud Execution Services Web Portal Metadata Data Model Browser Store Web/Grid Clients Planets Service Context Resources Services
  • 10. Experimental Setup Virtual Cluster (Apache Hadoop) Cloud Infrastructure (EC2) Job JSDL JSS Virtual Node Job Description File (Xen) Storage Infrastructure (S3) Raw Data Data Transfer Service
  • 11. Experimental Results 1 – Scaling Job Size x(1k) = 3,5 x(1k) = 4,4 x(1k) = 3,6 10,00 number of nodes = 5 9,00 8,00 EC2 0,07 MB 7,00 EC2 7,5 MB time [min] 6,00 EC2 250 MB 5,00 SLE 0,07 MB 4,00 3,00 SLE 7,5 MB 2,00 SLE 250 MB 1,00 0,00 1 10 100 1000 tasks x(1k) = t_seq / t_parallel and tasks = 1000
  • 12. Experimental Results 2 – Scaling #Nodes 40,00 n=1, t=36, s = 0.72, e=72% X 35,00 30,00 n=1 (local), s1=1, t=26 X 25,00 time [min] EC2 1000 x 0,07 MB 20,00 n=5, t=8, s=3.25, e=65% SLE 1000 x 0,07 MB 15,00 n=10, t=4.5, s=5.8, e=58% 10,00 X n=50, t=1.68, s=15.5, e=31% 5,00 X n=100, t=1.03, s=25.2, e=25% 0,00 X X 0 50 100 150 nodes
  • 13. Conclusions  Preservation systems will need to employ Grid/Cloud resources  Therefore there is a need to bridge communities in the areas of digital libraries and e-science.  Cloud and virtual infrastructures provide a powerful solution for obtaining on-demand access to computational resources.  Planets IF Job Submission Service provides a first step  Submission to virtual cluster of preservation nodes using Grid protocols.  Performance scales roughly with the number of nodes, accounting for expected overheads  Many open issues remain! Security, reliability, standardization, legal aspects...
  • 14. Thank you for your attention!  Planets Project http://www.planets-project.eu  Contacts Ross King ross.king@arcs.ac.at Rainer Schmidt rainer.schmidt@arcs.ac.at
  • 15. Sample JSDL code <?xml version=quot;1.0quot; encoding=quot;UTF-8quot;?> <jsdl:JobDefinition xmlns=quot;http://www.example.org/quot; xmlns:jsdl=quot;http://schemas.ggf.org/jsdl/2005/11/jsdlquot; xmlns:jsdl-posix=quot;http://schemas.ggf.org/jsdl/2005/11/jsdl-posixquot; xmlns:xsi=quot;http://www.w3.org/2001/XMLSchema-instancequot; xsi:schemaLocation=quot;http://schemas.ggf.org/jsdl/2005/11/jsdl jsdl.xsd quot;> <jsdl:JobDescription> <jsdl:JobIdentification> <jsdl:JobName>start vi</jsdl:JobName> </jsdl:JobIdentification> <jsdl:Application> <jsdl:ApplicationName>ls</jsdl:ApplicationName> <jsdl-posix:POSIXApplication> <jsdl-posix:Executable>/bin/ls</jsdl-posix:Executable> <jsdl-posix:Argument>-la file.txt</jsdl-posix:Argument> <jsdl-posix:Environment name=quot;LD_LIBRARY_PATHquot;>/usr/local/lib</jsdl-posix:Environment> <jsdl-posix:Input>/dev/null</jsdl-posix:Input> <jsdl-posix:Output>stdout.${JOB_ID}</jsdl-posix:Output> <jsdl-posix:Error>stderr.${JOB_ID}</jsdl-posix:Error> </jsdl-posix:POSIXApplication> </jsdl:Application> </jsdl:JobDescription> </jsdl:JobDefinition>
  • 16. Map-Reduce for Migrating Digital Objects  Map-Reduce implements a framework and prog. model for processing large documents (Sorting, Searching, Indexing) on multiple nodes.  Automated decomposition (split)  Mapping to intermediary pairs (map), optionally (combine)  Merge output (reduce)  Provides implementation for data parallel problems, i/o intensive,  Example: Conversion digital object (e.g website, folder, archive)  Decompose into atomic pieces (e.g. file, image, movie)  On each node, convert piece to target format  Merge pieces and create new data object