SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Downloaden Sie, um offline zu lesen
SCAPE


SCAPE
Building Digital Preservation Infrastructure
Dr. Ross King
AIT Austrian Institute of Technology GmbH

eSciDoc Days
Berlin, October 27, 2011
SCAPE
                                                                 Digital Preservation
• For the first time, the rate of
  increase of information creation is
  beginning to exceed the rate of
  increase in storage capacity.

• This massive volume of digital
  material raises a number of issues:
         •        What is worth preserving?
         •        How to preserve so much?
         •        How to access preserved data?
         •        How to create incentives to
                  preserve?

 http://arstechnica.com/business/consumerization-of-it/2011/09/information-explosion-how-rapidly-expanding-storage-spurs-innovation.ars




                                                                                                   07.11.2011
                                                                                                                                             2
SCAPE
                    Digital Preservation
• Standards, best-practices, and technologies utilized in order to
  ensure access to digital information over time

• How long?

  “Digital documents last forever – or five years,
   whichever comes first.”
       http://www.clir.org/pubs/reports/rothenberg/introduction.html


• Generally we mean decades or centuries

                               07.11.2011
                                                                       3
SCAPE
             SCAPE – what is it about?

• Planning and managing computing-intensive (digital)
  preservation processes such as the large-scale
  ingestion or migration of large (multi-Terabyte)
  data sets

  SCAPE is a follow-up to the highly successful FP6 IP Planets.
SCAPE
                 SCAPE Project Data
• Project instrument: FP7 Integrated Project
• 6. Call
   • Objective ICT-2009.4.1:
     Digital Libraries and Digital Preservation
   • Target outcome (a) Scalable systems and services for
     preserving digital content
• Duration: 42 months
   • February 2011 – July 2014
• Budget: 11.3 Million Euro
   • Funded: 8.6 Million Euro
SCAPE
                          SCAPE Consortium
   Number         Partner name                                Partner short name   Country
1 (coordinator)   AIT Austrian Institute of Technology GmbH          AIT             AT
       2          British Library                                    BL              UK
       3          Internet Memory Foundation                        IMF              NL
       4          Ex Libris Ltd                                      EXL             IL
       5          Fachinformationszentrum Karlsruhe                  FIZ             DE
       6          Koninklijke Bibliotheek                            KB              NL
       7          KEEP Solutions                                   KEEPS             PT
       8          Microsoft Research                                MSR              UK
       9          Österreichische Nationalbibliothek                ONB              AT
      10          Open Planets Foundation                           OPF              UK
      11          Statsbiblioteket Aarhus                            SB              DK
      12          Science and Technology Facilities Council         STFC             UK
      13          Technische Universität Berlin                     TUB              DE
      14          Technische Universität Wien                      TUW               AT
      15          University of Manchester                        UNIMAN             UK
      16          Pierre & Marie Curie Université Paris 6          UPMC              FR
SCAPE
                                SCAPE Project Overview
SCAPE will enhance the state of the art in digital preservation in three ways:
• Infrastructure and tools for scalable preservation actions
• A framework for automated, quality-assured preservation workflows
• Integration of these components with policy-based automated
preservation planning and watch                                             Takeup

                                                                                 Stakeholders
                                                                                 Communities
                                                                                 Dissemination
                                                                               Training Activities
                                                                                 Sustainability
SCAPE results will be validated in three large-scale testbeds:
• Digital Repositories                                                            Testbeds
• Web Content                                                                      Corpora
                                                                                 Integration
• Research Data Sets                                                            Benchmarking
                                                                                  Validation



The SCAPE Consortium brings together                                                                   Cross-project Activities
                                                                                                          Project Management
a broad spectrum of expertise from                                                 Platform
                                                                                                         Technical Coordination
                                                                                                           Research Roadmap

• Memory institutions                                                            Automation
                                                                                 Workflows
• Data centres                                        Planning and Watch        Parallelization          Preservation
                                                                                                         Components
                                                                                Virtualization
• Research labs                                                                                        Quality Assurance
                                                      Institutional Policies                         Scalable Components
• Universities                                          Technical Watch
                                                      Automated Planning
                                                                                                      Automation-ready
                                                                                                             Tools
• Industrial firms

                                                                                                                                  7
SCAPE
              Selected SCAPE Testbed Scenarios
• Characterise large video files
   •   The master MPEG2 files are so large that it is difficult to apply JHOVE and
       insufficient detail is provided. A detailed characterisation of the MPEG2 streams
       is needed in order to identify technical dependencies for extracting from or
       rendering the MPEG2 stream. This would enable preservation risks related to
       current access services to be monitored and action taken as necessary to ensure
       continued access and preservation.

• Carry out large scale migrations
   •   Migrating from one format to another introduces the possibility of damaging the
       content or failing to capture significant properties of the original in the resulting
       destination format.
   •   Specific requirements include:
         • Solution tools that operate reliably at scale (80TB, 2 million pages)
         • Automated QA, ideally with no manual intervention on a file by file basis
         • QA performed by independent process from the migration process                      from digitalbevaring.dk

         • QA demonstrates strong evidence of significant properties being captured
              in the destination format

• Quality assurance in web harvesting
   •   For large scale crawls, automation of the quality control processes is a necessary
       requirement. Currently, this process relies on random sampling and very basic
       quantitative checks.                                                                                              8
SCAPE
                Selected SCAPE Challenges
• Bridging the gap between test workflows and
  scalable workflows
• Applying Map/Reduce to binary data
• Locality of data
    • Bring the data to the computation, or
      bring the computation to the data?
• Repository Integration
    • Repository Consistency
    • Scalable Ingest
• Preservation Planning
    • How to scale?
    • How to automate?
• Research data sets                            from digitalbevaring.dk


    • How to preserve contextual information?
                                                                          9
SCAPE
                    SCAPE Solutions

• SCAPE Platform
  • HADOOP, Stratosphere
  • Virtualized cluster
  • Repository integration
     • HBASE, HDFS - Fedora
  • Three levels of parallelization    from digitalbevaring.dk



     • Distribution of files
     • Splitting binary files
     • Parallelisation of algorithms
  • Mapping Taverna to HADOOP

                                                                 10
SCAPE
                   SCAPE Solutions

• Automated Planning and Watch
  • Building on the Planets PLATO tool
  • Automated watch based on
     • Results Evaluation Framework (REF) database
     • Monitoring trends in web harvests
  • Automated planning based on semantically
    formalized policies
• Automated Quality Assurance
  • QA in web harvesting through automated comparison of
    rendered pages – combined structural and image analysis

                                                              11
SCAPE
                       SCAPE Achievements
• Public Website
    • http://www.scape-project.eu/
• Development Infrastructure
    • Hosted by the Open Planets Foundation and GitHub
    • Development Wiki
        • http://wiki.opf-labs.org/display/SP/Home
• Deliverables
    • First Deliverables available for download
• Publications
    • 13 in the first nine months, including 6 at iPres next week
    • Report: comparative analysis of identification tools
• Platform
    • 10-node, 20 TB experimental cluster hosted by AIT

                                                                       12
SCAPE
           SCAPE Contact Information

• http://www.scape-project.eu/

• office@list.scape-project.eu

• Dr. Ross King
  AIT Austrian Institute of Technology GmbH
  Donau-City-Strasse 1
  A-1220 Wien


                                                 13
SCAPE



Thank you for your attention!




                                   14

Weitere ähnliche Inhalte

Andere mochten auch

Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPESCAPE Project
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...SCAPE Project
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...SCAPE Project
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonSCAPE Project
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...SCAPE Project
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationSCAPE Project
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014SCAPE Project
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalitySCAPE Project
 
SCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and DeploymentSCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and DeploymentSCAPE Project
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000SCAPE Project
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationSCAPE Project
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation SCAPE Project
 
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectSCAPE Project
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsSCAPE Project
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...SCAPE Project
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Project
 
Evolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital PreservationEvolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital PreservationSCAPE Project
 

Andere mochten auch (20)

Historical Development of Photogrammetry
Historical Development of PhotogrammetryHistorical Development of Photogrammetry
Historical Development of Photogrammetry
 
Digital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPEDigital Preservation Policies - SCAPE
Digital Preservation Policies - SCAPE
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...Characterisation - 101. An introduction to the identification and characteris...
Characterisation - 101. An introduction to the identification and characteris...
 
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
Scape information day at BL - Using Jpylyzer and Schematron for validating JP...
 
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thonTaverna and myExperiment. SCAPE presentation at a Hack-a-thon
Taverna and myExperiment. SCAPE presentation at a Hack-a-thon
 
Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...Matchbox tool. Quality control for digital collections – SCAPE Training event...
Matchbox tool. Quality control for digital collections – SCAPE Training event...
 
Planets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservationPlanets, OPF & SCAPE - presentation of tools on digital preservation
Planets, OPF & SCAPE - presentation of tools on digital preservation
 
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
Migration of audio files using Hadoop, SCAPE Information Day, 25 June 2014
 
TAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionalityTAVERNA Components - Semantically annotated and sharable units of functionality
TAVERNA Components - Semantically annotated and sharable units of functionality
 
SCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and DeploymentSCAPE Preservation Platform. Design and Deployment
SCAPE Preservation Platform. Design and Deployment
 
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000PDF/A-3 for preservation. Notes on embedded files and JPEG2000
PDF/A-3 for preservation. Notes on embedded files and JPEG2000
 
Audio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlationAudio Quality Assurance. An application of cross correlation
Audio Quality Assurance. An application of cross correlation
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE projectJpylyzer, a validation and feature extraction tool developed in SCAPE project
Jpylyzer, a validation and feature extraction tool developed in SCAPE project
 
Duplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collectionsDuplicate detection for quality assurance of document image collections
Duplicate detection for quality assurance of document image collections
 
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
Integrating the Fedora based DOMS repository with Hadoop, SCAPE Information D...
 
SCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with HadoopSCAPE Information Day at BL - Large Scale Processing with Hadoop
SCAPE Information Day at BL - Large Scale Processing with Hadoop
 
Historical Development of Photogrammetry
Historical Development of PhotogrammetryHistorical Development of Photogrammetry
Historical Development of Photogrammetry
 
Evolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital PreservationEvolving Domains, Problems and Solutions for Long Term Digital Preservation
Evolving Domains, Problems and Solutions for Long Term Digital Preservation
 

Ähnlich wie SCAPE - Building Digital Preservation Infrastructure

SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE Project
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsRafael C. Jimenez
 
NordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeICNordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeICNordForsk
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubBjörn Backeberg
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutGemeente Almere
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Sarah Anna Stewart
 
Spin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinalSpin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinalcrebusproject
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciencesterradue
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver
 
The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...MarieThrseCulligan
 
The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...MarieThrseCulligan
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?Carole Goble
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Archiver
 
Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...tulipbiru64
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...EUDAT
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and CeremonyArchiver
 

Ähnlich wie SCAPE - Building Digital Preservation Infrastructure (20)

SCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation EnvironmentsSCAPE - Scalable Preservation Environments
SCAPE - Scalable Preservation Environments
 
Summary of Technical Coordinators discussions
Summary of Technical Coordinators discussionsSummary of Technical Coordinators discussions
Summary of Technical Coordinators discussions
 
E Infrastructure for OA
E Infrastructure for OAE Infrastructure for OA
E Infrastructure for OA
 
NordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeICNordForsk Open Access Reykjavik 14-15/8-2014:NeIC
NordForsk Open Access Reykjavik 14-15/8-2014:NeIC
 
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hubCloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
Cloud Computing Needs for Earth Observation Data Analysis: EGI and EOSC-hub
 
Session 36 - Engage Results
Session 36 - Engage ResultsSession 36 - Engage Results
Session 36 - Engage Results
 
Presentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handoutPresentation arsip nov 2012 frans smit handout
Presentation arsip nov 2012 frans smit handout
 
Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...Research Data (and Software) Management at Imperial: (Everything you need to ...
Research Data (and Software) Management at Imperial: (Everything you need to ...
 
E-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government ArchivesE-ARK: Open Data Mining for Government Archives
E-ARK: Open Data Mining for Government Archives
 
RDM Programme at University of Edinburgh
RDM Programme at University of EdinburghRDM Programme at University of Edinburgh
RDM Programme at University of Edinburgh
 
Spin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinalSpin off-ie at-10yearsmodfinal
Spin off-ie at-10yearsmodfinal
 
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth SciencesValues & Vision - Cloud Sandboxes for BIG Earth Sciences
Values & Vision - Cloud Sandboxes for BIG Earth Sciences
 
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing ServicesArchiver at CS3 - Cloud Storage Synchronization and Sharing Services
Archiver at CS3 - Cloud Storage Synchronization and Sharing Services
 
The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...The Irish Centre for High End Computing and IBM - The role of advanced comput...
The Irish Centre for High End Computing and IBM - The role of advanced comput...
 
The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...The Irish Centre for High End Computing and IBM: The role of advanced computi...
The Irish Centre for High End Computing and IBM: The role of advanced computi...
 
The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?The European Open Science Cloud: just what is it?
The European Open Science Cloud: just what is it?
 
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
Hybrid Cloud storage deployment models: ARCHIVER presentation at the CS3 Work...
 
Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...Repository : A Brief Comparative Study Between The National University Of Mal...
Repository : A Brief Comparative Study Between The National University Of Mal...
 
Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...Linking service capabilities to data stweardship competences for professional...
Linking service capabilities to data stweardship competences for professional...
 
Design phase kick-off event and Ceremony
Design phase kick-off event and CeremonyDesign phase kick-off event and Ceremony
Design phase kick-off event and Ceremony
 

Mehr von SCAPE Project

SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Project
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Project
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Project
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Project
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE Project
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...SCAPE Project
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...SCAPE Project
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsSCAPE Project
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbSCAPE Project
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3POSCAPE Project
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulationSCAPE Project
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusSCAPE Project
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsSCAPE Project
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE Project
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation WatchSCAPE Project
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPESCAPE Project
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...SCAPE Project
 

Mehr von SCAPE Project (18)

C sz z6
C sz z6C sz z6
C sz z6
 
SCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with NaniteSCAPE Information Day at BL - Characterising content in web archives with Nanite
SCAPE Information Day at BL - Characterising content in web archives with Nanite
 
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs AvailableSCAPE Information Day at BL - Some of the SCAPE Outputs Available
SCAPE Information Day at BL - Some of the SCAPE Outputs Available
 
SCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation ToolSCAPE Information day at BL - Flint, a Format and File Validation Tool
SCAPE Information day at BL - Flint, a Format and File Validation Tool
 
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositoriesSCAPE Webinar: Tools for uncovering preservation risks in large repositories
SCAPE Webinar: Tools for uncovering preservation risks in large repositories
 
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
SCAPE – Scalable Preservation Environments, SCAPE Information Day, 25 June 20...
 
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
Policy driven validation of JPEG 2000 files based on Jpylyzer, SCAPE Informat...
 
Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...Hadoop and its applications at the State and University Library, SCAPE Inform...
Hadoop and its applications at the State and University Library, SCAPE Inform...
 
Scape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation EnvironmentsScape project presentation - Scalable Preservation Environments
Scape project presentation - Scalable Preservation Environments
 
LIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven SchlarbLIBER Satellite Event, SCAPE by Sven Schlarb
LIBER Satellite Event, SCAPE by Sven Schlarb
 
Content profiling and C3PO
Content profiling and C3POContent profiling and C3PO
Content profiling and C3PO
 
Control policy formulation
Control policy formulationControl policy formulation
Control policy formulation
 
Preservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, AarhusPreservation Policy in SCAPE - Training, Aarhus
Preservation Policy in SCAPE - Training, Aarhus
 
An image based approach for content analysis in document collections
An image based approach for content analysis in document collectionsAn image based approach for content analysis in document collections
An image based approach for content analysis in document collections
 
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
SCAPE - Skalierbare Langzeitarchivierung (SCAPE - scalable longterm digital p...
 
Automatic Preservation Watch
Automatic Preservation WatchAutomatic Preservation Watch
Automatic Preservation Watch
 
Policy levels in SCAPE
Policy levels in SCAPEPolicy levels in SCAPE
Policy levels in SCAPE
 
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
Large scale preservation workflows with Taverna – SCAPE Training event, Guima...
 

Kürzlich hochgeladen

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Kürzlich hochgeladen (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

SCAPE - Building Digital Preservation Infrastructure

  • 1. SCAPE SCAPE Building Digital Preservation Infrastructure Dr. Ross King AIT Austrian Institute of Technology GmbH eSciDoc Days Berlin, October 27, 2011
  • 2. SCAPE Digital Preservation • For the first time, the rate of increase of information creation is beginning to exceed the rate of increase in storage capacity. • This massive volume of digital material raises a number of issues: • What is worth preserving? • How to preserve so much? • How to access preserved data? • How to create incentives to preserve? http://arstechnica.com/business/consumerization-of-it/2011/09/information-explosion-how-rapidly-expanding-storage-spurs-innovation.ars 07.11.2011 2
  • 3. SCAPE Digital Preservation • Standards, best-practices, and technologies utilized in order to ensure access to digital information over time • How long? “Digital documents last forever – or five years, whichever comes first.” http://www.clir.org/pubs/reports/rothenberg/introduction.html • Generally we mean decades or centuries 07.11.2011 3
  • 4. SCAPE SCAPE – what is it about? • Planning and managing computing-intensive (digital) preservation processes such as the large-scale ingestion or migration of large (multi-Terabyte) data sets SCAPE is a follow-up to the highly successful FP6 IP Planets.
  • 5. SCAPE SCAPE Project Data • Project instrument: FP7 Integrated Project • 6. Call • Objective ICT-2009.4.1: Digital Libraries and Digital Preservation • Target outcome (a) Scalable systems and services for preserving digital content • Duration: 42 months • February 2011 – July 2014 • Budget: 11.3 Million Euro • Funded: 8.6 Million Euro
  • 6. SCAPE SCAPE Consortium Number Partner name Partner short name Country 1 (coordinator) AIT Austrian Institute of Technology GmbH AIT AT 2 British Library BL UK 3 Internet Memory Foundation IMF NL 4 Ex Libris Ltd EXL IL 5 Fachinformationszentrum Karlsruhe FIZ DE 6 Koninklijke Bibliotheek KB NL 7 KEEP Solutions KEEPS PT 8 Microsoft Research MSR UK 9 Österreichische Nationalbibliothek ONB AT 10 Open Planets Foundation OPF UK 11 Statsbiblioteket Aarhus SB DK 12 Science and Technology Facilities Council STFC UK 13 Technische Universität Berlin TUB DE 14 Technische Universität Wien TUW AT 15 University of Manchester UNIMAN UK 16 Pierre & Marie Curie Université Paris 6 UPMC FR
  • 7. SCAPE SCAPE Project Overview SCAPE will enhance the state of the art in digital preservation in three ways: • Infrastructure and tools for scalable preservation actions • A framework for automated, quality-assured preservation workflows • Integration of these components with policy-based automated preservation planning and watch Takeup Stakeholders Communities Dissemination Training Activities Sustainability SCAPE results will be validated in three large-scale testbeds: • Digital Repositories Testbeds • Web Content Corpora Integration • Research Data Sets Benchmarking Validation The SCAPE Consortium brings together Cross-project Activities Project Management a broad spectrum of expertise from Platform Technical Coordination Research Roadmap • Memory institutions Automation Workflows • Data centres Planning and Watch Parallelization Preservation Components Virtualization • Research labs Quality Assurance Institutional Policies Scalable Components • Universities Technical Watch Automated Planning Automation-ready Tools • Industrial firms 7
  • 8. SCAPE Selected SCAPE Testbed Scenarios • Characterise large video files • The master MPEG2 files are so large that it is difficult to apply JHOVE and insufficient detail is provided. A detailed characterisation of the MPEG2 streams is needed in order to identify technical dependencies for extracting from or rendering the MPEG2 stream. This would enable preservation risks related to current access services to be monitored and action taken as necessary to ensure continued access and preservation. • Carry out large scale migrations • Migrating from one format to another introduces the possibility of damaging the content or failing to capture significant properties of the original in the resulting destination format. • Specific requirements include: • Solution tools that operate reliably at scale (80TB, 2 million pages) • Automated QA, ideally with no manual intervention on a file by file basis • QA performed by independent process from the migration process from digitalbevaring.dk • QA demonstrates strong evidence of significant properties being captured in the destination format • Quality assurance in web harvesting • For large scale crawls, automation of the quality control processes is a necessary requirement. Currently, this process relies on random sampling and very basic quantitative checks. 8
  • 9. SCAPE Selected SCAPE Challenges • Bridging the gap between test workflows and scalable workflows • Applying Map/Reduce to binary data • Locality of data • Bring the data to the computation, or bring the computation to the data? • Repository Integration • Repository Consistency • Scalable Ingest • Preservation Planning • How to scale? • How to automate? • Research data sets from digitalbevaring.dk • How to preserve contextual information? 9
  • 10. SCAPE SCAPE Solutions • SCAPE Platform • HADOOP, Stratosphere • Virtualized cluster • Repository integration • HBASE, HDFS - Fedora • Three levels of parallelization from digitalbevaring.dk • Distribution of files • Splitting binary files • Parallelisation of algorithms • Mapping Taverna to HADOOP 10
  • 11. SCAPE SCAPE Solutions • Automated Planning and Watch • Building on the Planets PLATO tool • Automated watch based on • Results Evaluation Framework (REF) database • Monitoring trends in web harvests • Automated planning based on semantically formalized policies • Automated Quality Assurance • QA in web harvesting through automated comparison of rendered pages – combined structural and image analysis 11
  • 12. SCAPE SCAPE Achievements • Public Website • http://www.scape-project.eu/ • Development Infrastructure • Hosted by the Open Planets Foundation and GitHub • Development Wiki • http://wiki.opf-labs.org/display/SP/Home • Deliverables • First Deliverables available for download • Publications • 13 in the first nine months, including 6 at iPres next week • Report: comparative analysis of identification tools • Platform • 10-node, 20 TB experimental cluster hosted by AIT 12
  • 13. SCAPE SCAPE Contact Information • http://www.scape-project.eu/ • office@list.scape-project.eu • Dr. Ross King AIT Austrian Institute of Technology GmbH Donau-City-Strasse 1 A-1220 Wien 13
  • 14. SCAPE Thank you for your attention! 14