Project Chrysalis – Transforming the Digital Business of the National Archives of Australia. Zoe D’Arcy

•Als PPTX, PDF herunterladen•

1 gefällt mir•435 views

12th International Conference on Digital Preservation (iPRES 2015)

The National Archives of Australia faces challenges in managing digital records at scale, including multiple formats, proprietary formats, metadata extraction, storage, and access. The project "Chrysalis" aims to transform the digital business of the Archives by designing systems for complexity and scale through automation, machine learning, and standardization. The project will also establish an "Archives Point of Presence" within agencies to facilitate record transfers and access in an iterative process involving industry and whole-of-government engagement.

Präsentationen & Vorträge

naa.gov.au
Project Chrysalis –
Transforming the Digital
Business of the National
Archives of Australia
Zoë D’Arcy
Director Business Systems and Online Services
National Archives of Australia

Challenges around ‘Digital’
Prepare
Multiple Formats
Structured and
Unstructured
Static and Dynamic
For humans and for
computers
Size of each item and
volume
Ingest
Proprietary Formats
Transfer
Reliability of Storage
Metadata Extraction
and Generation
Security Threats
(Viruses)
Authentication and
Access Control
Preserve
Normalising and
Standardising Formats
Licensing for
Commercial Formats
Storage of Structure
Data
Items that link to
external services
Capturing behaviour
as well as structure
Manage
Storage of Resources
that can be arbitrarily
copied and moved
electronically
Dynamic content
could ‘change itself’
Dependency
management
Copyright and
Intellectual Property
Access
Classification
Copying
Distribution
Size
Decisions around
access
Authentication and
Access Control
Validating Generated
Data and Metadata
naa.gov.au

From Archives to the Consumer
naa.gov.au

$naa.gov.au Designing for Complexity and Scale Metadata Pyramid Automation 10-12 fields Descriptive 100's of fields Content (everything) ~1,000's of fields <xml /> Json {} Key=Value DDL Binary Standardised_Key=Value UML$

naa.gov.au
Designing for Complexity and Scale
Automation of Processes
Preservation Workflow

Broad Selection
Fine Grained
Filtering
Manual
naa.gov.au
Designing for Complexity and Scale
Use of
Machine Learning
RNA
Records Holding Systems TRIM Intranet Folders Systems …
Selection of Records
for transfer to the
National Archives of Australia

naa.gov.au
Option 1: Archives Point of Presence

naa.gov.au
Option 2: Industry and WoG Engagement

Iteration 1
Proof of
concept/
prototype
Iteration 2
Transfers
and ingest
capability
Iteration 3
Access
capability
Iteration 4
Move to
business
as usual
Getting There
naa.gov.au

Weitere ähnliche Inhalte

Was ist angesagt?

Giovane Moura - Cybersecurity voor .nl

Michiel Cazemier

Cloud computing helps enterprises transform business and technology. Companies have begun to look for solutions that would help reduce their infrastructures costs and improve profitability. Cloud computing is becoming a foundation for benefits well beyond IT cost savings. Yet, many business leaders are concerned about cloud security, privacy, availability, and data protection. To discuss and address these issues, we invite researches who focus on cloud computing to shed more light on this emerging field. This peer-reviewed open access Journal aims to bring together researchers and practitioners in all security aspects of cloud-centric and outsourced computing, including (but not limited to):

International Journal on Cloud Computing: Services and Architecture (IJCCSA)

ijccsa

This presentation, the first of six parts on the practical analysis of significant properties of digital objects, introduces the concepts. The topic concerns the characteristics of digital objects that must be preserved over time in order to ensure the continued accessibility, usability, and meaning of the objects. The presentation was given as part of module 3 of a 5-module course on digital preservation tools for repository managers, presented by the JISC KeepIt project. For more on this and other presentations in this course look for the tag 'KeepIt course' in the project blog http://blogs.ecs.soton.ac.uk/keepit/

Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...

JISC KeepIt project

11th International conference on Database Management Systems (DMS 2020) will provide an excellent international forum for sharing knowledge and results in theory, methodology and applications of Database Management Systems. The goal of this conference is to bring together researchers and practitioners from academia and industry to focus on understanding Modern developments in this field and establishing new collaborations in these areas.

Dms 2020

dannyijwest

The International Journal of Database Management Systems (IJDMS) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the database management systems & its applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding Modern developments in this filed and establishing new collaborations in these areas.

International Journal of Database Management Systems (IJDMS)

ijdms

11th International conference on Database Management Systems (DMS 2020)

dannyijwest

Data is treated truly as an asset at Guardian Life. We have created a Data Services Marketplace which contains valuable data from the underlying sources and is used by business users for day-to-day operations. In this presentation, you will see how Data Virtualization can be used to support the marketplace with real-time data services, provision non real-time data into Hadoop, and swap underlying sources without effecting business users. This presentation is part of the Fast Data Strategy Conference, and you can watch the video here goo.gl/PZ2uFj.

Supporting Data Services Marketplace using Data Virtualization

Denodo

Watch full webinar here: https://bit.ly/3aIofv9 The best of breed big data fabrics should deliver actionable insights to the business users with minimal effort, provide end-to-end security to the entire enterprise data platform and provide real-time data integration, while delivering self-service data platform to business users. While big data initiatives have become necessary for any business to generate actionable insights, big data fabric has become a necessity for any successful big data initiative. The best of breed big data fabrics should deliver actionable insights to the business users with minimal effort, provide end-to-end security to the entire enterprise data platform and provide real-time data integration, while delivering self-service data platform to business users. Attend this session to learn how big data fabric enabled by data virtualization: - Provides lightning fast self-service data access to business users - Centralizes data security, governance and data privacy - Fulfills the promise of data lakes to provide actionable insights

Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)

Denodo

International Journal of Data Mining Systems & Applications (IJDSA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Data Mining Systems & Applications . The journal focuses on all technical and practical aspects of Database Management Systems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advancedData Mining Systems & Applications and establishing new collaborations in these areas.

Ijdbms

ijfcst journal

Ijdbms

ijfcst journal

Watch full webinar here: https://bit.ly/2KkJ08B Financial institutions need to implement new strategies and services that will drive them securely to their digital objectives over their entire infrastructure. - How to securely move legacy systems and data to new technologies such as the Big Data and Cloud? - How to break down silos and ensure a global, centralized, secure and agile access to meaningful data? - How to facilitate data sharing while applying strict and coherent governance and security rules? - How to avoid downtime and to guarantee the success of IT initiaves while optimizing costs and resources? - How to produce and to maintain efficient reports and financial aggregations for the holdings and CxO managers? We are pleased to invite you to this online session to discover how data virtualization can answer these questions and contribute to the digital transformation of financial institutions. WHAT IS IT ABOUT? This virtual event will be organized in two parts. First, we will conduct a conference focusing on the impact of digital transformation in the financial sector, in addition to the general concepts of Data Virtualization and how it has supported the new business goals of financial companies in terms of IT modernization, risk management, governance and security. Then, we will conduct will conduct a hands-on session with a guided live demo to help you discover the main features and benefits of Denodo Platform for Data Virtualization.

How Financial Institutions Are Leveraging Data Virtualization to Overcome the...

Denodo

iCloudxchange Brochure

iCloud Inc.

International Journal of Data Mining Systems & Applications (IJDSA) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of Data Mining Systems & Applications . The journal focuses on all technical and practical aspects of Database Management Systems. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on advanced Data Mining Systems & Applications and establishing new collaborations in these areas.

Ijdbms

ijfcst journal

What is cloud data management

Solix Technologies, Inc

Alitora Innovation Networks

alitora

Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...

Denodo

The International Journal of Database Management Systems (IJDMS) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the database management systems & its applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding Modern developments in this field, and establishing new collaborations in these areas.

International Journal of Database Management Systems (IJDBMS)

ijfcst journal

Kazoup Solution Overview

Kazoup

The International Journal of Database Management Systems (IJDMS) is a bi monthly open access peer-reviewed journal that publishes articles which contribute new results in all areas of the database management systems & its applications. The goal of this journal is to bring together researchers and practitioners from academia and industry to focus on understanding Modern developments in this field, and establishing new collaborations in these areas.

International Journal of Database Management Systems (IJDBMS)

ijfcst journal

"Big data" is a broad term that encompasses a wide range of data and contents. Big data offers new approaches to analysis and decision making. At first glance big data and IP may seem to be opposites, but have more in common than one may think. This talk will focus on how big data will impact, and be impacted, by IP. One of the biggest promises in big data is the possibility to re-use data produced via different sources, create new services or predict the future, via the analysis of correlations. In this context, how can companies protect information assets and analytical skills? What are the new skills required to search and analyze in real time a big amount of datasets ? Big data will change not only patents information, but will also generate new types of patents.

Big Data: Big Issues for IP

Dr. Haxel Consult

Was ist angesagt? (20)

Giovane Moura - Cybersecurity voor .nl

International Journal on Cloud Computing: Services and Architecture (IJCCSA)

Introducing Significant Properties (SPs part 1), by Stephen Grace and Gareth ...

Dms 2020

International Journal of Database Management Systems (IJDMS)

11th International conference on Database Management Systems (DMS 2020)

Supporting Data Services Marketplace using Data Virtualization

Data Virtualization enabled Data Fabric: Operationalize the Data Lake (APAC)

Ijdbms

How Financial Institutions Are Leveraging Data Virtualization to Overcome the...

iCloudxchange Brochure

Ijdbms

What is cloud data management

Alitora Innovation Networks

Why a Data Services Marketplace is Critical for a Successful Data-Driven Ente...

International Journal of Database Management Systems (IJDBMS)

Kazoup Solution Overview

International Journal of Database Management Systems (IJDBMS)

Big Data: Big Issues for IP

Andere mochten auch

James & Elena´s Family book Parte2

kermotsura

2015.11.30 About EBTC by Joerg Uehlin

CleanTechLatvia

B ed online assignment

athulyachandran

2015.11.30 Vega1 Ltd. by Janis Zviedris

CleanTechLatvia

Civics 8 Parent Night v07

bsurkan

Widia andiani wijaya (1201145103)

widyaandiani

Questionaire english

yeso126

Moo cs

Rethabile Machaba

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: An increasing amount of scientific work is performed in silico, such that the entire process of investigation, from experiment to publication, is performed by computer. Unfortunately, this has made the problem of scientific reproducibility even harder, due to the complexity and imprecision of specifying and recreating the computing environments needed to run a given piece of software. Here, we consider from a high level what techniques and technologies must be put in place to allow for the accurate preservation of the execution of software. We assume that there exists a suitable digital archive for storing digital objects; what is missing are frameworks for precisely specifying, assembling, and executing software with all of its dependencies. We discuss the fundamental problems of managing implicit dependencies and outline two broad approaches: preserving the mess, and encouraging cleanliness. We introduce three prototype tools for preserving software executions: Parrot, Umbrella, and Prune.

Techniques for Preserving Scientific Software Executions: Preserve the Mess o...

12th International Conference on Digital Preservation (iPRES 2015)

From oc_filecache to a flexible and scalable OC namespace

Hugo González Labrador

Amitie noeud solide, by Lonzer Zilerion

coachvalery

Cos'è l'introversione. Dal pregiudizio alla conoscenza

LIDI

مواصفات وزارة الاشغال

Laith Abdel Nabi

Bringing collaborative test to life an example of community effort

Comarch

Raport koncowy tu 154m

Janusz Kaliszczak

Immune Support by Ellen Kamhi PhD RN, The Natural Nurse

Ellen Kamhi, PhD, RN, AHG, AHN-BC

This is the presentation from our panel on the Future of Connected Objects @ Digital Shoreditch 2015. The panel was hosted by Daniel Fogg, and featured Ross Atkin (Ross Atkin Associates), Jessi Baker (Provenance) and Dan Harvey (Sapient Nitro). Using the Reiss Profile as a framework to understand human motivations, we came up with six fun, potential connected objects that could arise in the next few decades.

The Future of Connected Objects @ Digital Shoreditch 2015

Graftt

Bai thu hoach dang loai gioi 2016

Thích Hô Hấp

Komunikasi Data by selamet hariadi

Selamet Hariadi

Tabela diametro-furo-para-fazer-rosca

Braga2013

Andere mochten auch (20)

James & Elena´s Family book Parte2

2015.11.30 About EBTC by Joerg Uehlin

B ed online assignment

2015.11.30 Vega1 Ltd. by Janis Zviedris

Civics 8 Parent Night v07

Widia andiani wijaya (1201145103)

Questionaire english

Moo cs

Techniques for Preserving Scientific Software Executions: Preserve the Mess o...

From oc_filecache to a flexible and scalable OC namespace

Amitie noeud solide, by Lonzer Zilerion

Cos'è l'introversione. Dal pregiudizio alla conoscenza

مواصفات وزارة الاشغال

Bringing collaborative test to life an example of community effort

Raport koncowy tu 154m

Immune Support by Ellen Kamhi PhD RN, The Natural Nurse

The Future of Connected Objects @ Digital Shoreditch 2015

Bai thu hoach dang loai gioi 2016

Komunikasi Data by selamet hariadi

Tabela diametro-furo-para-fazer-rosca

Ähnlich wie Project Chrysalis – Transforming the Digital Business of the National Archives of Australia. Zoe D’Arcy

Information and Integration Management Vision

Colin Bell

Spca2014 navigating clouds sp_con14_mackie

NCCOMMS

GDPR Part 3: Practical Quest

Adrian Dumitrescu

We will dive into modern data management approaches that have become prevalent and popular across many industries, built on top of good old data lakes: Lakehouse. Here are some of the most common problems that are being solved with this novel approach: Data Silos Demolished: Discover how organizations are breaking down data silos that have plagued them for decades, unifying structured and unstructured data from diverse sources. Inefficient Data Processing: We'll unveil real-world examples of how inefficient data processing can grind productivity to a halt and explore how Data Lakehouses provide a powerful solution while improving governance and security. Real-time Analytics: Learn how modern businesses are striving to achieve real-time analytics and the role Data Lakehouses play in achieving this. Have one data copy that will serve BI, Reporting, and ML workloads

[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...

DataScienceConferenc1

While many enterprises consider cloud computing the savior of their data strategy, there is a process they should be following when looking to leveraging database-as-a-service. This includes understanding their own data requirements, selecting the right cloud computing candidate, and then planning for the migration and operations. A huge number of issues and obstacles will inevitably arise, but fortunately best practices are emerging. This presentation will take you through the process of moving data to cloud computing providers.

Cloud Data Integration Best Practices

Darren Cunningham

Windows Azure Platform

David Chou

2011 IIA Pittsburgh Grant Thornton LLP Presentation (Nov 2011)

Danny Miller

Kelly IT Resources Service Offerings

staciemarotta

Overview of GovCloud Today

GovCloud Network

Slides from Joe Caserta's Keynote at MIT CDOIQ Symposium 2018 As we continue to shift into a data-driven digital society, it’s crucial to ensure a cohesive strategy between the chief data officer and chief digital officer. In this talk, Joe Caserta will discuss the convergence between data and digital, addressing the interdependencies, ambiguities, and complications between the two. Joe will outline a cohesive strategy to enhance enterprise operations and improve your bottom line.

The Convergence of Data & Digital: Mapping Out a Cohesive Strategy for Maximu...

Remy Rosenbaum

5-16-13 Using the DuraCloud Service to archive content in Glacier Presentatio...

DuraSpace

Microsoft Cloud Computing

David Chou

Keynote Theatre. Keynote Day 2. 16:30 Evelyn de Souza

CloudExpoAsia

Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation

Tieu Luu

Managed Cloud Services CIO Conference Oil Gas

Jeff Holden

The Carlyle Group, one of the world’s largest private equity firm and an organization with strict security and private requirements, sought a cloud-enabled solution that would replace legacy remote office storage infrastructure and provide fast local file access for users. In choosing CTERA and AWS, Carlyle now leverages a single platform for NAS home directory and network file share access at regional offices; file-based collaboration for mobile users around the world. CTERA and AWS represented a solution to solve Carlyle’s business challenges, meet its IT requirements, and conform to its IT security mandate. Join us to learn how CTERA and AWS helped the Carlyle Group leverage a single platform for NAS home directory and network file share access at regional offices as well as cloud-based file collaboration for roaming users around the world.

The Carlyle Group Modernizes File Services with CTERA and AWS

Amazon Web Services

MARLABS - Cloud services CIO Conference

Marlabs

Novel cloud computingsecurity issues

Joo Manthar

Blbs prod-bloombase-store safe-product-brochure-uslet-en-r3

Bloombase

Data Science

Prakhyath Rai

Ähnlich wie Project Chrysalis – Transforming the Digital Business of the National Archives of Australia. Zoe D’Arcy (20)

Information and Integration Management Vision

Spca2014 navigating clouds sp_con14_mackie

GDPR Part 3: Practical Quest

[DSC Europe 23] Milos Solujic - Data Lakehouse Revolutionizing Data Managemen...

Cloud Data Integration Best Practices

Windows Azure Platform

2011 IIA Pittsburgh Grant Thornton LLP Presentation (Nov 2011)

Kelly IT Resources Service Offerings

Overview of GovCloud Today

The Convergence of Data & Digital: Mapping Out a Cohesive Strategy for Maximu...

5-16-13 Using the DuraCloud Service to archive content in Glacier Presentatio...

Microsoft Cloud Computing

Keynote Theatre. Keynote Day 2. 16:30 Evelyn de Souza

Issues with Ingesting/Staging/Analyzing Data in ConMon Implementation

Managed Cloud Services CIO Conference Oil Gas

The Carlyle Group Modernizes File Services with CTERA and AWS

MARLABS - Cloud services CIO Conference

Novel cloud computingsecurity issues

Blbs prod-bloombase-store safe-product-brochure-uslet-en-r3

Data Science

Mehr von 12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: Researchers in information science are placing increased attention on data reuse and on what must be preserved with that data to enable meaningful use by scholars within and across disciplines. Although the focus has been on scientific or quantitative data, this paper expands the discussion to qualitative data – specifically digital video records of practice in the field of education. This is an interesting case because researchers and diverse education professionals are interested in reusing this content, though their needs differ. We focus on three issues that raise challenges for preservation and access: file format, context, and dissemination.

Educational Records of Practice: Preservation and Access Concerns. Elizabeth ...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: To develop a comprehensive digital preservation program for maintaining long-term access to the Libraries’ digital assets and align our practices with national standards and guidelines, the University of Houston (UH) Libraries formed the Digital Preservation Task Force (DPTF) to assess previous digital preservation practices and make recommendations on future efforts. This paper outlines the methodology used, including the task force’s use of existing models and evaluation criteria, to successfully generate new policies and select Archivematica as our system to process and preserve our digital assets. It concludes with recommended strategies for the implementation of the policies and preservation operations.

Preserving the Fruit of Our Labor: Establishing Digital Preservation Policies...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: The DataNet Federation Consortium uses a policy-based data management system to apply and enforce preservation requirements. This paper describes the Preservation Policy Toolkit developed by the consortium. In particular, the paper describes the infrastructure needed for preservation, presents examples of computer actionable forms of policies, and provides a generic template for designing actionable preservation policies.

DataNet Federation Consortium Preservation Policy Toolkit. Reagan Moore, Arco...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: Memory institutions have already collected a large number of digital objects, predominantly CD-ROMs. Some of them are already inaccessible with current systems, and most of them will be soon. Emulation offers a viable strategy for long-term access to these publications. However, these collections are huge and the objects are missing technical metadata to setup a suitable emulated environment. In this paper we propose a pragmatic approach to technical metadata which we use to implement a characterization tool to suggest a suitable emulated rendering environment.

Characterization of CDROMs for Emulation-based Access. Klaus Rechert, Thomas ...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: This paper describes some of the challenges the National Library of New Zealand has faced in our efforts to maintain the authenticity of born digital collection items from first transfer to the Library through ingest into our digital preservation system. We assume that assuring the authenticity and integrity of digital objects means preserving the binary objects plus metadata about the objects. We discuss the efforts and challenges of the Library to preserve contextual metadata around the binary object, in particular filenames and file dates. We discuss these efforts from the two perspectives of the digital archivist and the digital preservation analyst, and how these two perspectives inform our current thinking.

Beyond the Binary: Pre-Ingest Preservation of Metadata. Jessica Moran and Jay...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: The National Library of France (BnF) has recently implemented a new module for its Scalable Preservation and Archiving Repository (SPAR) to set up preservation strategies based on formats, agents, workflows, tools and tests, and managed as reference packages in the Archive. This module aims to fulfill an objective: for SPAR to be fully self-documented. Formats, agents and workflows are formally described and preserved along with the Information packages in which such elements are involved. Although this was a feature that was included from the beginnings of SPAR, the new Preservation Planning module aims to provide a tool that can more easily build these reference packages and that will more closely involve domain experts and the IT department in the processes of preservation planning. But the main innovation lies in the documentation of decisions that directed their selection as standards in SPAR: test data are now preserved as a new kind of reference package.

Experiment, Document & Decide: A Collaborative Approach to Preservation Plann...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: We describe a hybrid approach for access to digital objects contained within forensic disk images extracted from physical media. This approach includes the use of emulation-as-a-service (EaaS) to provide web-accessible virtual environments for materials that may not render or execute accuratelyon modern hardware and software, and the use of digital forensics software libraries to produce web-accessible file system views to support single-file access and provide visualizations of the file system.

Functional Access to Forensic Disk Images in a Web Service. Kam Woods, Christ...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: Web resources are increasingly interactive, resulting in resources that are increasingly difficult to archive. The archival difficulty is based on the use of client-side technologies (e.g., JavaScript) to change the client-side state of a representation after it has initially loaded. We refer to these representations as deferred representations. We can better archive deferred representations using tools like headless browsing clients. We use 10,000 seed Universal Resource Identifiers (URIs) to explore the impact of including PhantomJS – a headless browsing tool – into the crawling process by comparing the performance of wget (the baseline), PhantomJS, and Heritrix. Heritrix crawled 2.065 URIs per second, 12.15 times faster than PhantomJS and 2.4 times faster than wget. However, PhantomJS discovered 531,484 URIs, 1.75 times more than Heritrix and 4.11 times more than wget. To take advantage of the performance benefits of Heritrix and the URI discovery of PhantomJS, we recommend a tiered crawling strategy in which a classifier predicts whether a representation will be deferred or not, and only resources with deferred representations are crawled with PhantomJS while resources without deferred representations are crawled with Heritrix. We show that this approach is 5.2 times faster than using only PhantomJS and creates a frontier (set of URIs to be crawled) 1.8 times larger than using only Heritrix.

Archiving Deferred Representations Using a Two-Tiered Crawling Approach. Just...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: This paper describes how the E-ARK project (European Archival Records and Knowledge Preservation) aims to develop an overarching methodology for curating digital assets. This methodology must address business needs and operational issues, proposing a technical wall-to-wall reference implementation for the core OAIS flow – Ingest, Archival Storage and Access. The focal point of the paper is the Access part of the OAIS flow. The paper first lays out the access vision of the E-ARK project, and secondly describes the method employed to enable information processing and to pin-point the functional and non-functional requirements. These requirements will allow the E-ARK project to create a standardized format for the Dissemination Information Package (DIP), and to develop the access tools that will process this format. The paper then proceeds to describe the actual DIP format before detailing what the access solution will look like, which tools will be developed and, not least, why the E-ARK Access system will be used and work.

Towards a Common Approach for Access to Digital Archival Records in Europe. A...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: Creation and improvement of tools for digital preservation is a difficult task without an established way to assess any progress in their quality. This happens due to low presence of solid evidence and a lack of accessible approaches to create such evidence. Software benchmarking, as an empirical method, is used in various fields to provide objective evidence about the quality of software tools. However, digital preservation field is still missing a proper adoption of that method. This paper establishes a theory of benchmarking of tools in digital preservation as a solid method for gathering and sharing the evidence needed to achieve widespread improvements in tool quality. To this end, we discuss and synthesize literature and experience on the theory and practice of benchmarking as a method and define a conceptual framework for benchmarks in digital preservation. Four benchmarks that address different digital preservation scenarios are presented. We compare existing reports on tool evaluation and how they address the main components of benchmarking, and we discuss the question of whether the field possesses the right combination of social factors that make benchmarking a promising method at this point in time. The conclusions point to significant opportunities for collaborative benchmarks and systematic evidence sharing, but also several major challenges ahead.

Benchmarks for Digital Preservation tools. Kresimir Duretec, Artur Kulmukhame...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: In this paper, we describe an OAIS aligned data model and architectural design that enables us to archive digital information with a single core preservation workflow. The data model allows for normalization of metadata from widely varied domains to ingest and manage the submitted information utilizing only one generalized toolchain and be able to create access platforms that are tailored to designated data consumer communities. The design of the preservation system is not dependent on its components to continue to exist over its lifetime, as we anticipate changes both of technology and environment. The initial implementation depends mainly on the open-source tools Archivematica, Fedora/Islandora, and iRODS.

One Core Preservation System for all your Data. No Exceptions! Marco Klindt a...

12th International Conference on Digital Preservation (iPRES 2015)

Developing a Framework for File Format Migrations. Joey Heinen and Andrea Goe...

12th International Conference on Digital Preservation (iPRES 2015)

Paper presented at the 12th International Conference on Digital Preservation, November 2-6, 2015. University of North Carolina at Chapel Hill. Abstract: Digital curation is a complex of actors, policies, practices, and technologies enabling successful consumer engagement with authentic content of interest across space and time. While digital curation is a rapidly maturing field, it still lacks a convincing unified theoretical foundation. A recent internal evaluation of its programmatic activities by the University of California Curation Center (UC3) led quickly to seemingly simple, yet deceptively difficult-to-answer questions. Too many fundamental terms of curation practice remain overloaded and under-formalized, perhaps none more so than “digital object.” To address these concerns, UC3 is developing a new model for conceptualizing the curation domain. While drawing freely from many significant prior efforts (e.g., Kahn-Wilensky, FRBR, NAA, OAIS, BRM, etc.), the UC3 Sept model also assumes that digital curation is an inherently semiotic activity. Consequently, the model considers curated content with respect to six distinct analytic dimensions: semantics, syntactics, empirics, pragmatics, diplomatics, and dynamics, which refer respectively to content’s underlying abstract meaning or emotional affect, symbolic encoding structures, physical representations, realizing behaviors, evidential authenticity and reliability, and evolution through time. Correspondingly, the model defines an object typology of increasing consumer utility: blobs, artifacts, exemplars, products, assets, records, and heirlooms, which are respectively existential, intentional, purposeful, interpretable, useful, trustworthy, and resilient digital objects. Content engagement is modeled in terms of producer, owner, manager, and consumer roles acting within a continuum of concerns for originating, organizing, and pluralizing curated content. Content policy and strategy are modeled in terms of six high-level imperatives: predilect, collect, protect, introspect, project, and connect. A consistent, comprehensive, and conceptually parsimonious domain model is important for planning, performing, and evaluating programmatic activities in a rigorous and systematic rather than ad hoc and idiosyncratic manner. The UC3 Sept model can be used to make precise yet concise statements regarding curation intentions, activities, and results.

A Foundational Framework for Digital Curation: The Sept Domain Model. Stephen...

12th International Conference on Digital Preservation (iPRES 2015)

Mehr von 12th International Conference on Digital Preservation (iPRES 2015) (13)