SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Prepared for
                         PLN 2012
                       UNC, Chapel Hill
                        October 2012




          Auditing PLN’s:
Preliminary Results and Next Steps


                       Micah Altman,
             Director of Research, MIT Libraries
    Non Resident Senior Fellow, The Brookings Institution

                     Jonathan Crabtree,
   Assistant Director of Computing and Archival Research
   HW Odum Institute for Research in Social Science, UNC
Collaborators*


• Nancy McGovern
• Tom Lipkis & the LOCKSS Team

Research Support
  Thanks to the Library of Congress, the National Science
    Foundation, IMLS, the Sloan Foundation, the Harvard
    University Library, the Institute for Quantitative Social
    Science, and the Massachusetts Institute of Technology.




                                             * And co-conspirators

                           Auditing PLN's
Related Work
Reprints available from: micahaltman.com

• M. Altman, J. Crabtree, “Using the SafeArchive System: TRAC-
  Based Auditing of LOCKSS”, Proceedings of Archiving 2011, Society
  for Imaging Science and Technology.
• Altman, M., Beecher, B., & Crabtree, J. (2009). A Prototype Platform
  for Policy-Based Archival Replication. Against the Grain, 21(2), 44-
  47.




                               Auditing PLN's
Preview
• Why audit?
• Theory & Practice
  – Round 0: Setting up the Data-PASS PLN
  – Round 1: Self-Audit
  – Round 2: Compliance (almost)
  – Round 3: Auditing Other Networks
• What’s next?


                      Auditing PLN's
Why audit?




  Auditing PLN's
Short Answer: Why the heck not?
     “Don‟t believe in anything you hear,
        and only half of what you see”

                                        - Lou Reed

             “Trust, but verify.”

                                    - Ronald Reagan

                   Auditing PLN's
Slightly Long Answer:
                Things Go Wrong



Physical & Hardware                        Software
                        Insider &
                        External
                         Attacks




                      Organizational
                         Failure

      Media                            Curatorial Error
Full Answer:
It’s our responsibility




        Auditing PLN's
OAIS Model Responsibilities
• Accept appropriate information from Information
  Producers.
• Obtain sufficient control of the information to ensure long
  term preservation.
• Determine which groups should become the Designated
  Community able to understand the information.
• Ensure that the preserved information is independently
  understandable to the DC
• Ensure that the information can be preserved against all
  reasonable contingencies,
• Ensure that the information can be disseminated as
  authenticated copies of the original or as traceable back to
  the original
• Makes the preserved data available to the DC
                           Auditing PLN's
OAIS Basic Implied Trust Model
• Organization is axiomatically trusted to identify
  designated communities
• Organization is engineered with the goal of:
   – Collecting appropriate authentic document
   – Reliably deliver authentic documents, in understandable
     form, at a future time
• Success depends upon:
   – Reliability of storage systems:
     e.g., LOCKSS network, Amazon Glacier
   – Reliability of organizations:
     MetaArchive, DataPASS, Digital Preservation Network
   – Document contents and properties:
     Formats, Metadata, Semantics, Provenance, Authenticity


                                   Auditing PLN's
Reflections on OAIS Trust Model
• Specific bundle of trusted properties
• Not complete instrumentally nor ultimately




                    Auditing PLN's
Trust Engineering Approaches
•   Incentive based approaches:
      – Rewards, penalties, incentive-compatible mechanisms
•   Modeling and analysis:
      – Statistical quality control & reliability estimation, threat-modeling and vulnerability assessment
•   Portfolio Theory:
      – Diversification (financial, legal, technical… )
      – hedges
•   Over-engineering approaches:
      – Safety margin, redundancy
•   Informational approaches:
      – Transparency (release of information needed to directly evaluate compliance); cryptographic signature,
          fingerprint, common knowledge, non-repudiation
•   Social engineering
      – Recognized practices; shared norms
      – Social evidence
      – Reduce provocations
      – Remove excuses
•   Regulatory approaches
      – Disclosure; Review; Certification; Audits; Regulations & penalties
•   Security engineering
      – Increase effort: harden target (reduce vulnerability); increase technical/procedural controls
      – Increase risk: surveillance, detection, likelihood of response
      – Design patterns: minimal privileges, separation of privileges
                                                      Auditing PLN's
      – Reduce reward: deny benefits, disrupt markets, identify property, remove/conceal targets
Audit [aw-dit]:

   An independent evaluation of
   records and activities to
   assess a system of controls

Fixity mitigates risk only if used
          for auditing.
Functions of Storage Auditing

• Detect
  corruption/deletion of content

• Verify
  compliance with storage/replication policies

• Prompt
  repair actions
Bit-Level Audit Design Choices
• Audit regularity and coverage:
  on-demand (manually); on object access; on
  event; randomized sample;
  scheduled/comprehensive
• Fixity check & comparison algorithms
• Auditing scope:
  integrity of object; integrity of collection;
  integrity of network; policy compliance;
  public/transparent auditing
• Trust model
• Threat model
Repair

  Auditing mitigates risk only if
         used for repair.
Key Design Elements

• Repair granularity
• Repair trust model
• Repair latency:
   – Detection to start of repair
   – Repair duration
• Repair algorithm
LOCKSS Auditing & Repair
     Decentralized, peer-2-peer, tamper-resistant
                 replication & repair
Regularity                 Scheduled
Algorithms                 Bespoke, peer-reviewed, tamper resistant
Scope                      - Collection integrity
                           - Collection repair
Trust model                - Publisher is canonical source of content
                           - Changed contented treated as new
                           - Replication peers are untrusted
Main threat models         -   Media failure
                           -   Physical Failure
                           -   Curatorial Error
                           -   External Attack
                           -   Insider threats
                           -   Organizational failure
Key auditing limitations   - Correlated Software Failure
                           - Lack of Policy Auditing, public/transparent auditing
Auditing & Repair
TRAC-Aligned policy auditing as a overlay network
Regularity                 Scheduled; Manual
Fixity algorithms          Relies on underlying replication system
Scope                      -   Collection integrity
                           -   Network integrity
                           -   Network repair
                           -   High-level (e.g. trac) policy auditing
Trust model                - External auditor, with permissions to collect meta-
                             data/log information from replication network
                           - Replication network is untrusted
Main threat models         - Software failure
                           - Policy implementation failure
                             (curatorial error; insider threat)
                           - Organizational failure
                           - Media/physical failure through underlying replication
                             system
Key auditing limitations   Relies on underlying replication system, (now) LOCKSS, for
                           fixity check and repair
Theory vs. Practice
Round 0: Setting up the Data-PASS PLN
             “Looks ok to me”

                                  - PHB Motto




                 Auditing PLN's
Theory
Expose Content (                       Install LOCKSS
    Through                            (On 7 servers)
OAI+DDI+HTTP )


                                       Harvest Content
                                     (through OAI plugin)


                                          Setup PLN
                                        configurations
                                     (through OAI plugin)
                   LOCKSS
                    Magic




                     Done
                    Auditing PLN's
Practice (Year 1)                             Theory
•   OAI Plugin extensions required:
     – Non-DC metadata
     – Large metadata                          Expose Content (            Install LOCKSS
                                                   Through
     – Alternate authentication method         OAI+DDI+HTTP )
                                                                           (On 7 servers)

     – Save metadata record
     – Support for OAI-SETS                                                Harvest Content
     – Non-fatal error handling                                             (through OAI
                                                                               plugin)
•   OAI Provider required:
     – Authentication extensions                                              Setup PLN
                                                                           configurations
     – Performance handling for delivery                                    (through OAI
                                                                               plugin)
     – Performance handling for errors                            LOCKSS
     – Metadata validation                                         Magic

•   PLN Configuration required:
     – Stabilization around LOCKSS versions
     – Coordination around plugin                                  Done
        repository
     – Coordination around AU definition
Theory vs. Practice
 Round 1: Self-Audit
“A mere matter of implementation”

                              - PHB Motto




             Auditing PLN's
Theory
Gather Information
       from                 Add Replica
   Each Replica


    Integrate
 Information ->
Map Network State

                            State
                                           NO
Compare Current              ==
Network to Policy           Policy
                              ?

                                     YES


                            Success
       Auditing PLN's
Implementation




             www.safearchive.org
Practice (Year 2)                                   Theory
•   Gathering information required
     –   Permissions                                Gather Information
     –   Reverse-engineering UI’s (with help)              from           Add Replica
                                                       Each Replica
     –   Network magic
•   Integrating information required
     – Heuristics for lagged information                Integrate
                                                     Information ->
     – Heuristics for incomplete                    Map Network State
        information
     – Heuristics for aggregated                                          State          NO
        information                                  Compare Current       ==
                                                    State Map to Policy   Policy
•   Comparing map to policy required                                        ?
     Mere matter of implementation 
                                                                                   YES
•   Adding replica:
     Uh-oh, most policies failed 
     Adding replicas wasn’t going to resolve most                         Succes
     issues                                                                 s
Theory vs. Practice
Round 2: Compliance (almost)
      “How do you spell „backup‟?

       R–E-C–O–V–E–R-Y

                                    -



             Auditing PLN's
Practice (and adjustment) makes
              perfekt?
• Timings (e.g. crawls, polls)
   –   Understand
   –   Tune
   –    Parameterize heuristics, reporting
   –    Track trends over time
• Collections
   – Change partitioning to AU’s at source
   – Extend mapping to AU’s in plugin
   – Extend reporting/policy framework to group AU’s
• Diagnostics
   – When things go wrong – information to inform adjustment


                             Auditing PLN's
Theory vs. Practice
Round 3: Auditing Other PLNs
 “In theory, theory and practice are the same –
             in practice, they differ.”

                                                  -




                 Auditing PLN's
Theory
Gather Information
                                       Add
       from
                                      Replica
   Each Replica                                 NO
                                                                 YES
                                      Adjust               AU Sizes,
    Integrate                                                Polling
 Information ->                                             Intervals
Map Network State                                          adjusted?
                                      State
                                                           NO
Compare Current                        ==
Network to Policy                     Policy
                                        ?            YES




                                      Success

                     Auditing PLN's
Practice (Year 3)          Theory
•   100% of what?
•   Diagnostic inference


                                 Gather
                                                 Add
                            Information from
                                                Replica
                               Each Replica               NO
                                                                           YES
                                                                     AU Sizes,
                                Integrate       Adjust                Polling
                             Information ->                          Intervals
                              Map Network                            adjusted
                                  State                                  ?
                                                State
                            Compare Current      ==                   NO
                            Network to Policy   Policy
                                                  ?            YES



                                                Succe
                                                  ss
100% of what?
•   No: Of LOCKSS boxes?
•   No: Of AU’s?
•   Almost: Of policy overall
•   Yes: Of policy for specific collection
•   Maybe: Of files?
•   Maybe: Of bits in a file?
What you see                                    Box X,Y,Z all agree on
                                                        AU A




What you can conclude:
                                              Assumption:
 Box X,Y,Z have the                    Failures on file harvest are
                                        independent; number of           Content is good
   same content
                                           harvested files large




                      Auditing PLN's
What you see               Box X,Y,Z don’t agree




What you can conclude?



          Auditing PLN's
Hypothesis 1: Disagreement is real, but doesn’t really matter.

Non-Substantive AU differences (arising from dynamic elements in AU’s that have no bearing on the substantive content )

        1.1 Individual URLS/files that are dynamic and non substantive (e.g., logo images, plugins, Twitter feeds, etc.) cause
content changes (this is common in the GLN).
        1.2 dynamic content embedded in substantive content (e.g. a customized per-client header page embedded in the pdf
for a journal article )

Hypothesis 2: Disagreement is real, but doesn’t really matter in the longer run (even if disagreement persists over long run!)

      2.1 Temporary AU Differences. Versions of objects temporarily out or sync.
            (E.g. if harvest frequency << source update frequency, but harvest times across boxes vary significantly)
      2.2 Objects temporarily missing
            (E.g. recently added objects are picked up by some replicas, not by others)

Hypothesis 3: Disagreement is real, matters
      Substantive AU differences

      3.1 Content corruption (e.g. from corruption in storage, or during transmission/harvesting)
      3.2     Objects persistently missing from some replicas
       ( e.g. because of permissions issue @ provider; technical failures during harvest; plugin problems)
      3.2 Versions of objects persistently missing/out of sync from some replicas
      (e.g. harvest frequency > source update frequency leading to different AU’s harvesting different versions of the content. )
      Note that later “agreement” signifies that a particular version was verified, not that all versions have been replicated
and verified

Hypothesis 4: AU’s really do agree, but we think they don’t

      4.1 Appearance of disagreement caused by Incomplete diagnostic information Poll data are missing as a result of
system reboot, daemon updates, or other cause.
      4.2 Poll data are lagging – from different periods Polls fail, but contains information about agreement that is ignored
Auditing PLN's
Design Challenge
• Create more sophisticated algorithms
             and
• Instrument PLN data collection

     Such that

Observed behavior allows us to distinguish
between hypotheses 1-4.

                     Auditing PLN's
Approaches to Design Challenge


         [Tom Lipkis’s Talk]




              Auditing PLN's
What’s Next?
         “It‟s tough to make predictions,
           especially about the future”
      -Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston
Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert
 Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan
       Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey
     Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, and others


                              Auditing PLN's
Short Term
• Complete round 3 data collection
• Refinements of current auditing algorithms
  – More tunable parameters (yeah?!)
  – Better documentation
  – Simple health metrics
• Reports, and dissemination



                     Auditing PLN's
Longer Term
•   Health metrics, diagnostics, decision support
•   Additional audit standards
•   Support additional replication networks
•   Audit other policy sets




                       Auditing PLN's
Bibliography (Selected)
• B. Schneier, 2012. Liars and Outliers, John Wiley & Sons

• H.M. Gladney, J.L. Bennett, 2003. “What do we mean by authentic”,
  D-Lib 9(7/8)

• K. Thompson, 1984. “Reflections on Trusting Trust”, Communication
  of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763.

• David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky
  Reich, Seth Morabito. “Requirements for Digital Preservation
  Systems: A Bottom-Up Approach”, D-Lib Magazine, vol. 11, no. 11,
  November 2005.

• OAIS, Reference Model for an Open Archival Information System
  (OAIS). CCSDS 650.0-B-1, Blue Book, January 2002




                             Auditing PLN's
Questions?



E-mail:
  Micah_altman@alumni.brown.edu
Web: micahaltman.com
Twitter: @drmaltman

E-mail:   Jonathan_Crabtree@unc.edu

              Auditing PLN's

Weitere ähnliche Inhalte

Ähnlich wie Auditing PLN’s: Preliminary Results

Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...RootedCON
 
KAI, the Information Specialist
KAI, the Information SpecialistKAI, the Information Specialist
KAI, the Information Specialistaik762
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceRobert H. McDonald
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools Mickey Boxell
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachSoftServe
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the CloudCloudPassage
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineSalford Systems
 
Tsc2021 cyber-issues
Tsc2021 cyber-issuesTsc2021 cyber-issues
Tsc2021 cyber-issuesErnest Staats
 
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
DSS.LV - Principles Of Data Protection - March2015 By Arturs FilatovsDSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
DSS.LV - Principles Of Data Protection - March2015 By Arturs FilatovsAndris Soroka
 
Openstack security presentation 2013
Openstack security presentation 2013Openstack security presentation 2013
Openstack security presentation 2013brian_chong
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and StandardsARDC
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madnesssemanticsconference
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...African Open Science Platform
 
Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1Steve Markey
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureZaloni
 
Myths of validation
Myths of validationMyths of validation
Myths of validationJeff Thomas
 
Data-PASS: How Collaborative Presentation Works
Data-PASS: How Collaborative Presentation WorksData-PASS: How Collaborative Presentation Works
Data-PASS: How Collaborative Presentation WorksMicah Altman
 

Ähnlich wie Auditing PLN’s: Preliminary Results (20)

Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
Jaime Blasco - Fighting Advanced Persistent Threat (APT) with Open Source Too...
 
KAI, the Information Specialist
KAI, the Information SpecialistKAI, the Information Specialist
KAI, the Information Specialist
 
Building a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability ScienceBuilding a Data Discovery Network for Sustainability Science
Building a Data Discovery Network for Sustainability Science
 
Prometheus 101
Prometheus 101Prometheus 101
Prometheus 101
 
ADDO Open Source Observability Tools
ADDO Open Source Observability Tools ADDO Open Source Observability Tools
ADDO Open Source Observability Tools
 
Agile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric ApproachAgile Big Data Analytics Development: An Architecture-Centric Approach
Agile Big Data Analytics Development: An Architecture-Centric Approach
 
45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud45 Minutes to PCI Compliance in the Cloud
45 Minutes to PCI Compliance in the Cloud
 
Machine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search EngineMachine Learned Relevance at A Large Scale Search Engine
Machine Learned Relevance at A Large Scale Search Engine
 
Tsc2021 cyber-issues
Tsc2021 cyber-issuesTsc2021 cyber-issues
Tsc2021 cyber-issues
 
8 Access Control
8 Access Control8 Access Control
8 Access Control
 
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
DSS.LV - Principles Of Data Protection - March2015 By Arturs FilatovsDSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
DSS.LV - Principles Of Data Protection - March2015 By Arturs Filatovs
 
Openstack security presentation 2013
Openstack security presentation 2013Openstack security presentation 2013
Openstack security presentation 2013
 
Architecture and Standards
Architecture and StandardsArchitecture and Standards
Architecture and Standards
 
Session 0.0 poster minutes madness
Session 0.0   poster minutes madnessSession 0.0   poster minutes madness
Session 0.0 poster minutes madness
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...
 
Wilbert QTI Profile
Wilbert QTI ProfileWilbert QTI Profile
Wilbert QTI Profile
 
Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1Securing_Native_Big_Data_v1
Securing_Native_Big_Data_v1
 
Data Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data ArchitectureData Lakes - The Key to a Scalable Data Architecture
Data Lakes - The Key to a Scalable Data Architecture
 
Myths of validation
Myths of validationMyths of validation
Myths of validation
 
Data-PASS: How Collaborative Presentation Works
Data-PASS: How Collaborative Presentation WorksData-PASS: How Collaborative Presentation Works
Data-PASS: How Collaborative Presentation Works
 

Mehr von Micah Altman

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesMicah Altman
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset ConversationMicah Altman
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Micah Altman
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Micah Altman
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset ConversationMicah Altman
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer ReviewMicah Altman
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer ReviewMicah Altman
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An OverviewMicah Altman
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral DistrictingMicah Altman
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk Micah Altman
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...Micah Altman
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Micah Altman
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Micah Altman
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsMicah Altman
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...Micah Altman
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenaryMicah Altman
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Micah Altman
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanMicah Altman
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...Micah Altman
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceMicah Altman
 

Mehr von Micah Altman (20)

Selecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategiesSelecting efficient and reliable preservation strategies
Selecting efficient and reliable preservation strategies
 
Well-Being - A Sunset Conversation
Well-Being - A Sunset ConversationWell-Being - A Sunset Conversation
Well-Being - A Sunset Conversation
 
Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...Matching Uses and Protections for Government Data Releases: Presentation at t...
Matching Uses and Protections for Government Data Releases: Presentation at t...
 
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
Privacy Gaps in Mediated Library Services: Presentation at NERCOMP2019
 
Well-being A Sunset Conversation
Well-being A Sunset ConversationWell-being A Sunset Conversation
Well-being A Sunset Conversation
 
Can We Fix Peer Review
Can We Fix Peer ReviewCan We Fix Peer Review
Can We Fix Peer Review
 
Academy Owned Peer Review
Academy Owned Peer ReviewAcademy Owned Peer Review
Academy Owned Peer Review
 
Redistricting in the US -- An Overview
Redistricting in the US -- An OverviewRedistricting in the US -- An Overview
Redistricting in the US -- An Overview
 
A Future for Electoral Districting
A Future for Electoral DistrictingA Future for Electoral Districting
A Future for Electoral Districting
 
A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk  A History of the Internet :Scott Bradner’s Program on Information Science Talk
A History of the Internet :Scott Bradner’s Program on Information Science Talk
 
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
SAFETY NETS: RESCUE AND REVIVAL FOR ENDANGERED BORN-DIGITAL RECORDS- Program ...
 
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
Labor And Reward In Science: Commentary on Cassidy Sugimoto’s Program on Info...
 
Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:Utilizing VR and AR in the Library Space:
Utilizing VR and AR in the Library Space:
 
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-NotsCreative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
Creative Data Literacy: Bridging the Gap Between Data-Haves and Have-Nots
 
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
SOLARSPELL: THE SOLAR POWERED EDUCATIONAL LEARNING LIBRARY - EXPERIENTIAL LEA...
 
Ndsa 2016 opening plenary
Ndsa 2016 opening plenaryNdsa 2016 opening plenary
Ndsa 2016 opening plenary
 
Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...Making Decisions in a World Awash in Data: We’re going to need a different bo...
Making Decisions in a World Awash in Data: We’re going to need a different bo...
 
Software Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental ScanSoftware Repositories for Research-- An Environmental Scan
Software Repositories for Research-- An Environmental Scan
 
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
The Open Access Network: Rebecca Kennison’s Talk for the MIT Prorgam on Infor...
 
Gary Price, MIT Program on Information Science
Gary Price, MIT Program on Information ScienceGary Price, MIT Program on Information Science
Gary Price, MIT Program on Information Science
 

Kürzlich hochgeladen

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 

Kürzlich hochgeladen (20)

GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 

Auditing PLN’s: Preliminary Results

  • 1. Prepared for PLN 2012 UNC, Chapel Hill October 2012 Auditing PLN’s: Preliminary Results and Next Steps Micah Altman, Director of Research, MIT Libraries Non Resident Senior Fellow, The Brookings Institution Jonathan Crabtree, Assistant Director of Computing and Archival Research HW Odum Institute for Research in Social Science, UNC
  • 2. Collaborators* • Nancy McGovern • Tom Lipkis & the LOCKSS Team Research Support Thanks to the Library of Congress, the National Science Foundation, IMLS, the Sloan Foundation, the Harvard University Library, the Institute for Quantitative Social Science, and the Massachusetts Institute of Technology. * And co-conspirators Auditing PLN's
  • 3. Related Work Reprints available from: micahaltman.com • M. Altman, J. Crabtree, “Using the SafeArchive System: TRAC- Based Auditing of LOCKSS”, Proceedings of Archiving 2011, Society for Imaging Science and Technology. • Altman, M., Beecher, B., & Crabtree, J. (2009). A Prototype Platform for Policy-Based Archival Replication. Against the Grain, 21(2), 44- 47. Auditing PLN's
  • 4. Preview • Why audit? • Theory & Practice – Round 0: Setting up the Data-PASS PLN – Round 1: Self-Audit – Round 2: Compliance (almost) – Round 3: Auditing Other Networks • What’s next? Auditing PLN's
  • 5. Why audit? Auditing PLN's
  • 6. Short Answer: Why the heck not? “Don‟t believe in anything you hear, and only half of what you see” - Lou Reed “Trust, but verify.” - Ronald Reagan Auditing PLN's
  • 7. Slightly Long Answer: Things Go Wrong Physical & Hardware Software Insider & External Attacks Organizational Failure Media Curatorial Error
  • 8. Full Answer: It’s our responsibility Auditing PLN's
  • 9. OAIS Model Responsibilities • Accept appropriate information from Information Producers. • Obtain sufficient control of the information to ensure long term preservation. • Determine which groups should become the Designated Community able to understand the information. • Ensure that the preserved information is independently understandable to the DC • Ensure that the information can be preserved against all reasonable contingencies, • Ensure that the information can be disseminated as authenticated copies of the original or as traceable back to the original • Makes the preserved data available to the DC Auditing PLN's
  • 10. OAIS Basic Implied Trust Model • Organization is axiomatically trusted to identify designated communities • Organization is engineered with the goal of: – Collecting appropriate authentic document – Reliably deliver authentic documents, in understandable form, at a future time • Success depends upon: – Reliability of storage systems: e.g., LOCKSS network, Amazon Glacier – Reliability of organizations: MetaArchive, DataPASS, Digital Preservation Network – Document contents and properties: Formats, Metadata, Semantics, Provenance, Authenticity Auditing PLN's
  • 11. Reflections on OAIS Trust Model • Specific bundle of trusted properties • Not complete instrumentally nor ultimately Auditing PLN's
  • 12. Trust Engineering Approaches • Incentive based approaches: – Rewards, penalties, incentive-compatible mechanisms • Modeling and analysis: – Statistical quality control & reliability estimation, threat-modeling and vulnerability assessment • Portfolio Theory: – Diversification (financial, legal, technical… ) – hedges • Over-engineering approaches: – Safety margin, redundancy • Informational approaches: – Transparency (release of information needed to directly evaluate compliance); cryptographic signature, fingerprint, common knowledge, non-repudiation • Social engineering – Recognized practices; shared norms – Social evidence – Reduce provocations – Remove excuses • Regulatory approaches – Disclosure; Review; Certification; Audits; Regulations & penalties • Security engineering – Increase effort: harden target (reduce vulnerability); increase technical/procedural controls – Increase risk: surveillance, detection, likelihood of response – Design patterns: minimal privileges, separation of privileges Auditing PLN's – Reduce reward: deny benefits, disrupt markets, identify property, remove/conceal targets
  • 13. Audit [aw-dit]: An independent evaluation of records and activities to assess a system of controls Fixity mitigates risk only if used for auditing.
  • 14. Functions of Storage Auditing • Detect corruption/deletion of content • Verify compliance with storage/replication policies • Prompt repair actions
  • 15. Bit-Level Audit Design Choices • Audit regularity and coverage: on-demand (manually); on object access; on event; randomized sample; scheduled/comprehensive • Fixity check & comparison algorithms • Auditing scope: integrity of object; integrity of collection; integrity of network; policy compliance; public/transparent auditing • Trust model • Threat model
  • 16. Repair Auditing mitigates risk only if used for repair. Key Design Elements • Repair granularity • Repair trust model • Repair latency: – Detection to start of repair – Repair duration • Repair algorithm
  • 17. LOCKSS Auditing & Repair Decentralized, peer-2-peer, tamper-resistant replication & repair Regularity Scheduled Algorithms Bespoke, peer-reviewed, tamper resistant Scope - Collection integrity - Collection repair Trust model - Publisher is canonical source of content - Changed contented treated as new - Replication peers are untrusted Main threat models - Media failure - Physical Failure - Curatorial Error - External Attack - Insider threats - Organizational failure Key auditing limitations - Correlated Software Failure - Lack of Policy Auditing, public/transparent auditing
  • 18. Auditing & Repair TRAC-Aligned policy auditing as a overlay network Regularity Scheduled; Manual Fixity algorithms Relies on underlying replication system Scope - Collection integrity - Network integrity - Network repair - High-level (e.g. trac) policy auditing Trust model - External auditor, with permissions to collect meta- data/log information from replication network - Replication network is untrusted Main threat models - Software failure - Policy implementation failure (curatorial error; insider threat) - Organizational failure - Media/physical failure through underlying replication system Key auditing limitations Relies on underlying replication system, (now) LOCKSS, for fixity check and repair
  • 19. Theory vs. Practice Round 0: Setting up the Data-PASS PLN “Looks ok to me” - PHB Motto Auditing PLN's
  • 20. Theory Expose Content ( Install LOCKSS Through (On 7 servers) OAI+DDI+HTTP ) Harvest Content (through OAI plugin) Setup PLN configurations (through OAI plugin) LOCKSS Magic Done Auditing PLN's
  • 21. Practice (Year 1) Theory • OAI Plugin extensions required: – Non-DC metadata – Large metadata Expose Content ( Install LOCKSS Through – Alternate authentication method OAI+DDI+HTTP ) (On 7 servers) – Save metadata record – Support for OAI-SETS Harvest Content – Non-fatal error handling (through OAI plugin) • OAI Provider required: – Authentication extensions Setup PLN configurations – Performance handling for delivery (through OAI plugin) – Performance handling for errors LOCKSS – Metadata validation Magic • PLN Configuration required: – Stabilization around LOCKSS versions – Coordination around plugin Done repository – Coordination around AU definition
  • 22. Theory vs. Practice Round 1: Self-Audit “A mere matter of implementation” - PHB Motto Auditing PLN's
  • 23. Theory Gather Information from Add Replica Each Replica Integrate Information -> Map Network State State NO Compare Current == Network to Policy Policy ? YES Success Auditing PLN's
  • 24. Implementation www.safearchive.org
  • 25. Practice (Year 2) Theory • Gathering information required – Permissions Gather Information – Reverse-engineering UI’s (with help) from Add Replica Each Replica – Network magic • Integrating information required – Heuristics for lagged information Integrate Information -> – Heuristics for incomplete Map Network State information – Heuristics for aggregated State NO information Compare Current == State Map to Policy Policy • Comparing map to policy required ? Mere matter of implementation  YES • Adding replica: Uh-oh, most policies failed  Adding replicas wasn’t going to resolve most Succes issues s
  • 26. Theory vs. Practice Round 2: Compliance (almost) “How do you spell „backup‟? R–E-C–O–V–E–R-Y - Auditing PLN's
  • 27. Practice (and adjustment) makes perfekt? • Timings (e.g. crawls, polls) – Understand – Tune – Parameterize heuristics, reporting – Track trends over time • Collections – Change partitioning to AU’s at source – Extend mapping to AU’s in plugin – Extend reporting/policy framework to group AU’s • Diagnostics – When things go wrong – information to inform adjustment Auditing PLN's
  • 28. Theory vs. Practice Round 3: Auditing Other PLNs “In theory, theory and practice are the same – in practice, they differ.” - Auditing PLN's
  • 29. Theory Gather Information Add from Replica Each Replica NO YES Adjust AU Sizes, Integrate Polling Information -> Intervals Map Network State adjusted? State NO Compare Current == Network to Policy Policy ? YES Success Auditing PLN's
  • 30. Practice (Year 3) Theory • 100% of what? • Diagnostic inference Gather Add Information from Replica Each Replica NO YES AU Sizes, Integrate Adjust Polling Information -> Intervals Map Network adjusted State ? State Compare Current == NO Network to Policy Policy ? YES Succe ss
  • 31. 100% of what? • No: Of LOCKSS boxes? • No: Of AU’s? • Almost: Of policy overall • Yes: Of policy for specific collection • Maybe: Of files? • Maybe: Of bits in a file?
  • 32. What you see Box X,Y,Z all agree on AU A What you can conclude: Assumption: Box X,Y,Z have the Failures on file harvest are independent; number of Content is good same content harvested files large Auditing PLN's
  • 33. What you see Box X,Y,Z don’t agree What you can conclude? Auditing PLN's
  • 34. Hypothesis 1: Disagreement is real, but doesn’t really matter. Non-Substantive AU differences (arising from dynamic elements in AU’s that have no bearing on the substantive content ) 1.1 Individual URLS/files that are dynamic and non substantive (e.g., logo images, plugins, Twitter feeds, etc.) cause content changes (this is common in the GLN). 1.2 dynamic content embedded in substantive content (e.g. a customized per-client header page embedded in the pdf for a journal article ) Hypothesis 2: Disagreement is real, but doesn’t really matter in the longer run (even if disagreement persists over long run!) 2.1 Temporary AU Differences. Versions of objects temporarily out or sync. (E.g. if harvest frequency << source update frequency, but harvest times across boxes vary significantly) 2.2 Objects temporarily missing (E.g. recently added objects are picked up by some replicas, not by others) Hypothesis 3: Disagreement is real, matters Substantive AU differences 3.1 Content corruption (e.g. from corruption in storage, or during transmission/harvesting) 3.2 Objects persistently missing from some replicas ( e.g. because of permissions issue @ provider; technical failures during harvest; plugin problems) 3.2 Versions of objects persistently missing/out of sync from some replicas (e.g. harvest frequency > source update frequency leading to different AU’s harvesting different versions of the content. ) Note that later “agreement” signifies that a particular version was verified, not that all versions have been replicated and verified Hypothesis 4: AU’s really do agree, but we think they don’t 4.1 Appearance of disagreement caused by Incomplete diagnostic information Poll data are missing as a result of system reboot, daemon updates, or other cause. 4.2 Poll data are lagging – from different periods Polls fail, but contains information about agreement that is ignored
  • 36. Design Challenge • Create more sophisticated algorithms and • Instrument PLN data collection Such that Observed behavior allows us to distinguish between hypotheses 1-4. Auditing PLN's
  • 37. Approaches to Design Challenge [Tom Lipkis’s Talk] Auditing PLN's
  • 38. What’s Next? “It‟s tough to make predictions, especially about the future” -Attributed to Woody Allen, Yogi Berra, Niels Bohr, Vint Cerf, Winston Churchill, Confucius, Disreali [sic], Freeman Dyson, Cecil B. Demille, Albert Einstein, Enrico Fermi, Edgar R. Fiedler, Bob Fourer, Sam Goldwyn, Allan Lamport, Groucho Marx, Dan Quayle, George Bernard Shaw, Casey Stengel, Will Rogers, M. Taub, Mark Twain, Kerr L. White, and others Auditing PLN's
  • 39. Short Term • Complete round 3 data collection • Refinements of current auditing algorithms – More tunable parameters (yeah?!) – Better documentation – Simple health metrics • Reports, and dissemination Auditing PLN's
  • 40. Longer Term • Health metrics, diagnostics, decision support • Additional audit standards • Support additional replication networks • Audit other policy sets Auditing PLN's
  • 41. Bibliography (Selected) • B. Schneier, 2012. Liars and Outliers, John Wiley & Sons • H.M. Gladney, J.L. Bennett, 2003. “What do we mean by authentic”, D-Lib 9(7/8) • K. Thompson, 1984. “Reflections on Trusting Trust”, Communication of the ACM, Vol. 27, No. 8, August 1984, pp. 761-763. • David S.H. Rosenthal, Thomas S. Robertson, Tom Lipkis, Vicky Reich, Seth Morabito. “Requirements for Digital Preservation Systems: A Bottom-Up Approach”, D-Lib Magazine, vol. 11, no. 11, November 2005. • OAIS, Reference Model for an Open Archival Information System (OAIS). CCSDS 650.0-B-1, Blue Book, January 2002 Auditing PLN's
  • 42. Questions? E-mail: Micah_altman@alumni.brown.edu Web: micahaltman.com Twitter: @drmaltman E-mail: Jonathan_Crabtree@unc.edu Auditing PLN's

Hinweis der Redaktion

  1. This work by Micah Altman (http://micahaltman.com) , with the exception of images explicitly accompanied by a separate “source” reference, is licensed under the Creative Commons Attribution-Share Alike 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-sa/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.