SlideShare ist ein Scribd-Unternehmen logo
1 von 1
Downloaden Sie, um offline zu lesen
Pairtrees for object storage
                               John Kunze and Stephen Abrams, California Digital Library (CDL)

                                                                                                                                     Summary
The deadly embrace                                              Objects in a pairtree                                                Pairtree is the thinnest smear we can add to our very well-
• Digital repositories tend to require a surrender of storage   A pairtree is especially useful if, for each contained object,          understood filesystems and their universal tools (the
  transparency that creates unhealthy system dependency            all of the object’s parts, and nothing but its parts, are            universal “API”) to create a very well-understood,
• Internally objects are often broken up so that they can be       enclosed in the object’s directory                                   platform-independent object storage substrate
  difficult to piece together in case of trouble                Import such a pairtree and, knowing nothing about the                Pairtree is not a complete repository system, but it is
                                                                   objects’ structure and semantics, you can reliably                   complete for object storage and makes it easier to build
Fig. 1. Object storage should not                                                                                                       systems and to share objects between institutions
need a fearful entanglement with                                • Enumerate all objects and their identifiers
software. Since objects have to                                 • Produce any object by requested id
be parked in a filesystem before
repository software upgrade, what
                                                                • Maintain and back it up with ordinary OS tools                     Why pairs of characters?
                                                                • Rebuild the collection in case of database corruption              Taking two chars at a time balances path depth and
if we left them in there and built                                 simply by walking the pairtree                                       fanout (number of possible entries in any directory)
our repositories around them?
                                                                To walk a pairtree requires knowing path termination rules           • Example: ab2def3 ⇒ ab/2d/ef/3/
                                          Jim B L
                                                                • A pairpath terminates when you reach a file or reach a             • Each pair, letters+digits, has 36x36 possibilities
                                                                   directory name with 1 char or more than 2 chars                   Compared to taking one char at a time
                                                                  ab/                                                                • Only 36 possibilities, but path depth grows rapidly
A pairtree maps ids to paths,                                     --- cd/                                                           • Example: ab2def3 ⇒ a/b/2/d/e/f/3/
                                                                                                                                     At another extreme, taking seven characters at a time
 two characters at a time                                                   |--- foo/
                                                                            |      | README.txt                                      • Short paths, but 78 billion (367) possible items
A pairtree is a filesystem hierarchy that uses an identifier                |      | thumbnail.gif                                   • Example: ab2def3 ⇒ ab2def3/
   string to derive an object directory (or folder) location                |      |--- master_images/
• The derivation takes successive pairs of characters and                   |      |       |      ...
   creates a succession of directories, called a pairpath                   |      |
                                                                                                                                     Pairtree credits and details
                                                                            |      --- gh/                                          Pairtree specification:
              ab2def3 ⇒ ab/2d/ef/3/
                                                                            --- e/                                                       www.ietf.org/internet-drafts/draft-kunze-pairtree-01.txt
• A pairpath ends at directory containing an object’s files;                                                                              www.cdlib.org/inside/diglib/pairtree/pairtreespec.html
                                                                                   --- bar/
   most systems do variation of this (is variation needed?)                                                                          Authors from CDL and University of Michigan (UM):
                                                                                            | metadata
• Reverse the mapping to find all ids/objects in a pairtree;                                                                            Martin Haye, Erik Hetzner, John Kunze, Mark Reyes,
                                                                                           | 54321.wav
   pairpath termination rules permit variable length ids                                                                                and Cory Snavely; many thanks to Stephen Abrams,
                                                                                           | index.html                                 Sebastien Korner, Brian Tingle, et al
Pre-converting problematic characters                              Fig. 2. Example pairtree containing two objects:                  Pairtree origins include
Some identifier characters are inconvenient or illegal in          abcd and abcde. The first object is enclosed in                    • Prototype: UCSF tobacco control
  filenames and must be hex-encoded (e.g., *→^2a)                  directory foo/, the second in bar/. While foo/                    documents and CDL digitized books
      id:    what-the-*@?#!                                        does not subsume e/ at the same level, by                         • Early production: digitized books
         → what-the-^2a@^3f#!                                      enclosure, it does subsume the gh/ underneath it.                 for UM and Hathi Trust
         ⇒ wh/at/-t/he/-^/2a/@^/3f/#!                                                                                                                                                            cyocum



But to keep paths short, 3 common chars are converted to 3
  rare chars (at cost of complexity): /→= :→+ .→,               Sample software implementation                                       For further information
      id:    ark:/13030/xt12t3                                     http://search.cpan.org/~jak/Pairtree-0.2/lib/File/Pairtree.pm     Please contact jak@ucop.edu or stephen.abrams@ucop.edu
            → ark+=13030=xt12t3                                 A Perl module that implements two mappings: id2ppath() takes an      For information on CDL’s Preservation Program, see
            ⇒ ar/k+/=1/30/30/=x/t1/2t/3/                           id into a pairpath and ppath2id() performs the inverse mapping.     http://www.cdlib.org/programs/digital_preservation.html

Weitere ähnliche Inhalte

Ähnlich wie Pairtrees for object storage: an SEO-optimized title

Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-researchsaintdevil163
 
stream processing engine
stream processing enginestream processing engine
stream processing enginetiana528
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Jen Waller
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyUri Laserson
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Anne Nicolas
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013olberger
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software BenchmarkAkira Shibata
 
Streams, sockets and filters oh my!
Streams, sockets and filters oh my!Streams, sockets and filters oh my!
Streams, sockets and filters oh my!Elizabeth Smith
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mappingThomas Maroschik
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteISSGC Summer School
 
Spark Gotchas and Lessons Learned
Spark Gotchas and Lessons LearnedSpark Gotchas and Lessons Learned
Spark Gotchas and Lessons LearnedJen Waller
 
Query processing and optimization
Query processing and optimizationQuery processing and optimization
Query processing and optimizationArif A.
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramViswanath Gangavaram
 
The BagIt file package format
The BagIt file package formatThe BagIt file package format
The BagIt file package formatJohn Kunze
 
[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their Own[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their OwnPerforce
 

Ähnlich wie Pairtrees for object storage: an SEO-optimized title (20)

Borthakur hadoop univ-research
Borthakur hadoop univ-researchBorthakur hadoop univ-research
Borthakur hadoop univ-research
 
stream processing engine
stream processing enginestream processing engine
stream processing engine
 
Hadoop
HadoopHadoop
Hadoop
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)Spark Gotchas and Lessons Learned (2/20/20)
Spark Gotchas and Lessons Learned (2/20/20)
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive Biology
 
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
Distro Recipes 2013 : Contribution of RDF metadata for traceability among pro...
 
Presentation distro recipes-2013
Presentation distro recipes-2013Presentation distro recipes-2013
Presentation distro recipes-2013
 
Analysis Software Benchmark
Analysis Software BenchmarkAnalysis Software Benchmark
Analysis Software Benchmark
 
Parquet overview
Parquet overviewParquet overview
Parquet overview
 
Streams, sockets and filters oh my!
Streams, sockets and filters oh my!Streams, sockets and filters oh my!
Streams, sockets and filters oh my!
 
Git studynotes
Git studynotesGit studynotes
Git studynotes
 
Bin carver
Bin carverBin carver
Bin carver
 
Extbase object to xml mapping
Extbase object to xml mappingExtbase object to xml mapping
Extbase object to xml mapping
 
Session 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLiteSession 24 - Distribute Data and Metadata Management with gLite
Session 24 - Distribute Data and Metadata Management with gLite
 
Spark Gotchas and Lessons Learned
Spark Gotchas and Lessons LearnedSpark Gotchas and Lessons Learned
Spark Gotchas and Lessons Learned
 
Query processing and optimization
Query processing and optimizationQuery processing and optimization
Query processing and optimization
 
Pig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaramPig power tools_by_viswanath_gangavaram
Pig power tools_by_viswanath_gangavaram
 
The BagIt file package format
The BagIt file package formatThe BagIt file package format
The BagIt file package format
 
[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their Own[Nvidia] Extracting Depot Paths Into New Instances of Their Own
[Nvidia] Extracting Depot Paths Into New Instances of Their Own
 

Mehr von John Kunze

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ MetadictionaryJohn Kunze
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderJohn Kunze
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...John Kunze
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDLJohn Kunze
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy buildingJohn Kunze
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for PersistenceJohn Kunze
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesJohn Kunze
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsJohn Kunze
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardJohn Kunze
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyJohn Kunze
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014John Kunze
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupJohn Kunze
 
Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research DatasetsJohn Kunze
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management EcosystemJohn Kunze
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchJohn Kunze
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long TailJohn Kunze
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsJohn Kunze
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayJohn Kunze
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsJohn Kunze
 

Mehr von John Kunze (20)

The YAMZ Metadictionary
The YAMZ MetadictionaryThe YAMZ Metadictionary
The YAMZ Metadictionary
 
YAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary BuilderYAMZ Metadata Vocabulary Builder
YAMZ Metadata Vocabulary Builder
 
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
The ARK Alliance: 20 years, 850 institutions, 8.2 billion persistent identifi...
 
EZID and N2T at CDL
EZID and N2T at CDLEZID and N2T at CDL
EZID and N2T at CDL
 
YAMZ.net: better, faster, cheaper taxonomy building
YAMZ.net:  better, faster, cheaper taxonomy buildingYAMZ.net:  better, faster, cheaper taxonomy building
YAMZ.net: better, faster, cheaper taxonomy building
 
A Vocabulary for Persistence
A Vocabulary for PersistenceA Vocabulary for Persistence
A Vocabulary for Persistence
 
Identifiers obey Resolvers not Schemes
Identifiers obey Resolvers not SchemesIdentifiers obey Resolvers not Schemes
Identifiers obey Resolvers not Schemes
 
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKsNames, Things, and Open Identifier Infrastructure: N2T and ARKs
Names, Things, and Open Identifier Infrastructure: N2T and ARKs
 
ARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forwardARK identifiers: lessons learnt at BnF: paths forward
ARK identifiers: lessons learnt at BnF: paths forward
 
YAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabularyYAMZ: a cross-domain crowd-sourced metadata vocabulary
YAMZ: a cross-domain crowd-sourced metadata vocabulary
 
DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014DataONE Preservation and Metadata Working Group Report 2014
DataONE Preservation and Metadata Working Group Report 2014
 
Selected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout groupSelected Bash shell tricks from Camp CDL breakout group
Selected Bash shell tricks from Camp CDL breakout group
 
Annotating Research Datasets
Annotating Research DatasetsAnnotating Research Datasets
Annotating Research Datasets
 
The Data Management Ecosystem
The Data Management EcosystemThe Data Management Ecosystem
The Data Management Ecosystem
 
Library Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich ResearchLibrary Tools Supporting Data-Rich Research
Library Tools Supporting Data-Rich Research
 
Big Data's Long Tail
Big Data's Long TailBig Data's Long Tail
Big Data's Long Tail
 
Pamwg 2012ahm
Pamwg 2012ahmPamwg 2012ahm
Pamwg 2012ahm
 
Scalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History CollectionsScalable Identifiers for Natural History Collections
Scalable Identifiers for Natural History Collections
 
Future-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do TodayFuture-Proofing the Web: What We Can Do Today
Future-Proofing the Web: What We Can Do Today
 
Supporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many FrontsSupporting Data-Rich Research on Many Fronts
Supporting Data-Rich Research on Many Fronts
 

Kürzlich hochgeladen

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

Pairtrees for object storage: an SEO-optimized title

  • 1. Pairtrees for object storage John Kunze and Stephen Abrams, California Digital Library (CDL) Summary The deadly embrace Objects in a pairtree Pairtree is the thinnest smear we can add to our very well- • Digital repositories tend to require a surrender of storage A pairtree is especially useful if, for each contained object, understood filesystems and their universal tools (the transparency that creates unhealthy system dependency all of the object’s parts, and nothing but its parts, are universal “API”) to create a very well-understood, • Internally objects are often broken up so that they can be enclosed in the object’s directory platform-independent object storage substrate difficult to piece together in case of trouble Import such a pairtree and, knowing nothing about the Pairtree is not a complete repository system, but it is objects’ structure and semantics, you can reliably complete for object storage and makes it easier to build Fig. 1. Object storage should not systems and to share objects between institutions need a fearful entanglement with • Enumerate all objects and their identifiers software. Since objects have to • Produce any object by requested id be parked in a filesystem before repository software upgrade, what • Maintain and back it up with ordinary OS tools Why pairs of characters? • Rebuild the collection in case of database corruption Taking two chars at a time balances path depth and if we left them in there and built simply by walking the pairtree fanout (number of possible entries in any directory) our repositories around them? To walk a pairtree requires knowing path termination rules • Example: ab2def3 ⇒ ab/2d/ef/3/ Jim B L • A pairpath terminates when you reach a file or reach a • Each pair, letters+digits, has 36x36 possibilities directory name with 1 char or more than 2 chars Compared to taking one char at a time ab/ • Only 36 possibilities, but path depth grows rapidly A pairtree maps ids to paths, --- cd/ • Example: ab2def3 ⇒ a/b/2/d/e/f/3/ At another extreme, taking seven characters at a time two characters at a time |--- foo/ | | README.txt • Short paths, but 78 billion (367) possible items A pairtree is a filesystem hierarchy that uses an identifier | | thumbnail.gif • Example: ab2def3 ⇒ ab2def3/ string to derive an object directory (or folder) location | |--- master_images/ • The derivation takes successive pairs of characters and | | | ... creates a succession of directories, called a pairpath | | Pairtree credits and details | --- gh/ Pairtree specification: ab2def3 ⇒ ab/2d/ef/3/ --- e/ www.ietf.org/internet-drafts/draft-kunze-pairtree-01.txt • A pairpath ends at directory containing an object’s files; www.cdlib.org/inside/diglib/pairtree/pairtreespec.html --- bar/ most systems do variation of this (is variation needed?) Authors from CDL and University of Michigan (UM): | metadata • Reverse the mapping to find all ids/objects in a pairtree; Martin Haye, Erik Hetzner, John Kunze, Mark Reyes, | 54321.wav pairpath termination rules permit variable length ids and Cory Snavely; many thanks to Stephen Abrams, | index.html Sebastien Korner, Brian Tingle, et al Pre-converting problematic characters Fig. 2. Example pairtree containing two objects: Pairtree origins include Some identifier characters are inconvenient or illegal in abcd and abcde. The first object is enclosed in • Prototype: UCSF tobacco control filenames and must be hex-encoded (e.g., *→^2a) directory foo/, the second in bar/. While foo/ documents and CDL digitized books id: what-the-*@?#! does not subsume e/ at the same level, by • Early production: digitized books → what-the-^2a@^3f#! enclosure, it does subsume the gh/ underneath it. for UM and Hathi Trust ⇒ wh/at/-t/he/-^/2a/@^/3f/#! cyocum But to keep paths short, 3 common chars are converted to 3 rare chars (at cost of complexity): /→= :→+ .→, Sample software implementation For further information id: ark:/13030/xt12t3 http://search.cpan.org/~jak/Pairtree-0.2/lib/File/Pairtree.pm Please contact jak@ucop.edu or stephen.abrams@ucop.edu → ark+=13030=xt12t3 A Perl module that implements two mappings: id2ppath() takes an For information on CDL’s Preservation Program, see ⇒ ar/k+/=1/30/30/=x/t1/2t/3/ id into a pairpath and ppath2id() performs the inverse mapping. http://www.cdlib.org/programs/digital_preservation.html