SlideShare ist ein Scribd-Unternehmen logo
1 von 32
Data Migrations
Some Considerations when Preparing to
Migrate to AtoM
http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html
Overview• Assess what data will be part of the migration
• Do any in-system clean up prior to export
• Review AtoM data formats and available fields
• Establish a crosswalk between your data and AtoM’s fields
• Export your data
• Perform any additional clean up as needed
• Transform your data to an AtoM compatible import format
• Import
• Review your work
• Revise and reimport if needed
• Make small clean up edits in AtoM directly
Expect this to take time
• Our average length for a client data migration project is
around 4-6 months. Even for a simple project, there will
be a lot of time needed for data clean-up, quality
assurance review, and reimports.
Expect to do your import more than once
• It’s unlikely that everything will go perfectly on the
first attempt. You’ll discover some records don’t quite
match the same pattern as the rest, or one field didn’t
import, etc. Don’t be discouraged, and do budget your
time with this assumption in mind.
Before Starting
Develop a data management plan while you migrate
• How will you ensure you are not stranding data during the
time of your migration? Will you freeze data entry
entirely for the length? Manage your data in a
spreadsheet? Run a small migration for new data at the
end? Make sure everyone knows the plan.
Clarify roles, deadlines, and communication channels
• Ensure everyone involved knows what is expected of them
throughout the project, and when. Clearly identify those
responsible for key roles, and where to go for support.
Before Starting
Data assessment
• How many descriptions do you have? How many top-level
records?
• Have the records been described based on any content
standards? (e.g. ISAD(G), RAD, DACS, MAD, MODS, etc.?)
• Are there custom fields with data in your system? How many?
Do they readily map to known standards or not?
• What export formats does your system support?
• Is all record data captured in the exports?
• Are some descriptions “draft” or non-public? Is this
information captured in the export?
Questions to ask in the data assessment phase:
Data assessment
• How many digital objects do you have to migrate? What types
(images, text, video, etc) and formats (e.g. JPG, mp4, etc)
are represented?
• Are authority records maintained separately from
descriptions? What about other entities? Accession records?
• Is the relationship between these entities and descriptions
captured in the export formats available?
• Do these other record types have their own export formats?
(e.g. EAC-CPF XML, SKOS XML, CSV, etc)
• How are hierarchical relationships captured in the export?
Questions to ask in the data assessment phase:
AtoM Data FormatsArchival descriptions
• CSV, EAD 2002 XML, MODS XML
Authority records
• EAC-CPF XML, CSV
Accessions
• CSV
Terms (Subjects, Places, Genres, etc.)
• SKOS - many serializations supported
Repository records
• CSV
Current as of
version 2.4
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
CSV import will be the best way to get data into AtoM -
because the CSV import template is a format specific to
AtoM, there is no data loss and all fields are represented.
If you are able to get your data out of your legacy system
and transform it into an AtoM-compatible CSV format, we
recommend using this method for your migration project.
AtoM CSV Templates
https://wiki.accesstomemory.org/Resources/CSV_templates
In the example CSV files from v2.2 on, we have
included the relevant content standard name and number
in the sample data field. This means you can import
the CSV template to produce a sort of “crosswalk” or
key, showing you how fields in AtoM map to the column
headers.
AtoM CSV Templates
AtoM EAD 2002 XML
Similarly, if you
ensure all data entry
fields in AtoM are
filled in with related
content standard names
and numbers, you can
now export as EAD XML
to generate an EAD -
ISAD(G) crosswalk from
AtoM.
AtoM EAD 2002 XML
Similarly, if you
ensure all data entry
fields in AtoM are
filled in with related
content standard names
and numbers, you can
now export as EAD XML
to generate an EAD -
ISAD(G) crosswalk from
AtoM.
AtoM EAD 2002 XML
AtoM EAD 2002 XML
EAD 2002 XML is a flexible standard with many possible valid
but different implementations. For this reason, your locally
generated EAD, while valid, may still not import perfectly
into AtoM. This is why we prefer working with CSV imports
whenever possible.
We recommend running a test import of a representative sample
from your source system into AtoM, and using the crosswalk
method discussed above to evaluate if you will need to make
changes to how your EAD XML is encoded for a successful import
into AtoM.
Crosswalking
Crosswalking is the process of mapping your source
data fields to equivalent AtoM ones.
To do so, you must understand how AtoM handles some
data (such as authority records, terms, etc.) first.
There will be cases where there are no 1:1
equivalencies either - you will have to make decisions
about how to combine or split apart your existing data
to make it work with what is available.
https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
Crosswalking
AtoM is standards-based.
This means you can focus on crosswalking to the
content standard you know best. Use the guidance
provided in the relevant standard to help inform your
mapping.
https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
AtoM Entity Types
A(n incomplete) list
of the main entity
types around which
AtoM was built.
https://www.accesstomemory.org/docs/latest/user-manual/overview/entity-types/
Accessions
Accession records have their own CSV import format. As
there is currently no international accessions standard,
you will need to review the available fields in AtoM
closely and determine where to map your data.
Descriptions can be linked to Accessions via the
accessionNumber column in the description CSV templates.
We recommend importing your Accessions first, then your
descriptions with the corresponding accession number, to
establish links.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#import-accessions-via-csv
ActorsIn AtoM, creators and name access points are maintained
separately as authority records, so they can be re-used and
linked to multiple descriptions.
This means any creator name or name access point you import
with your descriptions will create an authority record, or
link to an existing match!
Make sure that names are consistent in your data, and the
biographical/administrative history is about the actor only -
not specific to the description.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/add-edit-content/authority-
records/#authority-bioghist-access
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#on-
authority-records-archival-descriptions-and-csv-imports
ActorsThe Actor data you can add to a description CSV is minimal -
if you do maintain authority records, then you may want to
import them separately via AtoM’s authority record CSV
templates.
There are 3 actor CSV templates - the main actors template, 1
to supplement relationship data (between actors and/or
resources) and 1 to supplement alternative forms of name.
We recommend importing authority records before descriptions,
so you can link them on description import.
See:
• https://www.accesstomemory.org/docs/lates/user-manual/import-export/csv-import/#creator-
related-import-columns-actors-and-events
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#import-
authority-records-via-csv
Event Dates
Description edit templates have 3 date fields. The
Display date is what the end user will see - it is
free text. The start and end dates must follow ISO
8601 (YYYY-MM-DD, etc) formatting. These fields are
used to support AtoM’s date range search.
• Display date
• Start date
• End date
Event DatesDuring CSV import, Creators and Dates are paired (as
Events - see Entity types diagram).
Use the | pipe character to add multiple creators/dates.
You can use a literal NULL value in your CSV file to
keep the spacing correct for dates without actors or
vice versa:
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#creator-related-import-columns-actors-and-events
Access PointsIn AtoM, access points on a description (e.g. subjects,
places, genre terms) are maintained separately as terms in a
taxonomy so they can be controlled and reused.
This means that access point data in your description imports
will either create new terms or link to existing ones. Make
sure your data is consistent so you don’t have near-
duplicates later! (e.g. “cars” vs “car” vs “automobiles”)
The exception is name access points - these are authority
records!
See:
• https://www.accesstomemory.org/docs/latest/user-manual/add-edit-
content/terms/#term-name-vs-subject
Hierarchies
Hierarchies are managed in the description CSV
templates via the legacyId and parentId columns.
Parent records must import in a row above child
records. The children should have the legacyId value
of the parent record in the parentId column.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#hierarchical-relationships
Can be imported with descriptions using the
digitalObjectURI or digitalObjectPath columns.
URIs point to external, web-accessible resources -
must end in file extension!
Paths point to a local directory added to your
server prior to import.
See:
• https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-
import/#digital-object-related-import-columns
Digital Objects
You have 3 main options when it comes to
transforming your data into an AtoM-
compatible format:
• Manual data transformation
• Tools such as OpenRefine
• Transformation script
http://www.publicdomainpictures.net/view-image.php?image=131381&picture=monarch-butterfly
Data Transformation
OpenRefine is “a free, open source
power tool for working with messy
data and improving it.”
See:
• http://openrefine.org/
• https://github.com/OpenRefine/OpenRefine
There are many great free resources to help
you get started.
Data Transformation
Use OpenRefine to:
• Add AtoM column headers
• Normalize names and terms
• Standardize identifiers or accession
numbers
• Split source data into separate columns
• Combine data into a single column
• Delete unnecessary rows
• Global search/replace
• etc
You can also use OpenRefine to clean up XML data!
Data Transformation
A transformation script
is generally a script prepared
by a developer that takes an
input (your source data) and
runs a series of operations to
transform the data into the
desired output (an AtoM-
compatible file).
These can be prepared in many
programming languages (e.g.
PHP, Python, etc).
Data Transformation
Import Ordering
If you are working with
several different types of
data, you may need to
perform multiple imports,
possibly in different
formats. If so, we
recommend proceeding in
this order to link entities
together as your imports
proceed.
We also recommend running a
smaller sample test first!
1. Terms
2. Repositories
3. Actors
4. Accessions
5. Descriptions
Questions?
info@artefactual.com
http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html

Weitere ähnliche Inhalte

Was ist angesagt?

Document Object Model
Document Object ModelDocument Object Model
Document Object Model
chomas kandar
 

Was ist angesagt? (20)

Creating custom themes in AtoM
Creating custom themes in AtoMCreating custom themes in AtoM
Creating custom themes in AtoM
 
AtoM's Command Line Tasks - An Introduction
AtoM's Command Line Tasks - An IntroductionAtoM's Command Line Tasks - An Introduction
AtoM's Command Line Tasks - An Introduction
 
An Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual SystemsAn Introduction to AtoM, Archivematica, and Artefactual Systems
An Introduction to AtoM, Archivematica, and Artefactual Systems
 
AtoM feature development
AtoM feature developmentAtoM feature development
AtoM feature development
 
AtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of CustodyAtoM, Authenticity, and the Chain of Custody
AtoM, Authenticity, and the Chain of Custody
 
BAPI - Criação de Ordem de Manutenção
BAPI - Criação de Ordem de ManutençãoBAPI - Criação de Ordem de Manutenção
BAPI - Criação de Ordem de Manutenção
 
Document Object Model
Document Object ModelDocument Object Model
Document Object Model
 
CSS Basics
CSS BasicsCSS Basics
CSS Basics
 
HTML5: features with examples
HTML5: features with examplesHTML5: features with examples
HTML5: features with examples
 
Php mysql
Php mysqlPhp mysql
Php mysql
 
Css
CssCss
Css
 
Introduction to Sightly and Sling Models
Introduction to Sightly and Sling ModelsIntroduction to Sightly and Sling Models
Introduction to Sightly and Sling Models
 
Intro to HTML and CSS basics
Intro to HTML and CSS basicsIntro to HTML and CSS basics
Intro to HTML and CSS basics
 
Oracle Forms : Query Triggers
Oracle Forms : Query TriggersOracle Forms : Query Triggers
Oracle Forms : Query Triggers
 
Html5 Basic Structure
Html5 Basic StructureHtml5 Basic Structure
Html5 Basic Structure
 
AEM Best Practices for Component Development
AEM Best Practices for Component DevelopmentAEM Best Practices for Component Development
AEM Best Practices for Component Development
 
CSS ppt
CSS pptCSS ppt
CSS ppt
 
Cascading Style Sheets - Part 01
Cascading Style Sheets - Part 01Cascading Style Sheets - Part 01
Cascading Style Sheets - Part 01
 
Html ppt
Html pptHtml ppt
Html ppt
 
Html forms
Html formsHtml forms
Html forms
 

Ähnlich wie AtoM Data Migrations

Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh Patel
 

Ähnlich wie AtoM Data Migrations (20)

Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
Saurabh_Patel_An Alternative way to Import Multiple Excel files with Multiple...
 
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
AWS Public Data Sets: How to Stage Petabytes of Data for Analysis in AWS (WPS...
 
CTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML FormsCTDA MODS and Islandora XML Forms
CTDA MODS and Islandora XML Forms
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
ALA Interoperability
ALA InteroperabilityALA Interoperability
ALA Interoperability
 
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best PracticesAWS Summit Singapore - Managing a Database Migration Project | Best Practices
AWS Summit Singapore - Managing a Database Migration Project | Best Practices
 
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
Building Your First Serverless Data Lake (ANT356-R1) - AWS re:Invent 2018
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Loading Data into Amazon Redshift
Loading Data into Amazon RedshiftLoading Data into Amazon Redshift
Loading Data into Amazon Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF LoftLoading Data into Redshift: Data Analytics Week at the SF Loft
Loading Data into Redshift: Data Analytics Week at the SF Loft
 
Best Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon RedshiftBest Practices for Migrating your Data Warehouse to Amazon Redshift
Best Practices for Migrating your Data Warehouse to Amazon Redshift
 
CADA
CADA CADA
CADA
 
SAS - Training
SAS - Training SAS - Training
SAS - Training
 
Loading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SFLoading Data into Redshift: Data Analytics Week SF
Loading Data into Redshift: Data Analytics Week SF
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 
Cassandra data modelling best practices
Cassandra data modelling best practicesCassandra data modelling best practices
Cassandra data modelling best practices
 
Content migration for sitecore
Content migration for sitecoreContent migration for sitecore
Content migration for sitecore
 
Scalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and HowScalable, Fast Analytics with Graph - Why and How
Scalable, Fast Analytics with Graph - Why and How
 

Mehr von Artefactual Systems - AtoM

Mehr von Artefactual Systems - AtoM (16)

AtoM Community Update: 2019-05
AtoM Community Update: 2019-05AtoM Community Update: 2019-05
AtoM Community Update: 2019-05
 
Creating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with VagrantCreating your own AtoM demo data set for re-use with Vagrant
Creating your own AtoM demo data set for re-use with Vagrant
 
Searching in AtoM
Searching in AtoMSearching in AtoM
Searching in AtoM
 
Looking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and futureLooking Ahead: AtoM's governance, development, and future
Looking Ahead: AtoM's governance, development, and future
 
Contributing to the AtoM documentation
Contributing to the AtoM documentationContributing to the AtoM documentation
Contributing to the AtoM documentation
 
Installing AtoM with Ansible
Installing AtoM with AnsibleInstalling AtoM with Ansible
Installing AtoM with Ansible
 
Installing and Upgrading AtoM
Installing and Upgrading AtoMInstalling and Upgrading AtoM
Installing and Upgrading AtoM
 
Command-Line 101
Command-Line 101Command-Line 101
Command-Line 101
 
National Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshopNational Archives of Norway - AtoM and Archivematica intro workshop
National Archives of Norway - AtoM and Archivematica intro workshop
 
Artefactual and Open Source Development
Artefactual and Open Source DevelopmentArtefactual and Open Source Development
Artefactual and Open Source Development
 
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
Technologie Proche: Imagining the Archival Systems of Tomorrow With the Tools...
 
AtoM Community Update 2016
AtoM Community Update 2016AtoM Community Update 2016
AtoM Community Update 2016
 
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
Project Documentation with Sphinx (or, How I Learned to Stop Worrying and Lov...
 
Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015Digital Curation using Archivematica and AtoM: DLF Forum 2015
Digital Curation using Archivematica and AtoM: DLF Forum 2015
 
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
Introducing Binder: A Web-based, Open Source Digital Preservation Management ...
 
Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)Introducing the Digital Repository for Museum Collections (DRMC)
Introducing the Digital Repository for Museum Collections (DRMC)
 

Kürzlich hochgeladen

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

AtoM Data Migrations

  • 1. Data Migrations Some Considerations when Preparing to Migrate to AtoM http://boingboing.net/2016/11/08/heres-the-unexpected-origin.html
  • 2. Overview• Assess what data will be part of the migration • Do any in-system clean up prior to export • Review AtoM data formats and available fields • Establish a crosswalk between your data and AtoM’s fields • Export your data • Perform any additional clean up as needed • Transform your data to an AtoM compatible import format • Import • Review your work • Revise and reimport if needed • Make small clean up edits in AtoM directly
  • 3. Expect this to take time • Our average length for a client data migration project is around 4-6 months. Even for a simple project, there will be a lot of time needed for data clean-up, quality assurance review, and reimports. Expect to do your import more than once • It’s unlikely that everything will go perfectly on the first attempt. You’ll discover some records don’t quite match the same pattern as the rest, or one field didn’t import, etc. Don’t be discouraged, and do budget your time with this assumption in mind. Before Starting
  • 4. Develop a data management plan while you migrate • How will you ensure you are not stranding data during the time of your migration? Will you freeze data entry entirely for the length? Manage your data in a spreadsheet? Run a small migration for new data at the end? Make sure everyone knows the plan. Clarify roles, deadlines, and communication channels • Ensure everyone involved knows what is expected of them throughout the project, and when. Clearly identify those responsible for key roles, and where to go for support. Before Starting
  • 5. Data assessment • How many descriptions do you have? How many top-level records? • Have the records been described based on any content standards? (e.g. ISAD(G), RAD, DACS, MAD, MODS, etc.?) • Are there custom fields with data in your system? How many? Do they readily map to known standards or not? • What export formats does your system support? • Is all record data captured in the exports? • Are some descriptions “draft” or non-public? Is this information captured in the export? Questions to ask in the data assessment phase:
  • 6. Data assessment • How many digital objects do you have to migrate? What types (images, text, video, etc) and formats (e.g. JPG, mp4, etc) are represented? • Are authority records maintained separately from descriptions? What about other entities? Accession records? • Is the relationship between these entities and descriptions captured in the export formats available? • Do these other record types have their own export formats? (e.g. EAC-CPF XML, SKOS XML, CSV, etc) • How are hierarchical relationships captured in the export? Questions to ask in the data assessment phase:
  • 7. AtoM Data FormatsArchival descriptions • CSV, EAD 2002 XML, MODS XML Authority records • EAC-CPF XML, CSV Accessions • CSV Terms (Subjects, Places, Genres, etc.) • SKOS - many serializations supported Repository records • CSV Current as of version 2.4
  • 9. AtoM CSV Templates https://wiki.accesstomemory.org/Resources/CSV_templates CSV import will be the best way to get data into AtoM - because the CSV import template is a format specific to AtoM, there is no data loss and all fields are represented. If you are able to get your data out of your legacy system and transform it into an AtoM-compatible CSV format, we recommend using this method for your migration project.
  • 10. AtoM CSV Templates https://wiki.accesstomemory.org/Resources/CSV_templates In the example CSV files from v2.2 on, we have included the relevant content standard name and number in the sample data field. This means you can import the CSV template to produce a sort of “crosswalk” or key, showing you how fields in AtoM map to the column headers.
  • 12. AtoM EAD 2002 XML Similarly, if you ensure all data entry fields in AtoM are filled in with related content standard names and numbers, you can now export as EAD XML to generate an EAD - ISAD(G) crosswalk from AtoM.
  • 13. AtoM EAD 2002 XML Similarly, if you ensure all data entry fields in AtoM are filled in with related content standard names and numbers, you can now export as EAD XML to generate an EAD - ISAD(G) crosswalk from AtoM.
  • 15. AtoM EAD 2002 XML EAD 2002 XML is a flexible standard with many possible valid but different implementations. For this reason, your locally generated EAD, while valid, may still not import perfectly into AtoM. This is why we prefer working with CSV imports whenever possible. We recommend running a test import of a representative sample from your source system into AtoM, and using the crosswalk method discussed above to evaluate if you will need to make changes to how your EAD XML is encoded for a successful import into AtoM.
  • 16. Crosswalking Crosswalking is the process of mapping your source data fields to equivalent AtoM ones. To do so, you must understand how AtoM handles some data (such as authority records, terms, etc.) first. There will be cases where there are no 1:1 equivalencies either - you will have to make decisions about how to combine or split apart your existing data to make it work with what is available. https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
  • 17. Crosswalking AtoM is standards-based. This means you can focus on crosswalking to the content standard you know best. Use the guidance provided in the relevant standard to help inform your mapping. https://commons.wikimedia.org/wiki/File:Brand_New_Crosswalk_(6223110132).jpg
  • 18. AtoM Entity Types A(n incomplete) list of the main entity types around which AtoM was built. https://www.accesstomemory.org/docs/latest/user-manual/overview/entity-types/
  • 19. Accessions Accession records have their own CSV import format. As there is currently no international accessions standard, you will need to review the available fields in AtoM closely and determine where to map your data. Descriptions can be linked to Accessions via the accessionNumber column in the description CSV templates. We recommend importing your Accessions first, then your descriptions with the corresponding accession number, to establish links. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#import-accessions-via-csv
  • 20. ActorsIn AtoM, creators and name access points are maintained separately as authority records, so they can be re-used and linked to multiple descriptions. This means any creator name or name access point you import with your descriptions will create an authority record, or link to an existing match! Make sure that names are consistent in your data, and the biographical/administrative history is about the actor only - not specific to the description. See: • https://www.accesstomemory.org/docs/latest/user-manual/add-edit-content/authority- records/#authority-bioghist-access • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#on- authority-records-archival-descriptions-and-csv-imports
  • 21. ActorsThe Actor data you can add to a description CSV is minimal - if you do maintain authority records, then you may want to import them separately via AtoM’s authority record CSV templates. There are 3 actor CSV templates - the main actors template, 1 to supplement relationship data (between actors and/or resources) and 1 to supplement alternative forms of name. We recommend importing authority records before descriptions, so you can link them on description import. See: • https://www.accesstomemory.org/docs/lates/user-manual/import-export/csv-import/#creator- related-import-columns-actors-and-events • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv-import/#import- authority-records-via-csv
  • 22. Event Dates Description edit templates have 3 date fields. The Display date is what the end user will see - it is free text. The start and end dates must follow ISO 8601 (YYYY-MM-DD, etc) formatting. These fields are used to support AtoM’s date range search. • Display date • Start date • End date
  • 23. Event DatesDuring CSV import, Creators and Dates are paired (as Events - see Entity types diagram). Use the | pipe character to add multiple creators/dates. You can use a literal NULL value in your CSV file to keep the spacing correct for dates without actors or vice versa: See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#creator-related-import-columns-actors-and-events
  • 24. Access PointsIn AtoM, access points on a description (e.g. subjects, places, genre terms) are maintained separately as terms in a taxonomy so they can be controlled and reused. This means that access point data in your description imports will either create new terms or link to existing ones. Make sure your data is consistent so you don’t have near- duplicates later! (e.g. “cars” vs “car” vs “automobiles”) The exception is name access points - these are authority records! See: • https://www.accesstomemory.org/docs/latest/user-manual/add-edit- content/terms/#term-name-vs-subject
  • 25. Hierarchies Hierarchies are managed in the description CSV templates via the legacyId and parentId columns. Parent records must import in a row above child records. The children should have the legacyId value of the parent record in the parentId column. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#hierarchical-relationships
  • 26. Can be imported with descriptions using the digitalObjectURI or digitalObjectPath columns. URIs point to external, web-accessible resources - must end in file extension! Paths point to a local directory added to your server prior to import. See: • https://www.accesstomemory.org/docs/latest/user-manual/import-export/csv- import/#digital-object-related-import-columns Digital Objects
  • 27. You have 3 main options when it comes to transforming your data into an AtoM- compatible format: • Manual data transformation • Tools such as OpenRefine • Transformation script http://www.publicdomainpictures.net/view-image.php?image=131381&picture=monarch-butterfly Data Transformation
  • 28. OpenRefine is “a free, open source power tool for working with messy data and improving it.” See: • http://openrefine.org/ • https://github.com/OpenRefine/OpenRefine There are many great free resources to help you get started. Data Transformation
  • 29. Use OpenRefine to: • Add AtoM column headers • Normalize names and terms • Standardize identifiers or accession numbers • Split source data into separate columns • Combine data into a single column • Delete unnecessary rows • Global search/replace • etc You can also use OpenRefine to clean up XML data! Data Transformation
  • 30. A transformation script is generally a script prepared by a developer that takes an input (your source data) and runs a series of operations to transform the data into the desired output (an AtoM- compatible file). These can be prepared in many programming languages (e.g. PHP, Python, etc). Data Transformation
  • 31. Import Ordering If you are working with several different types of data, you may need to perform multiple imports, possibly in different formats. If so, we recommend proceeding in this order to link entities together as your imports proceed. We also recommend running a smaller sample test first! 1. Terms 2. Repositories 3. Actors 4. Accessions 5. Descriptions