In this insightful presentation we will provide a profound analysis of the complexities institutions face during the migration process. With a focus on real-world examples, the presentation will explore challenges encountered when transitioning from older DSpace versions and diverse platforms such as EPrints and Invenio. The session will also offer a sneak peek into DSpace 8, anticipated to reshape the landscape of digital repositories.
“Adoption DSpace 7 and 8 Challenges and Solutions from Real Migration Experiences”.pdf
1. Adopting DSpace 7 and 8:
Challenges and Solutions from
Real Migration Experiences
2. AGENDA
4Science who we
are
It is not just an
update, it is always
a migration
A couple of hints
about your data
model
There are more data
that need to be
migrated than what
you expect
Plan,
Do, Check, Finalize
Common pitfalls &
Solution strategies
Take aways
4. Who we are
OUR AIM: to enable implementationof the transnationally
importantpolicies
of Open Research,
Research Impact
and Digital Preservation.
DSpace
(CRIS/GLAM)
OJS
Dataverse
Our
services:
Our solutions
support
compliance
with key
international
standards:
Certified Platinum
Provider and leading
contributorto DSpace 7
✓ OpenAIRE
✓ ORCID
✓ CERIF
✓ IIIF
We provide
solutions for
research
information & data
management and
for cultural
heritage
• Installation
• Configuration
• Hosting and
maintenance
• System integration,
customization and
consultancy
5. What we believe in Security Certification is not a matterof compromise: our
solutions are secure by design; openness without
security would be counterproductive;security without
openness would be unproductive.
ISO/IEC 27001:2013,
27017:2015, 27018:
2019, and
ISO/IEC 9001:2015
Our solutionssupport
the key defining
transnational
policies, Open
Research and open
digitalcultural
heritage, and are
based on:
Open-
source
software
Open
standards
Interoperability
Preservation
Collaboration
Innovation
7. The context in which we operate since 2016
We are driven by serving the
open knowledge ecosystem.
Proprietary products often
come with expensivelicenses
and pricing fluctuations,can
become obsolescent and can
result in vendor-lock-in.
Our open solutions (open
standards,open protocols,
open source) aredesigned to
support open science.
Open knowledge helps to
solve,by collaboration,the
world’s very
pressingproblems,and creates
new opportunities, especially
when cross-disciplinary.
8. 4Science role in the Open Science and DSpace community
Certified Platinum
Provider and leading
contributor of DSpace
Our goal is to anticipate
the future making it
more accessible
2023 DSpace worldwide
community leaders for
hours donated for
DSpace development
Experts in the field and
enablers that can help
with any situation
At 4Science we are driven by serving the open knowledge ecosystem.
Openknowledge
empowering open access,
supporting open science,
advancing open scholarly
communication.
FAIR data
Our solutions enable your
data to be Findable,
Accessible, Interoperable and
Reusable
Interoperablesolutions
ORCID and Datacite Certified
Service Provider, CERIF and
IIIF enabler
Compliance& Quality
COAR-NGR, OpenAIRE,
Certified Platinum Provider of
DSpace, ISO 9001:2015
Security
Battle-tested solutions, secure
by design; Trusted Providers
of the Cloud Security Alliance
9. «Migration» or «update»? Not so different?
In this session we will lookat some insights frombest practices that we havelearned moving from DSpace 5,
DSpace 6, EPrints, Digital Commons, OPUS or even custom solutions, but the first thing we would like to share is…
Even when you are about to upgrade from an old to the new version of DSpace, keep in mind that it has been
completelyreengineered fromprevious ones:anyupdate to a major release should therefore be understood (and
planned with the appropriate timing)as if it were a migration to an entirelynewplatform,in additionto
integrations with systems alreadyin yourecosystem.
Consider it as it was a migration toa completelydifferent system, although the main paradigms and approaches
are preserved
10. Entities are
the
foundation of
the new data
model
An effective datamodel should also be
flexible
Entities are a pivotal part of defining a
whole datamodel contributing to its
design, they enable flexibility to reflect
your data in a more granular way
Your data model should be as close as
possible to international standards to
enhance interoperability
The current design of DSpace 7
provides the foundation for flexibility
ensuring that it can be tailored to your
requirements
Relations complete the definition of
your data model: authors, publications,
organizations and more, can be
interconnected to each other
11. Entities should reflect your data model, enabling
relations and exploring connections
ENTITIES AREA WAYOF
REPRESENTING DATA AND THEIR
RELATIONS IN A STRUCTURED
MANNER
ENTITIES ARECONSTITUTED BY
RECORDS THATCAN BE
DESCRIBED, IDENTIFIED, AND
RELATED TO OTHER RECORDS IN
A REPOSITORY
ENTITIES ARE USED TO
REPRESENT REAL-WORLD
OBJECTS SUCH AS PEOPLE,
ORGANIZATIONS,
PUBLICATIONS AS WELL AS
ABSTRACTCONCEPTS SUCH AS
SUSTAINABILITY GOALS,
RESEARCH LINES, THEMATIC
COLLECTIONS
ENTITIES AREUSED TO PROVIDE
CONTEXT, CONNECTIONS, AND
RELATIONSHIPS BETWEEN
OBJECTS INTHE REPOSITORY,
SUPPORTING DISCOVERYAND
COMPREHENSION OF THE
CONTEXT
12. But with a correct balance:
when you’re about to migrate…
• You could have processes that
you would like to drop
• Customizations that affect your
maintenance costs
• Metadata representing
information that is no longer
useful
• And processes…you’d like to add,
or change
• New features that can substitute
your old customizations
• Opportunity to add new
information to your repository
14. How to enable entities
during the upgrade: pt 2
This step/job may be slow!
15. How to enable
entities during
the
“migration”
from other
platforms
Follow the DSpace documentation, YES but...Howto import all the
metadata, relationship and files?
• The SAF import could be an option (single records), BUT... you
cannot set the relationship with not-yet-created entities: it is
preferable to individually create all entities, make sure to store a
local.legacyid value for each
• Use the CSV Bulk edit (manually or automatically updated) to
create the relationship(s)
Warning: CSV Bulk cannot manage ordering between entities and
simple strings (i.e. ordering of Authors when only few of them have
a profile)
16. All of that is
easier in DSpace-
CRIS thanks to
the possibility to
use…
• Denormalized tables where you can prepare your data for import (like
the CSV but on the database) → easier!
• Enhanced Bulk import from Excel instead of CSV (yes, it is a non-standard
format but easier to work with, available for non-technical people →
new lines can be created)
• Promise for future reference that will be resolved once the target item is
created (i.e. you can say will be referenced:ORCID:XXXXX to
create a relation with the item AUTHOR using
person.identifier.orcid = XXXXX)
• You can manage files directly providing a remote URL (no SAF process
needed)
• Ordering between Entities and strings is supported (column with the
specific relationship.type can be ordered by value/promise)
17. Not enough said, but…
Do not customize your DSpace
database tables/structure, nor
backport any feature that changes it
Why?
Because it could lead to your
automated database upgrade process
to fail
Create new tables (instead of
modifying existing ones)
ALREADYDID?
Consider replacing your additional contents (tables) → new entities enabled by DSpace 7
18. Yes, your institution has a lot of data
…and not all of them are visible in plain sight (as metadata of your
items)
There will be more data emerging that you did not imagine
19. So…please keep this in mind
OAI Identifiers should be preserved.This is currently not supported without code change
(we plan to generalize the solution and open a PR → DSpace 8)
OAI URLs should be preserved as well:
redirection is (almost) good but you should
check it at least with your known harvesters
→ Easy to do in Apache or nginX (light web
server)
Statistics can be migrated
Upgradeprocedures, if followed, will resultin a full migration of the data... not -really-
deleted items / bitstreams areloss
When you migratefromanother platformyou can bulk import your statistics data
directly in SOLRvia CSV. Data need to be prepared so a local.legacyidmetadata willbe
crucial to translateyour legacy ID into the new one
20. Step 1: PLAN - ask yourself all relevant questions
Make sure to sync your activities and preparatory/interdependent tasks...
Prepare a new,
separated, environment
for DSpace 7
Do you use the Handle
Server?
Do you mint DOIs?
Integration:whatapplication extractsdata fromDSpace?What application
pushesdataintoDSpace? Usingwhich technology:SWORD, REST API?How
much time 3rd partieswill needtoswitchfromthe oldintegrationtothe
newone?
Plan to put your
repository in READ ONLY
mode for enough time
to perform the final
migration
Prepare your UATs that
should take into account
of your customizations,
configurations and top-
priority functionalities
You need to run the migration at least two times and
usually you cannot afford to haveyour currentrepository
locked down for a long period
This means that the two runs will useslightly differentdata!
Even if the repository is
in READ ONLY mode,
there are still running
data... Statistics will
grow!
21. Step 2: DO
Verify Verify the timing for execution/import/indexing during this phase: you’ll
benefit from them for the final migration
Note Remember to keep track of all of your steps (you’ll have to exactly repeat
them for the final migration)
Do Do your first test migration
22. Step 3:
CHECK
Perform UATs to validate and
flag possible issues (and the
related fixes you applied)
If you notice something
wrong that was not covered
by UATs, you should not
ignore it: UATs should be
amended to reflect the
path
Verify that timing of the
first migration allows you
to meet the
deadlines you were
expecting?
Verufy which tasks could be
optimized/reviewed
Check data integrity: run the
checksum checker (fixed by
4Science in 7.6)
Temporarily disable indexing during intermediate milestones/steps to save some time…
(…but be careful of the interdependencies in further steps and keep in mind that you’ll have to run a full
indexing when needed)
About the automatic initial reindexing: it is not recommended to skip it, unless you will manually reindex at a
later time, or verify that a reindexing is not necessary. Forgetting to reindex your site after an upgrade may result
in unexpected errors or instabilities
23. Step 4: FINALIZE
Put in read-only mode your current production environment before performing the final
deployment
Alert your partners of integrated systems that the systems is freezed
Extract your data from your current freezed repository
Re-run the steps that you succesfully run during the first test migration: even small
differences may lead to unexpexted issues
Run the UAT books: if everything goes smooth, make the final switch into production
24. DOs
Alert Give notice to your partners that they can restartto perform ordinary activities on their
3rd party systems
Move Move your handle server to your new environment
Enable Enable all of your crontabjobs
Update Update ALL of your URLs to matchthe ones in productions
25. More pitfalls and solutions we adopted
with experience
…fromDSpace 5, DSpace 6, EPrints,Digital
Commons, OPUS,Invenio…
26. UATs, the world where the obvious is certainly
not – guidelines
A plan should be prepared and followed methodically to test and verify
consistencybetween the old systemand the new one. A few examples:
1. How many items were visible in the old system? How many in the new one?
2. How many items were present in the users' workspace? How many in the new
system?
3. Same for workflows: how many in the various steps, how many in charge of the
various users?
4. Are any items restricted or embargoed? Are restrictions migrated correctly and
working?
5. Are all protocols used by 3rd party systems enabled (SWORD? Legacy REST…)?
27. Time spent in
UATs is very
well-spent
Through these cross-checks we had the
opportunity to discover inconsistencies
between the database and UI of older
versions of DSpace:
oeven fixing the problem in the new
version did not always coincide with
the user's desires (e.g., items
previously not visible by mistake
becoming visible in the new version
and vice versa).
28. Fun facts and
unapparent trivia
Thumbnails in the new DSpace 7 are now larger
than in the old versions. We learned that the
layout, importing the old ones, would be
compromised.
This resulted in the discovery of the century: all
thumbnails had to be…regenerated.
4Science contributed the fix for the regeneration
of the thumbnails ☺
The moral: consider every possible interaction!
29. Fun facts and
unapparent trivia
Most viewed item? OH YES PLEASE.
…but the item in the new version turned out to be
different from the item in the old version. Why?
Because slightly different rules had simply been applied-
which led to a different result.
One can never be too cautious: watch out for
inconsistenciesand rule changes, even
minimal ones.
30. What about DSpace 8?
• DSpace 8 is expected to go live in the spring/summer of 2024
• It will not be a major change like DSpace 7 was
Should I upgrade to DSpace 7 or wait for DSpace 8 to be released?
• We suggest to cautiously migrate/upgrade to the most stable version at the
moment of the release, assessing what is better for your institution
• The upgrade from DSpace 7 to Dspace 8 will not require such a big effort
compared the upgrade from DSpace 5 / 6 to 7
• Institutions upgrading from DSpace 7 to DSpace 8 will enjoy features already
implemented in DSpace-CRIS 7, e.g. Notify protocol (contributedby 4Science
+ Harvard), Correction service to enhance data quality (4Science), Duplicate
detection (ported by TLC from our implementation in DSpace-CRIS)
31. Be sure to check every
minimal step and take careful
note of it.
Time spent in analysis and
double-checks is really well
spent
We, at 4Science, would love
to put out expertise at your
service on behalf of the
entire community.
Contact us at: info@4Science.com
Visit our website: www.4science.com
Follow us on social media!
4Science International 4ScienceDSpace
4ScienceIT
4Science
Join the 4Science
newsletter to keep up to
date with news about
our contributions to
DSpace and much more!