SlideShare ist ein Scribd-Unternehmen logo
1 von 8
Downloaden Sie, um offline zu lesen
Technical Data Migration in Six Steps
   by
   Wael O. Elrifai, July 2011




     London - New York - Dubai - Hong Kong - Mumbai   2011
Abstract

Data Migration is a field that is often overlooked as a simple step between retiring an existing
platform and bringing to life a new one. The reality is that we are constantly migrating data between
systems, implementing new applications, setting up co-existence scenarios and sharing information
in a way that could be thought of as data migration.

The actual task of data migration is a complex one requiring experience and a detailed
understanding of the steps to be taken, both in terms of business process and technical
implementation. This includes a clear delineation of responsibility between IT and business, since far
too often the job of manipulating data is relegated to IT, well outside of their core competency.
Enabling the migration requires a team experienced in abstracting the technical details such that the
business team can implement their rules for migration without ever writing a single line of code, a
task for IT.

This document focuses on technical architecture rather than process. We describe an architecture
that we’ve found to be ideal in dozens of large data migration scenarios at organizations ranging
from global Fortune 500 companies to local Small Businesses. This architecture is portable between
different software toolsets; we’ve implemented it in hand-coded systems as well as open source and
best-of-breed ETL/DQ solutions.

This document is aimed at anyone who’s about to embark on a data migration task, or who is trying
to rescue an existing project that is “off-plan”. To gain the most from this document, you should only
require a rudimentary understanding of the terminology of data handling, terms such as ETL
(Extract, Transform, and Load) and Reference Data (e.g. ISO currency codes, lists of possible post
code).




                                                                                                    2
                                   Confidential - not for redistribution
Introduction

The Peak solution for data migration involves a 6-stage process designed to isolate and validate
elemental data changes at the lowest level, hierarchy changes, referential integrity checks, duplicate
checks and capture metadata into distinct steps such that they can be easily tracked, reversed, and a
complete lineage exists for audit purposes. The diagram describes the process at a high level, a more
detailed description can be found below. Please bear in mind that a “stage” refers to the staging
area and the preceding technical ETL/DQ process between the previous stage and that stage. This
document is by no means a complete description of the systems and processes required, but it’s a
good indicator as to the technical elements and requirements of a good data migration system.

This architecture is not tied to a particular ETL tool, but Peak does recommend the use of tools to aid
in re-usability and visibility of process. Some tools are better suited to the “pivot” tasks required at
some points though all major commercial and open source ETL tools can provide the necessary
functionality. No data is deleted at any point within the stages, any data that does not pass a
particular stage is excluded via the error logging mechanism and a “warning” or “error” flag attached
to the record by Stage and Surrogate ID (SID).

Stage 1: Data is read directly from source systems in the native code page and put into database
staging areas in a format and ER model that exactly matches the source systems, whether they are
ERPs, bespoke systems, flat files, or semi-structured data. Some metadata is added to aid in tracking
of records in the form of a surrogate ID (SID) as well as timestamp, source, and other metadata as
appropriate for the system.

Stage 2: Data is moved from the Stage 1 area to the Stage 2 area. Only that subset of columns
considered necessary in the data map to target systems are moved, including any fields necessary to
perform calculations but not necessarily designed to be loaded to the target, as well as all metadata.
SIDs are maintained to provide easy “back tracking”. Any filtering rules to remove data known to be
irrelevant to the migration is performed. Duplicates are removed as deemed appropriate (i.e.
customer master data might be removed while transactions are persisted). Data is still maintained in
the entity-relationship model of the source systems less the columns that have been deemed
extraneous. All actions are logged in the audit and error data model with error data being excluded
from the target.

Stage 3: Data is mapped from the old entity-relationship model to the new one, metadata is
transferred, SIDs maintained, and all actions logged while error records are excluded from the
target. It is important to note that no data translations or calculations have taken place yet. The data
is in the target ER model but not necessarily in the target data formats. All data types are defined as
VARCHAR (MAX LENGTH) to avoid rejecting any source data.

Stage 4: Data is translated to meet target system rules, implement functional requirements, and
checked against target system reference data. All metadata and SIDs are maintained and all actions
logged, both audit and error information. A second duplicate check is performed here to account for
the potential for “new” duplicates create by data translation, of course the concept of a duplicate is
one requiring a business and technical definition. The target data model including data types have


                                                                                                      3
                                    Confidential - not for redistribution
now been put in place and a column-by-column check is enforced to ensure that no data violate the
datatype constraints in the target system. Error records are excluded from the target and logged.

“Dictionary” translations, those translations where a given input from a given source system
translates to a particular output (e.g. “Source System A” uses a 3 character country code while
“Target System X” uses a 2 character one) will have the code abstracted such that the translation is
stored in a value mapping table. Those tables can be easily maintained by the business users via a
spreadsheet that gets loaded into the value mapping table. This ensures that the business maintains
responsibility for the data translation while the development team is responsible for correctly coding
the abstract rule.

Stage 5: Referential integrity checks are performed to ensure that the data structures are
completely ready to be written to the target system. Audit and error data is logged and metadata is
maintained. At this point the data should be error free, any “child” records that did not pass error
checks have been excluded and any parent records that didn’t pass error checks have also been
excluded along with any related child records such that orphan records cannot be written to the
target system. Alteration of the parent-child exclusion rules is possible but heavily dependent on the
nature of the target systems (i.e. SAP will not allow creation of a parent-child hierarchy if parent and
child legal entities don’t already exist in the system).

Stage 6: This stage involves invoking any APIs necessary and passing data in the structure required to
those methods. It also includes writing data directly in cases requiring a direct-write rather than API,
IDOCS, or similar calls.

Reconciliation: The stage 6 reconciliation is performed by reading the data structures from the
target system and comparing backwards to Stage 5 data, performing logical DELETEs against exact
matches.

Fault, fix, and follow-through: Using our internally developed audit and error data model along with
the technical steps detailed here allows a single view of all excluded data, why, and where it was
excluded, and also exposes a simple interface in which a business user can change the data (with
further audit information logged, of course). This system can be built to log exclusions based on
minor differences (i.e. variations of ‘-‘ vs. ‘/’ on dates) or to only indicate major failures (i.e.
variations in transaction values). Ultimately the system allows for “fixing” data in situ and “re-
migrating” the data to the new system.

This system is meant to support test data migrations such that all rules can be verified and enforced
with the goal of ZERO errors in the stages by the time of the final cutover.




                                                                                                      4
                                    Confidential - not for redistribution
Added value

The occasion of a data migration often involves work that can be used to support other initiatives
bringing new value to the enterprise.

       Master Data Management
       As part of the data and application migration, it is often necessary to capture and validate
       various “master data” including:

              Customer data

              Product data

              Reference data

              Value mappings for translation: “Dictionary translations” as described in Stage 4.

       Having developed the data model and captured the requisite information, the possibility of
       implementing a full featured Master Data Hub can be considered. Master data could be
       created, maintained, or simply exposed via this central repository with all systems
       synchronized in real-time or batch modes to this Single Version of Truth.

       Co-existence
       At the time of writing the translations and transformations of data and hierarchies in the
       migration work stream, “reverse functions” are written for those transformations that are
       easily reversed (e.g. VAT added calculations can be easily reversed). Other not easily
       reversible transformations (e.g. concatenations) are “reversed” by writing the rule as though
       it was to support a migration from the target system(s) to the source system(s) with the
       added benefit of being able to examine programmatically the audit trails. With the object-
       oriented writing of all functions, they are easily re-used to support a co-existence scenario of
       legacy and new systems that can be used to replicate data between those systems on a real-
       time or batch basis as appropriate. These transformations can configured to be called via an
       Enterprise Service Hub (ESB) or other methods, as required.

       Archiving
       During any data migration, business requirements should be captured around the availability
       of data; what data should be available on which SLAs in order to ensure that the operational
       system continues to operate at maximum efficiency. A typical plan might envision
       segregating former customers to a secondary system outside of the main physical database
       (for performance reasons) 12 months after they terminate a client relationship. The same
       could be true of older transactions and other data. The ideal technical system would have
       the ability to treat data in an “object oriented” form, archiving off such “objects” like a
       customer, automatically capturing all information related to that customer and moving it to
       the secondary system. The data can still be interrogated but is no longer cause for
       performance degradation as data volumes expand. Use of this tactic ensures that the sizing
       of databases and therefore software and hardware requirements can be easily predicted
       and is only variable insofar as operational behaviors vary, or the client base expands at
                                                                                                     5
                                   Confidential - not for redistribution
which time additional hardware or software can become necessary. This function can also
support a full Information Lifecycle Management system that includes creation of data for
system testing with features such as object movement (with referential integrity
maintained) and data masking for security. Additional work required to support this would
include creation of a security model that defines how datasets can be created, who can view
unmasked data (at the branch/column/customer/etc. level), and how the datasets can be
extracted or moved.

There are existing software products that can best leverage the work that will have already
been performed to support the Information Lifecycle Management system.




                                                                                         6
                           Confidential - not for redistribution
Getting Started

Our experience shows that if a well-defined business process is implemented, along with our
technical steps as described above, you can significantly reduce the cost of your data migration and
eliminate many sources of risk, especially when that migration is repetitive in nature (multiple
countries to be migrated independently over a number of months/years to a new system). A typical
cost curve levels out at 30% cost per migration after the second migration. Furthermore you can
leverage the artifacts created to gain new value as described in “Added Value” above.

You can contact me directly via email at WAEL@peakconsulting.eu or via our website at
www.peakconsulting.eu to learn more about how Peak Consulting can help you successfully
complete your data migration or other “big data” initiative.




                                                                                                  7
                                   Confidential - not for redistribution
diagram 1: Data Migration Architecture Diagram


                                                                                8
                                        Confidential - not for redistribution

Weitere ähnliche Inhalte

Andere mochten auch

B2B White Papers Examples
B2B White Papers ExamplesB2B White Papers Examples
B2B White Papers ExamplesAna Thompson
 
Creating effective white papers and industry reports
Creating effective white papers and industry reportsCreating effective white papers and industry reports
Creating effective white papers and industry reportsSonia Quinones
 
Presentation on White paper
Presentation on White paperPresentation on White paper
Presentation on White paperMuhammad Rehman
 
Writing Test Cases 20110808
Writing Test Cases 20110808Writing Test Cases 20110808
Writing Test Cases 20110808slovejoy
 
Fragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed DatabaseFragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed DatabaseAbhilasha Lahigude
 
How to write a good white paper
How to write a good white paperHow to write a good white paper
How to write a good white paperslidale
 

Andere mochten auch (8)

B2B White Papers Examples
B2B White Papers ExamplesB2B White Papers Examples
B2B White Papers Examples
 
Creating effective white papers and industry reports
Creating effective white papers and industry reportsCreating effective white papers and industry reports
Creating effective white papers and industry reports
 
Presentation on White paper
Presentation on White paperPresentation on White paper
Presentation on White paper
 
Writing Test Cases 20110808
Writing Test Cases 20110808Writing Test Cases 20110808
Writing Test Cases 20110808
 
Blockchain white paper
Blockchain white paperBlockchain white paper
Blockchain white paper
 
Fragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed DatabaseFragmentation and types of fragmentation in Distributed Database
Fragmentation and types of fragmentation in Distributed Database
 
LinkedIn Powerpoint Template - Slideworld.com
LinkedIn Powerpoint Template - Slideworld.comLinkedIn Powerpoint Template - Slideworld.com
LinkedIn Powerpoint Template - Slideworld.com
 
How to write a good white paper
How to write a good white paperHow to write a good white paper
How to write a good white paper
 

Kürzlich hochgeladen

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 

Kürzlich hochgeladen (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 

Data Migration White Paper

  • 1. Technical Data Migration in Six Steps by Wael O. Elrifai, July 2011 London - New York - Dubai - Hong Kong - Mumbai 2011
  • 2. Abstract Data Migration is a field that is often overlooked as a simple step between retiring an existing platform and bringing to life a new one. The reality is that we are constantly migrating data between systems, implementing new applications, setting up co-existence scenarios and sharing information in a way that could be thought of as data migration. The actual task of data migration is a complex one requiring experience and a detailed understanding of the steps to be taken, both in terms of business process and technical implementation. This includes a clear delineation of responsibility between IT and business, since far too often the job of manipulating data is relegated to IT, well outside of their core competency. Enabling the migration requires a team experienced in abstracting the technical details such that the business team can implement their rules for migration without ever writing a single line of code, a task for IT. This document focuses on technical architecture rather than process. We describe an architecture that we’ve found to be ideal in dozens of large data migration scenarios at organizations ranging from global Fortune 500 companies to local Small Businesses. This architecture is portable between different software toolsets; we’ve implemented it in hand-coded systems as well as open source and best-of-breed ETL/DQ solutions. This document is aimed at anyone who’s about to embark on a data migration task, or who is trying to rescue an existing project that is “off-plan”. To gain the most from this document, you should only require a rudimentary understanding of the terminology of data handling, terms such as ETL (Extract, Transform, and Load) and Reference Data (e.g. ISO currency codes, lists of possible post code). 2 Confidential - not for redistribution
  • 3. Introduction The Peak solution for data migration involves a 6-stage process designed to isolate and validate elemental data changes at the lowest level, hierarchy changes, referential integrity checks, duplicate checks and capture metadata into distinct steps such that they can be easily tracked, reversed, and a complete lineage exists for audit purposes. The diagram describes the process at a high level, a more detailed description can be found below. Please bear in mind that a “stage” refers to the staging area and the preceding technical ETL/DQ process between the previous stage and that stage. This document is by no means a complete description of the systems and processes required, but it’s a good indicator as to the technical elements and requirements of a good data migration system. This architecture is not tied to a particular ETL tool, but Peak does recommend the use of tools to aid in re-usability and visibility of process. Some tools are better suited to the “pivot” tasks required at some points though all major commercial and open source ETL tools can provide the necessary functionality. No data is deleted at any point within the stages, any data that does not pass a particular stage is excluded via the error logging mechanism and a “warning” or “error” flag attached to the record by Stage and Surrogate ID (SID). Stage 1: Data is read directly from source systems in the native code page and put into database staging areas in a format and ER model that exactly matches the source systems, whether they are ERPs, bespoke systems, flat files, or semi-structured data. Some metadata is added to aid in tracking of records in the form of a surrogate ID (SID) as well as timestamp, source, and other metadata as appropriate for the system. Stage 2: Data is moved from the Stage 1 area to the Stage 2 area. Only that subset of columns considered necessary in the data map to target systems are moved, including any fields necessary to perform calculations but not necessarily designed to be loaded to the target, as well as all metadata. SIDs are maintained to provide easy “back tracking”. Any filtering rules to remove data known to be irrelevant to the migration is performed. Duplicates are removed as deemed appropriate (i.e. customer master data might be removed while transactions are persisted). Data is still maintained in the entity-relationship model of the source systems less the columns that have been deemed extraneous. All actions are logged in the audit and error data model with error data being excluded from the target. Stage 3: Data is mapped from the old entity-relationship model to the new one, metadata is transferred, SIDs maintained, and all actions logged while error records are excluded from the target. It is important to note that no data translations or calculations have taken place yet. The data is in the target ER model but not necessarily in the target data formats. All data types are defined as VARCHAR (MAX LENGTH) to avoid rejecting any source data. Stage 4: Data is translated to meet target system rules, implement functional requirements, and checked against target system reference data. All metadata and SIDs are maintained and all actions logged, both audit and error information. A second duplicate check is performed here to account for the potential for “new” duplicates create by data translation, of course the concept of a duplicate is one requiring a business and technical definition. The target data model including data types have 3 Confidential - not for redistribution
  • 4. now been put in place and a column-by-column check is enforced to ensure that no data violate the datatype constraints in the target system. Error records are excluded from the target and logged. “Dictionary” translations, those translations where a given input from a given source system translates to a particular output (e.g. “Source System A” uses a 3 character country code while “Target System X” uses a 2 character one) will have the code abstracted such that the translation is stored in a value mapping table. Those tables can be easily maintained by the business users via a spreadsheet that gets loaded into the value mapping table. This ensures that the business maintains responsibility for the data translation while the development team is responsible for correctly coding the abstract rule. Stage 5: Referential integrity checks are performed to ensure that the data structures are completely ready to be written to the target system. Audit and error data is logged and metadata is maintained. At this point the data should be error free, any “child” records that did not pass error checks have been excluded and any parent records that didn’t pass error checks have also been excluded along with any related child records such that orphan records cannot be written to the target system. Alteration of the parent-child exclusion rules is possible but heavily dependent on the nature of the target systems (i.e. SAP will not allow creation of a parent-child hierarchy if parent and child legal entities don’t already exist in the system). Stage 6: This stage involves invoking any APIs necessary and passing data in the structure required to those methods. It also includes writing data directly in cases requiring a direct-write rather than API, IDOCS, or similar calls. Reconciliation: The stage 6 reconciliation is performed by reading the data structures from the target system and comparing backwards to Stage 5 data, performing logical DELETEs against exact matches. Fault, fix, and follow-through: Using our internally developed audit and error data model along with the technical steps detailed here allows a single view of all excluded data, why, and where it was excluded, and also exposes a simple interface in which a business user can change the data (with further audit information logged, of course). This system can be built to log exclusions based on minor differences (i.e. variations of ‘-‘ vs. ‘/’ on dates) or to only indicate major failures (i.e. variations in transaction values). Ultimately the system allows for “fixing” data in situ and “re- migrating” the data to the new system. This system is meant to support test data migrations such that all rules can be verified and enforced with the goal of ZERO errors in the stages by the time of the final cutover. 4 Confidential - not for redistribution
  • 5. Added value The occasion of a data migration often involves work that can be used to support other initiatives bringing new value to the enterprise. Master Data Management As part of the data and application migration, it is often necessary to capture and validate various “master data” including:  Customer data  Product data  Reference data  Value mappings for translation: “Dictionary translations” as described in Stage 4. Having developed the data model and captured the requisite information, the possibility of implementing a full featured Master Data Hub can be considered. Master data could be created, maintained, or simply exposed via this central repository with all systems synchronized in real-time or batch modes to this Single Version of Truth. Co-existence At the time of writing the translations and transformations of data and hierarchies in the migration work stream, “reverse functions” are written for those transformations that are easily reversed (e.g. VAT added calculations can be easily reversed). Other not easily reversible transformations (e.g. concatenations) are “reversed” by writing the rule as though it was to support a migration from the target system(s) to the source system(s) with the added benefit of being able to examine programmatically the audit trails. With the object- oriented writing of all functions, they are easily re-used to support a co-existence scenario of legacy and new systems that can be used to replicate data between those systems on a real- time or batch basis as appropriate. These transformations can configured to be called via an Enterprise Service Hub (ESB) or other methods, as required. Archiving During any data migration, business requirements should be captured around the availability of data; what data should be available on which SLAs in order to ensure that the operational system continues to operate at maximum efficiency. A typical plan might envision segregating former customers to a secondary system outside of the main physical database (for performance reasons) 12 months after they terminate a client relationship. The same could be true of older transactions and other data. The ideal technical system would have the ability to treat data in an “object oriented” form, archiving off such “objects” like a customer, automatically capturing all information related to that customer and moving it to the secondary system. The data can still be interrogated but is no longer cause for performance degradation as data volumes expand. Use of this tactic ensures that the sizing of databases and therefore software and hardware requirements can be easily predicted and is only variable insofar as operational behaviors vary, or the client base expands at 5 Confidential - not for redistribution
  • 6. which time additional hardware or software can become necessary. This function can also support a full Information Lifecycle Management system that includes creation of data for system testing with features such as object movement (with referential integrity maintained) and data masking for security. Additional work required to support this would include creation of a security model that defines how datasets can be created, who can view unmasked data (at the branch/column/customer/etc. level), and how the datasets can be extracted or moved. There are existing software products that can best leverage the work that will have already been performed to support the Information Lifecycle Management system. 6 Confidential - not for redistribution
  • 7. Getting Started Our experience shows that if a well-defined business process is implemented, along with our technical steps as described above, you can significantly reduce the cost of your data migration and eliminate many sources of risk, especially when that migration is repetitive in nature (multiple countries to be migrated independently over a number of months/years to a new system). A typical cost curve levels out at 30% cost per migration after the second migration. Furthermore you can leverage the artifacts created to gain new value as described in “Added Value” above. You can contact me directly via email at WAEL@peakconsulting.eu or via our website at www.peakconsulting.eu to learn more about how Peak Consulting can help you successfully complete your data migration or other “big data” initiative. 7 Confidential - not for redistribution
  • 8. diagram 1: Data Migration Architecture Diagram 8 Confidential - not for redistribution