IRJET- A Scrutiny on Research Analysis of Big Data Analytical Method and Clou...
MDM AS A METHODOLOGY
1. MDM AS A METHODOLOGY Page 1
MDM AS A METHODOLOGY
Written by Janet Wetter an Enterprise Information Architect
Submitted by:
POB 247
Owenton, KY 40359
Tel: 502-514-3969
www.wetterlands.com
2. MDM AS A METHODOLOGY Page 2
The Reason
Small tweaks and incremental investments in data accessibility and intelligence can
have big returns in revenues, growth and innovation, according to findings from a study
of Fortune 1000 companies released today. Touting millions of dollars in added revenue
possibilities and avenues of business, the study – entitled “Measuring the Business
Impacts of Effective Data” – was conducted by Sybase, an SAP company, and the
University of Texas, in conjunction with the Indian School of Business. It reviewed large,
global corporations and surveyed their positions and plans for data management
systems.
When data accessibility and intelligence are increased by 10 percent, revenue
generated from new customers grew by 0.7 percent, or, as applied to the median
organization in the study sample, $14.7 million. As far as revenue from new products
and services – a common measure of business innovation – a 10-percent increase in
development of data accessibility and intelligence registered 0.81 percent increase in
revenue, which, for the median organization in the study - $17 billion in annual revenues
with $2.1 billion coming from new products in services – meant an additional $17 million
for the year.
Anitesh Barua, lead researcher with the University of Texas, says the summary of
additional dedication to data should spark a dollars-and-cents interest with top level
managers and executives as well as IT officers and CIOs.
“We’re not suggesting it’s low-hanging fruit. It’s almost a mindset change,” Barua says.
Companies finding gains from investment in data – or not taking advantage of the
possibilities – cross all business sectors. In particular, the petroleum market has the
most to gain from enhancing its data quality, with a 14-fold increase in revenues from
even a slight jump in company-wide, usable data, according to the study.
Referencing fiscal advantages from changes to data use and understanding by
employees made by Charles Schwab following an economic downturn in 2001, Barua
said that corporations stand to make even greater gains with improvements and
investment in data at all levels.
“We’re saying at the end of the day, it’s still a lot of data and we’re drowning in
transaction-level data. But, if you can make that meaningful – not just accurate and
timely – you can reap a lot of benefits,” he says.
3. MDM AS A METHODOLOGY Page 3
Still, the importance of usable, clear data is largely relegated to IT departments, the
researchers concluded. For example, only a “lowly” 3.7 percent of large companies
reported that investing in data accessibility and intelligence was part of their
development of revenue streams and method of acquiring new customers, Barua says,
alluding to a resistance or ignorance of data quality or advancing technology with many
successful businesses.
These findings were the second release in a three-part installment quantifying the
relationship between effective data and a company’s performance. The first installment,
release in July, reviewed the connections between incremental investments in data and
the key financial indicators of an enterprise’s health and profitability. Later this fall,
researchers will release the final portion of findings on the operational impacts of
effective data with a focus on improving the accuracy of planning and forecasting.
The above article was written by Justin Kern who is an associate editor at Information
Management Magazine.
The DATA MANAGEMENT (Technical) Methodology is one of four pyramids in the
MDM DPGA Framework Product. The other three pyramids are the INFORMATION
GOVERNANCE (Governance), BUSINESS PROCESS (Business) & SYSTEM
ARCHITECTURE (Operational) pyramids which relate to each other across the six
layers of each pyramid to provide a complete view of the information environment. The
INFORMATION GOVERNANCE Methodology reviews and approves the artifacts from
the DATA MANAGEMENT Methodology.
The History
MDM is a lot of things to a lot of people and vendors and currently has no single
definition. MDM stands for Master Data Management and has its roots in creating global
reference tables of master data used within a company. Database Administrators are
credited with developing the idea because each application they backed up had
repeated reference tables that spanned the company, hence global reference tables.
And, since the DBA backup window kept shrinking, the need to consolidate these
identical application reference tables into a single instance for backing up became
popular. Creating a database of global reference tables for common usage became
known as Master Data Management – MDM.
The foundations for Data Governance also sprang from the development of MDM due to
the combining of the many application reference tables of the same origin. Identical
reference tables with common data values were often tweaked by the application so
that the set of data values differed but the common master reference table needed a
4. MDM AS A METHODOLOGY Page 4
single set of data values everyone agreed on. Deciding on the common set of data
values were all of the owners of the applications these values were consolidated from.
Hence the beginning of Data Stewardship to derive a common set of data values for a
given master reference table.
Since this idea worked so well with common master reference tables and, since a
process was in place to resolve conflicts for determining common data values, people
began applying it to other conflicting data values like common customer information,
then common product information and so on. The data stewards soon needed a
repository to contain the flow of ever increasing metadata from these consolidations so
the Metadata Repository (Data Dictionary) was utilized. And, soon after, the MDM
Methodology for standardizing the process of capturing, resolving and recording the
metadata developed by the data stewards was created and the process of Data
Harmonization was born.
The above described methodology was in its infancy when Sarbanes-Oxley was
implemented and soon grew popular as a way to find the “TRUTH” in the multitudes of
company data for reporting. The Business Intelligence practices tried to use this
methodology with mixed results because the underlying consolidated data foundation
was in various stages of development across companies. The BI group needed the data
foundation to find the truth.
As MDM became popular the software vendors were not far behind…including various
versions of “MDM” in their products. So the concept of MDM grew from its humble
origins to the buzz word of today with many flavors, ideas and concepts to fit the vendor
tool it is implemented within. Some have a metadata repository with associated tools,
some use just the physical definition of profiling data and integration and a few
understand that a Data Management Life Cycle is involved containing all of the aspects
of MDM. A good definition is:
A Metadata Repository, a Data Governance Organization and a Data
Harmonization Process that spans both the application and enterprise levels of a
company at the conceptual, logical and physical layers of structured, semi-
structured and unstructured data became known as the MDM Framework –
MDMF. A bottom up design with top down validation.
The concept of a completely integrated agile framework for integrating data across an
enterprise is popular because it utilizes the company’s existing tool environment saving
expensive software acquisitions. Below is a diagram identifying the components of the
MDM Framework and the integrated structure between the components.
5. MDM AS A METHODOLOGY Page 5
The Mission
At the core of the MDMF is the Enterprise Data Model which represents the six layers of
data within a company. A Data Management Strategy for linking the Enterprise Data
Model to both the Project Layer and the Global Enterprise Layer using standardized
Templates that drive the Data Management Life Cycles for both layers. The Templates
are the project artifacts (deliverables) utilized as collection devices for interfacing with
the business community, requirements conformance for IT personnel, project gate
reviews and for loading the evaluated and approved data into a Metadata Repository.
Using a pre-defined yet flexible MDM Framework standardizes the data integration for a
client with a custom environment by quickly incorporating an existing System
Development Life Cycle (SDLC), tool set and Metadata Repository of any combination
present in the client’s current environment whereby avoiding new tool expenditures. The
6. MDM AS A METHODOLOGY Page 6
MDMF is an intranet based framework providing company wide access for all to view
while providing a single place of reference for information management including Data
Governance Decisions, Business Metadata and Project Data Gate Reviews.
UST has a well defined Methodology to quickly implement the MDMF in a client
environment. Beginning with the function of Road Mapping which documents the client’s
Vision and creates a prioritized Data Integration List covering the scope of the
engagement. The Data Integration List is then used as the input for the Project Data
Management Life Cycle to execute the Template Artifacts associated with the
engagement. The Template Artifacts are the input for the Data Harmonization process
where the Master Reference Tables and Base Data Elements and Entities are derived.
Existing Master Reference Tables and Base Data Elements are mapped to the current
Data Governance baseline of data using Source to Target Mapping techniques. Newly
derived Master Reference Tables and Base Data Elements and Entities are then
inputted into the Data Governance Process, as well as any metadata changes made to
existing published metadata in the Enterprise Metadata Repository.
As the Template Artifacts are approved thru Gate Reviews the information is loaded into
the Metadata Repository and reports published to the MDMF. Project Management is
enhanced by the MDMF published reports which detail the metadata captured and
processed to date by any project. Additional reports from the Data Governance Group
are also published for easy access to the most current information on approved Master
Reference Tables and their data values and up to date information on the Base Data
Elements and Entities for use by the projects in Source to Target Mapping and in the
Data Harmonization Process.
The Core
The core of the MDMF is the Enterprise Data Model detailed below with the six levels of
company data identified. The Enterprise Data Model is a well known industry standard
model of 5 layers but is enhanced by adding one additional layer to accommodate the
requirements of MDM artifacts in the MDM Framework. Below is a diagram depicting
the six layers of the Enterprise Data Model:
7. MDM AS A METHODOLOGY Page 7
The Enterprise Data Model represents all of the company information formatted into the
Conceptual, Logical and Physical data that a company acquires. The three types of data
are structured, semi-structured and unstructured information that is used to operate the
company business. The top three layers are the conceptual artifacts. The fourth and fifth
layers are the logical artifacts and the last, or sixth layer, is the physical layer where the
three types of data exists.
8. MDM AS A METHODOLOGY Page 8
The first layer is the Enterprise Subject Area Model ESAM which is a high level
business view of the company data. It contains the major and sub-components of the
business domains and it reflect the company structure or organization of data. Typical
business domains of data would be People, Organizations, Product, Facility, ERP
Functions (Ordering, Shipping and Billing of Products), Work Effort, HR and Accounting.
Industry specific Business Domains are added and customized for the individual
business type.
9. MDM AS A METHODOLOGY Page 9
The second layer is the Enterprise Conceptual Data Model ECM which represents a
high level view of the data in the company. It is similar to the Enterprise Subject Area
Model but it contains the relationships between the Data Domains showing how the
business functions. This model is an entity relationship view only and does not contain
data elements.
10. MDM AS A METHODOLOGY Page 10
The Enterprise Data Object Model EDOM is the third layer of the Enterprise Data Model
and is an additional layer added to the traditional model for the MDM functions of a
company. The EDOM is the MDM area which includes the Base Data Elements and
Base Entities; Global Reference Tables, The Enterprise Metadata Repository Model
and the Data Governance Organization structures and processes.
11. MDM AS A METHODOLOGY Page 11
The fourth layer is the Logical Data Model LDM which is a specific business functional
view of a subset of the company data within the Enterprise Conceptual Model. It
contains a low level normalized Entity Relationship diagram attributed with the Base
Data Elements or their alias equivalent. An Alias Data Element is a data element that
has been mapped to a Base Data Element thru the Data Harmonization Process.
The Logical Transformation Model LTM, the fifth layer, is created from the completed
Logical Data Model after deciding which DBMS is to be used in the physical
environment. This is the physicalization of the logical design to determine whether it will
remain in a normalized form or be transcribed into a dimensional model or other
structure. The intention of the Logical Transformation Model is to maximize performance
in the physical environment based on the best design for that environment and the
usage of the data contained within that design.
The final sixth layer is the Physical Data Model PDM which is the Data Definition
Language (DDL) of the selected Database Management System (DBMS) that contains
the code to implement the Logical Transformation design in the company environment.
This layer also contains the semi-structured and unstructured data within the company
which can be mapped to a data structure in the LDM.
12. MDM AS A METHODOLOGY Page 12
The Templates
The Templates are EXCEL spreadsheets designed to map the reviewed data into the
Metadata Repository. If designed correctly they can be uploaded directly into the
repository. They are used as both collection devices within the IT community and as an
interface with the Business users. Vendor data tools are not wide spread throughout a
company but everyone usually has access to Excel so they provide a common
foundation to share information. The Templates also provide a standardized format for a
System Development Life Cycle (SDLC) review process across all applications being
incorporated into the Enterprise Data Model.
There are several Templates possible but the four major templates common to all
projects are:
Conceptual Data Element Template is a standardized format to collect data and the
metadata associated with it at the individual data element level for inclusion in
Enterprise Metadata Repository. The conceptual data element is the foundation and the
integration point for data design at the higher levels. The template is customized for
each client based on the standardized metadata requirements across all projects and
the Enterprise Metadata Repository being used by the client.
Logical Data Element Template collects the outcome of the Entity Relationship
Diagram process in a standardized format for inclusion in Enterprise Metadata
Repository after the logical design process is completed.
Physical Data Element Template describes the table and column attributes of the
physical database design after the Logical Transformation Model is completed.
Database Structure Template includes the physical attributes at a table level to
complete the database design after the Logical Transformation Model. The templates
require a design review using the Physical and Database Structure Templates as input.
Additional Templates like Domain, Domain Values and Content media, ETC. are used
based on the Metadata Repository being used by the client.
13. MDM AS A METHODOLOGY Page 13
The Strategy
The MDM Strategy ties together the Enterprise Data Model and the two levels of
Methodology (Project & Enterprise Data Management Life Cycles, discussed in the next
section) thru the Templates. Below is a diagram of that relationship.
Information is usually collected thru the project level where a new application or COTS
product is being incorporated into the Enterprise layer. The Conceptual Data Element
Template can also be used to collect new data elements to be considered by the Data
Governance process for inclusion as Base Data Elements. Most projects have an
existing database which needs to be reverse engineered into a logical format and added
to a Conceptual Data Element Template to be processed thru the Project Data
Management Life Cycle Methodology. Proceeding thru the steps of the methodology will
define a new Logical Data Model that is mapped to the existing Enterprise Data Model
using the Data Harmonization process. Changes or enhancements to the Logical Data
Model can be made at this point in time in a project.
The Conceptual Data Element Template is the bridge between the Project Data
Management Life Cycle and the Enterprise Data Management Life Cycle. Collected
project data elements that were not mapped as an alias into the existing Base Data
Elements in the Metadata Repository are then sent to the Data Governance process for
14. MDM AS A METHODOLOGY Page 14
consideration as new Base Data Elements. Updates to existing Base Data Element
metadata can also be forwarded to the Data Governance process this way.
The Data Harmonization Process
Data Harmonization is a reoccurring process within the Data Management Life Cycle at
the project level which is managed by the Data Architect. Data Harmonization is also
the bridge between the project layer and the enterprise layer of the Data Management
Life Cycle which is managed by the Data Governance organization. The collection,
cleansing and comparing of Data Element between both layers is recorded in an
Enterprise Metadata Repository maintaining the original data source to target mappings
and resulting transformation rules so impact analyst can be performed across the
enterprise.
Example of a Base Data Elements where all business rules are removed:
Entity Person:
Person First Name Text
Person Last Name Text
Person Identifier
Two examples of the logical data elements derived from the Base Data Elements when
a business rule is associated with it:
Entity Customer: Business Rule – Person is Customer
Customer First Name Text
Customer Last Name Text
Customer Identifier
OR Entity Employee: Business Rule – Person is Employee
Employee First Name Text
Employee Last Name Text
Employee identifier
15. MDM AS A METHODOLOGY Page 15
The Methodology
The Project Data Management Life Cycle PDMLC drives the Enterprise Data
Management Life Cycle EDMLC thru the Data Harmonization process. And the
Conceptual Data Element Template is the link between the two methodologies. The
PDMLC is a standard data management process for a project creating a new database
or enhancing an existing one. The difference is the addition of the Data Harmonization
process which utilizes the Templates to communicate information to the EDMLC in a
standardized fashion using the Data Governance process.
The PDMLC is open source and the System Development Life Cycle SDLC can be of
the client’s choosing. Likewise, the Enterprise Metadata Repository is chosen by the
client and as well as incorporating the existing tool set at the client site into the PDMLC
tasks to product the project artifacts. The flexibility of the MDMF helps the client contain
cost by avoiding new software expenditures while providing the advantages of
integrating the company data.
Below is the diagram for the PDMLC:
The number of PDM Phases is adjusted by the client’s choice of SDLC.
16. MDM AS A METHODOLOGY Page 16
The EDMLC is driven by the PDMLC’s Data Harmonization process that maps the
project data elements collected on the Conceptual Data Element Template to the
existing Base Data Elements approved by the Data Stewards that exist in the Metadata
Repository. Project data elements that are not mapable are then forwarded to the Data
Stewards to be processed thru the Data Governance process for new Base Data
Elements. Below is the PDMLC diagram depicting the enterprise tasks for MDM data
integration:
These tasks align the Enterprise Object Data Model, the Enterprise Conceptual Data
Model and the Enterprise Metadata Repository using the predefined methodology. The
official version of the results from the Data Governance process can be posted for
employee knowledge thru out the company via the company intranet MDMF interface
providing an up to date version of the currently approved metadata. Other projects can
immediately take advantage of these results and build on them.
17. MDM AS A METHODOLOGY Page 17
The Case Studies
Two case studies are documented below. These case studies are both for a large data
integration effort but this methodology can be used for any size of data integration effort.
The first involves a project for US Customs (Department of Homeland Security) to
integrate 24 diverse government agencies across the trade domain to create a web
based application (ACE) that allows a single user interface for the import and export of
all trade goods in the USA.
A data dictionary was created for each of the 24 agencies using Data Harmonization for
all of the agency applications that involved trade data. As Each data dictionary was
completed for an agency it was the integrated into the Metadata Repository using Data
Harmonization and a Data Governance process. After all data was compiled for the 24
agencies the total of initial data elements was over twenty two thousand. The Data
Harmonization process reduced that number to less than twelve hundred Base Data
Elements which were added to the IBM Infosphere Data Architect Metadata Repository
to provide the basis for the SAP Trade Data Module.
The second case study involves the US Air Force Research Labs (HQ WPAFB in
Dayton OH). Over time many common applications became customized at the 10
directorates under the Research Labs umbrella. HQ needed a common version and a
single data model for all of the directorates to input their data for reporting. A common
data dictionary was created using Data Harmonization and a Data governance process
which provided the data elements for the unified data model and the original source to
target mappings for each directorate. The initial effort was so successful the MDMF
became the common procedure for the Research Labs for all home grown and vendor
applications providing a common data framework across all Research Lab units and
external interfaces.
18. MDM AS A METHODOLOGY Page 18
The Advantage
The MDMF provides a low cost solution to integrating company data in a standardized
repeatable fashion. The MDMF Methodology allows you to quickly integrate existing
projects into a unified platform providing an Enterprise view of the data. The Enterprise
layer can be built one application at a time while integrating the common data in any
business domain. The pace of data integration is easily controlled and managed with
the MDMF reusable Project Data Management Life Cycle eliminating a company-wide
initial effort.
The input for the Data Governance process is built into the MDMF so only the actual
Data Governance structure, which is right for the organization of the company, needs to
be implemented. The MDMF use of predefined Templates allows an automatic load of
data into the Metadata Repository after design reviews and SDLC phase approval is
completed. Metadata Repository reporting is unified and available to all of the
employees of the company thru the intranet interface while the actual reports or files
can be stored in any media.
The MDMF is cost effective because it does not require the purchase of new software
products but incorporates the existing company tool set into the complete methodology.
The primary costs are the initial consulting fees for setup of the MDMF then each
project can be funded separately as funds allow providing large savings over time.
Information from COTS or vendor products can be assimilated into the Metadata
Repository using the MDMF while the original data remains in the tool.
The unique ability to be customized for the current environment without the restrictions
of a predefined vendor process or a box approach provides easy acceptance by the
company employees. The consolidation of data into to an easy to access information
reporting structure available to all employees cuts cost of locating and utilizing
information and provides a foundation for Business Intelligence Reporting.
This low cost open source solution provides a substantial cost savings over time while
providing complete data integration at a pace determined by the client.
Enjoy!