1. CDMP - Certified Data
Management Professional
DMBOK V.2
Trainer :
Hery Purnama, SE., MM.
MCP, PMP, ITILF, CISA, CISM, CISSP, CDMP, COBIT, CTFL,
TOGAF9
2. Mr. Hery Purnama is an IT Practitioner, Lecturer and IT
Consultant in Bandung, with more than 20 years of
experience in various IT projects with specialization in
System Development, Bigdata, Data Science, Internet of
Things, ISO, Project Management, IT Service Management,
I.S Governance, InfoSec Governance, Data Governance ,
Enterprise Architect , Quality Assurance, and IT Audit
Until now he is still actively working as a consultant and also
a trainer with clients from the Government, BUMN, Mining,
Industrial Banking, Telecommunications.
Some of the international certifications he holds are:
MCP, PMP, ITILF, COBIT, CGEIT, CDMP, CISA, CISM, CISSP, CTFL,
TOGAF 9
5. 100 Questions Covers 14 Topics of DMBOK2
1. Data Management Process – 2%
2. Data Ethics – 2%
3. Data Governance – 11%
4. Data Architecture – 6%
5. Data Modelling and Design – 11%
6. Data Storage and Operations – 6%
7. Data Security – 6%
8. Data Integration and Interoperability – 6%
9. Document and Content Management – 6%
10. Master and Reference Data Management – 10%
11. Data Warehousing and Business Intelligence – 10%
12. Metadata Management – 11%
13. Data Quality – 11%
14. Big Data – 2%
9. • LET’S GET EXERCISE
•https://wato.xyz/cdmppractice1
passcode : cdmp
10. Introduction
• Data Management is the development, execution, and supervision of
plans, policies, programs, and practices that deliver, control, protect,
and enhance the value of data and information assets throughout
their lifecycles.
• Data Management Professional is any person who works in any facet
of data management
• Data is the ‘currency’, the ‘life blood’, and even the ‘new oil’ of the
information economy.
• Business Driver : Data Asset Value
• Data Management Goals
11. Essential Concept
• VARIOUS DATA DEFINITIONS :
• data emphasize its role in representing facts about the
world. (Common)
• Data information that has been stored in digital form (IT)
• Facts : Data is a mean representation Need Context
(Metadata)
12. Essential Concept
• DATA VS INFORMATION :
• DATA PYRAMID DIKW :
1.DATA (RAW) ->
2.INFORMATION (WHO,WHAT,WHEN, WHERE) ->
3.KNOWLEDGE (HOW) ->
4.WISDOM (WHY)
KNOWLEDGE & WISDOM
DATA & INFORMATION GOALS
DKIW
Example Data vs Information
(“Here is a sales report for the last quarter [information]. It is based on
data from our data warehouse [data]. Next quarter these results [data]
will be used to generate our quarter-over-quarter performance measures
[information]”)
13. Essential Concept
• Data as an Organizational Asset (Economic
Resources :
shows up as an item on the Profit and Loss
Statement (P&L) ,
& to make more effective decisions and to
operate more efficiently
• Data Management Principles >
• Data Management Challenges (Differs,
Valuation, Quality, Planning for Better Data,
Metadata and Meta management , Cross
functionality..)
15. The focus of data management on the data lifecycle
IMPLICATIONS :
•Creation and usage are the most critical points in the data lifecycle
•Data Quality must be managed throughout the data lifecycle
•Metadata Quality must be managed through the data lifecycle
•Data Security must be managed throughout the data lifecycle
•Data Management efforts should focus on the most critical data
16. Data and Risk
• Low Quality Data (Inaccurate, Incomplete, Out of Date)
• Missunderstood,Missused
“Information Gaps : the difference between what we know and what
we need to know to make an effective decision.
Information gaps represent enterprise liabilities with potentially
profound impacts on operational effectiveness and profitability. “
• The increased role of information as an organizational asset across all
sectors has led to an increased focus by regulators and legislators on
the potential uses and abuses of information
17. Data Management Strategy
The components of a data management strategy :
•A compelling vision for data management
•A summary business case for data management, with selected examples
•Guiding principles, values, and management perspectives
•The mission and long-term directional goals of data management
•Proposed measures of data management success
•Short-term (12-24 months) Data Management program objectives that are SMART
(specific, measurable,actionable, realistic, time-bound)
•Descriptions of data management roles and organizations, along with a summary of their
responsibilitiesand decision rights
•Descriptions of Data Management program components and initiatives
•A prioritized program of work with scope boundaries
•A draft implementation roadmap with projects and action items
33. • LET’S GET PRACTICE
•https://wato.xyz/cdmppractice2
passcode : cdmp
34. Introduction
• Data handling ethics are concerned with how to procure, store,
manage, use, and dispose of data in ways that are aligned with ethical
principles.
• Handling data in an ethical manner is necessary to the long-term
success of any organization that wants to get value from its data.
35. Core Concept
• Impact on people: Because data represents characteristics of
individuals and is used to make decisions that affect people’s lives,
there is an imperative to manage its quality and reliability.
• Potential for misuse: Misusing data can negatively affect people and
organizations, so there is an ethical imperative to prevent the misuse
of data.
• Economic value of data: Data has economic value. Ethics of data
ownership should determine how that value can be accessed and by
whom.
36. Context Diagram >
• There is an ethical
imperative not only to
protect data, but also to
manage quality
37. Business Driver
• Ethical data handling can increase the trustworthiness of an
organization and the organization’s data and process outcomes.
• This can create better relationships between the organization and its
stakeholders.
“The emerging roles of Chief Data Officer, Chief Risk Officer, Chief Privacy Officer,
and Chief Analytics Officer are focused on controlling risk by establishing
acceptable practices for data handling “
38. Ethical Principles for Data
• Respect for Person : respects their dignity and autonomy as
human individuals
• Beneficence : do not harm; maximize possible benefits and
minimize possible harms.
• Justice : fair and equitable treatment of people
40. Risks of Unethical Data Handling Practices
• Timing
• Misleading Visualizations
• Unclear Definitions or Invalid Comparisons
• Bias ( Data Collection for pre-defined result, Biased use of data collected,
Hunch and search, Biased sampling methodology, Context and Culture )
• Transforming and Integrating Data ( Limited knowledge of data’s origin
and lineage, Data of poor quality, Unreliable Metadata, No
Documentation)
• Obfuscation / Redaction of Data ( Data aggregation, Data Marking, Data
Masking)
41. Establishing an Ethical Data Culture
STEPS :
• Review Current State Data Handling
Practices
• Identify Principles, Practices, and Risk
Factors
• Create an Ethical Data Handling Strategy
and Roadmap
• Adopt a Socially Responsible Ethical Risk
Model
43. Data Ethics and Governance
• Oversight for the appropriate handling of data falls under both data
governance and legal counsel.
• Keep up-to-date on legal changes
• CDMP formal code of ethics
44. • LET’S GET PRACTICE
•https://wato.xyz/cdmppractice2
passcode : cdmp
46. • LET’S GET PRACTICE
•https://wato.xyz/cdmppractice3
passcode : cdmp
47.
48. GOVERNANCE VS MANAGEMENT
DATA GOVERNANCE = Ensure The Data Managed Properly
DATA MANAGEMENT = ensure an organization gets value out of its data
“ SCOPE & FOCUS DATA GOVERNANCE PROGRAMS
OREGANITATION NEEDS “
49.
50. Context Diagram
DATA GOVERNANCE PROGRAMS :
• Strategy: Defining, communicating, and driving execution of Data Strategy and Data
Governance Strategy
• •Policy: Setting and enforcing policies related to data and Metadata management, access,
usage, security, and quality
• •Standards and quality: Setting and enforcing Data Quality and Data Architecture standards
• •Oversight: Providing hands-on observation, audit, and correction in key areas of quality,
policy, and data management (often referred to as stewardship)
• •Compliance: Ensuring the organization can meet data-related regulatory compliance
requirements
• •Issue management: Identifying, defining, escalating, and resolving issues related to data
security, data access, data quality, regulatory compliance, data ownership, policy, standards,
terminology, or data governance procedures
• •Data management projects: Sponsoring efforts to improve data management practices
• •Data asset valuation: Setting standards and processes to consistently define the business
value of data assets
51.
52. Goals
1. Enable an organization to
manage its data as an asset.
2. Define, approve,
communicate, and implement
principles, policies, procedures,
metrics, tools, and
responsibilities for data
management.
3. Monitor and guide policy
compliance, data usage, and
management activities.
53.
54. DG Business Driver
• Common Driver : regulatory compliance, especially for heavily
regulated industries
• Focus on reducing risks or improving processes :
Reducing Risk : General risk management, Data security , Privacy
Improving Processes: Regulatory compliance, Data quality improvement,
Metadata Management. Efficiency in development projects (SDLC) , Vendor
management
58. Data Governance vs IT Governance
Data governance is separate from IT governance.
• IT governance makes decisions about IT investments, the IT
application portfolio, and the IT project portfolio – in other words,
hardware, software, and overall technical architecture.
• IT governance aligns the IT strategies and investments with enterprise
goals and strategies.
• The COBIT (Control Objectives for Information and Related
Technology) framework provides standards for IT governance,
59.
60. DG Goals and Principles
Data Governance is to enable an organization to manage data as an
asset. DG Program must be :
61.
62. DG Essential Concept
Data governance represents an inherent separation of duty between
oversight and execution
68. Data Stewardship
• Data Stewardship is the most common label to describe
accountability and responsibility for data and processes that ensure
effective control and use of data assets
• Core activities : Creating and managing core Metadata,
Documenting rules and standards, Managing data quality issues,
Executing operational data governance activities
76. • LET’S GET PRACTICE
•https://wato.xyz/cdmppractice4
passcode : cdmp
77.
78. What is Architecture ?
• Architecture refers to an organized arrangement of component
elements intended to optimize the function, performance, feasibility,
cost, and aesthetics of an overall structure or system
79. Data Architecture Perspective
Data Architecture will be considered from the following perspectives:
•Data Architecture outcomes
•Data Architecture activities
•Data Architecture behavior
Together, these three form the essential components of Data
Architecture.
80.
81. Introduction
• The most detailed Data Architecture design document is a formal
enterprise data model, containing data names, comprehensive data
and Metadata definitions, conceptual and logical entities and
relationships, and business rules.
• Physical data models are included, but as a product of data modeling
and design, rather than Data Architecture.
85. Goals
1. Identify data storage and
processing requirements.
2. Design structures and plans to
meet the current and long-term
data requirements of the
enterprise.
3. Strategically prepare
organizations to quickly evolve
their products, services, and data
to take advantageof business
opportunities inherent in
emerging technologies.
89. Zachman’ Columns
• What (the inventory column): Entities used to build the architecture
• How (the process column): Activities performed
• Where (the distribution column): Business location and technology
location
• Who (the responsibility column): Roles and organizations
• When (the timing column): Intervals, events, cycles, and schedules
• Why (the motivation column): Goals, strategies, and means
93. Enterprise Data Architecture
Enterprise Data Architecture defines standard terms and designs
for the elements that are important to the organization.
• Enterprise Data Model (EDM): The EDM is a holistic, enterprise-level,
implementation-independentconceptual or logical data model
providing a common consistent view of data across the enterprise.
• Data Flow Design: Defines the requirements and master blueprint for
storage and processing acrossdatabases, applications, platforms, and
networks (the components).
94.
95.
96. Project Development Method
• Waterfall methods: Understand the requirements and construct
systems in sequential phases as part ofan overall enterprise design.
• •Incremental methods: Learn and construct in gradual steps (i.e.,
mini-waterfalls). This method createsprototypes based on vague
overall requirements. The initiation phase is crucial;
• •Agile, iterative, methods: Learn, construct, and test in discrete
delivery packages (called ‘sprints’)that are small enough that if work
needs to be discarded, not much is lost.
107. Goals and Principles
• The goal of data modeling is to confirm and document understanding of
different perspectives, which leads to applications that more closely align
with current and future business requirements, and creates a foundation
to successfully complete broad-scoped initiatives such as Master Data
Management and data governance programs.
• Confirming and documenting understanding of different perspectives
facilitates :
- Formalization
- Scope Definition
- Knowledge retention/documentation
108.
109. Types of Data that are Modeled
• Category information: Data used to classify and assign types to
things.
• Resource information: Basic profiles of resources needed conduct
operational processes such asProduct, Customer, Supplier, Facility,
Organization, and Account.
• Business event information: Data created while operational
processes are in progress.
• Detail transaction information: Detailed transaction information is
often produced through point-of-sale systems (either in stores or
online).
• Data at Rest
116. Relational data in model Scheme
• A foreign key is used in physical and sometimes logical
relational data modeling schemes to represent a
relationship
126. • LET’S GET PRACTICE
•https://wato.xyz/cdmppractice678
passcode : cdmp
127.
128. Introduction
• Data Storage and Operations includes the design, implementation,
and support of stored data, to maximize its value throughout its
lifecycle, from creation/acquisition to disposal.
• Data Storage and Operations includes two sub-activities:
1. Database Support
2. Database Support Technology
• Play Key Roles : DBA
129.
130. Context Diagram
The goals of data storage
and operations include:
• Managing the
availability of data
throughout the data
lifecycle
• Ensuring the integrity
of data assets
• Managing the
performance of data
transactions
131.
132. SLA
Service Level Agreement Principles Practice:
• The Service Level Agreement(SLA) can reflect DBA-recommended and
developer-accepted methods of ensuring data integrity and data
security. The SLA should reflect the transfer of responsibility from the
DBAs to the development team if the development team will be
coding their own database update procedures or data accesslayer.
• This prevents an ‘all or nothing’ approach to standards.
133.
134. Procedural and Development DBAs
Procedural DBAs :
• Lead the review and administration of procedural database objects.
• Specializes in development and support of procedural logic controlled
and execute by the DBMS:
• Development DBAs focus on data design activities including creating
and managing special use databases
137. Database Processing Types
CAP (BREWER’S THEOREM ) Consistency, Availability and Partition
How Distribution System Closely match with :
• ACID (Atomicity, Consistency, Isolation, Durability)
• BASE (Basically Available, Soft State, Eventual Consistency)
145. Introduction
• Data Security includes the planning, development, and execution of
security policies and procedures to provide proper authentication,
authorization, access, and auditing of data and information assets.
157. Assess Current Security Risks
• The sensitivity of the data stored or in transit
• The requirements to protect that data, and
• The current security protections in place
158. Other Concerns
• Regulatory Requirements
• Data Security Standards
• Data Security Roles
• Tools & Technique
• Guidelines
159. Chapter 8 : Data Integration and
Interoperability
160.
161. Introduction
Data Integration and Interoperability (DII) describes processes related to the movement
and consolidation of data within and between data stores, applications and organizations.
Org. Data Management Function Depend Data Management Area Depend
• Data migration and conversion
• Data consolidation into hubs or marts
• Integration of vendor packages into an
organization’s application portfolio
• Data sharing between applications and across
organizations
• Distributing data across data stores and data
centers
• Archiving data
• Managing data interfaces
• Obtaining and ingesting external data
• Integrating structured and unstructured data
• Providing operational intelligence and
management decision support
• Data Governance
• Data Architecture:
• Data Security:
• Metadata:
• Data Storage and Operations
• Data Modeling and Design
162.
163.
164.
165. Essential Concepts
• Extract, Transform, and Load (ETL)
• Extract, Transform, and Load (ELT)
• LATENCY, REPLICATION …
169. Interaction Model
• Point-to-point (Pass Data Directly )
• Hub-and-spoke (Consolidates share data)
• Publish - Subscribe (System push data – Other System Pull data –
Distributed to subscriber)
173. Introduction
• Document and Content Management entails controlling the capture,
storage, access, and use of data and information stored outside
relational databases
• In some Organizations unstructured data has a direct relationship to
structured data.
• Management decisions about such content should be applied
consistently.
174.
175. Business Driver
• Regulatory compliance
• the ability to respond to litigation and e-discovery requests, and
business continuity requirements.
• Good records management can also help organizations become more
efficient
• Well-organized, searchable websites that result from effective
management of ontologies
• E-discovery is the process of finding electronic records that might
serve as evidence in a legal action.
178. ARMA International Principles - 2009
Generally Acceptable Recordkeeping Principles® (GARP)
• Principle of Accountability
• Principle of Integrity
• Principle of Protection
• Principle of Compliance
• Principle of Availability
• Principle of Retention
• Principle of Disposition
• Principle of Transparency
179.
180. Essential Concepts
• Content : document is to content what a bucket is to water: a container. Content refers
to the data and information inside the file, document, or website.
• Controlled Vocabularies : is a defined list of explicitly allowed terms used to index,
categorize, tag, sort, and retrieve content through browsing and searching. : ,
• Documents and Records : Documents are electronic or paper objects that contain
instructions for tasks, requirements for how and when to perform a task or function,
and logs of task execution and decisions. Documents can communicate and share
information and knowledge. Examples of documents include procedures, protocols,
methods, and specifications. ,
• Data Map : is an inventory of all ESI data sources, applications, and IT environments
that includes the owners of the applications, custodians, relevant geographical
locations, and data types
• E-Discovery , etc.
186. Activities - Plan for Record Management
• Records management starts with a clear definition of what
constitutes a record.
• Managing electronic records requires decisions about where to store
current, active records and how to archive older records
187.
188. Activities - Manage the Lifecycle
• Capture Records and Content : Capturing content is the first step to
managing it. Electronic content is often already in a format to be stored in
electronic repositories.
• Manage Versioning and Control : Formal, Revision, Custody
• Backup and Recovery : The document / record management system needs
to be included in the organization’s overall corporate backup and recovery
activities, including business continuity and disaster recovery planning.
• Manage Retention and Disposal : Effective document / records
management requires clear policies and procedures, especially regarding
retention and disposal of records.
• Audit Documents / Records : Document / records management requires
periodic auditing to ensure that the right information is getting to the right
people at the right time for decision-making or performing operational
activities
189. Manage Versioning and Control
ANSI Standard 859 has three levels of control of data:
• •Formal control requires formal change initiation, thorough
evaluation for impact, decision by achange authority, and full status
accounting of implementation and validation to stakeholders
• •Revision control is less formal, notifying stakeholders and
incrementing versions when a change isrequired
• •Custody control is the least formal, merely requiring safe storage
and a means of retrieval
191. • LET’S TRY OUT (90 questions in 80 minutes )
•https://wato.xyz/cdmptryout
passcode : cdmp
192.
193. Introduction
• In any organization, certain data is required across business areas,
processes, and systems.
• The overall organization and its customers benefit if this data is
shared and all business units can access the same customer lists,
geographic location codes, business unit lists, delivery options, part
lists, accounting cost center codes, governmental tax codes, and
other data used to run the business.
194.
195.
196. Business Driver
Master Data Management
• Meeting organizational data requirements
• Managing data quality
• Managing the costs of data integration
• Reducing risk
The drivers for managing Reference Data are similar. Centrally managed
Reference Data enables organizations to:
• Meet data requirements for multiple initiatives and reduce the risks and
costs of data integration through use of consistent Reference Data
• Manage the quality of Reference Data
197.
198. Differences Between Master and Reference Data
• Different types of data play different roles within an organization. They
also have different management requirements.
• Six-layer taxonomy of data that includes Metadata, Reference Data,
enterprise structure data, transaction structure data, transaction
activity data, and transaction audit data (Chisholm, 2008; Talburt and
Zhou, 2015).
• Master Data as an aggregation of Reference Data, enterprise structure
data, and transaction structure data
199.
200. Master Data - Trusted Source, Golden Record
• A Trusted Source is recognized as the ‘best version of the truth’ based
on a combination of automated rules and manual stewardship of data
content.
• A trusted source may also be referred to as a Single View, 360° View.
• Any MDM system should be managed so that it is a trusted source.
Within a trusted source, records that represent the most accurate
data about entity instances can be referred to as Golden Records.
• ‘Golden Record’ does not mean that it is always a 100% complete and
100% accurate representation of all the entities within the
organization (especially in organizations that have multiple SOR’s
supplying data to the Master Data environment).
201.
202. Data Sharing Architecture
Three basic approaches to implementing a Master Data hub
environment :
• A Registry
• In a Transaction Hub
• A Consolidated
203.
204. Party Master Data
• Party Master Data includes data about individuals, organizations, and
the roles they play in business relationships.
• In the commercial environment, parties include customers,
employees, vendors, partners, and competitors.
• In the public sector, parties are usually citizens
• Customer Relationship Management (CRM) systems manage Master
Data about customers. The goal of CRM is to provide complete and
accurate information about each and every customer.
205.
206. Master Data Management Key Processing Steps
• Key processing steps for MDM includes data model management;
data acquisition; data validation, standardization, and enrichment;
entity resolution; and stewardship and sharing.
207.
208. • Product Master Data can focus on an organization’s internal
products and services or on industry-wide (including competitor)
products and services.
• Different types of product Master Data solutions support different
business functions.
209.
210. Entity Resolution and Identifier Management
Entity resolution is the process of determining
whether two references to real world objects refer
to the same object or to different objects (Talburt,
2011).
Entity resolution is a decision-making process
211. Entity Resolution and Identifier Management
Matching, or candidate identification, is the process of identifying how
different records may relate to a single entity. The risks with this
process are:
• False positives: Two references that do not represent the same entity
are linked with a single identifier. This results in one identifier that
refers to more than one real-world entity instance.
• False negatives: Two references represent the same entity but they
are not linked with a single identifier. This results in multiple
identifiers that refer to the same real-world entity when each
instanceis expected to have one-and-only-one identifier.