SlideShare ist ein Scribd-Unternehmen logo
1 von 39
Downloaden Sie, um offline zu lesen
Data Quality Strategies
From Data Duckling to Successful Swan
Peter Aiken, Ph.D.
• DAMA International President 2009-2013
• DAMA International Achievement Award 2001 (with
Dr. E. F. "Ted" Codd
• DAMA International Community Award 2005
Peter Aiken, Ph.D.
• 33+ years in data management
• Repeated international recognition
• Founder, Data Blueprint (datablueprint.com)
• Associate Professor of IS (vcu.edu)
• DAMA International (dama.org)
• 10 books and dozens of articles
• Experienced w/ 500+ data
management practices
• Multi-year immersions:

– US DoD (DISA/Army/Marines/DLA)

– Nokia

– Deutsche Bank

– Wells Fargo

– Walmart

– … PETER AIKEN WITH JUANITA BILLINGS
FOREWORD BY JOHN BOTTEGA
MONETIZING
DATA MANAGEMENT
Unlocking the Value in Your Organization’s
Most Important Asset.
The Case for the
Chief Data Officer
Recasting the C-Suite to Leverage
Your MostValuable Asset
Peter Aiken and
Michael Gorman
2
Copyright 2017 by Data Blueprint Slide #
3Copyright 2017 by Data Blueprint Slide #
1. Data Quality in Context of Data Management
2. DQE Definition
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tool Sets
7. Takeaways and Q&A
Data Quality Strategies
4Copyright 2017 by Data Blueprint Slide #
• Before further construction could proceed
• No IT equivalent
Our barn had to pass a foundation inspection
You can accomplish
Advanced Data Practices
without becoming proficient
in the Foundational Data
Practices however 

this will:
• Take longer
• Cost more
• Deliver less
• Present 

greater

risk
(with thanks to 

Tom DeMarco)
Data Management Practices Hierarchy
Advanced 

Data 

Practices
• MDM
• Mining
• Big Data
• Analytics
• Warehousing
• SOA
Foundational Data Practices
Data Platform/Architecture
Data Governance Data Quality
Data Operations
Data Management Strategy
Technologies
Capabilities
5Copyright 2017 by Data Blueprint Slide #
DMM℠ Structure of 

5 Integrated 

DM Practice Areas
Data architecture
implementation
Data 

Governance
Data 

Management

Strategy
Data 

Operations
Platform

Architecture
Supporting

Processes
Maintain fit-for-purpose data,
efficiently and effectively
6Copyright 2017 by Data Blueprint Slide #
Manage data coherently
Manage data assets professionally
Data life cycle
management
Organizational support
Data 

Quality
Data architecture
implementation
Maintain fit-for-purpose data,
efficiently and effectively
Manage data coherently
Manage data assets professionally
Data life cycle
management
Organizational support
DMM℠ Structure of 

5 Integrated 

DM Practice Areas
Data 

Governance
Data 

Management

Strategy
Data 

Operations
Platform

Architecture
Supporting

Processes
7Copyright 2017 by Data Blueprint Slide #
Data 

Quality
3
3
33
1
The DAMA Guide
to the Data
Management 

Body of 

Knowledge
8Copyright 2017 by Data Blueprint Slide #
Data 

Management
Functions
fromTheDAMAGuidetotheDataManagementBodyofKnowledge©2009byDAMAInternational
• Good enough 

to criticize
– All models 

are wrong
– Some models 

are useful
• Missing two 

important concepts
– Optionality
– Dependency
Overview: Data Quality Engineering
9
Copyright 2017 by Data Blueprint Slide #
10Copyright 2017 by Data Blueprint Slide #
Organizational

Strategy
Data Strategy
Data
Governance
Data Quality and Data Governance in Context
Data
asset support for 

organizational
strategy
What
the data assets do to
support strategy
(business goals)
How well the data
strategy is working
(metadata)
Data Quality
Governance
of quality aspects
of data assets
Evolutionary
feedback
about the
current focus
11Copyright 2017 by Data Blueprint Slide #
1. Data Quality in Context of Data Management
2. DQE Definition
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tool Sets
7. Takeaways and Q&A
Data Quality Strategies
Data
Data
Data
Information
Fact Meaning
Request
A Model Specifying Relationships Among Important Terms
[Built on definition by Dan Appleton 1983]
Intelligence
Use
1. Each FACT combines with one or more MEANINGS.
2. Each specific FACT and MEANING combination is referred to as a DATUM.
3. An INFORMATION is one or more DATA that are returned in response to a specific
REQUEST
4. INFORMATION REUSE is enabled when one FACT is combined with more than one
MEANING.
5. INTELLIGENCE is INFORMATION associated with its USES.
Wisdom & knowledge are 

often used synonymously
Data
Data
Data Data
12
Copyright 2017 by Data Blueprint Slide #
Definitions
• Quality Data
– Fit for purpose meets the requirements of its authors, users, 

and administrators (adapted from Martin Eppler)
– Synonymous with information quality, since poor data quality 

results in inaccurate information and poor business performance
• Data Quality Management
– Planning, implementation and control activities that apply quality 

management techniques to measure, assess, improve, and 

ensure data quality
– Entails the "establishment and deployment of roles, responsibilities 

concerning the acquisition, maintenance, dissemination, and 

disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf
✓ Critical supporting process from change management
✓ Continuous process for defining acceptable levels of data quality to meet
business needs and for ensuring that data quality meets these levels
• Data Quality Engineering
– Recognition that data quality solutions cannot not managed but must be engineered
– Engineering is the application of scientific, economic, social, and practical knowledge
in order to design, build, and maintain solutions to data quality challenges
– Engineering concepts are generally not known and understood within IT or business!
13
Copyright 2017 by Data Blueprint Slide #
Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166
Improving Data Quality during System Migration
• Challenge
– Millions of NSN/SKUs 

maintained in a catalog
– Key and other data stored in 

clear text/comment fields
– Original suggestion was manual 

approach to text extraction
– Left the data structuring problem unsolved
• Solution
– Proprietary, improvable text extraction process
– Converted non-tabular data into tabular data
– Saved a minimum of $5 million
– Literally person centuries of work
Copyright 2017 by Data Blueprint Slide #
14
Unmatched
Items
Ignorable
Items
Items
Matched
Week # (% Total) (% Total) (% Total)
1 31.47% 1.34% N/A
2 21.22% 6.97% N/A
3 20.66% 7.49% N/A
4 32.48% 11.99% 55.53%
… … … …
14 9.02% 22.62% 68.36%
15 9.06% 22.62% 68.33%
16 9.53% 22.62% 67.85%
17 9.5% 22.62% 67.88%
18 7.46% 22.62% 69.92%
Determining Diminishing Returns
Copyright 2017 by Data Blueprint Slide #
15
Before
After
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
Copyright 2017 by Data Blueprint Slide #
16
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
Copyright 2017 by Data Blueprint Slide #
17
Time needed to review all NSNs once over the life of the project:
NSNs 150,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 750,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 750,000
Minutes available person/year 108,000
Total Person-Years 7
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 7
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $420,000
Time needed to review all NSNs once over the life of the project:
NSNs 2,000,000
Average time to review & cleanse (in minutes) 5
Total Time (in minutes) 10,000,000
Time available per resource over a one year period of time:
Work weeks in a year 48
Work days in a week 5
Work hours in a day 7.5
Work minutes in a day 450
Total Work minutes/year 108,000
Person years required to cleanse each NSN once prior to migration:
Minutes needed 10,000,000
Minutes available person/year 108,000
Total Person-Years 92.6
Resource Cost to cleanse NSN's prior to migration:
Avg Salary for SME year (not including overhead) $60,000.00
Projected Years Required to Cleanse/Total DLA Person Year Saved 93
Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million
Quantitative Benefits
Copyright 2017 by Data Blueprint Slide #
18
Data Quality Misconceptions
• You can fix the data
• Data quality is an IT problem
• The problem is in the data sources or data entry
• The data warehouse will provide a single version of the truth
• The new system will provide a single version of the truth
• Standardization will eliminate the problem of different "truths"
represented in the reports or analysis
Source: Business Intelligence solutions, Athena Systems
19
Copyright 2017 by Data Blueprint Slide #
• It was six men of Indostan, To learning much inclined,

Who went to see the Elephant

(Though all of them were blind),

That each by observation

Might satisfy his mind.
• The First approached the Elephant,

And happening to fall

Against his broad and sturdy side,

At once began to bawl:

"God bless me! but the Elephant

Is very like a wall!"
• The Second, feeling of the tusk

Cried, "Ho! what have we here,

So very round and smooth and sharp? To me `tis mighty clear

This wonder of an Elephant

Is very like a spear!"
• The Third approached the animal,

And happening to take

The squirming trunk within his hands, Thus boldly up he spake:

"I see," quoth he, "the Elephant

Is very like a snake!"
• The Fourth reached out an eager hand, And felt about the knee:

"What most this wondrous beast is like Is mighty plain," quoth he;

"'Tis clear enough the Elephant 

Is very like a tree!"
• The Fifth, who chanced to touch the ear, Said: "E'en
the blindest man

Can tell what this resembles most;

Deny the fact who can,

This marvel of an Elephant

Is very like a fan!"
• The Sixth no sooner had begun

About the beast to grope,

Than, seizing on the swinging tail

That fell within his scope.

"I see," quoth he, "the Elephant

Is very like a rope!"
• And so these men of Indostan

Disputed loud and long,

Each in his own opinion

Exceeding stiff and strong,

Though each was partly in the right,

And all were in the wrong!
The Blind Men and the Elephant
(Source: John Godfrey Saxe's ( 1816-1887) version of the famous Indian legend )
20
Copyright 2017 by Data Blueprint Slide #
No universal conception of data
quality exists, instead many
differing perspective compete
• Problem:
– Most organizations approach 

data quality problems in the same way 

that the blind men approached the elephant - people tend to see only the data
that is in front of them
– Little cooperation across boundaries, just as the blind men were unable to
convey their impressions about the elephant to recognize the entire entity.
– Leads to confusion, disputes and narrow views
• Solution:
– Data quality engineering can help achieve a more complete picture and facilitate
cross boundary communications
21
Copyright 2017 by Data Blueprint Slide #
Quality Data is ...
22Copyright 2017 by Data Blueprint Slide #
Fit
For
Purpose
Famous Words?
• Question:
– Why haven't organizations taken a 

more proactive approach to data quality?
• Answer:
– Fixing data quality problems is not easy
– It is dangerous -- they'll come after you
– Your efforts are likely to be misunderstood
– You could make things worse
– Now you get to fix it
• A single data quality 

issue can grow 

into a significant, 

unexpected 

investment
23Copyright 2017 by Data Blueprint Slide #
24Copyright 2017 by Data Blueprint Slide #
1. Data Quality in Context of Data Management
2. DQE Definition
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tool Sets
7. Takeaways and Q&A
Data Quality Strategies
Four ways to make your data sparkle!
1.Prioritize the task
– Cleaning data is costly and time 

consuming
– Identify mission critical/non-mission 

critical data
2.Involve the data owners
– Seek input of business units on what constitutes "dirty"
data
3.Keep future data clean
– Incorporate processes and technologies that check every
zip code and area code
4.Align your staff with business
– Align IT staff with business units
(Source: CIO JULY 1 2004)
25
Copyright 2017 by Data Blueprint Slide #
Structured Data Quality Engineering
1. Allow the form of the 

Problem to guide the 

form of the solution
2. Provide a means of 

decomposing the problem
3. Feature a variety of tools 

simplifying system understanding
4. Offer a set of strategies for evolving a design solution
5. Provide criteria for evaluating the quality of the various solutions
6. Facilitate development of a framework for developing
organizational knowledge.
26
Copyright 2017 by Data Blueprint Slide #
The DQE Cycle
• Deming cycle
• "Plan-do-study-act" or 

"plan-do-check-act"
1. Identifying data issues that are
critical to the achievement of
business objectives
2. Defining business requirements for
data quality
3. Identifying key data quality
dimensions
4. Defining business rules critical to
ensuring high quality data
27
Copyright 2017 by Data Blueprint Slide #
The DQE Cycle: (1) Plan
• Plan for the assessment of the
current state and identification
of key metrics for measuring
quality
• The data quality engineering
team assesses the scope of
known issues
– Determining cost and impact
– Evaluating alternatives for
addressing them
28
Copyright 2017 by Data Blueprint Slide #
The DQE Cycle: (2) Deploy
• Deploy processes for measuring
and improving the quality of
data:
• Data profiling
– Institute inspections and monitors to
identify data issues when they occur
– Fix flawed processes that are the root
cause of data errors or correct errors
downstream
– When it is not possible to correct
errors at their source, correct them at
their earliest point in the data flow
29
Copyright 2017 by Data Blueprint Slide #
The DQE Cycle: (3) Monitor
• Monitor the quality of data as
measured against the defined
business rules
• If data quality meets defined
thresholds for acceptability,
the processes are in control
and the level of data quality
meets the business
requirements
• If data quality falls below
acceptability thresholds,
notify data stewards so they
can take action during the
next stage
30
Copyright 2017 by Data Blueprint Slide #
The DQE Cycle: (4) Act
• Act to resolve any identified
issues to improve data
quality and better meet
business expectations
• New cycles begin as new
data sets come under
investigation or as new data
quality requirements are
identified for existing data
sets
31
Copyright 2017 by Data Blueprint Slide #
DQE Context & Engineering Concepts
• Can rules be implemented stating that no data can be corrected
unless the source of the error has been discovered and
addressed?
• All data must 

be 100% 

perfect?
• Pareto
– 80/20 rule
– Not all data 

is of equal 

Importance
• Scientific, 

economic, 

social, and 

practical 

knowledge
32Copyright 2017 by Data Blueprint Slide #
33Copyright 2017 by Data Blueprint Slide #
1. Data Quality in Context of Data Management
2. DQE Definition
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tool Sets
7. Takeaways and Q&A
Data Quality Strategies
Two Distinct Activities Support Quality Data
• Data quality best practices depend on both
– Practice-oriented activities
– Structure-oriented activities
34Copyright 2017 by Data Blueprint Slide #
Practice-oriented
activities focus on the
capture and
manipulation of data
Structure-oriented
activities focus on the
data implementation
Quality
Data
Practice-Oriented Activities
• Stem from a failure to rigor when capturing/manipulating data such
as:
– Edit masking
– Range checking of input data
– CRC-checking of transmitted data
• Affect the Data Value Quality and Data Representation Quality
• Examples of improper practice-oriented activities:
– Allowing imprecise or incorrect data to be collected when requirements specify
otherwise
– Presenting data out of sequence
• Typically diagnosed in bottom-up manner: find and fix the resulting
problem
• Addressed by imposing 

more rigorous 

data-handling/governance
35
Copyright 2017 by Data Blueprint Slide #


Practice-oriented activities


Quality of Data
Values


Quality of Data
Representation
Knee Surgery
36Copyright 2017 by Data Blueprint Slide #
Structure-Oriented Activities
• Occur because of data and metadata that has been arranged
imperfectly. For example:
– When the data is in the system but we just can't access it;
– When a correct data value is provided as the wrong response to a query; or
– When data is not provided because it is unavailable or inaccessible
• Developer focus within system boundaries instead of within
organization boundaries
• Affect the Data Model Quality and Data Architecture Quality
• Examples of improper structure-oriented activities:
– Providing a correct response but incomplete data to a query because the user
did not comprehend the system data structure
– Costly maintenance of inconsistent data used by redundant systems
• Typically diagnosed in 

top-down manner: root 

cause fixes
• Addressed through 

fundamental data structure 

governance
37
Copyright 2017 by Data Blueprint Slide #


Quality of 

Data Models


Quality of 

Data Architecture
Structure-oriented activities
New York Turns to Data to
Solve Big Tree Problem
• NYC
– 2,500,000 trees
• 11-months from 2009 to 2010
– 4 people were killed or seriously injured by falling tree limbs in 

Central Park alone
• Belief
– Arborists believe that pruning and otherwise maintaining trees can keep them
healthier and make them more likely to withstand a storm, decreasing the
likelihood of property damage, injuries and deaths
• Until recently
– No research or data to back it up
38
Copyright 2017 by Data Blueprint Slide #
http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05
NYC's Big Tree Problem
• Question
– Does pruning trees in one year reduce the 

number of hazardous tree conditions in the 

following year?
• Lots of data but granularity challenges
– Pruning data recorded block by block
– Cleanup data recorded at the address level
– Trees have no unique identifiers
• After downloading, cleaning, merging, analyzing and intensive
modeling
– Pruning trees for certain types of hazards caused a 22 percent reduction in the
number of times the department had to send a crew for emergency cleanups
• The best data analysis
– Generates further questions
• NYC cannot prune each block every year
– Building block risk profiles: number of trees, types of trees, whether the block is in
a flood zone or storm zone
39
Copyright 2017 by Data Blueprint Slide #
http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05
Quality Dimensions
40
Copyright 2017 by Data Blueprint Slide #
4 Dimensions of Data Quality
An organization’s overall data quality is a function of four
distinct components, each with its own attributes:
• Data Value: the quality of data as stored & maintained in
the system
• Data Representation – the quality of representation for
stored values; perfect data values stored in a system that
are inappropriately represented can be harmful
• Data Model – the quality of data logically representing
user requirements related to data entities, associated
attributes, and their relationships; essential for effective
communication among data suppliers and consumers
• Data Architecture – the coordination of data
management activities in cross-functional system
development and operations
41
Copyright 2017 by Data Blueprint Slide #
Practice-
oriented
Structure-
oriented
Effective Data Quality Engineering
• Data quality engineering has been focused on operational problem
correction
– Directing attention to practice-oriented data imperfections
• Data quality engineering is more effective when also focused on
structure-oriented causes
– Ensuring the quality of shared data across system boundaries
42
Copyright 2017 by Data Blueprint Slide #
Data
Representation
Quality
As presented to
the user
Data Value
Quality
As maintained in
the system
Data Model
Quality
As understood by
developers
Data Architecture
Quality
As an
organizational
asset
(closer to the architect)(closer to the user)
Full Set of Data Quality Attributes
43
Copyright 2017 by Data Blueprint Slide #
Difficult to obtain leverage at the bottom of the falls
44
Copyright 2017 by Data Blueprint Slide #
Frozen Falls
45
Copyright 2017 by Data Blueprint Slide #
46Copyright 2017 by Data Blueprint Slide #
1. Data Quality in Context of Data Management
2. DQE Definition
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tool Sets
7. Takeaways and Q&A
Data Quality Strategies
Data acquisition activities Data usage activitiesData storage
Traditional Quality Life Cycle
47
Copyright 2017 by Data Blueprint Slide #
restored data


Metadata 

Creation


Metadata Refinement




Metadata
Structuring


Data Utilization


Data Manipulation






Data Creation
Data Storage




Data
Assessment




Data 

Refinement
Data Life
Cycle
Model
Products
48
Copyright 2017 by Data Blueprint Slide #
data
architecture
& models
populated data
models and
storage locations
data values
data

values
data

values
value

defects
structure

defects
architecture

refinements
model

refinements
data
architecture &
model quality




Data 

Refinement


Data Utilization


Data Manipulation
representation
quality
restored data


Metadata Refinement




Metadata
Structuring






Data Creation
Data Storage




Data
Assessment
Data Life
Cycle
Model:
Quality
Focus
49
Copyright 2017 by Data Blueprint Slide #
populated data
models and
storage locations
data

values
data
model quality
value quality
value quality
value quality


Metadata 

Creation
architecture
quality
Starting
point
for new
system
development
data performance metadata
data architecture
data
architecture and
data models
shared data updated data
corrected
data
architecture
refinements
facts &
meanings
Metadata &
Data Storage
Starting point
for existing
systems
Metadata Refinement
• Correct Structural Defects
• Update Implementation
Metadata Creation
• Define Data Architecture
• Define Data Model Structures
Metadata Structuring
• Implement Data Model Views
• Populate Data Model Views
Data Refinement
• Correct Data Value Defects
• Re-store Data Values
Data Manipulation
• Manipulate Data
• Updata Data
Data Utilization
• Inspect Data
• Present Data
Data Creation
• Create Data
• Verify Data Values
Data Assessment
• Assess Data Values
• Assess Metadata
Extended data life cycle model with metadata sources and uses
50
Copyright 2017 by Data Blueprint Slide #
51Copyright 2017 by Data Blueprint Slide #
1. Data Quality in Context of Data Management
2. DQE Definition
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tool Sets
7. Takeaways and Q&A
Data Quality Strategies
Profile, Analyze and Assess DQ
• Data assessment using 2 different approaches:
– Bottom-up
– Top-down
• Bottom-up assessment:
– Inspection and evaluation of the data sets
– Highlight potential issues based on the 

results of automated processes
• Top-down assessment:
– Engage business users to document 

their business processes and the 

corresponding critical data dependencies
– Understand how their processes 

consume data and which data elements 

are critical to the success of the business 

applications
52
Copyright 2017 by Data Blueprint Slide #
Define DQ Measures
• Measures development occurs as part of the strategy/design/plan
step
• Process for defining data quality measures:
1. Select one of the identified critical business impacts
2. Evaluate the dependent data elements, create and update processes associate
with that business impact
3. List any associated data requirements
4. Specify the associated dimension of data quality and one or more business rules
to use to determine conformance of the data to expectations
5. Describe the process for measuring conformance
6. Specify an acceptability threshold
53
Copyright 2017 by Data Blueprint Slide #
Set and Evaluate DQ Service Levels
• Data quality inspection and 

monitoring are used to 

measure and monitor 

compliance with defined 

data quality rules
• Data quality SLAs specify 

the organization’s expectations for response and remediation
• Operational data quality control defined in data quality SLAs
includes:
– Data elements covered by the agreement
– Business impacts associated with data flaws
– Data quality dimensions associated with each data element
– Quality expectations for each data element of the identified dimensions in
each application for system in the value chain
– Methods for measuring against those expectations
– (…)
54
Copyright 2017 by Data Blueprint Slide #
Measure, Monitor & Manage DQ
• DQM procedures depend on 

available data quality measuring 

and monitoring services
• 2 contexts for control/measurement 

of conformance to data quality 

business rules exist:
– In-stream: collect in-stream measurements while creating data
– In batch: perform batch activities on collections of data instances assembled in a
data set
• Apply measurements at 3 levels of granularity:
– Data element value
– Data instance or record
– Data set
55
Copyright 2017 by Data Blueprint Slide #
Overview: Data Quality Tools
• 4 categories of activities:
– Analysis
– Cleansing
– Enhancement
– Monitoring 



























• Principal tools:
– Data Profiling
– Parsing and Standardization
– Data Transformation
– Identity Resolution and Matching
– Enhancement
– Reporting
56
Copyright 2017 by Data Blueprint Slide #
DQ Tool Set #1: Data Profiling
• Data profiling is the assessment of 

value distribution and clustering of 

values into domains
• Need to be able to distinguish 

between good and bad data before 

making any improvements
• Data profiling is a set of algorithms 

for 2 purposes:
– Statistical analysis and assessment of the data quality values within a data set
– Exploring relationships that exist between value collections within and across
data sets
• At its most advanced, data profiling takes a series of prescribed
rules from data quality engines. It then assesses the data,
annotates and tracks violations to determine if they comprise new
or inferred data quality rules
57
Copyright 2017 by Data Blueprint Slide #
DQ Tool Set #1: Data Profiling, cont’d
• Data profiling vs. data quality-business context and semantic/
logical layers
– Data quality is concerned with proscriptive rules
– Data profiling looks for patterns when rules are adhered to and when rules are
violated; able to provide input into the business context layer
• Incumbent that data profiling services notify all concerned parties
of whatever is discovered
• Profiling can be used to…
– …notify the help desk that valid 

changes in the data are about to 

case an avalanche of “skeptical 

user” calls
– …notify business analysts of 

precisely where they should be 

working today in terms of shifts 

in the data
58
Copyright 2017 by Data Blueprint Slide #
Courtesy GlobalID.com
59
Copyright 2017 by Data Blueprint Slide #
DQ Tool Set #2: Parsing & Standardization
• Data parsing tools enable the definition 

of patterns that feed into a rules engine 

used to distinguish between valid 

and invalid data values
• Actions are triggered upon matching 

a specific pattern
• When an invalid pattern is recognized, 

the application may attempt to 

transform the invalid value into one that meets expectations
• Data standardization is the process of conforming to a set of
business rules and formats that are set up by data stewards and
administrators
• Data standardization example:
– Brining all the different formats of “street” into a single format, e.g. “STR”, “ST.”,
“STRT”, “STREET”, etc.
60
Copyright 2017 by Data Blueprint Slide #
DQ Tool Set #3: Data Transformation
• Upon identification of data
errors, trigger data rules to
transform the flawed data
• Perform standardization
and guide rule-based
transformations by
mapping data values in
their original formats and
patterns into a target
representation
• Parsed components of a
pattern are subjected to
rearrangement,
corrections, or any
changes as directed by the
rules in the knowledge
base
61
Copyright 2017 by Data Blueprint Slide #
DQ Tool Set #4: Identify Resolution & Matching
• Data matching enables analysts to identify relationships between records for
de-duplication or group-based processing
• Matching is central to maintaining data consistency and integrity throughout
the enterprise
• The matching process should be used in 

the initial data migration of data into a 

single repository
• 2 basic approaches to matching:
• Deterministic
– Relies on defined patterns/rules for assigning 

weights and scores to determine similarity
– Predictable
– Dependent on rules developers anticipations
• Probabilistic
– Relies on statistical techniques for assessing the probability that any pair of record represents
the same entity
– Not reliant on rules
– Probabilities can be refined based on experience -> matchers can improve precision as more
data is analyzed
62
Copyright 2017 by Data Blueprint Slide #
DQ Tool Set #5: Enhancement
• Definition:
– A method for adding value to information by accumulating additional information
about a base set of entities and then merging all the sets of information to
provide a focused view. Improves master data.
• Benefits:
– Enables use of third party data sources
– Allows you to take advantage of the information and 

research carried out by external data vendors to 

make data more meaningful and useful
• Examples of data enhancements:
– Time/date stamps
– Auditing information
– Contextual information
– Geographic information
– Demographic information
– Psychographic information
63
Copyright 2017 by Data Blueprint Slide #
DQ Tool Set #6: Reporting
• Good reporting supports:
– Inspection and monitoring of conformance to data quality expectations
– Monitoring performance of data stewards conforming to data quality SLAs
– Workflow processing for data quality incidents
– Manual oversight of data cleansing and correction
• Data quality tools provide dynamic reporting and monitoring
capabilities
• Enables analyst and data stewards to support and drive the
methodology for ongoing DQM and improvement with a single,
easy-to-use solution
• Associate report results with:
– Data quality measurement
– Metrics
– Activity
64
Copyright 2017 by Data Blueprint Slide #
65Copyright 2017 by Data Blueprint Slide #
1. Data Quality in Context of Data Management
2. DQE Definition
3. DQE Cycle & Contextual Complications
4. DQ Causes and Dimensions
5. Quality and the Data Life Cycle
6. DDE Tool Sets
7. Takeaways and Q&A
Data Quality Strategies
Guiding Principles
• Manage data as a core organizational asset.
• Identify a gold record for all data elements
• All data elements will have a standardized data

definition, data type, and acceptable value domain
• Leverage data governance for the control and performance of DQM
• Use industry and international data standards whenever possible
• Downstream data consumers specify data quality expectations
• Define business rules to assert conformance to data quality expectations
• Validate data instances and data sets against defined business rules
• Business process owners will agree to and abide by data quality SLAs
• Apply data corrections at the original source if possible
• If it is not possible to correct data at the source, forward data corrections
to the owner of the original source. Influence on data brokers to conform
to local requirements may be limited
• Report measured levels of data quality to appropriate data stewards,
business process owners, and SLA managers
66
Copyright 2017 by Data Blueprint Slide #
Goals and Principles
• To measurably improve the quality of
data in relation to defined business
expectations
• To define requirements and
specifications for integrating data
quality control into the system
development life cycle
• To provide defined processes for
measuring, monitoring, and reporting
conformance to acceptable levels of
data quality
67
Copyright 2017 by Data Blueprint Slide #
Summary: Data Quality Engineering
68
Copyright 2017 by Data Blueprint Slide #
Upcoming Events
Data-Ed Online: The Seven Deadly Data Sins - 

Emerging from Management Purgatory
November 14, 2017 @ 2:00 PM ET/11:00 AM PT
Data-Ed Online: Metadata Strategies - Data Squared
December 13, 2012 @ 2:00 PM ET/11:00 AM PT
Sign up here:
www.datablueprint.com/webinar-schedule
or www.dataversity.net
69
Copyright 2017 by Data Blueprint Slide #
References & Recommended Reading
70Copyright 2017 by Data Blueprint Slide #
Data Quality Dimensions
71Copyright 2017 by Data Blueprint Slide #
Data Value Quality
72Copyright 2017 by Data Blueprint Slide #
Data Representation Quality
73Copyright 2017 by Data Blueprint Slide #
Data Model Quality
74Copyright 2017 by Data Blueprint Slide #
Data Architecture Quality
75Copyright 2017 by Data Blueprint Slide #
Questions?
76
Copyright 2017 by Data Blueprint Slide #
+ =
It’s your turn!
Use the chat feature or Twitter (#dataed) to submit
your questions to Peter now.
10124 W. Broad Street, Suite C
Glen Allen, Virginia 23060
804.521.4056

Weitere ähnliche Inhalte

Was ist angesagt?

RWDG Webinar: Build Your Own Data Governance Tools
RWDG Webinar: Build Your Own Data Governance ToolsRWDG Webinar: Build Your Own Data Governance Tools
RWDG Webinar: Build Your Own Data Governance ToolsDATAVERSITY
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...DATAVERSITY
 
RWDG Slides: Achieving Data Quality with Data Governance
RWDG Slides: Achieving Data Quality with Data GovernanceRWDG Slides: Achieving Data Quality with Data Governance
RWDG Slides: Achieving Data Quality with Data GovernanceDATAVERSITY
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
ADV Slides: Databases vs Hadoop vs Cloud Storage
ADV Slides: Databases vs Hadoop vs Cloud StorageADV Slides: Databases vs Hadoop vs Cloud Storage
ADV Slides: Databases vs Hadoop vs Cloud StorageDATAVERSITY
 
Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!DATAVERSITY
 
DataEd Slides: Data Governance Strategies
DataEd Slides: Data Governance StrategiesDataEd Slides: Data Governance Strategies
DataEd Slides: Data Governance StrategiesDATAVERSITY
 
Data Management vs Data Strategy
Data Management vs Data StrategyData Management vs Data Strategy
Data Management vs Data StrategyDATAVERSITY
 
Implementing the Data Maturity Model (DMM)
Implementing the Data Maturity Model (DMM)Implementing the Data Maturity Model (DMM)
Implementing the Data Maturity Model (DMM)DATAVERSITY
 
Getting Started with Data Stewardship
Getting Started with Data StewardshipGetting Started with Data Stewardship
Getting Started with Data StewardshipDATAVERSITY
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DATAVERSITY
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...DATAVERSITY
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesDATAVERSITY
 
IT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights TogetherIT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights TogetherDATAVERSITY
 
Big Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and TechnologyBig Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and TechnologyDATAVERSITY
 
Everybody is a Data Steward – Get Over It!
Everybody is a Data Steward – Get Over It!Everybody is a Data Steward – Get Over It!
Everybody is a Data Steward – Get Over It!DATAVERSITY
 
Real-World Data Governance: Modeling Data Governance
Real-World Data Governance: Modeling Data GovernanceReal-World Data Governance: Modeling Data Governance
Real-World Data Governance: Modeling Data GovernanceDATAVERSITY
 
Balancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational MaturityBalancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational MaturityDATAVERSITY
 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMDATAVERSITY
 
Data-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the MoneyData-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the MoneyDATAVERSITY
 

Was ist angesagt? (20)

RWDG Webinar: Build Your Own Data Governance Tools
RWDG Webinar: Build Your Own Data Governance ToolsRWDG Webinar: Build Your Own Data Governance Tools
RWDG Webinar: Build Your Own Data Governance Tools
 
DAS Slides: Data Governance - Combining Data Management with Organizational ...
DAS Slides: Data Governance -  Combining Data Management with Organizational ...DAS Slides: Data Governance -  Combining Data Management with Organizational ...
DAS Slides: Data Governance - Combining Data Management with Organizational ...
 
RWDG Slides: Achieving Data Quality with Data Governance
RWDG Slides: Achieving Data Quality with Data GovernanceRWDG Slides: Achieving Data Quality with Data Governance
RWDG Slides: Achieving Data Quality with Data Governance
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
ADV Slides: Databases vs Hadoop vs Cloud Storage
ADV Slides: Databases vs Hadoop vs Cloud StorageADV Slides: Databases vs Hadoop vs Cloud Storage
ADV Slides: Databases vs Hadoop vs Cloud Storage
 
Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!Data Leadership - Stop Talking About Data and Start Making an Impact!
Data Leadership - Stop Talking About Data and Start Making an Impact!
 
DataEd Slides: Data Governance Strategies
DataEd Slides: Data Governance StrategiesDataEd Slides: Data Governance Strategies
DataEd Slides: Data Governance Strategies
 
Data Management vs Data Strategy
Data Management vs Data StrategyData Management vs Data Strategy
Data Management vs Data Strategy
 
Implementing the Data Maturity Model (DMM)
Implementing the Data Maturity Model (DMM)Implementing the Data Maturity Model (DMM)
Implementing the Data Maturity Model (DMM)
 
Getting Started with Data Stewardship
Getting Started with Data StewardshipGetting Started with Data Stewardship
Getting Started with Data Stewardship
 
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
DAS Slides: Building a Data Strategy — Practical Steps for Aligning with Busi...
 
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
Data-Ed Slides: Data Modeling Strategies - Getting Your Data Ready for the Ca...
 
Data-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance StrategiesData-Ed Webinar: Data Governance Strategies
Data-Ed Webinar: Data Governance Strategies
 
IT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights TogetherIT + Line of Business - Driving Faster, Deeper Insights Together
IT + Line of Business - Driving Faster, Deeper Insights Together
 
Big Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and TechnologyBig Data Strategies – Organizational Structure and Technology
Big Data Strategies – Organizational Structure and Technology
 
Everybody is a Data Steward – Get Over It!
Everybody is a Data Steward – Get Over It!Everybody is a Data Steward – Get Over It!
Everybody is a Data Steward – Get Over It!
 
Real-World Data Governance: Modeling Data Governance
Real-World Data Governance: Modeling Data GovernanceReal-World Data Governance: Modeling Data Governance
Real-World Data Governance: Modeling Data Governance
 
Balancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational MaturityBalancing Data and Processes to Achieve Organizational Maturity
Balancing Data and Processes to Achieve Organizational Maturity
 
Data-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDMData-Ed Online Webinar: Business Value from MDM
Data-Ed Online Webinar: Business Value from MDM
 
Data-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the MoneyData-Ed Webinar: Monetizing Data Management - Show Me the Money
Data-Ed Webinar: Monetizing Data Management - Show Me the Money
 

Ähnlich wie Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan

Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality StrategiesDATAVERSITY
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data Blueprint
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringDATAVERSITY
 
Data-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality EngineeringData-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality EngineeringDATAVERSITY
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsDATAVERSITY
 
Data-Ed Slides: Data Architecture Strategies - Constructing Your Data Garden
Data-Ed Slides: Data Architecture Strategies - Constructing Your Data GardenData-Ed Slides: Data Architecture Strategies - Constructing Your Data Garden
Data-Ed Slides: Data Architecture Strategies - Constructing Your Data GardenDATAVERSITY
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsDATAVERSITY
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements Data Blueprint
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality RightDATAVERSITY
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture StrategiesDATAVERSITY
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityDATAVERSITY
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)DATAVERSITY
 
Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success StoriesDATAVERSITY
 
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
DataEd Webinar:  Reference & Master Data Management - Unlocking Business ValueDataEd Webinar:  Reference & Master Data Management - Unlocking Business Value
DataEd Webinar: Reference & Master Data Management - Unlocking Business ValueDATAVERSITY
 
Key Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramKey Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramDATAVERSITY
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data Blueprint
 

Ähnlich wie Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan (20)

Data Quality Strategies
Data Quality StrategiesData Quality Strategies
Data Quality Strategies
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Data-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality EngineeringData-Ed: Unlock Business Value through Data Quality Engineering
Data-Ed: Unlock Business Value through Data Quality Engineering
 
Data-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality EngineeringData-Ed Webinar: Data Quality Engineering
Data-Ed Webinar: Data Quality Engineering
 
2014 dqe handouts
2014 dqe handouts2014 dqe handouts
2014 dqe handouts
 
Data-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling FundamentalsData-Ed Webinar: Data Modeling Fundamentals
Data-Ed Webinar: Data Modeling Fundamentals
 
Data-Ed Slides: Data Architecture Strategies - Constructing Your Data Garden
Data-Ed Slides: Data Architecture Strategies - Constructing Your Data GardenData-Ed Slides: Data Architecture Strategies - Constructing Your Data Garden
Data-Ed Slides: Data Architecture Strategies - Constructing Your Data Garden
 
Data-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture RequirementsData-Ed Webinar: Data Architecture Requirements
Data-Ed Webinar: Data Architecture Requirements
 
Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements  Data-Ed: Data Architecture Requirements
Data-Ed: Data Architecture Requirements
 
Getting Data Quality Right
Getting Data Quality RightGetting Data Quality Right
Getting Data Quality Right
 
Data Architecture Strategies
Data Architecture StrategiesData Architecture Strategies
Data Architecture Strategies
 
Data Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great AccountabilityData Governance Strategies - With Great Power Comes Great Accountability
Data Governance Strategies - With Great Power Comes Great Accountability
 
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)Data-Ed Slides: Best Practices in Data Stewardship (Technical)
Data-Ed Slides: Best Practices in Data Stewardship (Technical)
 
Data Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-MakingData Analytics: From Basic Skills to Executive Decision-Making
Data Analytics: From Basic Skills to Executive Decision-Making
 
Data Quality Success Stories
Data Quality Success StoriesData Quality Success Stories
Data Quality Success Stories
 
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
DataEd Webinar:  Reference & Master Data Management - Unlocking Business ValueDataEd Webinar:  Reference & Master Data Management - Unlocking Business Value
DataEd Webinar: Reference & Master Data Management - Unlocking Business Value
 
Key Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance ProgramKey Elements of a Successful Data Governance Program
Key Elements of a Successful Data Governance Program
 
Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM Data-Ed: Business Value From MDM
Data-Ed: Business Value From MDM
 
Practical experiences of portfolio management
Practical experiences of portfolio managementPractical experiences of portfolio management
Practical experiences of portfolio management
 
Introduction to CAI
Introduction to CAIIntroduction to CAI
Introduction to CAI
 

Mehr von DATAVERSITY

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...DATAVERSITY
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceDATAVERSITY
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data LiteracyDATAVERSITY
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for YouDATAVERSITY
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?DATAVERSITY
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?DATAVERSITY
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling FundamentalsDATAVERSITY
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectDATAVERSITY
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at ScaleDATAVERSITY
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?DATAVERSITY
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...DATAVERSITY
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsDATAVERSITY
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayDATAVERSITY
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise AnalyticsDATAVERSITY
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best PracticesDATAVERSITY
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?DATAVERSITY
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesDATAVERSITY
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 

Mehr von DATAVERSITY (20)

Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
Architecture, Products, and Total Cost of Ownership of the Leading Machine Le...
 
Data at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and GovernanceData at the Speed of Business with Data Mastering and Governance
Data at the Speed of Business with Data Mastering and Governance
 
Exploring Levels of Data Literacy
Exploring Levels of Data LiteracyExploring Levels of Data Literacy
Exploring Levels of Data Literacy
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Make Data Work for You
Make Data Work for YouMake Data Work for You
Make Data Work for You
 
Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?Data Catalogs Are the Answer – What is the Question?
Data Catalogs Are the Answer – What is the Question?
 
Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?Data Catalogs Are the Answer – What Is the Question?
Data Catalogs Are the Answer – What Is the Question?
 
Data Modeling Fundamentals
Data Modeling FundamentalsData Modeling Fundamentals
Data Modeling Fundamentals
 
Showing ROI for Your Analytic Project
Showing ROI for Your Analytic ProjectShowing ROI for Your Analytic Project
Showing ROI for Your Analytic Project
 
How a Semantic Layer Makes Data Mesh Work at Scale
How a Semantic Layer Makes  Data Mesh Work at ScaleHow a Semantic Layer Makes  Data Mesh Work at Scale
How a Semantic Layer Makes Data Mesh Work at Scale
 
Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?Is Enterprise Data Literacy Possible?
Is Enterprise Data Literacy Possible?
 
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
The Data Trifecta – Privacy, Security & Governance Race from Reactivity to Re...
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
Data Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and ForwardsData Governance Trends - A Look Backwards and Forwards
Data Governance Trends - A Look Backwards and Forwards
 
Data Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement TodayData Governance Trends and Best Practices To Implement Today
Data Governance Trends and Best Practices To Implement Today
 
2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics2023 Trends in Enterprise Analytics
2023 Trends in Enterprise Analytics
 
Data Strategy Best Practices
Data Strategy Best PracticesData Strategy Best Practices
Data Strategy Best Practices
 
Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?Who Should Own Data Governance – IT or Business?
Who Should Own Data Governance – IT or Business?
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 

Kürzlich hochgeladen

GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyEthan lee
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...anilsa9823
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessAggregage
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Servicediscovermytutordmt
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurSuhani Kapoor
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth MarketingShawn Pang
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechNewman George Leech
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetDenis Gagné
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024christinemoorman
 

Kürzlich hochgeladen (20)

GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case studyThe Coffee Bean & Tea Leaf(CBTL), Business strategy case study
The Coffee Bean & Tea Leaf(CBTL), Business strategy case study
 
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
Lucknow 💋 Escorts in Lucknow - 450+ Call Girl Cash Payment 8923113531 Neha Th...
 
Sales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for SuccessSales & Marketing Alignment: How to Synergize for Success
Sales & Marketing Alignment: How to Synergize for Success
 
Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.Eni 2024 1Q Results - 24.04.24 business.
Eni 2024 1Q Results - 24.04.24 business.
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls in Gomti Nagar - 7388211116 - With room Service
Call Girls in Gomti Nagar - 7388211116  - With room ServiceCall Girls in Gomti Nagar - 7388211116  - With room Service
Call Girls in Gomti Nagar - 7388211116 - With room Service
 
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service JamshedpurVIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
VIP Call Girl Jamshedpur Aashi 8250192130 Independent Escort Service Jamshedpur
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
Tech Startup Growth Hacking 101  - Basics on Growth MarketingTech Startup Growth Hacking 101  - Basics on Growth Marketing
Tech Startup Growth Hacking 101 - Basics on Growth Marketing
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
RE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman LeechRE Capital's Visionary Leadership under Newman Leech
RE Capital's Visionary Leadership under Newman Leech
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature SetCreating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
Creating Low-Code Loan Applications using the Trisotech Mortgage Feature Set
 
The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024The CMO Survey - Highlights and Insights Report - Spring 2024
The CMO Survey - Highlights and Insights Report - Spring 2024
 

Data-Ed Webinar: Data Quality Strategies - From Data Duckling to Successful Swan

  • 1. Data Quality Strategies From Data Duckling to Successful Swan Peter Aiken, Ph.D. • DAMA International President 2009-2013 • DAMA International Achievement Award 2001 (with Dr. E. F. "Ted" Codd • DAMA International Community Award 2005 Peter Aiken, Ph.D. • 33+ years in data management • Repeated international recognition • Founder, Data Blueprint (datablueprint.com) • Associate Professor of IS (vcu.edu) • DAMA International (dama.org) • 10 books and dozens of articles • Experienced w/ 500+ data management practices • Multi-year immersions:
 – US DoD (DISA/Army/Marines/DLA)
 – Nokia
 – Deutsche Bank
 – Wells Fargo
 – Walmart
 – … PETER AIKEN WITH JUANITA BILLINGS FOREWORD BY JOHN BOTTEGA MONETIZING DATA MANAGEMENT Unlocking the Value in Your Organization’s Most Important Asset. The Case for the Chief Data Officer Recasting the C-Suite to Leverage Your MostValuable Asset Peter Aiken and Michael Gorman 2 Copyright 2017 by Data Blueprint Slide #
  • 2. 3Copyright 2017 by Data Blueprint Slide # 1. Data Quality in Context of Data Management 2. DQE Definition 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tool Sets 7. Takeaways and Q&A Data Quality Strategies 4Copyright 2017 by Data Blueprint Slide # • Before further construction could proceed • No IT equivalent Our barn had to pass a foundation inspection
  • 3. You can accomplish Advanced Data Practices without becoming proficient in the Foundational Data Practices however 
 this will: • Take longer • Cost more • Deliver less • Present 
 greater
 risk
(with thanks to 
 Tom DeMarco) Data Management Practices Hierarchy Advanced 
 Data 
 Practices • MDM • Mining • Big Data • Analytics • Warehousing • SOA Foundational Data Practices Data Platform/Architecture Data Governance Data Quality Data Operations Data Management Strategy Technologies Capabilities 5Copyright 2017 by Data Blueprint Slide # DMM℠ Structure of 
 5 Integrated 
 DM Practice Areas Data architecture implementation Data 
 Governance Data 
 Management
 Strategy Data 
 Operations Platform
 Architecture Supporting
 Processes Maintain fit-for-purpose data, efficiently and effectively 6Copyright 2017 by Data Blueprint Slide # Manage data coherently Manage data assets professionally Data life cycle management Organizational support Data 
 Quality
  • 4. Data architecture implementation Maintain fit-for-purpose data, efficiently and effectively Manage data coherently Manage data assets professionally Data life cycle management Organizational support DMM℠ Structure of 
 5 Integrated 
 DM Practice Areas Data 
 Governance Data 
 Management
 Strategy Data 
 Operations Platform
 Architecture Supporting
 Processes 7Copyright 2017 by Data Blueprint Slide # Data 
 Quality 3 3 33 1 The DAMA Guide to the Data Management 
 Body of 
 Knowledge 8Copyright 2017 by Data Blueprint Slide # Data 
 Management Functions fromTheDAMAGuidetotheDataManagementBodyofKnowledge©2009byDAMAInternational • Good enough 
 to criticize – All models 
 are wrong – Some models 
 are useful • Missing two 
 important concepts – Optionality – Dependency
  • 5. Overview: Data Quality Engineering 9 Copyright 2017 by Data Blueprint Slide # 10Copyright 2017 by Data Blueprint Slide # Organizational
 Strategy Data Strategy Data Governance Data Quality and Data Governance in Context Data asset support for 
 organizational strategy What the data assets do to support strategy (business goals) How well the data strategy is working (metadata) Data Quality Governance of quality aspects of data assets Evolutionary feedback about the current focus
  • 6. 11Copyright 2017 by Data Blueprint Slide # 1. Data Quality in Context of Data Management 2. DQE Definition 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tool Sets 7. Takeaways and Q&A Data Quality Strategies Data Data Data Information Fact Meaning Request A Model Specifying Relationships Among Important Terms [Built on definition by Dan Appleton 1983] Intelligence Use 1. Each FACT combines with one or more MEANINGS. 2. Each specific FACT and MEANING combination is referred to as a DATUM. 3. An INFORMATION is one or more DATA that are returned in response to a specific REQUEST 4. INFORMATION REUSE is enabled when one FACT is combined with more than one MEANING. 5. INTELLIGENCE is INFORMATION associated with its USES. Wisdom & knowledge are 
 often used synonymously Data Data Data Data 12 Copyright 2017 by Data Blueprint Slide #
  • 7. Definitions • Quality Data – Fit for purpose meets the requirements of its authors, users, 
 and administrators (adapted from Martin Eppler) – Synonymous with information quality, since poor data quality 
 results in inaccurate information and poor business performance • Data Quality Management – Planning, implementation and control activities that apply quality 
 management techniques to measure, assess, improve, and 
 ensure data quality – Entails the "establishment and deployment of roles, responsibilities 
 concerning the acquisition, maintenance, dissemination, and 
 disposition of data" http://www2.sas.com/proceedings/sugi29/098-29.pdf ✓ Critical supporting process from change management ✓ Continuous process for defining acceptable levels of data quality to meet business needs and for ensuring that data quality meets these levels • Data Quality Engineering – Recognition that data quality solutions cannot not managed but must be engineered – Engineering is the application of scientific, economic, social, and practical knowledge in order to design, build, and maintain solutions to data quality challenges – Engineering concepts are generally not known and understood within IT or business! 13 Copyright 2017 by Data Blueprint Slide # Spinach/Popeye story from http://it.toolbox.com/blogs/infosphere/spinach-how-a-data-quality-mistake-created-a-myth-and-a-cartoon-character-10166 Improving Data Quality during System Migration • Challenge – Millions of NSN/SKUs 
 maintained in a catalog – Key and other data stored in 
 clear text/comment fields – Original suggestion was manual 
 approach to text extraction – Left the data structuring problem unsolved • Solution – Proprietary, improvable text extraction process – Converted non-tabular data into tabular data – Saved a minimum of $5 million – Literally person centuries of work Copyright 2017 by Data Blueprint Slide # 14
  • 8. Unmatched Items Ignorable Items Items Matched Week # (% Total) (% Total) (% Total) 1 31.47% 1.34% N/A 2 21.22% 6.97% N/A 3 20.66% 7.49% N/A 4 32.48% 11.99% 55.53% … … … … 14 9.02% 22.62% 68.36% 15 9.06% 22.62% 68.33% 16 9.53% 22.62% 67.85% 17 9.5% 22.62% 67.88% 18 7.46% 22.62% 69.92% Determining Diminishing Returns Copyright 2017 by Data Blueprint Slide # 15 Before After Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits Copyright 2017 by Data Blueprint Slide # 16
  • 9. Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits Copyright 2017 by Data Blueprint Slide # 17 Time needed to review all NSNs once over the life of the project: NSNs 150,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 750,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 750,000 Minutes available person/year 108,000 Total Person-Years 7 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 7 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $420,000 Time needed to review all NSNs once over the life of the project: NSNs 2,000,000 Average time to review & cleanse (in minutes) 5 Total Time (in minutes) 10,000,000 Time available per resource over a one year period of time: Work weeks in a year 48 Work days in a week 5 Work hours in a day 7.5 Work minutes in a day 450 Total Work minutes/year 108,000 Person years required to cleanse each NSN once prior to migration: Minutes needed 10,000,000 Minutes available person/year 108,000 Total Person-Years 92.6 Resource Cost to cleanse NSN's prior to migration: Avg Salary for SME year (not including overhead) $60,000.00 Projected Years Required to Cleanse/Total DLA Person Year Saved 93 Total Cost to Cleanse/Total DLA Savings to Cleanse NSN's: $5.5 million Quantitative Benefits Copyright 2017 by Data Blueprint Slide # 18
  • 10. Data Quality Misconceptions • You can fix the data • Data quality is an IT problem • The problem is in the data sources or data entry • The data warehouse will provide a single version of the truth • The new system will provide a single version of the truth • Standardization will eliminate the problem of different "truths" represented in the reports or analysis Source: Business Intelligence solutions, Athena Systems 19 Copyright 2017 by Data Blueprint Slide # • It was six men of Indostan, To learning much inclined,
 Who went to see the Elephant
 (Though all of them were blind),
 That each by observation
 Might satisfy his mind. • The First approached the Elephant,
 And happening to fall
 Against his broad and sturdy side,
 At once began to bawl:
 "God bless me! but the Elephant
 Is very like a wall!" • The Second, feeling of the tusk
 Cried, "Ho! what have we here,
 So very round and smooth and sharp? To me `tis mighty clear
 This wonder of an Elephant
 Is very like a spear!" • The Third approached the animal,
 And happening to take
 The squirming trunk within his hands, Thus boldly up he spake:
 "I see," quoth he, "the Elephant
 Is very like a snake!" • The Fourth reached out an eager hand, And felt about the knee:
 "What most this wondrous beast is like Is mighty plain," quoth he;
 "'Tis clear enough the Elephant 
 Is very like a tree!" • The Fifth, who chanced to touch the ear, Said: "E'en the blindest man
 Can tell what this resembles most;
 Deny the fact who can,
 This marvel of an Elephant
 Is very like a fan!" • The Sixth no sooner had begun
 About the beast to grope,
 Than, seizing on the swinging tail
 That fell within his scope.
 "I see," quoth he, "the Elephant
 Is very like a rope!" • And so these men of Indostan
 Disputed loud and long,
 Each in his own opinion
 Exceeding stiff and strong,
 Though each was partly in the right,
 And all were in the wrong! The Blind Men and the Elephant (Source: John Godfrey Saxe's ( 1816-1887) version of the famous Indian legend ) 20 Copyright 2017 by Data Blueprint Slide #
  • 11. No universal conception of data quality exists, instead many differing perspective compete • Problem: – Most organizations approach 
 data quality problems in the same way 
 that the blind men approached the elephant - people tend to see only the data that is in front of them – Little cooperation across boundaries, just as the blind men were unable to convey their impressions about the elephant to recognize the entire entity. – Leads to confusion, disputes and narrow views • Solution: – Data quality engineering can help achieve a more complete picture and facilitate cross boundary communications 21 Copyright 2017 by Data Blueprint Slide # Quality Data is ... 22Copyright 2017 by Data Blueprint Slide # Fit For Purpose
  • 12. Famous Words? • Question: – Why haven't organizations taken a 
 more proactive approach to data quality? • Answer: – Fixing data quality problems is not easy – It is dangerous -- they'll come after you – Your efforts are likely to be misunderstood – You could make things worse – Now you get to fix it • A single data quality 
 issue can grow 
 into a significant, 
 unexpected 
 investment 23Copyright 2017 by Data Blueprint Slide # 24Copyright 2017 by Data Blueprint Slide # 1. Data Quality in Context of Data Management 2. DQE Definition 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tool Sets 7. Takeaways and Q&A Data Quality Strategies
  • 13. Four ways to make your data sparkle! 1.Prioritize the task – Cleaning data is costly and time 
 consuming – Identify mission critical/non-mission 
 critical data 2.Involve the data owners – Seek input of business units on what constitutes "dirty" data 3.Keep future data clean – Incorporate processes and technologies that check every zip code and area code 4.Align your staff with business – Align IT staff with business units (Source: CIO JULY 1 2004) 25 Copyright 2017 by Data Blueprint Slide # Structured Data Quality Engineering 1. Allow the form of the 
 Problem to guide the 
 form of the solution 2. Provide a means of 
 decomposing the problem 3. Feature a variety of tools 
 simplifying system understanding 4. Offer a set of strategies for evolving a design solution 5. Provide criteria for evaluating the quality of the various solutions 6. Facilitate development of a framework for developing organizational knowledge. 26 Copyright 2017 by Data Blueprint Slide #
  • 14. The DQE Cycle • Deming cycle • "Plan-do-study-act" or 
 "plan-do-check-act" 1. Identifying data issues that are critical to the achievement of business objectives 2. Defining business requirements for data quality 3. Identifying key data quality dimensions 4. Defining business rules critical to ensuring high quality data 27 Copyright 2017 by Data Blueprint Slide # The DQE Cycle: (1) Plan • Plan for the assessment of the current state and identification of key metrics for measuring quality • The data quality engineering team assesses the scope of known issues – Determining cost and impact – Evaluating alternatives for addressing them 28 Copyright 2017 by Data Blueprint Slide #
  • 15. The DQE Cycle: (2) Deploy • Deploy processes for measuring and improving the quality of data: • Data profiling – Institute inspections and monitors to identify data issues when they occur – Fix flawed processes that are the root cause of data errors or correct errors downstream – When it is not possible to correct errors at their source, correct them at their earliest point in the data flow 29 Copyright 2017 by Data Blueprint Slide # The DQE Cycle: (3) Monitor • Monitor the quality of data as measured against the defined business rules • If data quality meets defined thresholds for acceptability, the processes are in control and the level of data quality meets the business requirements • If data quality falls below acceptability thresholds, notify data stewards so they can take action during the next stage 30 Copyright 2017 by Data Blueprint Slide #
  • 16. The DQE Cycle: (4) Act • Act to resolve any identified issues to improve data quality and better meet business expectations • New cycles begin as new data sets come under investigation or as new data quality requirements are identified for existing data sets 31 Copyright 2017 by Data Blueprint Slide # DQE Context & Engineering Concepts • Can rules be implemented stating that no data can be corrected unless the source of the error has been discovered and addressed? • All data must 
 be 100% 
 perfect? • Pareto – 80/20 rule – Not all data 
 is of equal 
 Importance • Scientific, 
 economic, 
 social, and 
 practical 
 knowledge 32Copyright 2017 by Data Blueprint Slide #
  • 17. 33Copyright 2017 by Data Blueprint Slide # 1. Data Quality in Context of Data Management 2. DQE Definition 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tool Sets 7. Takeaways and Q&A Data Quality Strategies Two Distinct Activities Support Quality Data • Data quality best practices depend on both – Practice-oriented activities – Structure-oriented activities 34Copyright 2017 by Data Blueprint Slide # Practice-oriented activities focus on the capture and manipulation of data Structure-oriented activities focus on the data implementation Quality Data
  • 18. Practice-Oriented Activities • Stem from a failure to rigor when capturing/manipulating data such as: – Edit masking – Range checking of input data – CRC-checking of transmitted data • Affect the Data Value Quality and Data Representation Quality • Examples of improper practice-oriented activities: – Allowing imprecise or incorrect data to be collected when requirements specify otherwise – Presenting data out of sequence • Typically diagnosed in bottom-up manner: find and fix the resulting problem • Addressed by imposing 
 more rigorous 
 data-handling/governance 35 Copyright 2017 by Data Blueprint Slide # 
 Practice-oriented activities 
 Quality of Data Values 
 Quality of Data Representation Knee Surgery 36Copyright 2017 by Data Blueprint Slide #
  • 19. Structure-Oriented Activities • Occur because of data and metadata that has been arranged imperfectly. For example: – When the data is in the system but we just can't access it; – When a correct data value is provided as the wrong response to a query; or – When data is not provided because it is unavailable or inaccessible • Developer focus within system boundaries instead of within organization boundaries • Affect the Data Model Quality and Data Architecture Quality • Examples of improper structure-oriented activities: – Providing a correct response but incomplete data to a query because the user did not comprehend the system data structure – Costly maintenance of inconsistent data used by redundant systems • Typically diagnosed in 
 top-down manner: root 
 cause fixes • Addressed through 
 fundamental data structure 
 governance 37 Copyright 2017 by Data Blueprint Slide # 
 Quality of 
 Data Models 
 Quality of 
 Data Architecture Structure-oriented activities New York Turns to Data to Solve Big Tree Problem • NYC – 2,500,000 trees • 11-months from 2009 to 2010 – 4 people were killed or seriously injured by falling tree limbs in 
 Central Park alone • Belief – Arborists believe that pruning and otherwise maintaining trees can keep them healthier and make them more likely to withstand a storm, decreasing the likelihood of property damage, injuries and deaths • Until recently – No research or data to back it up 38 Copyright 2017 by Data Blueprint Slide # http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05
  • 20. NYC's Big Tree Problem • Question – Does pruning trees in one year reduce the 
 number of hazardous tree conditions in the 
 following year? • Lots of data but granularity challenges – Pruning data recorded block by block – Cleanup data recorded at the address level – Trees have no unique identifiers • After downloading, cleaning, merging, analyzing and intensive modeling – Pruning trees for certain types of hazards caused a 22 percent reduction in the number of times the department had to send a crew for emergency cleanups • The best data analysis – Generates further questions • NYC cannot prune each block every year – Building block risk profiles: number of trees, types of trees, whether the block is in a flood zone or storm zone 39 Copyright 2017 by Data Blueprint Slide # http://www.computerworld.com/s/article/9239793/New_York_Turns_to_Big_Data_to_Solve_Big_Tree_Problem?source=CTWNLE_nlt_datamgmt_2013-06-05 Quality Dimensions 40 Copyright 2017 by Data Blueprint Slide #
  • 21. 4 Dimensions of Data Quality An organization’s overall data quality is a function of four distinct components, each with its own attributes: • Data Value: the quality of data as stored & maintained in the system • Data Representation – the quality of representation for stored values; perfect data values stored in a system that are inappropriately represented can be harmful • Data Model – the quality of data logically representing user requirements related to data entities, associated attributes, and their relationships; essential for effective communication among data suppliers and consumers • Data Architecture – the coordination of data management activities in cross-functional system development and operations 41 Copyright 2017 by Data Blueprint Slide # Practice- oriented Structure- oriented Effective Data Quality Engineering • Data quality engineering has been focused on operational problem correction – Directing attention to practice-oriented data imperfections • Data quality engineering is more effective when also focused on structure-oriented causes – Ensuring the quality of shared data across system boundaries 42 Copyright 2017 by Data Blueprint Slide # Data Representation Quality As presented to the user Data Value Quality As maintained in the system Data Model Quality As understood by developers Data Architecture Quality As an organizational asset (closer to the architect)(closer to the user)
  • 22. Full Set of Data Quality Attributes 43 Copyright 2017 by Data Blueprint Slide # Difficult to obtain leverage at the bottom of the falls 44 Copyright 2017 by Data Blueprint Slide #
  • 23. Frozen Falls 45 Copyright 2017 by Data Blueprint Slide # 46Copyright 2017 by Data Blueprint Slide # 1. Data Quality in Context of Data Management 2. DQE Definition 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tool Sets 7. Takeaways and Q&A Data Quality Strategies
  • 24. Data acquisition activities Data usage activitiesData storage Traditional Quality Life Cycle 47 Copyright 2017 by Data Blueprint Slide # restored data 
 Metadata 
 Creation 
 Metadata Refinement 
 
 Metadata Structuring 
 Data Utilization 
 Data Manipulation 
 
 
 Data Creation Data Storage 
 
 Data Assessment 
 
 Data 
 Refinement Data Life Cycle Model Products 48 Copyright 2017 by Data Blueprint Slide # data architecture & models populated data models and storage locations data values data
 values data
 values value
 defects structure
 defects architecture
 refinements model
 refinements data
  • 25. architecture & model quality 
 
 Data 
 Refinement 
 Data Utilization 
 Data Manipulation representation quality restored data 
 Metadata Refinement 
 
 Metadata Structuring 
 
 
 Data Creation Data Storage 
 
 Data Assessment Data Life Cycle Model: Quality Focus 49 Copyright 2017 by Data Blueprint Slide # populated data models and storage locations data
 values data model quality value quality value quality value quality 
 Metadata 
 Creation architecture quality Starting point for new system development data performance metadata data architecture data architecture and data models shared data updated data corrected data architecture refinements facts & meanings Metadata & Data Storage Starting point for existing systems Metadata Refinement • Correct Structural Defects • Update Implementation Metadata Creation • Define Data Architecture • Define Data Model Structures Metadata Structuring • Implement Data Model Views • Populate Data Model Views Data Refinement • Correct Data Value Defects • Re-store Data Values Data Manipulation • Manipulate Data • Updata Data Data Utilization • Inspect Data • Present Data Data Creation • Create Data • Verify Data Values Data Assessment • Assess Data Values • Assess Metadata Extended data life cycle model with metadata sources and uses 50 Copyright 2017 by Data Blueprint Slide #
  • 26. 51Copyright 2017 by Data Blueprint Slide # 1. Data Quality in Context of Data Management 2. DQE Definition 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tool Sets 7. Takeaways and Q&A Data Quality Strategies Profile, Analyze and Assess DQ • Data assessment using 2 different approaches: – Bottom-up – Top-down • Bottom-up assessment: – Inspection and evaluation of the data sets – Highlight potential issues based on the 
 results of automated processes • Top-down assessment: – Engage business users to document 
 their business processes and the 
 corresponding critical data dependencies – Understand how their processes 
 consume data and which data elements 
 are critical to the success of the business 
 applications 52 Copyright 2017 by Data Blueprint Slide #
  • 27. Define DQ Measures • Measures development occurs as part of the strategy/design/plan step • Process for defining data quality measures: 1. Select one of the identified critical business impacts 2. Evaluate the dependent data elements, create and update processes associate with that business impact 3. List any associated data requirements 4. Specify the associated dimension of data quality and one or more business rules to use to determine conformance of the data to expectations 5. Describe the process for measuring conformance 6. Specify an acceptability threshold 53 Copyright 2017 by Data Blueprint Slide # Set and Evaluate DQ Service Levels • Data quality inspection and 
 monitoring are used to 
 measure and monitor 
 compliance with defined 
 data quality rules • Data quality SLAs specify 
 the organization’s expectations for response and remediation • Operational data quality control defined in data quality SLAs includes: – Data elements covered by the agreement – Business impacts associated with data flaws – Data quality dimensions associated with each data element – Quality expectations for each data element of the identified dimensions in each application for system in the value chain – Methods for measuring against those expectations – (…) 54 Copyright 2017 by Data Blueprint Slide #
  • 28. Measure, Monitor & Manage DQ • DQM procedures depend on 
 available data quality measuring 
 and monitoring services • 2 contexts for control/measurement 
 of conformance to data quality 
 business rules exist: – In-stream: collect in-stream measurements while creating data – In batch: perform batch activities on collections of data instances assembled in a data set • Apply measurements at 3 levels of granularity: – Data element value – Data instance or record – Data set 55 Copyright 2017 by Data Blueprint Slide # Overview: Data Quality Tools • 4 categories of activities: – Analysis – Cleansing – Enhancement – Monitoring 
 
 
 
 
 
 
 
 
 
 
 
 
 
 • Principal tools: – Data Profiling – Parsing and Standardization – Data Transformation – Identity Resolution and Matching – Enhancement – Reporting 56 Copyright 2017 by Data Blueprint Slide #
  • 29. DQ Tool Set #1: Data Profiling • Data profiling is the assessment of 
 value distribution and clustering of 
 values into domains • Need to be able to distinguish 
 between good and bad data before 
 making any improvements • Data profiling is a set of algorithms 
 for 2 purposes: – Statistical analysis and assessment of the data quality values within a data set – Exploring relationships that exist between value collections within and across data sets • At its most advanced, data profiling takes a series of prescribed rules from data quality engines. It then assesses the data, annotates and tracks violations to determine if they comprise new or inferred data quality rules 57 Copyright 2017 by Data Blueprint Slide # DQ Tool Set #1: Data Profiling, cont’d • Data profiling vs. data quality-business context and semantic/ logical layers – Data quality is concerned with proscriptive rules – Data profiling looks for patterns when rules are adhered to and when rules are violated; able to provide input into the business context layer • Incumbent that data profiling services notify all concerned parties of whatever is discovered • Profiling can be used to… – …notify the help desk that valid 
 changes in the data are about to 
 case an avalanche of “skeptical 
 user” calls – …notify business analysts of 
 precisely where they should be 
 working today in terms of shifts 
 in the data 58 Copyright 2017 by Data Blueprint Slide #
  • 30. Courtesy GlobalID.com 59 Copyright 2017 by Data Blueprint Slide # DQ Tool Set #2: Parsing & Standardization • Data parsing tools enable the definition 
 of patterns that feed into a rules engine 
 used to distinguish between valid 
 and invalid data values • Actions are triggered upon matching 
 a specific pattern • When an invalid pattern is recognized, 
 the application may attempt to 
 transform the invalid value into one that meets expectations • Data standardization is the process of conforming to a set of business rules and formats that are set up by data stewards and administrators • Data standardization example: – Brining all the different formats of “street” into a single format, e.g. “STR”, “ST.”, “STRT”, “STREET”, etc. 60 Copyright 2017 by Data Blueprint Slide #
  • 31. DQ Tool Set #3: Data Transformation • Upon identification of data errors, trigger data rules to transform the flawed data • Perform standardization and guide rule-based transformations by mapping data values in their original formats and patterns into a target representation • Parsed components of a pattern are subjected to rearrangement, corrections, or any changes as directed by the rules in the knowledge base 61 Copyright 2017 by Data Blueprint Slide # DQ Tool Set #4: Identify Resolution & Matching • Data matching enables analysts to identify relationships between records for de-duplication or group-based processing • Matching is central to maintaining data consistency and integrity throughout the enterprise • The matching process should be used in 
 the initial data migration of data into a 
 single repository • 2 basic approaches to matching: • Deterministic – Relies on defined patterns/rules for assigning 
 weights and scores to determine similarity – Predictable – Dependent on rules developers anticipations • Probabilistic – Relies on statistical techniques for assessing the probability that any pair of record represents the same entity – Not reliant on rules – Probabilities can be refined based on experience -> matchers can improve precision as more data is analyzed 62 Copyright 2017 by Data Blueprint Slide #
  • 32. DQ Tool Set #5: Enhancement • Definition: – A method for adding value to information by accumulating additional information about a base set of entities and then merging all the sets of information to provide a focused view. Improves master data. • Benefits: – Enables use of third party data sources – Allows you to take advantage of the information and 
 research carried out by external data vendors to 
 make data more meaningful and useful • Examples of data enhancements: – Time/date stamps – Auditing information – Contextual information – Geographic information – Demographic information – Psychographic information 63 Copyright 2017 by Data Blueprint Slide # DQ Tool Set #6: Reporting • Good reporting supports: – Inspection and monitoring of conformance to data quality expectations – Monitoring performance of data stewards conforming to data quality SLAs – Workflow processing for data quality incidents – Manual oversight of data cleansing and correction • Data quality tools provide dynamic reporting and monitoring capabilities • Enables analyst and data stewards to support and drive the methodology for ongoing DQM and improvement with a single, easy-to-use solution • Associate report results with: – Data quality measurement – Metrics – Activity 64 Copyright 2017 by Data Blueprint Slide #
  • 33. 65Copyright 2017 by Data Blueprint Slide # 1. Data Quality in Context of Data Management 2. DQE Definition 3. DQE Cycle & Contextual Complications 4. DQ Causes and Dimensions 5. Quality and the Data Life Cycle 6. DDE Tool Sets 7. Takeaways and Q&A Data Quality Strategies Guiding Principles • Manage data as a core organizational asset. • Identify a gold record for all data elements • All data elements will have a standardized data
 definition, data type, and acceptable value domain • Leverage data governance for the control and performance of DQM • Use industry and international data standards whenever possible • Downstream data consumers specify data quality expectations • Define business rules to assert conformance to data quality expectations • Validate data instances and data sets against defined business rules • Business process owners will agree to and abide by data quality SLAs • Apply data corrections at the original source if possible • If it is not possible to correct data at the source, forward data corrections to the owner of the original source. Influence on data brokers to conform to local requirements may be limited • Report measured levels of data quality to appropriate data stewards, business process owners, and SLA managers 66 Copyright 2017 by Data Blueprint Slide #
  • 34. Goals and Principles • To measurably improve the quality of data in relation to defined business expectations • To define requirements and specifications for integrating data quality control into the system development life cycle • To provide defined processes for measuring, monitoring, and reporting conformance to acceptable levels of data quality 67 Copyright 2017 by Data Blueprint Slide # Summary: Data Quality Engineering 68 Copyright 2017 by Data Blueprint Slide #
  • 35. Upcoming Events Data-Ed Online: The Seven Deadly Data Sins - 
 Emerging from Management Purgatory November 14, 2017 @ 2:00 PM ET/11:00 AM PT Data-Ed Online: Metadata Strategies - Data Squared December 13, 2012 @ 2:00 PM ET/11:00 AM PT Sign up here: www.datablueprint.com/webinar-schedule or www.dataversity.net 69 Copyright 2017 by Data Blueprint Slide # References & Recommended Reading 70Copyright 2017 by Data Blueprint Slide #
  • 36. Data Quality Dimensions 71Copyright 2017 by Data Blueprint Slide # Data Value Quality 72Copyright 2017 by Data Blueprint Slide #
  • 37. Data Representation Quality 73Copyright 2017 by Data Blueprint Slide # Data Model Quality 74Copyright 2017 by Data Blueprint Slide #
  • 38. Data Architecture Quality 75Copyright 2017 by Data Blueprint Slide # Questions? 76 Copyright 2017 by Data Blueprint Slide # + = It’s your turn! Use the chat feature or Twitter (#dataed) to submit your questions to Peter now.
  • 39. 10124 W. Broad Street, Suite C Glen Allen, Virginia 23060 804.521.4056