Tackling Data Quality problems requires more than a series of tactical, one-off improvement projects. By their nature, many Data Quality problems extend across and often beyond an organization. Addressing these issues requires a holistic architectural approach combining people, process, and technology. Join Nigel Turner and Donna Burbank as they provide practical ways to control Data Quality issues in your organization.
Aspirational Block Program Block Syaldey District - Almora
Data Quality Best Practices
1. Copyright Global Data Strategy, Ltd. 2022
Data Quality Best Practices
Donna Burbank and Nigel Turner
Global Data Strategy, Ltd.
August 25th, 2022
Follow on Twitter @donnaburbank, @nigelturner8
@GlobalDataStrat
Twitter Event hashtag: #DAStrategies
1
2. Global Data Strategy, Ltd. 2022
Donna Burbank
2
• Recognized industry expert in information
management with over 25 years of
experience in data strategy, information
management, data modeling, metadata
management, and enterprise architecture
• Managing Director at Global Data Strategy,
Ltd., an international information
management consulting company that
specializes in the alignment of business
drivers with data-centric technology
• Worked with dozens of Fortune 500
companies worldwide in the Americas,
Europe, Asia, and Africa and speaks
regularly at industry conferences
• Excellence in Data Management Award
from DAMA International
• Past President and Advisor to the DAMA
Rocky Mountain chapter
• Co-author of several books on data
management
• Regular contributor to industry
publications
• She can be reached at
donna.burbank@globaldatastrategy.com
Donna is based in Boulder, Colorado, US
Follow on Twitter @donnaburbank
@GlobalDataStrat
3. Global Data Strategy, Ltd. 2022
• Worked in Information Management
(IM) and related areas for over 30
years. Experience has embraced Data
Governance, Information Strategy,
Data Quality, Data Governance, Master
Data Management & Business
Intelligence.
• Principal Consultant, EMEA, for Global
Data Strategy, Ltd.
• Spent much of his career in British
Telecommunications Group (BT)
where he led a series of enterprise-
wide IM & data governance initiatives.
• Also been VP of Information
Management Strategy at Harte Hanks
Trillium Software, and Principal
Consultant at FromHereOn and IPL.
• Nigel is very active in professional Data
Management organizations and is an
elected Data Management Association
(DAMA) UK Committee member.
• He was the joint winner of DAMA
International’s 2015 Community Award
for the work he initiated and led in
setting up a mentoring scheme in the
UK where experienced DAMA
professionals coach and support newer
data management professionals.
• Nigel is based in Cardiff, Wales, UK.
Follow on Twitter @NigelTurner8
Today’s hashtag: # DAStrategies
Nigel Turner
3
4. Global Data Strategy, Ltd. 2022
DATAVERSITY Data Architecture Strategies
• January Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February Building a Data Strategy - Practical Steps for Aligning with Business Goals
• March Master Data Management – Aligning Data, Process, and Governance
• April Data Governance & Data Architecture: Alignment & Synergies
• May Improving Data Literacy Around Data Architecture
• June Business Intelligence & Data Analytics: An Architected Approach
• July Best Practices in Metadata Management
• August Data Quality Best Practices – with special guest Nigel Turner
• September Business-centric Data Modeling
• October Graph Databases: Benefits & Risks
• December Enterprise Architecture vs. Data Architecture
4
This Year’s Lineup
5. Global Data Strategy, Ltd. 2022 5
What We’ll Cover Today – Our Agenda
• Explore the Data Management Lifecycle and why data
quality management is a ‘must have’ across all stages
of the lifecycle
• Outline specific data quality emphases and approaches
that support data quality in specific stages of the Data
Management Lifecycle
6. Global Data Strategy, Ltd. 2022 6
A Successful Data Strategy links Business Goals with Technology Solutions
“Top-Down” alignment with
business priorities
“Bottom-Up” management &
inventory of data sources
Managing the people, process,
policies & culture around data
Coordinating & integrating
disparate data sources
Leveraging & managing data for
strategic advantage
Data Quality is Part of a Wider Data Strategy
www.globaldatastrategy.com
7. Global Data Strategy, Ltd. 2022
Data Quality Remains a Growing Priority
7
0% 10% 20% 30% 40% 50% 60% 70%
Data Strategy
Data Quality
Data Architecture
Data Governance
Data Security
Business Intelligence & Reporting
Data Warehouse
0% 5% 10% 15% 20% 25% 30% 35% 40% 45%
Data Architecture
Metadata Management
Master Data Management
Data Quality
Self-Service Reporting & Analytics
Data Strategy
Data Governance
Data Priorities in
Place for 2021
Future Data
Priorities
in 2022-2023
Data Quality is #6
Data Quality is #4
From the 2020 DATAVERSITY survey on “Trends in Data Management”, by Donna Burbank & Michelle Knight
Available for download at: https://globaldatastrategy.com/resources/white-papers/
8. Global Data Strategy, Ltd. 2022
Primary Data Quality Challenges
8
Headline Findings:
• Data quality challenges remain holistic, embracing
people, process and technology issues
• Business challenges include culture change, skills,
training, senior management buy in etc.
• Technical challenges include scaling, handling new
data types, and managing data quality across
multiple platforms and sources
• Challenges embrace all stages of the data lifecycle,
from data creation to data usage and publication
Source: 2022 State of Data Quality, TDWI, James Kobielus
Available for download at https://globaldatastrategy.com/resources/white-papers/
9. Global Data Strategy, Ltd. 2022
Data Quality: the 1:10:100 Principle
9
$1
$10
$100
Cost of Prevention
Cost of Failure
Cost of Correction
The 1:10:100 Principle
It costs exponentially more
to identify and correct data
quality problems and data
errors the later they are
identified and addressed
10. Global Data Strategy, Ltd. 2022
Customer touch points
Wrong address
X
The 1:10:100 Principle – A Use Case from British
Telecommunications (BT) • Incorrect or incomplete addresses input on
data entry led to huge costs of failure,
impacting:
• Network connection
• Customer equipment installation
• Customer billing
• Fault repair
• Customer marketing
• Electronic directory enquiries and paper
directories
• Emergency services (999 / 911)
• All of these resulted in failures at different
customer touch points
• Conclusion:
• Minimize data entry errors!
• Any costs incurred in enhancing address
data entry (e.g. business process
changes, drop down menus, MDM etc.)
insignificant when compared to
downstream failure costs
10
11. Global Data Strategy, Ltd. 2022
How This Led to Bigger Data Quality Successes in BT
11
• Address improvement became the first major project in
a data quality improvement program
• The program eventually ran for 10 years – adapting and
evolving to reflect BT’s transforming business
• Over 75 Data Quality improvement projects completed
• Projects ranged from tactical data cleanses to strategic
enterprise wide projects, e.g. Customer MDM (including
address)
• Benefits derived from process efficiencies, capital
expenditure avoidance, better asset utilization, revenue
recovery etc.
• Total audited bottom line benefits exceeded $1 billion +
• Praised by Gartner, Forrester, OVUM, Tom Redman and
others for its business alignment & rapid delivery cycles
12. Global Data Strategy, Ltd. 2022
Data Management Lifecycle: What Is It?
• Most living or manufactured objects have a natural lifecycle; as
an asset, data is no different
• Data is:
• Created
• Has a lifespan
• Ultimately reduces in value as its currency declines
• Is eventually discarded or disposed of
• A Data Management Lifecycle can be defined as:
A way of describing the different stages data will go through from
design and collection to dissemination and archival / destruction’
(Source: UK Government Data Quality Framework 2020)
• Organizations creating, storing, using and publishing data need to
be aware of the data lifecycle and plan to manage data
throughout its lifecycle
• How data is managed will vary according to its lifecycle stage
12
13. Global Data Strategy, Ltd. 2022
Data Management Lifecycle
13
PLAN
COLLECT /
ACQUIRE /
INGEST
PREPARE /
STORE /
MAINTAIN
USE AND
PROCESS
SHARE
AND
PUBLISH
ARCHIVE
OR
DESTROY
Sources:
• UK Government Data
Quality Framework
• DAMA Data
Management Book of
Knowledge (DMBoK)
Data
Management
Lifecycle
Stage 1
Stage 5
Stage 4 Stage 3
Stage 2
Stage 6
14. Global Data Strategy, Ltd. 2022
Data Management Lifecycle and Data Lineage
14
Data Management Lifecycle
Data Lineage
‘Data lineage… helps to determine the
data provenance for your organization. It
can provide an ongoing and continuously
updated record of where a data asset
originates, how it moves through the
organization, how it gets transformed,
where it’s stored, who accesses it and
other key metadata.’
Source: Informatica
‘A way of describing the different
stages data will go through from
design and collection to dissemination
and archival / destruction.’
Source: UK Government Data Quality
Framework 2020
The concepts of Data
Management Lifecycle and Data
Lineage overlap in that both:
• Emphasize that data is
managed through a sequence
of processes and activities
• Describe the way data changes
as it passes through the
sequence
But differ in that:
• Lifecycle Management is more
business focused as it
describes the overall maturity
of and use of data from its
creation to disposal
• Data Lineage is more
technically focused as it
describes the physical flow of
data as it moves from its point
of origin to its point of usage
15. Global Data Strategy, Ltd. 2022
Data Lifecycle and the 1:10:100 Principle
15
PLAN
COLLECT /
ACQUIRE
/ INGEST
PREPARE
/ STORE /
MAINTAIN
USE AND
PROCESS
SHARE
AND
PUBLISH
ARCHIVE
OR
DESTROY
1 10 100
Cost to correct data errors and problems
Data Management Lifecycle emphasis
Prevention Correction Cost of Failure Remediation
16. Global Data Strategy, Ltd. 2022
Stage 1: PLAN
16
PLAN
PURPOSE OF STAGE
KEY STAGE ACTIVITIES
DATA QUALITY FOCUS
DATA QUALITY CHALLENGES
• Unclear ownership and
accountability for new data
• Potential data duplication
• Data fitness for purpose
• Lack of data definitions and
standards (format & content)
• Impact on downstream data
processing
• Poor design of data collection /
acquisition / ingestion
• Identify business needs &
objectives
• Investigate if data already exists
• Develop a business case
• Produce requirements
specification, including business
rules
• Determine data provider(s) –
internal / external (Open &
Propriety)
• Design data creation / ingestion
processes
• Conduct impact analysis
• Appoint business data owner and
data steward to ensure early
accountability
• Engage key data stakeholders to
establish ‘ fitness for purpose’
• Investigate current data sources to
ensure existing sources not available
• Profile potential data sources to
quantify baseline data quality
• Define required data definitions &
standards & agree supplier SLA
• Consider data validation methods
and rules, e.g. drop down lists
Make the business case for and
create a plan for a new data
source (internal and / or
external)
17. Global Data Strategy, Ltd. 2022
Stage 2: COLLECT / ACQUIRE / INGEST
17
PURPOSE OF STAGE
KEY STAGE ACTIVITIES
DATA QUALITY FOCUS
DATA QUALITY CHALLENGES
COLLECT /
ACQUIRE /
INGEST
Design & implement data
entry / acquisition processes
and templates
• Specify and design optimum
data entry types
▪ Users
▪ Devices
▪ Applications
▪ IoT etc.
• Determine target platform(s)
(existing or new)
• Design and implement source
to target mapping
• Lack of data source metadata
• Errors in manual data entry
• Inconsistent data formats &
content
• Incomplete or inaccurate data
• Create data model(s)
• Populate business glossary / data
catalog
• Design detailed DQ rules and
enforce on data entry / ingestion
• Implement regular source data
monitoring
• Establish data quality measures
and dashboard
18. Global Data Strategy, Ltd. 2022
Stage 3: PREPARE / STORE /MAINTAIN
18
PURPOSE OF STAGE
KEY STAGE ACTIVITIES
DATA QUALITY FOCUS
DATA QUALITY CHALLENGES
PREPARE /
STORE /
MAINTAIN
Implement a stable policy
driven technical environment
to store and maintain data
• Define and implement
storage policies (e.g.
encryption, compression
etc.)
• Publish data security & data
privacy policies
• Design & implement ETL /
ELT processes
• Integrate data into existing
data stores or new data
stores
• Inappropriate data access and
uses
• Multiple data consumer needs
and requirements
• Data transformation (source to
target): too often DQ is first
considered here
• Metadata management
• Ongoing data quality validation
& reporting
• Implement role based access and
usage to data
• Embed predefined DQ rules in
ETL / ELT routines – Implement
DQ rules engine
• Implement DQ dashboards
• Actively steward the data to
ensure all data consumer needs
met
• Automatic update of data catalog
19. Global Data Strategy, Ltd. 2022
Stage 4: USE & PROCESS
19
PURPOSE OF STAGE
KEY STAGE ACTIVITIES DATA QUALITY FOCUS
DATA QUALITY CHALLENGES
USE AND
PROCESS
Data is stored and
processed in a stable
environment for use by
data consumers
• Schedule regular (real time
/ event or time driven)
data ingestion and storage
processes
• Actively maintain the data
within the technical
environment
• Monitor use and
adherence to data policies
• Lack of data quality error
monitoring and reporting
• Absence of data consumer
feedback channels to report
DQ problems
• Inaction on tackling data
quality issues
• Active data stewardship
• Generate data consumer DQ
reports & deploy feedback
workflows
• Consider MDM workflows
• Deploy DQ rules engine to
enforce DQ rules throughout
the lifespan of the data
20. Global Data Strategy, Ltd. 2022
Automating Data Quality Business Rules via a DQ Rules Engine
DATA
INPUT
DATA
WAREHOUSE
STAGING / ETL
LAYER
SOURCE
SYSTEMS
REPORTING
LAYER
DATA
MARTS
Real Time Data Validation
Batch
Validation
DATA QUALITY
RULES ENGINE
20
21. Global Data Strategy, Ltd. 2022
Use of Tools & Technology in Data Quality Management
21
Source: 2022 State of Data Quality, TDWI, James Kobielus
Available for download at https://globaldatastrategy.com/resources/white-papers/
Headline Findings:
• Tools and technologies are
increasingly being acquired
and used to support data
quality management
• Strong focus on Data
Preparation & Transformation
reinforce finding that
analytics and intelligence are
key drivers for better data
quality management
• Although only 16% currently
have an enterprise wide data
catalog, 45% intend
implementing one within the
next 12 months
Tools enable:
• Profiling
• Monitoring
• Parsing
• Standardizing
• Matching
• Merging
• Cleansing
• Correcting
• Enhancing
• Management of data
quality rules
• Data quality workflows
• Data quality reporting
22. Global Data Strategy, Ltd. 2022
Stage 5: SHARE & PUBLISH
22
PURPOSE OF STAGE
KEY STAGE ACTIVITIES
DATA QUALITY FOCUS
DATA QUALITY CHALLENGES
SHARE
AND
PUBLISH
• Make data available to all
authorized consumers and
users
• Develop pre-canned reports
and / or self-service reporting
capabilities
• Generate and maintain user
metadata
Data consumers access and
modify the data to use in BI,
analytics, visualisation etc.
• Continuing data quality
issues – accuracy,
completeness, uniqueness,
consistency etc.
• Active management and
update of metadata
• Publish report catalog &
associated metadata
• Publish user focused DQ
dashboards
• Ensure DQ workflows and
feedback mechanisms are
actively managed and acted on
• Apply automated data catalog
update processes
23. Global Data Strategy, Ltd. 2022
Stage 6: ARCHIVE & DESTROY
23
PURPOSE OF STAGE
KEY STAGE ACTIVITIES
DATA QUALITY FOCUS
DATA QUALITY CHALLENGES
ARCHIVE
OR
DESTROY
To manage data that is no
longer required for current
operational or reporting
purposes
• Establish and operate data
archiving and / or
destruction policies and
processes where data is no
longer actively used because:
▪ It is time expired
▪ Legal or regulatory
constraints demand
action
▪ A specific project ends
(e.g. analytics or data
science sandpit)
• Difficulty in identifying data
to be archived or destroyed
• Ensuring data archived is
secure, tagged and
accessible in case of future
need
• Loss of potential knowledge
/ expertise of archived data
• Ensure data retention
policies and processes are in
place and enforced
• Apply data archive security &
privacy policies
• Ensure archived data is
actively stewarded and
maintained, including
metadata
24. Global Data Strategy, Ltd. 2022
Effective Data Lifecycle Management:
Data Quality Implications
• In order to minimize costs of failure in the data management lifecycle, the
most critical stages are Stage 1 (Plan) and Stage 2 (Collect / Ingest / Acquire)
• In these stages Data Quality by Design should be the key objective
• This will ensure:
• From the outset the data is fit for purpose throughout its lifecycle
• Potential data quality issues and problems can be identified early and
preventative actions taken before usage
• Delaying fixing issues to later stages will:
• Lead to operational inefficiencies and poor decision making
• Require more effort to re-engineer the data
• Increase the costs of remediation
• Each stage of the data management lifecycle will require different KPIs and
measures to ensure effective data quality management (e.g. Data Collection v
Data Sharing & Publication)
24
25. Global Data Strategy, Ltd. 2022
Effective Data Lifecycle Management:
Data Governance Implications
• Data Quality is most effectively sustained and enforced through
Data Governance policies and practices
• Data Governance is a critical enabler to ensure end to end data
lifecycle management, including data quality management
• Data accountability needs to be assigned throughout the lifecycle
by:
• Appointing Data Owners and Data Stewards during the Plan stage of
the data management lifecycle to ensure new data creation is
required by the business and is aligned with existing data policies and
standards
• Ensuring policies and processes exist to ensure key data is governed
throughout its lifecycle
• Accountability may change as the data moves through the lifecycle
(e.g. from processing to sharing, and usage to archiving)
25
26. Global Data Strategy, Ltd. 2022
Summary
• Data quality management remains a holistic challenge
involving People, Process & Technology
• Data quality management must embrace the entire data
lifecycle from data creation to data disposal
• Even when data is at the later stages of the lifecycle
thinking about data quality from a data lifecycle
management perspective helps to focus on the root
causes of data quality problems
• Putting focus and effort on the early stages of the data
lifecycle reduces cost of failure impacts and remediation
activities and costs
• Data Governance is the key enabler for sustaining
improved data quality and needs to underpin the entire
data management lifecycle
26
27. Global Data Strategy, Ltd. 2022
Dataversity Data Architecture Strategies Series:
Related Previous Data Quality Webinars
27
https://www.dataversity.net/das-webinar-data-quality-best-practices-3/
https://www.dataversity.net/das-slides-data-quality-best-practices-2/
August 2020:
The A2E Methodology for Tackling Data Quality Problems
August 2021:
Designing and applying Business Rules to Support
Data Quality Improvement
28. Global Data Strategy, Ltd. 2022
DATAVERSITY Data Architecture Strategies
• January Emerging Trends in Data Architecture – What’s the Next Big Thing?
• February Building a Data Strategy - Practical Steps for Aligning with Business Goals
• March Master Data Management – Aligning Data, Process, and Governance
• April Data Governance & Data Architecture: Alignment & Synergies
• May Improving Data Literacy Around Data Architecture
• June Business Intelligence & Data Analytics: An Architected Approach
• July Best Practices in Metadata Management
• August Data Quality Best Practices – with special guest Nigel Turner
• September Business-centric Data Modeling
• October Graph Databases: Benefits & Risks
• December Enterprise Architecture vs. Data Architecture
28
This Year’s Lineup
29. Global Data Strategy, Ltd. 2022
Who We Are: Business-Focused Data Strategy
Maximize the Organizational Value of Your Data Investment
In today’s business environment, showing rapid time to value for
any technical investment is critical.
But technology and data can be complex. At Global Data Strategy,
we help demystify technical complexity to help you:
• Demonstrate the ROI and business value of data to your
management
• Build a data strategy at your pace to match your unique culture
and organizational style.
• Create an actionable roadmap for “quick wins”, which building
towards a long-term scalable architecture.
Global Data Strategy’s shares experience from some of the largest
international organizations scaled to the pace of your unique team.
www.globaldatastrategy.com
Global Data Strategy has worked with organizations globally in the
following industries:
Finance · Retail · Social Services · Health Care · Education · Manufacturing
· Government · Public Utilities · Construction · Media & Entertainment ·
Insurance …. and more
29