Data Stewardship is an approach to Data Governance that formalises accountability for managing information resources on behalf of others and for the best interests of the organization
Data Stewardship consists of the people, organisation, and processes to ensure that the appropriately designated stewards are responsible for the governed data.
3. Data Ownership and Accountability
› Data Stewardship is an approach to Data Governance that formalises
accountability for managing information resources on behalf of others and for
the best interests of the organization
› Data Stewardship consists of the people, organisation, and processes to
ensure that the appropriately designated stewards are responsible for the
governed data.
4. Australian companies are not formal
› Stewards may not always be known
as stewards – but they are still
needed.
› Governance should have a low entry
level and not a high compliance cost.
› Carrying out steward tasks should be
made as easy as possible.
› Good stewardship should be
rewarded.
5. Australian IT Staff have DIY attitude
› Excel lets someone build their own
report.
› Too many governance rules can alienate
the DIY staff.
› An organisation wants better sharing of
information and better management of
information assets; but many experts
want to do things their own way.
Australian DIY attitude
6. Regulators want greater maturity
In order to ensure that data risk management is not conducted in an
ad hoc and fragmented manner, a regulated entity would typically
adopt a systematic and formalised approach that ensures data risk is
taken into consideration as part of its change management and
business-as-usual processes.
APRA expects that a regulated entity would implement processes that ensure compliance with
regulatory and legal requirements and data risk management requirements. This would typically
include ongoing checks by the compliance function (or equivalent), supported by reporting
mechanisms (e.g. metrics, exceptions) and management reviews.
7. Stewardship across Lines of Business
BusinessValue
Data Stewardship Evolution
By IT System
By Organization
Pros – Easy of deployment
Cons – Propagates fragmentation of data, IT-centric
Pros – Alignment with organization structure
Cons – Propagates fragmentation of data
By Master Data Entity
Pros – Alignment with enterprise initiatives such as
single view and cross-sell/up-sell
Cons – Organization challenges, requires System of
Record (SOR)
Data Stewardship
as a Competitive
Differentiator
10. Getting Started with Stewardship
Aspiration
› Better data quality
› Reduced application development costs
› Increased productivity
› Reduced compliance issues.
Perspiration
› Subject matter experts are already too
busy
› Installation and training costs
› Extra roles needed for projects
› Takes too long to retrospectively add
governance to existing information.
11. InfoSphere Information Governance Catalog - Glossary
Benefits:
› Aligns the efforts of IT with the goals of the business
› Provides business context and governance to
information technology assets
› Establishes responsibility and accountability
throughout the information development lifecycle
› Accelerates information development
› Dramatically increases business confidence in
information assets
A meaningful directory of governed information
Hierarchical view and navigation
12. Glossary example: NBN
A point of interconnection between the
NBN and the network of an Access
Seeker, as determined by NBN Co and
notified to the Access Seeker.
NBN Co Information Paper
Access Seeker Accreditation
The connection point that allows Retail Service
Providers (RSPs) and Wholesale Service Provides
(WSPs) to connect to the NBN Co access capability.
In the field, this is the physical port on the Ethernet
Fanout Switch (EFS) switch located at NBN Co’s
PoI, where an Access Seeker connects to establish
exchange of traffic with NBN Co’s network.
NBN Co Website Glossary of Terms
Point of Interconnect (POI)
Short
andeasy
toread
Longand
technical
13. Business Glossary terms provide a common language description of information used by the
University and relationships to put that information into context
Glossary example: University
14. InfoSphere Information Governance Catalog - Compliance
› Declare the intended behavior of
information
› Leverage business terms for defining
functional scope
› Communicate precise intent for how
information must be managed
throughout its lifecycle:
− Data Discovery
− Data Modeling
− Master Data
− Reference Data
− Data Quality
− Data Archiving
− Data Privacy
− Data Security
− Data Movement
− Data Transformation
− Data Availability
Declare Information Governance Rules and track compliance
Information Governance Policy
Information
Governance
Policies & Rules
16. InfoSphere Information Governance Catalog - Lineage
View end-to-end data lineage and impact analysis across data sources
› One-click view of end-to-end
upstream and downstream data
flows
› Fast display of complex flows
› Advanced filters support
defining scope of displayed
properties
› Business Lineage display
available for non-technical
audiences
› Links to Stewards and Glossary
provide business context for
graph items
Heterogeneous
data flow reporting
17. How Data Lineage Works
They say “We want end to end date Lineage”
You deliver…Here you go…
They say “That is too complex!”
You ask ‘What do you really want?’
18. We want to know the rules
To calculate a study load
(EFTSL) for a single subject,
divide the number of credit
points for the subject by 120.
One EFTSL is
equivalent to 100 credit
points and represents a
standard annual full
time load.
The EFTSL of any course can
be determined by dividing its
allocated credit points by 96.
For example, a 12 credit point
course has an EFTSL of 0.125
(12/96 = 0.125).
EFTSL = Macquarie full-time load for a Bachelor degree
is 68 credit points over 3 years (equivalent of 22.667
per year). To calculate your EFTSL divide the unit value
of the unit(s) by 22.667, eg 3 units = 3/22.667 EFTSL =
0.1324 EFTSL.
EFTSL Equivalent Full Time Study Load
Study Load (EFTSL) is a measurement based on a normal full time study
load for a year.
At USC 8 courses
undertaken per year
is equivalent to one
(1) EFTSL.
19. Agile Governance
The Big Data Approach is changing the way we govern data – making it
higher risk
TRADITIONAL APPROACH BIG DATA APPROACH
Govern data to the highest standard.
Store it, then use it for multiple purposes
Understand data and usage. Govern to
the appropriate level. Use it, and iterate
RepositoryGovern to
Perfection
UseData
Data Explore /
Understand
Govern
Appropriately
Use
20. Finding Value in MDM
Start the MDM journey knowing what
you can get out of it
21. Maximize 1:1 consumer
relationships
Deliver personalised offers
aligned to unique behaviors,
needs and desires
Brand reputation
Right message every time in
market
Marketing productivity
Increased breadth of digital
channels, emphasis on cross-sell /
up-sell / right-sell opportunities,
understanding and embracing ROMI
Deliver value across all
touch points
Build opportunity for revenue
growth throughout marketing
value chain
360 Degree View of the Customer
Understanding, responding and maximizing each
unique customer relationship
Optimize marketing mix
Model and plan balancing needs of
channels, probability of ROI success and
resource constraints
Customer growth and retention
Demanding customers, commoditised
products and crowded competitive
marketplace
Define MDM Value
23. Increased engagement
Increased revenue
Decreased risk
Less ‘gut feel’
More data (when used effectively)
Increase on Churn retention rate
(no discounting required)
More newsletter article clicks
More articles read per session
Lookalike acquisition model
increasing conversion
Strong Ad revenue growth 20%
10%
Linkage: audience connections
Any hard links across accounts, Consumer & Household, Fuzzy matching, Enrichment (Single Customer View)
News Corp Example
Presentation to IBM SolutionConnect Event Sydney 2014
24. Household relationships
› Inspect potential household members
and link to confirm relationships.
Employment Relationships
› Inspect relationships between
companies and staff.
Using MDM Relationship Inspector
Joseph’s
Household
Wife of
Daughter
of
Son
of
Is the Subsidiary of
Supplies
Product
to
Is Married to
Is the
Owner
of
Has an
Account
with
Is Employed by
26. Consuming Applications
Australia NZ China IndiaPortal
Kate Lamb
32 George Street
Perth, 6000
Kate Jones
Perth, WA 6000
12/06/1970
Catherine Jones
44 Station Street
Perth, WA
Mrs K Lamb
32 St. George
06/12/1970
Dr Katherine Lamb
23 George St
Perth, 6000
06/12/1970
Miss C Jones
Station Street, Perth
Western Australia, 6000
12/06/1970
Person Entity
Dr. Katherine Lamb
Composite View
Dr Katherine Lamb
32 George St, Perth, WA 6000
DOB: 12/06/1970
ANZ Bank › Trying to match customer
records across 40 core
banking systems and 32
countries.
27.
28.
29.
30. 360 Degree View
› The 360 degree
view portal view
of a customer
as an MDM
deliverable
31. MDM Success as shown by ANZ bank
$50 million to
synchronise master data
across all core banking
applications
$5 million to create a
golden customer record
2 Data Stewards to
review candidate
matches and submit
data quality fixes
MDM registry
management that is
constantly improved
using Steward feedback.
32. MDM Stewardship made easy
› The Steward can review what the merged/collapsed customer records will look
like. This is still a “virtual record” and rules can be tweaked and fine tuned.
33. The Benefits of Customer Matching
Media Organisation
› Matched 16.4m customer records
› Found 2.7m duplicates
› Found 8m potential household
relationships
Financial Services 2 Day PoC
› Just under 200K customer records
› Legacy system matched 561 records
› MDM PoC matched 3318 automatically
› A further 5840 potential duplicates
found
34. Critical Success Factors for MDM
› Start with a 360 Degree View use case as this can use a “Best Guess”
customer registry.
› Get in place a platform of stewardship and quality improvement around the
initial registry.
› Move to more complex uses cases such as MDM applications and MDM
synchronisation on top of this foundation.
36. Finding Data Quality Problems is now Easy
A data quality assessment identifies problems before the design and
build phase
Low Dates
19/10/1918
High Dates
31/12/9999
Missing
Dates
Columns
without nulls
Columns we
can ignore
Blank
Values
37. Cross System Assessment Example
Making Cross System profiling easier:
›Distributed heterogeneous sources
›Handle situations where there is no documentation on data
structures
›Gain a rapid understanding of data relationships
›Create data quality metrics from profiling
›Detect confidential data elements
Cost Prohibitive Alternative Solutions:
›Manual spot checking of data
›Hand coding
?
??
?
?
?
?
?
?
??
?
?
?
?
?
?
?
??
?
?
?
?
??
?
?
?
?
How do you understand enterprise data relationships?
38. Data Quality Example
What happens when identify data quality rules is an IT lead process:
Table
Data Steward
Source
Table Name
Source
Column Name
Error
Text
Error
Condition
Number
Risk Data Coordinator Dim_Facility AccountBaseNumber has length outside acceptable range 20105701
Risk Data Coordinator Dim_Facility AccountBaseNumber is null 20105702
Risk Data Coordinator Dim_Facility AccountName is null 20105801
Risk Data Coordinator Dim_Facility AccountNumber has length outside acceptable range 20105601
Risk Data Coordinator Dim_Facility AccountNumber is null 20105602
Risk Data Coordinator Dim_Facility AccountOpenDate is in future 20106301
Risk Data Coordinator Dim_Facility ApplicationScore has value = 0 20107801
39. REQUESTED_
FLD
The REQUESTED_FLD column is for past, current and
future requests for grant money. The length frequencies
reveal some very large requests - a 12 digit request for
2014 and five records with an 11 digit request.
Medium Futher investigation is required to
determine whether these are valid
values. Due to the large requests, it
appears summarised data may be
incorrectly included in the dashboard,
which would be performing its own
aggregation and totalling.
RDO_REF RDO_REF – has three different versions of an empty
field. It has 145 values set to “#N/A” and 39 set to “NA”
and 676 set to <null>.
High It is not desirable to have three different
versions of “non applicable” turning up in
dashboard reporting so either the source
needs to be cleaned up to be consistent
or an ETL data load rule is needed to
convert all three to the same value of
“N/A” – “Non Applicable”.
RDO_REF There are two main patterns of data for values in the
RDO_REF column and this usually indicates different
rules at different times. There are 6557 values set to
the format of ANNNNNNN such as R0015838 and there
are 1178 values in the format of NNNN such as 1279.
Medium This mixture of alpha numeric codes and
numeric codes may not belong together
in Dashboard reporting.
Defining the Business Impact is Important
40. Attaching a cost to a DQ Rule
BirthDate is null or zero
BirthDate age is out of bounds
If this rule is
important then what
is the business
impact of it failing?
Whey should
managers and
stewards care?
42. Putting Data Quality into business terms
Defining the Impact
Vendor item code data was provided
in all data files. Results showed a
minimum match of 28.6% and maximum
match of 100%. Net content and unit
of measure data was provided in all
files. Matching varied from 0% to 99.6%
for the two fields.
Varying vendor item code formats
and special characters such as dots
and dashes are found to be used
frequently but are often not
supported by healthcare IT systems
nor used in supplier systems.
44. Stewardship Business Process Example
Detect
DQ
Exception
Steward
Opens
Exception
Steward
Repairs
Data
Data
Quality
Change
Request
submitted
Data
Quality
Change
Approved
Support
fix data
quality
problem
in source
The Stewardship Center is where a team of stewards log in and review the data
that failed data quality checks. It manages a team of stewards, subject matter
experts and support staff so they can investigate and fix problems.
45. Manage stewards: View and collaborate on MDM and DQF data
quality problems in the Stewardship Center
46. A steward can
accept or reject a
data change
A fix can be
applied
automatically or
manually
Data work flow: Set up custom stewardship workflows
47. Let Stewards Multi Task
DW Load Exceptions
MDM Duplicate Candidates
Reference Data Checks
48. Data Quality Success Factors
› Focus on data quality issues with a real impact.
› Make it easy to collect data quality metrics.
› Make it easy to be a steward across different facets of data quality.
› Put in a combination of people, processes and tools that lets you tackle data
quality in a consistent way.
› Make your stewards more useful.
› Make your non-stewards better stewards.
49. FRESH IDEAS…
TO YOUR BUSINESS WITH… TO YOUR CUSTOMERS WITH…TO EXTERNAL TOUCH POINTS
LICENSING IMPLEMENTATION TRAINING APPLICATIONS ANALYTICSINFRASTRUCTUREDATA ASSETSWEB
SOFTWARE
COMPONENTS
TECHNOLOGY
DISCIPLINES &
SPECIALTIES
CRITICAL SYSTEMS &
RESOURCES
TRANSFORM YOUR
BUSINESS THROUGH
TECHNOLOGY
CONNECT
REQUIREMENTS
TO KPIs
DESIGN SMARTER
SOLUTIONS
Editor's Notes
A discovery tool at the table and column level shows missing values and out of range values.