2. Do You Know What's in the
Data You're Consuming?
Ken O’Connor – Kenoconnordata.com
6th Nov 2012
Copyright Kenoconnordata.com 2012
3. As food consumers, we are provided with facts
about the food we’re consuming – it’s the law
Ingredients – the basic facts
Allergy Information –
Can mean life or death to some
Nutrition Information
Enables us to make “informed
choices” about the food we buy
We don’t all use the food facts given to us –
Those who choose/need to control their
diet are in a position to do so
Copyright Kenoconnordata.com 2012
4. We know where food such as beef comes from…
Traceability –
Hugely important to restore
confidence in beef following the
Mad Cow disease (BSE) crisis
Copyright Kenoconnordata.com 2012
5. We know that our food has not been tampered with, since it
left its “trusted source”
Tamperproof lids and seals –
Introduced following Tylenol
poisonings killed 7 people in
Chicago in 1982
Best Before / Use by date
Copyright Kenoconnordata.com 2012
6. What do you know about the data you depend on?
• Data consumers are seldom provided with facts about
the data feeding their critical business processes
• Most data consumers assume the data input to their
business processes is “right”, or “OK”.
• They often assume it is the job of the IT function to
ensure the data is “right”.
• Almost all data consumers are also data producers –
unaware of their role in the data supply chain
Copyright Kenoconnordata.com 2012
7. In order to trust data; in order to confidently base
business decisions on data, I believe…
As data consumers, you and I have the right to expect facts
about the data provided to us. We should:
• Know what’s in the data we’re consuming
• Know where it comes from
• Know the quality controls applied to it
Copyright Kenoconnordata.com 2012
8. What basic facts do you need to know about the data you
consume?
Let’s look at a profile of “Customer Date of birth” as an example…
Do you spot anything unusual about these dates of birth?
Data Content Facts
Total number of customer records: 2,500,000
Could Marketing use
Data field name: Date of Birth this data to target 20
Age ranges - based on date of birth to 59 year olds?
Age Range Count Percentage
0-19 200,000 8.00%
20-59 1,800,000 72.00% Could this data be
60-99 310,000 12.40% used to calculate
100-119 44,000 1.76%
pension annuities?
120-169 20,500 0.82%
170+ 500 0.02%
No Date of birth 125,000 5.00%
Total 2,500,000 100.00%
Data that may be fit for one purpose may not be fit for a different purpose
Armed with basic facts – the data consumer can make an informed choice
Copyright Kenoconnordata.com 2012
9. Data profiling helps – but does it provide the facts we need?
Data Content Facts
Total number of customer records: 2,500,000
Data field name: Date of Birth
Age ranges - based on date of birth
Age Range Count Percentage
0-19 200,000 8.00%
20-59 1,800,000 72.00%
60-99 310,000 12.40%
100-119 44,000 1.76%
120-169 20,500 0.82%
170+ 500 0.02%
No Date of birth 125,000 5.00%
Total 2,500,000 100.00%
1. Accuracy? No – we cannot tell if the dates of birth are accurate
2. Completeness? Yes – 95% complete
3. Validity? Perhaps valid dates – but could a customer be 170+
4. Timeliness? No – No indication of the currency of the data
5. Consistency? No – No indication
Copyright Kenoconnordata.com 2012
10. Data content facts add a “smell” to data defects
One thing worse than a square peg
not fitting in a round hole… a square
peg that does fit in a round hole…
It’s not “fit for purpose” –
Data
Defect
Data defects are not like s/w bugs –
they seldom cause a system to fail.
Data defects are more like natural gas:
• Colourless
• Odourless
• Potentially deadly
Copyright Kenoconnordata.com 2012
11. Where does your data come from?
Nicola Askham wrote an excellent blog post recently about
“The data faeries” – does this sound familiar to you?
• Team A: Our data is loaded up by IT
• IT: No we don't touch that data, it's a manual data load
by Team B
• Team B: We just send the spreadsheet to Team A -
we're sure that they load the data
• Team A: No we really don't load up that data…
• Most people don’t know where their data comes from
• They assume it is always there, and is “OK”
• Too few are aware of their role in the data supply chain
Copyright Kenoconnordata.com 2012
12. The FSA expects you to know where your data
comes from… “data provenance”
http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf
Copyright Kenoconnordata.com 2012
13. The FSA expects you to understand how your data
is transformed…
But don’t sweat the small stuff – the FSA advice is to focus on your most
critical data
http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf
Copyright Kenoconnordata.com 2012
14. Where does your data come from? Data Provenance /
Traceability / Lineage – The “bucket brigade” model
Imagine if someone in the “bucket brigade” chain
• Thought the water was for him and drank it
• Used the water on his garden
• Turned off the tap
• Started the fire deliberately…
• Useless if bucket is empty when it reaches the fire
Copyright Kenoconnordata.com 2012
15. Turn your data supply chain into a “bucket brigade”
Everyone must understand:
• Why the data is ultimately required
• The importance of their role & their dependence on others
• Where they get their data from and who they provide it to
• What the data should contain and what it does contain
• If the data is not right – they should raise a data defect !
Copyright Kenoconnordata.com 2012
16. Where to start…Learn from Chilean mine rescue…
Trace a single critical data element end to end through your
data supply chain – this will highlight challenges to overcome
• How do we assign data ownership?
• How do we agree data definitions?
• How do we specify business rules?
• How do we measure data quality?
• How do we govern the above?
Copyright Kenoconnordata.com 2012
17. You know what’s in the data and where it comes
from… now what do you do with it?
Apply appropriate controls to your spreadsheets
http://www.clusterseven.com/external-research/2010/7/20/spreadsheets-and-solvency-ii-financial-services-authority-uk.html
Copyright Kenoconnordata.com 2012
18. All industries have critical data…
• Health
• Pharmaceutical
• Banking
• Insurance
• Aviation
• …
Data consumers in all industries need to know:
• What’s in the data they’re consuming
• Where it comes from
• What quality controls have been applied to it
Copyright Kenoconnordata.com 2012
19. The food industry reacted to crises…
• Tylenol poisonings
• Mad Cow disease (BSE) crisis
Regulators are reacting to the 2008 financial crisis…
They increasingly expect evidence that:
- You can trust your data
- They can trust your data
Solvency II, Basel III, Dodd Frank, UCITs, MiFID II, CRD IV...
A perfect storm - a Frankenstorm of regulation…
- all expecting evidence of data provenance
- all expecting evidence of DQ management process
Copyright Kenoconnordata.com 2012
20. JFK quoted George Bernard Shaw…
“Other people, he said "see things and . . . say 'Why?' . . .
But I dream things that never were-- and I say: 'Why not?'"
I dream of a time…
- When all critical data is accompanied by facts about
that data (Data Quality Information / provenance).
- When we will look back on the days when data
consumers had few facts about the data they were
consuming – and regulators tolerated it.
and I say: 'Why not now?'
Visit www.clearinformation.org - a good place to start
John F Kennedy – Address before Irish parliament June 28 th 1963
http://www.jfklibrary.org/Research/Ready-Reference/JFK-Speeches/Address-Before-the-Irish-Parliament-June-28-1963.aspx
Copyright Kenoconnordata.com 2012
21. Your new approach to data…
When you return to your office, I would like you to start
asserting your rights
• Ask for facts about the data provided to you
• Provide facts about the data you provide to others
The norm must become “Here is the data, and here are the
facts (Data Quality Information / provenance) about the
data”
Copyright Kenoconnordata.com 2012
22. Ken O’Connor
Email: Ken@Kenoconnordata.com
Twitter: Kenoconnordata
Linkedin: ie.linkedin.com/in/kenoconnor00
Ken O'Connor is an independent data consultant with over 30 years of
hands on experience in the field. Ken specialises in helping organisations
meet the data quality management challenges presented by data intensive
programmes such as data conversions, data migrations, data population and
regulatory programmes such as Solvency II, Basel II / III, Single Customer
View and Anti Money Laundering. Ken provides practical data quality and
data governance advice at his popular blog: http://kenoconnordata.com
Copyright Kenoconnordata.com 2012