SlideShare ist ein Scribd-Unternehmen logo
1 von 22
Downloaden Sie, um offline zu lesen
Copyright Kenoconnordata.com 2012
Do You Know What's in the
 Data You're Consuming?


     Ken O’Connor – Kenoconnordata.com
               6th Nov 2012




         Copyright Kenoconnordata.com 2012
As food consumers, we are provided with facts
about the food we’re consuming – it’s the law
                                  Ingredients – the basic facts



                                  Allergy Information –
                                  Can mean life or death to some



                                  Nutrition Information
                                  Enables us to make “informed
                                  choices” about the food we buy


                             We don’t all use the food facts given to us –
                             Those who choose/need to control their
                             diet are in a position to do so


               Copyright Kenoconnordata.com 2012
We know where food such as beef comes from…

Traceability –
Hugely important to restore
confidence in beef following the
Mad Cow disease (BSE) crisis




                           Copyright Kenoconnordata.com 2012
We know that our food has not been tampered with, since it
left its “trusted source”

                                              Tamperproof lids and seals –



                                             Introduced following Tylenol
                                             poisonings killed 7 people in
                                             Chicago in 1982




                                              Best Before / Use by date




                  Copyright Kenoconnordata.com 2012
What do you know about the data you depend on?

• Data consumers are seldom provided with facts about
  the data feeding their critical business processes

• Most data consumers assume the data input to their
  business processes is “right”, or “OK”.

• They often assume it is the job of the IT function to
  ensure the data is “right”.

• Almost all data consumers are also data producers –
  unaware of their role in the data supply chain



                   Copyright Kenoconnordata.com 2012
In order to trust data; in order to confidently base
business decisions on data, I believe…

As data consumers, you and I have the right to expect facts
  about the data provided to us. We should:

• Know what’s in the data we’re consuming

• Know where it comes from

• Know the quality controls applied to it




                   Copyright Kenoconnordata.com 2012
What basic facts do you need to know about the data you
consume?
Let’s look at a profile of “Customer Date of birth” as an example…

Do you spot anything unusual about these dates of birth?
           Data Content Facts
Total number of customer records:               2,500,000
                                                               Could Marketing use
Data field name:                              Date of Birth    this data to target 20
Age ranges - based on date of birth                            to 59 year olds?
Age Range                             Count       Percentage
0-19                                200,000            8.00%
20-59                             1,800,000           72.00%   Could this data be
60-99                               310,000           12.40%   used to calculate
100-119                              44,000            1.76%
                                                               pension annuities?
120-169                              20,500            0.82%
170+                                    500            0.02%
No Date of birth                    125,000            5.00%
Total                             2,500,000         100.00%

Data that may be fit for one purpose may not be fit for a different purpose
Armed with basic facts – the data consumer can make an informed choice
                          Copyright Kenoconnordata.com 2012
Data profiling helps – but does it provide the facts we need?
           Data Content Facts
Total number of customer records:               2,500,000
Data field name:                              Date of Birth
Age ranges - based on date of birth
Age Range                             Count       Percentage
0-19                                200,000            8.00%
20-59                             1,800,000           72.00%
60-99                               310,000           12.40%
100-119                              44,000            1.76%
120-169                              20,500            0.82%
170+                                    500            0.02%
No Date of birth                    125,000            5.00%
Total                             2,500,000         100.00%

1. Accuracy?                No – we cannot tell if the dates of birth are accurate
2. Completeness?            Yes – 95% complete
3. Validity?                Perhaps valid dates – but could a customer be 170+
4. Timeliness?              No – No indication of the currency of the data
5. Consistency?             No – No indication
                          Copyright Kenoconnordata.com 2012
Data content facts add a “smell” to data defects

One thing worse than a square peg
not fitting in a round hole… a square
peg that does fit in a round hole…
It’s not “fit for purpose” –
                                                           Data
                                                           Defect

Data defects are not like s/w bugs –
they seldom cause a system to fail.
Data defects are more like natural gas:
• Colourless
• Odourless
• Potentially deadly
                       Copyright Kenoconnordata.com 2012
Where does your data come from?
Nicola Askham wrote an excellent blog post recently about
  “The data faeries” – does this sound familiar to you?

• Team A: Our data is loaded up by IT
• IT: No we don't touch that data, it's a manual data load
  by Team B
• Team B: We just send the spreadsheet to Team A -
  we're sure that they load the data
• Team A: No we really don't load up that data…

• Most people don’t know where their data comes from
• They assume it is always there, and is “OK”
• Too few are aware of their role in the data supply chain
                   Copyright Kenoconnordata.com 2012
The FSA expects you to know where your data
    comes from… “data provenance”




http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf

                                              Copyright Kenoconnordata.com 2012
The FSA expects you to understand how your data
  is transformed…




  But don’t sweat the small stuff – the FSA advice is to focus on your most
  critical data
http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf

                                              Copyright Kenoconnordata.com 2012
Where does your data come from? Data Provenance /
Traceability / Lineage – The “bucket brigade” model




 Imagine if someone in the “bucket brigade” chain
 • Thought the water was for him and drank it
 • Used the water on his garden
 • Turned off the tap
 • Started the fire deliberately…
 • Useless if bucket is empty when it reaches the fire
                    Copyright Kenoconnordata.com 2012
Turn your data supply chain into a “bucket brigade”




Everyone must understand:
• Why the data is ultimately required
• The importance of their role & their dependence on others
• Where they get their data from and who they provide it to
• What the data should contain and what it does contain
• If the data is not right – they should raise a data defect !
                   Copyright Kenoconnordata.com 2012
Where to start…Learn from Chilean mine rescue…
Trace a single critical data element end to end through your
data supply chain – this will highlight challenges to overcome

                                  • How do we assign data ownership?
                                  • How do we agree data definitions?
                                  • How do we specify business rules?
                                  • How do we measure data quality?
                                  • How do we govern the above?




                    Copyright Kenoconnordata.com 2012
You know what’s in the data and where it comes
      from… now what do you do with it?




                                                            Apply appropriate controls to your spreadsheets



http://www.clusterseven.com/external-research/2010/7/20/spreadsheets-and-solvency-ii-financial-services-authority-uk.html
                                                  Copyright Kenoconnordata.com 2012
All industries have critical data…
•   Health
•   Pharmaceutical
•   Banking
•   Insurance
•   Aviation
•   …

Data consumers in all industries need to know:
• What’s in the data they’re consuming
• Where it comes from
• What quality controls have been applied to it


                     Copyright Kenoconnordata.com 2012
The food industry reacted to crises…
• Tylenol poisonings
• Mad Cow disease (BSE) crisis

Regulators are reacting to the 2008 financial crisis…
They increasingly expect evidence that:
   - You can trust your data
   - They can trust your data
Solvency II, Basel III, Dodd Frank, UCITs, MiFID II, CRD IV...
A perfect storm - a Frankenstorm of regulation…
- all expecting evidence of data provenance
- all expecting evidence of DQ management process


                   Copyright Kenoconnordata.com 2012
JFK quoted George Bernard Shaw…
   “Other people, he said "see things and . . . say 'Why?' . . .
   But I dream things that never were-- and I say: 'Why not?'"

 I dream of a time…
    - When all critical data is accompanied by facts about
    that data (Data Quality Information / provenance).
    - When we will look back on the days when data
    consumers had few facts about the data they were
    consuming – and regulators tolerated it.
 and I say: 'Why not now?'
 Visit www.clearinformation.org - a good place to start

John F Kennedy – Address before Irish parliament June 28 th 1963

http://www.jfklibrary.org/Research/Ready-Reference/JFK-Speeches/Address-Before-the-Irish-Parliament-June-28-1963.aspx
                                                      Copyright Kenoconnordata.com 2012
Your new approach to data…



When you return to your office, I would like you to start
  asserting your rights
• Ask for facts about the data provided to you
• Provide facts about the data you provide to others

The norm must become “Here is the data, and here are the
  facts (Data Quality Information / provenance) about the
  data”



                   Copyright Kenoconnordata.com 2012
Ken O’Connor


Email: Ken@Kenoconnordata.com
Twitter: Kenoconnordata
Linkedin: ie.linkedin.com/in/kenoconnor00

Ken O'Connor is an independent data consultant with over 30 years of
hands on experience in the field. Ken specialises in helping organisations
meet the data quality management challenges presented by data intensive
programmes such as data conversions, data migrations, data population and
regulatory programmes such as Solvency II, Basel II / III, Single Customer
View and Anti Money Laundering. Ken provides practical data quality and
data governance advice at his popular blog: http://kenoconnordata.com



                       Copyright Kenoconnordata.com 2012

Weitere ähnliche Inhalte

Ähnlich wie Do you know what's in the data you're consuming

Big Data's Big Paradox_Dr. Nita Rollins
Big Data's Big Paradox_Dr. Nita RollinsBig Data's Big Paradox_Dr. Nita Rollins
Big Data's Big Paradox_Dr. Nita RollinsNita Rollins, Ph.D.
 
Big Data Issues Today
Big Data Issues TodayBig Data Issues Today
Big Data Issues TodaySteve Cotton
 
SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...
SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...
SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...Shawn Tuma
 
Data theft in india (K K Mookhey)
Data theft in india (K K Mookhey)Data theft in india (K K Mookhey)
Data theft in india (K K Mookhey)ClubHack
 
6th November 2008 Final
6th November 2008 Final6th November 2008 Final
6th November 2008 FinalMarcusBrook
 
Healing healthcare security
Healing healthcare securityHealing healthcare security
Healing healthcare securityBarry Caplin
 
ISACA NA CACS 2012 Orlando session 414 Ulf Mattsson
ISACA NA CACS 2012 Orlando session 414 Ulf MattssonISACA NA CACS 2012 Orlando session 414 Ulf Mattsson
ISACA NA CACS 2012 Orlando session 414 Ulf MattssonUlf Mattsson
 
Rp data breach-investigation-report-2015-en_xg
Rp data breach-investigation-report-2015-en_xgRp data breach-investigation-report-2015-en_xg
Rp data breach-investigation-report-2015-en_xgLiberteks
 
Mobile Device Tracking Seminar
Mobile Device Tracking SeminarMobile Device Tracking Seminar
Mobile Device Tracking SeminarBrian Ahier
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Daniel Austin
 
WI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptx
WI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptxWI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptx
WI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptxIdentitiLab
 
How to Communicate the Actual Readiness of your IT Security Program for PCI 3...
How to Communicate the Actual Readiness of your IT Security Program for PCI 3...How to Communicate the Actual Readiness of your IT Security Program for PCI 3...
How to Communicate the Actual Readiness of your IT Security Program for PCI 3...RedZone Technologies
 
rp_data-breach-investigation-report-2015-insider_en_xg
rp_data-breach-investigation-report-2015-insider_en_xgrp_data-breach-investigation-report-2015-insider_en_xg
rp_data-breach-investigation-report-2015-insider_en_xgKevin Long
 
Fact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptxFact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptxDoug Hall
 
Running with Scissors: Balance between business and InfoSec needs
Running with Scissors: Balance between business and InfoSec needsRunning with Scissors: Balance between business and InfoSec needs
Running with Scissors: Balance between business and InfoSec needsMichael Scheidell
 

Ähnlich wie Do you know what's in the data you're consuming (20)

Big Data's Big Paradox_Dr. Nita Rollins
Big Data's Big Paradox_Dr. Nita RollinsBig Data's Big Paradox_Dr. Nita Rollins
Big Data's Big Paradox_Dr. Nita Rollins
 
Big Data Issues Today
Big Data Issues TodayBig Data Issues Today
Big Data Issues Today
 
SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...
SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...
SecureWorld Expo Dallas - Cybersecurity Law: What Business and IT Leaders Nee...
 
Data theft in india (K K Mookhey)
Data theft in india (K K Mookhey)Data theft in india (K K Mookhey)
Data theft in india (K K Mookhey)
 
A data powered future
A data powered futureA data powered future
A data powered future
 
6th November 2008 Final
6th November 2008 Final6th November 2008 Final
6th November 2008 Final
 
Healing healthcare security
Healing healthcare securityHealing healthcare security
Healing healthcare security
 
ISACA NA CACS 2012 Orlando session 414 Ulf Mattsson
ISACA NA CACS 2012 Orlando session 414 Ulf MattssonISACA NA CACS 2012 Orlando session 414 Ulf Mattsson
ISACA NA CACS 2012 Orlando session 414 Ulf Mattsson
 
Verizon DBIR-2015
Verizon DBIR-2015Verizon DBIR-2015
Verizon DBIR-2015
 
Rp data breach-investigation-report-2015-en_xg
Rp data breach-investigation-report-2015-en_xgRp data breach-investigation-report-2015-en_xg
Rp data breach-investigation-report-2015-en_xg
 
Mobile Device Tracking Seminar
Mobile Device Tracking SeminarMobile Device Tracking Seminar
Mobile Device Tracking Seminar
 
Big Data and the Future of Money 2014
Big Data and the Future of Money 2014Big Data and the Future of Money 2014
Big Data and the Future of Money 2014
 
WI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptx
WI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptxWI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptx
WI - IDENTITY CHALLENGE - Admonsters Publisher Forum 2023.pptx
 
How to Communicate the Actual Readiness of your IT Security Program for PCI 3...
How to Communicate the Actual Readiness of your IT Security Program for PCI 3...How to Communicate the Actual Readiness of your IT Security Program for PCI 3...
How to Communicate the Actual Readiness of your IT Security Program for PCI 3...
 
rp_data-breach-investigation-report-2015-insider_en_xg
rp_data-breach-investigation-report-2015-insider_en_xgrp_data-breach-investigation-report-2015-insider_en_xg
rp_data-breach-investigation-report-2015-insider_en_xg
 
Fact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptxFact Check Your Data - Data.Monks.pptx
Fact Check Your Data - Data.Monks.pptx
 
Running with Scissors: Balance between business and InfoSec needs
Running with Scissors: Balance between business and InfoSec needsRunning with Scissors: Balance between business and InfoSec needs
Running with Scissors: Balance between business and InfoSec needs
 
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdfSFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
SFScon 22 - Paolo Pinto - Real Life Data Anonymization.pdf
 
2018-11-15 IT Assessment
2018-11-15 IT Assessment2018-11-15 IT Assessment
2018-11-15 IT Assessment
 
Ritz 4th-july-gdpr
Ritz 4th-july-gdprRitz 4th-july-gdpr
Ritz 4th-july-gdpr
 

Do you know what's in the data you're consuming

  • 2. Do You Know What's in the Data You're Consuming? Ken O’Connor – Kenoconnordata.com 6th Nov 2012 Copyright Kenoconnordata.com 2012
  • 3. As food consumers, we are provided with facts about the food we’re consuming – it’s the law Ingredients – the basic facts Allergy Information – Can mean life or death to some Nutrition Information Enables us to make “informed choices” about the food we buy We don’t all use the food facts given to us – Those who choose/need to control their diet are in a position to do so Copyright Kenoconnordata.com 2012
  • 4. We know where food such as beef comes from… Traceability – Hugely important to restore confidence in beef following the Mad Cow disease (BSE) crisis Copyright Kenoconnordata.com 2012
  • 5. We know that our food has not been tampered with, since it left its “trusted source” Tamperproof lids and seals – Introduced following Tylenol poisonings killed 7 people in Chicago in 1982 Best Before / Use by date Copyright Kenoconnordata.com 2012
  • 6. What do you know about the data you depend on? • Data consumers are seldom provided with facts about the data feeding their critical business processes • Most data consumers assume the data input to their business processes is “right”, or “OK”. • They often assume it is the job of the IT function to ensure the data is “right”. • Almost all data consumers are also data producers – unaware of their role in the data supply chain Copyright Kenoconnordata.com 2012
  • 7. In order to trust data; in order to confidently base business decisions on data, I believe… As data consumers, you and I have the right to expect facts about the data provided to us. We should: • Know what’s in the data we’re consuming • Know where it comes from • Know the quality controls applied to it Copyright Kenoconnordata.com 2012
  • 8. What basic facts do you need to know about the data you consume? Let’s look at a profile of “Customer Date of birth” as an example… Do you spot anything unusual about these dates of birth? Data Content Facts Total number of customer records: 2,500,000 Could Marketing use Data field name: Date of Birth this data to target 20 Age ranges - based on date of birth to 59 year olds? Age Range Count Percentage 0-19 200,000 8.00% 20-59 1,800,000 72.00% Could this data be 60-99 310,000 12.40% used to calculate 100-119 44,000 1.76% pension annuities? 120-169 20,500 0.82% 170+ 500 0.02% No Date of birth 125,000 5.00% Total 2,500,000 100.00% Data that may be fit for one purpose may not be fit for a different purpose Armed with basic facts – the data consumer can make an informed choice Copyright Kenoconnordata.com 2012
  • 9. Data profiling helps – but does it provide the facts we need? Data Content Facts Total number of customer records: 2,500,000 Data field name: Date of Birth Age ranges - based on date of birth Age Range Count Percentage 0-19 200,000 8.00% 20-59 1,800,000 72.00% 60-99 310,000 12.40% 100-119 44,000 1.76% 120-169 20,500 0.82% 170+ 500 0.02% No Date of birth 125,000 5.00% Total 2,500,000 100.00% 1. Accuracy? No – we cannot tell if the dates of birth are accurate 2. Completeness? Yes – 95% complete 3. Validity? Perhaps valid dates – but could a customer be 170+ 4. Timeliness? No – No indication of the currency of the data 5. Consistency? No – No indication Copyright Kenoconnordata.com 2012
  • 10. Data content facts add a “smell” to data defects One thing worse than a square peg not fitting in a round hole… a square peg that does fit in a round hole… It’s not “fit for purpose” – Data Defect Data defects are not like s/w bugs – they seldom cause a system to fail. Data defects are more like natural gas: • Colourless • Odourless • Potentially deadly Copyright Kenoconnordata.com 2012
  • 11. Where does your data come from? Nicola Askham wrote an excellent blog post recently about “The data faeries” – does this sound familiar to you? • Team A: Our data is loaded up by IT • IT: No we don't touch that data, it's a manual data load by Team B • Team B: We just send the spreadsheet to Team A - we're sure that they load the data • Team A: No we really don't load up that data… • Most people don’t know where their data comes from • They assume it is always there, and is “OK” • Too few are aware of their role in the data supply chain Copyright Kenoconnordata.com 2012
  • 12. The FSA expects you to know where your data comes from… “data provenance” http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf Copyright Kenoconnordata.com 2012
  • 13. The FSA expects you to understand how your data is transformed… But don’t sweat the small stuff – the FSA advice is to focus on your most critical data http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf Copyright Kenoconnordata.com 2012
  • 14. Where does your data come from? Data Provenance / Traceability / Lineage – The “bucket brigade” model Imagine if someone in the “bucket brigade” chain • Thought the water was for him and drank it • Used the water on his garden • Turned off the tap • Started the fire deliberately… • Useless if bucket is empty when it reaches the fire Copyright Kenoconnordata.com 2012
  • 15. Turn your data supply chain into a “bucket brigade” Everyone must understand: • Why the data is ultimately required • The importance of their role & their dependence on others • Where they get their data from and who they provide it to • What the data should contain and what it does contain • If the data is not right – they should raise a data defect ! Copyright Kenoconnordata.com 2012
  • 16. Where to start…Learn from Chilean mine rescue… Trace a single critical data element end to end through your data supply chain – this will highlight challenges to overcome • How do we assign data ownership? • How do we agree data definitions? • How do we specify business rules? • How do we measure data quality? • How do we govern the above? Copyright Kenoconnordata.com 2012
  • 17. You know what’s in the data and where it comes from… now what do you do with it? Apply appropriate controls to your spreadsheets http://www.clusterseven.com/external-research/2010/7/20/spreadsheets-and-solvency-ii-financial-services-authority-uk.html Copyright Kenoconnordata.com 2012
  • 18. All industries have critical data… • Health • Pharmaceutical • Banking • Insurance • Aviation • … Data consumers in all industries need to know: • What’s in the data they’re consuming • Where it comes from • What quality controls have been applied to it Copyright Kenoconnordata.com 2012
  • 19. The food industry reacted to crises… • Tylenol poisonings • Mad Cow disease (BSE) crisis Regulators are reacting to the 2008 financial crisis… They increasingly expect evidence that: - You can trust your data - They can trust your data Solvency II, Basel III, Dodd Frank, UCITs, MiFID II, CRD IV... A perfect storm - a Frankenstorm of regulation… - all expecting evidence of data provenance - all expecting evidence of DQ management process Copyright Kenoconnordata.com 2012
  • 20. JFK quoted George Bernard Shaw… “Other people, he said "see things and . . . say 'Why?' . . . But I dream things that never were-- and I say: 'Why not?'" I dream of a time… - When all critical data is accompanied by facts about that data (Data Quality Information / provenance). - When we will look back on the days when data consumers had few facts about the data they were consuming – and regulators tolerated it. and I say: 'Why not now?' Visit www.clearinformation.org - a good place to start John F Kennedy – Address before Irish parliament June 28 th 1963 http://www.jfklibrary.org/Research/Ready-Reference/JFK-Speeches/Address-Before-the-Irish-Parliament-June-28-1963.aspx Copyright Kenoconnordata.com 2012
  • 21. Your new approach to data… When you return to your office, I would like you to start asserting your rights • Ask for facts about the data provided to you • Provide facts about the data you provide to others The norm must become “Here is the data, and here are the facts (Data Quality Information / provenance) about the data” Copyright Kenoconnordata.com 2012
  • 22. Ken O’Connor Email: Ken@Kenoconnordata.com Twitter: Kenoconnordata Linkedin: ie.linkedin.com/in/kenoconnor00 Ken O'Connor is an independent data consultant with over 30 years of hands on experience in the field. Ken specialises in helping organisations meet the data quality management challenges presented by data intensive programmes such as data conversions, data migrations, data population and regulatory programmes such as Solvency II, Basel II / III, Single Customer View and Anti Money Laundering. Ken provides practical data quality and data governance advice at his popular blog: http://kenoconnordata.com Copyright Kenoconnordata.com 2012