SlideShare a Scribd company logo
1 of 18
Download to read offline
Data Quality
Perception
data brewery
Dallas Data Brewery, June 2013
Topic
■ What is "high quality data"?
■ What are data quality expectations?
you, people or businesses you know have
■ Business issues and data quality
How to you deal with it?
■ What happens when you ignore it?
What is data quality ?
Dimensions
■ completeness – data provided
■ accuracy – reflecting real world
■ credibility – regarded as true
■ timeliness – up-to-date
■ consistency – matching facts across datasets
■ integrity – valid references between datasets
... and there are more
Fallacies
■ “good data are error-free and valid”
■ “improving quality means cleansing”
■ “it is IT problem”
■ “it can be fixed”
Short Story:
Completeness
Open Public Procurements
from this...
... to this:
http://tendre.sme.sk
0%
25%
50%
75%
100%
2005-3
2005-5
2005-7
2005-92005-112006-1
2006-3
2006-5
2006-7
2006-92006-112007-1
2007-3
2007-5
2007-7
2007-92007-112008-1
2008-3
2008-5
2008-7
2008-92008-112009-1
2009-3
2009-5
2009-7
2009-92009-112010-1
2010-3
2010-5
2010-7
2010-9
better
have it all
none
Quality measure
completeness: 55%
how many % of the field is filled and
successfully processed?
type 1 type 2
+
how many % of the field is filled
and successfully processed?
0%
25%
50%
75%
100%
2005-3
2005-5
2005-7
2005-92005-102005-122006-3
2006-5
2006-7
2006-92006-112007-1
2007-3
2007-5
2007-7
2007-92007-102007-122008-3
2008-5
2008-7
2008-92008-112009-1
2009-3
2009-5
2009-7
2009-92009-112010-1
2010-3
2010-5
2010-7
2010-9
Quality measure
completeness: 88%
better
have it all
none
What does that mean:
“high quality data?”
?
85% ?
Conclusion
appropriate for given
purpose
Data Project
■ define data quality requirements
■ measure during development
■ provide data quality report
More topics
■ Data quality measurement
indicators, probes
■ Data quality management
roles, processes, impact
■ Data cleansing
Thank You
stefan@freshdata.sk ■ Stiivi
www.meetup.com/dallas-data-brewery

More Related Content

What's hot

What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...
ProductCamp Boston
 
Predictive analytics ouedi final version 11 3-13 2
Predictive analytics ouedi final version   11 3-13 2Predictive analytics ouedi final version   11 3-13 2
Predictive analytics ouedi final version 11 3-13 2
Dean Whittaker
 
Big data
Big dataBig data
Big data
Claire Choong
 

What's hot (19)

Big data overview
Big data overviewBig data overview
Big data overview
 
How to start thinking like a data scientist
How to start thinking like a data scientistHow to start thinking like a data scientist
How to start thinking like a data scientist
 
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
Big Data Day LA 2016/ Data Science Track - The Right Tool for the Job: Guidel...
 
Is big data just a buzzword -Big data simply explained
Is big data just a buzzword -Big data simply explainedIs big data just a buzzword -Big data simply explained
Is big data just a buzzword -Big data simply explained
 
Data mining: How it can Help Boost Effectiveness
Data mining: How it can Help Boost EffectivenessData mining: How it can Help Boost Effectiveness
Data mining: How it can Help Boost Effectiveness
 
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinanceBig Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
Big Data Day LA 2015 - Data Science ≠ Big Data by Jim McGuire of ZestFinance
 
Big data week London 2014 - Affectv
Big data week London 2014 - AffectvBig data week London 2014 - Affectv
Big data week London 2014 - Affectv
 
What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...What every product manager needs to know about data science (ProductCamp Bost...
What every product manager needs to know about data science (ProductCamp Bost...
 
Why you’re doing 
Big Data wrong
Why you’re doing 
Big Data wrongWhy you’re doing 
Big Data wrong
Why you’re doing 
Big Data wrong
 
How To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise DataHow To Make The Most Out of Enterprise Data
How To Make The Most Out of Enterprise Data
 
Creating an EDGE - Enterprise Data Governance Experience
Creating an EDGE - Enterprise Data Governance ExperienceCreating an EDGE - Enterprise Data Governance Experience
Creating an EDGE - Enterprise Data Governance Experience
 
Data science intro deck
Data science intro deckData science intro deck
Data science intro deck
 
What is Big Data? - Business Plans
What is Big Data? - Business PlansWhat is Big Data? - Business Plans
What is Big Data? - Business Plans
 
Data Science towards the Digital Enterprise
Data Science towards the Digital EnterpriseData Science towards the Digital Enterprise
Data Science towards the Digital Enterprise
 
Data Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st CenturyData Scientist: The Sexiest Job in the 21st Century
Data Scientist: The Sexiest Job in the 21st Century
 
Moral Responsibility of Data Professionals - Whistleblowing
Moral Responsibility of Data Professionals - WhistleblowingMoral Responsibility of Data Professionals - Whistleblowing
Moral Responsibility of Data Professionals - Whistleblowing
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
 
Predictive analytics ouedi final version 11 3-13 2
Predictive analytics ouedi final version   11 3-13 2Predictive analytics ouedi final version   11 3-13 2
Predictive analytics ouedi final version 11 3-13 2
 
Big data
Big dataBig data
Big data
 

Viewers also liked

Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management Introduction
Stefan Urbanek
 
Cubes – ways of deployment
Cubes – ways of deploymentCubes – ways of deployment
Cubes – ways of deployment
Stefan Urbanek
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizations
Stefan Urbanek
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presence
Stefan Urbanek
 

Viewers also liked (20)

Open spending as-is 2011-06
Open spending   as-is 2011-06Open spending   as-is 2011-06
Open spending as-is 2011-06
 
Dallas Data Brewery - introduction
Dallas Data Brewery - introductionDallas Data Brewery - introduction
Dallas Data Brewery - introduction
 
Олег Лавров. Личные, командные и организационные стратегии.
Олег Лавров. Личные, командные и организационные стратегии. Олег Лавров. Личные, командные и организационные стратегии.
Олег Лавров. Личные, командные и организационные стратегии.
 
Knowledge Management Introduction
Knowledge Management IntroductionKnowledge Management Introduction
Knowledge Management Introduction
 
Cubes 1.0 Overview
Cubes 1.0 OverviewCubes 1.0 Overview
Cubes 1.0 Overview
 
Cubes – ways of deployment
Cubes – ways of deploymentCubes – ways of deployment
Cubes – ways of deployment
 
Open Data Decentralisation
Open Data DecentralisationOpen Data Decentralisation
Open Data Decentralisation
 
Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)Python business intelligence (PyData 2012 talk)
Python business intelligence (PyData 2012 talk)
 
Cubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP FrameworkCubes - Lightweight OLAP Framework
Cubes - Lightweight OLAP Framework
 
Cubes – pluggable model explained
Cubes – pluggable model explainedCubes – pluggable model explained
Cubes – pluggable model explained
 
New york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introductionNew york data brewery meetup #1 – introduction
New york data brewery meetup #1 – introduction
 
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
Cubes - Lightweight Python OLAP (EuroPython 2012 talk)
 
Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)Data Cleansing introduction (for BigClean Prague 2011)
Data Cleansing introduction (for BigClean Prague 2011)
 
Knowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizationsKnowledge Management Lecture 2: Individuals, communities and organizations
Knowledge Management Lecture 2: Individuals, communities and organizations
 
Bubbles – Virtual Data Objects
Bubbles – Virtual Data ObjectsBubbles – Virtual Data Objects
Bubbles – Virtual Data Objects
 
Knowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: CycleKnowledge Management Lecture 3: Cycle
Knowledge Management Lecture 3: Cycle
 
Knowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presenceKnowledge Management Lecture 1: definition, history and presence
Knowledge Management Lecture 1: definition, history and presence
 
Creativity and innovation
Creativity and innovationCreativity and innovation
Creativity and innovation
 
Knowledge Management Lecture 4: Models
Knowledge Management Lecture 4: ModelsKnowledge Management Lecture 4: Models
Knowledge Management Lecture 4: Models
 
Creativity and innovation ppt mba
Creativity and innovation ppt  mbaCreativity and innovation ppt  mba
Creativity and innovation ppt mba
 

Similar to Dallas Data Brewery Meetup #2: Data Quality Perception

The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
Trillium Software
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
Thinkful
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
Thinkful
 

Similar to Dallas Data Brewery Meetup #2: Data Quality Perception (20)

Big Data: How does it fit in your data strategy?
Big Data: How does it fit in your data strategy?Big Data: How does it fit in your data strategy?
Big Data: How does it fit in your data strategy?
 
Applying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data ScaleApplying Data Quality Best Practices at Big Data Scale
Applying Data Quality Best Practices at Big Data Scale
 
From Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data ForumFrom Near to Maturity - Presentation to European Data Forum
From Near to Maturity - Presentation to European Data Forum
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
How to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organizationHow to unlock new data-driven potential for your organization
How to unlock new data-driven potential for your organization
 
Regina Food Summit - Data for Good
Regina Food Summit - Data for GoodRegina Food Summit - Data for Good
Regina Food Summit - Data for Good
 
Intro to Data Science
Intro to Data ScienceIntro to Data Science
Intro to Data Science
 
2017 06-14-getting started with data science
2017 06-14-getting started with data science2017 06-14-getting started with data science
2017 06-14-getting started with data science
 
BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?BDA 2012 Big data why the big fuss?
BDA 2012 Big data why the big fuss?
 
How to succeed at data without even trying!
How to succeed at data without even trying!How to succeed at data without even trying!
How to succeed at data without even trying!
 
How to source good data
How to source good dataHow to source good data
How to source good data
 
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
Bigger and Better: Employing a Holistic Strategy for Big Data toward a Strong...
 
Unlocking the Value of Big Data (Innovation Summit 2014)
Unlocking the Value of Big Data (Innovation Summit 2014)Unlocking the Value of Big Data (Innovation Summit 2014)
Unlocking the Value of Big Data (Innovation Summit 2014)
 
Big data & data science challenges and opportunities
Big data & data science   challenges and opportunitiesBig data & data science   challenges and opportunities
Big data & data science challenges and opportunities
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)Career in Data Science (July 2017, DTLA)
Career in Data Science (July 2017, DTLA)
 
Getting Started in Data Science
Getting Started in Data ScienceGetting Started in Data Science
Getting Started in Data Science
 
Big Data for the Next Big Idea in Financial Services (Whitepaper)
Big Data for the Next Big Idea in Financial Services (Whitepaper)Big Data for the Next Big Idea in Financial Services (Whitepaper)
Big Data for the Next Big Idea in Financial Services (Whitepaper)
 
Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science Thinkful DC - Intro to Data Science
Thinkful DC - Intro to Data Science
 
Why Big Data is Really about Small Data
Why Big Data is Really about Small DataWhy Big Data is Really about Small Data
Why Big Data is Really about Small Data
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 

Recently uploaded (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Dallas Data Brewery Meetup #2: Data Quality Perception