SlideShare a Scribd company logo
1 of 15
Download to read offline
A STEP TOWARDS A DATA QUALITY THEORY
The Third International Workshop on Data Science Engineering and its Applications (DSEA 2019)
In conjunction with
The Fifth International Conference on Social Networks Analysis, Management and Security(SNAMS-2019)
Granada, Spain. October 22-25, 2019.
Janis Bicevskis, Anastasija Nikiforova, Zane Bicevska, Ivo Oditis, Girts Karnitis
Faculty of Computing, University of Latvia
Anastasija.Nikiforova@lu.lv
 Def. I: «Quality» is a desirable goal to be achieved
through management of the production process.
 Def. II: «Data quality» is a relative concept, largely
dependent on specific requirements resulting from the
data use.
late 60’s
the data quality issues were firstly researched by statisticians, when mainly
mathematical theory for considering duplicates in statistical data sets was
proposed
late 80’s
the data quality issue has attracted management researchers
early 90’s
computer researchers have begun their own researches, focusing on the data
that are stored in databases and data warehouses, examining how to define,
measure and improve the quality of different types of data, relating the concept
of “data quality” to the “data quality dimension”
nowadays
almost 30 years later, since the data are everywhere and their amount increases
significantly, this issue is still popular and relevant, but, unfortunately, has
not yet been solved
2017
Organizations believe poor data quality to be responsible
for an average of $15 million per year in losses (Gartner)
Data quality weaknesses can lead to huge losses
The aggregate economic impact from applications based
on open data across the EU27 economy is estimated to
be €140 billion annually.
2016
Decisions resulting from bad data cost the US economy
$3.1 trillion dollars per year (IBM)
A BRIEF INSIGHT INTO THE HISTORY
[open] data are usually used by wide audience that
may not have deep knowledge in IT or data quality areas
a solution should be simple enough
ensuring particular users with possibility to take part in
the analysis of «third-party» data
for their own purposes
DATA QUALITY
Solution: previously proposed user-oriented data object-driven
approach
(Bicevskis, Bicevska, Nikiforova, Oditis, 2018), (Nikiforova, 2019)
!!! The same data may be
sufficiently qualitative in one case
BUT
completely useless under other
circumstances.
RELATED RESEARCHES
Problem I: necessity to involve data quality experts at every stage of data quality analysis
process.
Solution: data object-driven approach to data quality evaluation (Bicevskis, Bicevska, Nikiforova, Oditis, 2018)
Problem II: absence of data quality theory.
* «… This state of affairs has led to much confusion within the data quality
community and is even more bewildering for those who are new to the
discipline and more importantly to business stakeholders…» (DAMA UK,
2018)
** In different proposals, dimensions of the same name can have different
semantics and vice versa. (Batini, 2016)
Example I: (Kerr, et al., 2007):
New Zealand’s healthcare data:
 6 data quality dimensions,
 24 characteristics
 69 data quality criteria.
Example II: (Dahbi et al., 2018; Weiskopf et al.,
2013):
 2 data quality dimensions: accuracy
and completeness
 Most of the theoretical researches are characterized by a wide range of data and information quality
dimensions:
✘ data quality theoretical studies have not provided a unified system of data quality concepts yet*;
✘ the exact meaning of each dimension and how it should be assesd is still under discussion**;
✘ different proposals often use the same notation indicating semantically different dimensions and
vice versa.
✘ sometimes the difference between some of them is almost unnoticeable.
✘ each dimension can be supplied with one or more metrics that varies from one solution to another;
✘ the number of different dimensions, their definitions are often useful for only particular solution.
Question: How to relate particular dimension (and which one?) to a particular use-case???
SUMMARY
 This research is of a theoretical nature, the main objectives of which are:
 to provide a clear and straightforward definition of data quality concepts to ensure that all stakeholders perceive
them equally,
 to provide a language family that will describe the data quality requirements and assess the quality of data, taking
into account the various possible uses of the data and their variability over time.
 to provide a formalisation of the previously proposed practical solution to take a step towards a theory of data
quality, which hasn’t been proposed yet, despite numerous attempts.
TDQM data quality lifecycle
Data quality
definition
Data quality
measuring
Data quality
analysis
Data quality
improvement
MAIN PRINCIPLES OF THE
PROPOSED SOLUTION
 Each specific application can have its own specific DQ checks;
 DQ requirements can be formulated on several levels:
 DQ can be checked in various stages of the data processing;
 DQ definition language is graphical DSL:
• the diagrams are easy to read, create, understand and edit even by non-
IT and non-DQ experts;
• syntax and semantics can be easily applied to any new IS.
from informal text
in natural language (PIM)
to an automatically executable model,
SQL statements or program code (PSM);
!!! All three components are
defined by using a graphical
domain specific language
(DSL)**
**Three DSL families were developed as graphic languages based on the
possibilities of the modelling platform DIMOD
2. DATA QUALITY REQUIREMENTS - conditions that must be met in order a
data object is considered of high quality.
** May contain: informal or formalized implementation-independent descriptions of conditions
3. DATA QUALITY MEASURING PROCESS - procedure to be followed to assess quality of the data
DATA QUALITY MODEL
instead of dimensions
1. DATA OBJECT (DO) - the set of values of the parameters that characterize a real-life object
 primary data object - the initial DO which quality is analysed;
 secondary data object – DO that determines the context for analysis of the primary DO.
 both, primary and secondary DOs may contain an unlimited number of data sub-objects.
* Many objects of the same structure form class of data objects
** The primary data object is usually one, but the number of secondary data objects is not limited and determined by the
nature of the primary data object and the specific use-case.
d1
d2
d3
d4
dn
d..
ARCHITECTURE OF DATA QUALITY SYSTEM
 DO is a set of attribute values that characterize
one real object.
 The address for the attribute value of a
single data object is
<dataObjectName.attributeName> - is
used at the stage of determining data
quality requirements.
 Can be formulated at different levels of abstraction:
 from the formal language grammar
 to definitions of variables in programming languages.
DATA OBJECT
Students Programs
inputMessage
studentName
varchar
courseCode
varchar
progCode
varchar
Name
varchar
Success
Assessment
enumerable
Date
date
courseCode
varchar
Assessment
enumerable
Date
date
Courses
Code
varchar
Name
varchar
Name
varchar
Code
varchar
Primary DO
Secondary data object
Data sub-object
 In order to include quality requirements in the
contextual requirements, addresses of the
secondary data object’s parameters are used in
the appropriate conditions:
<secondaryDataObjectName(instanceIdent).
attributeName>.
 If the secondary data object should be searched
for by its attribute values, a secondary data
object search command similar to the primary
data object is used: <instanceIdent = seekInst
(secondaryObjectName, expression)>.
 When processing a data object class:
 instances of the data object class are selected,
 examining the fulfilment of the quality requirements for each individual instance.
 The instance processing cycle is determined by users.
The most commonly used options
If quality is analysed for all instances of a DO
reviewing all class instances by changing address
<dataObjectName(instanceIdent).attributeName>,
that is (a) calculated first by selecting the first instance
using method: instanceIdent =
getFirst(dataObjectName),
(b) followed by a transition to the next instance using
<instanceIdent = getNext(dataObjectName)> method.
If quality is analysed for only one instance of a DO
using a dynamically calculated address
<instanceIdent = seekInst(dataObject,
expression)>,
If an instance of a DO is found, then (a) a reference to the
DO is inserted into the variable instanceIdent,
(b) the value TRUE is returned to the environment;
otherwise – FALSE and NULL is inserted into the variable.
QUALITY SPECIFICATION FOR DATA
OBJECT’S CLASS
 When processing a data object class:
 instances of the data object class are selected,
 examining the fulfilment of the quality
requirements for each individual instance.
 DQ requirements are defined by using logical
expressions.
 The names of DO attributes/ fields serve as
operands in the logical expressions.
PRE-CONDITION QUALITY DEFINITIONS
Check Course
instProgram = seekInst(Programs,'Programs.Code =
Students(instStudent).progCode')
Check Student
instStudent = seekInst(Students,'Students.Name =
inputMessage.studentName')
Check Course
instCourse = seekInst(Programs(instProgram).Courses,
'Courses.Code = inputMessage.courseCode')
Send Message
sendMessage(18,
inputMessage.courseCode)
Send Message
sendMessage(19,
inputMessage.courseCode)
Send Message
sendMessage(17,
inputMessage.studentName)
YES
YES
YES
NO
NO
NO
 Pre-condition verifies (bold lines in «DO»):
 whether a student to whom inputMessage
applies exists;
 whether a student is registered to any training
program;
 whether the course specified in inputMessage
belongs to training program.
Students Programs
inputMessage
studentName
varchar
courseCode
varchar
progCode
varchar
Name
varchar
Success
Assessment
enumerable
Date
date
courseCode
varchar
Assessment
enumerable
Date
date
Courses
Code
varchar
Name
varchar
Name
varchar
Code
varchar
If quality is analysed for all instances of a DO If quality is analysed for only one instance of a DO
review all class instances by changing address
<dataObjectName(instanceIdent).attributeName>,
that is (a) calculated first by selecting the first instance using method:
<instanceIdent = getFirst(dataObjectName)>,
(b) followed by a transition to the next instance using <instanceIdent =
getNext(dataObjectName)> method.
using a dynamically calculated address
<instanceIdent = seekInst(dataObject, expression)>,
If an instance of a DO is found, then
(a) a reference to the DO is inserted into the variable instanceIdent,
(b) the value TRUE is returned to the environment;
otherwise – FALSE and NULL is inserted into the variable.
 A concrete DO or a class of DO is used as an
input for a quality verification process.
 The quality verification process creates a test
protocol.
EXAMPLE: POST-CONDITION
QUALITY DEFINITIONS
Check Course Insert
instSuccess = seekInst(Students(instStudent).Success,
'Success.courseCode = inputMessage.courseCode)
Check Assessment Insert
Success(instSuccess).Assessment =
inputMessage.Assessment
Check Date Insert
Success(instSuccess).Date = inputMessage.Date
Send Message
sendMessage(23,
inputMessage.Date)
Send Message
sendMessage(22,
inputMessage.Assessment)
Send Message
sendMessage(21,
inputMessage.courseCode)
Seek Student
instStudent = seekInst(Students, 'Student.Name = inputMessage.studentName')
YES
YES
YES
NO
NO
NO
 Post-condition is executed after Data_Input and
it verifies (thin arrows in Fig. «DO»):
 whether a new instance has been added to
the Student sub-object Success data object;
 whether a new instance with the
corresponding course assessment has been
added to the Student sub-object Success
data item;
 whether a new instance with the
corresponding exam date has been added to
the Student data object Success sub-object.
Students Programs
inputMessage
studentName
varchar
courseCode
varchar
progCode
varchar
Name
varchar
Success
Assessment
enumerable
Date
date
courseCode
varchar
Assessment
enumerable
Date
date
Courses
Code
varchar
Name
varchar
Name
varchar
Code
varchar
 In total: 25 data sets  23 (92%) have at least several data quality issues;
 The most popular and frequently occurred data quality issues:
✘ lack of values even for the primary parameters;
✘ doubtful/ invalid dates;
✘ issues in interrelated parameters;
✘ multiple notation for the same object;
✘ values that don’t belong to the list of valid values;
✘ contextual data quality issues such as lack of values and conflicting values;
EXPERIENCE OF EVALUATION
OF OPEN DATA QUALITY
 structured and semi-structured
open data sets provided by
different data publishers;
 the data quality requirements
formulated for each data set vary
from very simple to fairly complex.
 The research proposes a data-object driven theory of data quality, which arose from previous studies, eliminating the
lack of formalization.
 An end-user who is interested in data quality analysis according to his needs is set into the centre of
the data quality analysis.
 The most significant advantages:
 all concepts of the proposed data quality theory are straightforward;
 the proposed approach is an «external» mechanism that allows describing the DQ and veryfying the applicability of data to a
specific use case independently from the IS accumulating and processing data;
 the use of graphical DSLs simplifies the interaction process by allowing multiple stakeholders to be involved;
 designing of diagrams is fairly simple  it is assumed that DQ analysis can be performed even by non-IT and non-DQ experts;
 the appliance of the proposed solution for the analysis of “third-party” data sets proves the simplicity and effectiveness of the
proposed solution.
RESULTS
THANK YOU FOR ATTENTION!
For more information, see ResearchGate
See also anastasijanikiforova.com
For questions or any other queries, contact me via email - Anastasija.Nikiforova@lu.lv
Article: Bicevskis, J., Nikiforova, A., Bicevska, Z., Oditis, I., & Karnitis, G. (2019, October). A step towards a data
quality theory. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security
(SNAMS) (pp. 303-308). IEEE.

More Related Content

What's hot

The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope IJCSEIT Journal
 
Explanations in Data Systems
Explanations in Data SystemsExplanations in Data Systems
Explanations in Data SystemsFotis Savva
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35Alexander Decker
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...Alexander Decker
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesChristopher Eaker
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalitiesRajendran
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
 

What's hot (8)

The Survey of Data Mining Applications And Feature Scope
The Survey of Data Mining Applications  And Feature Scope The Survey of Data Mining Applications  And Feature Scope
The Survey of Data Mining Applications And Feature Scope
 
Explanations in Data Systems
Explanations in Data SystemsExplanations in Data Systems
Explanations in Data Systems
 
4.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-354.on demand quality of web services using ranking by multi criteria 31-35
4.on demand quality of web services using ranking by multi criteria 31-35
 
11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...11.0004www.iiste.org call for paper.on demand quality of web services using r...
11.0004www.iiste.org call for paper.on demand quality of web services using r...
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
Hy3414631468
Hy3414631468Hy3414631468
Hy3414631468
 
1.2 steps and functionalities
1.2 steps and functionalities1.2 steps and functionalities
1.2 steps and functionalities
 
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIESA SURVEY ON DATA MINING IN STEEL INDUSTRIES
A SURVEY ON DATA MINING IN STEEL INDUSTRIES
 

Similar to A step towards a data quality theory

Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Anastasija Nikiforova
 
Evaluating the effectiveness of data quality framework in software engineering
Evaluating the effectiveness of data quality framework in  software engineeringEvaluating the effectiveness of data quality framework in  software engineering
Evaluating the effectiveness of data quality framework in software engineeringIJECEIAES
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsIRJET Journal
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGIJCSEIT Journal
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsHong-Linh Truong
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RIOSR Journals
 
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Kathmandu Living Labs
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisManuel Martín
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseVaticle
 
Making data typing efforts or automatically detecting data types for automat...
Making data typing efforts or automatically detecting data types  for automat...Making data typing efforts or automatically detecting data types  for automat...
Making data typing efforts or automatically detecting data types for automat...National Institute of Informatics
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)NikitaRajbhoj
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?andrea huang
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Hima Patel
 
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITYDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITYIJDKP
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupEdward Curry
 
Chapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptChapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptAnasSamara3
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesDeepaR42
 

Similar to A step towards a data quality theory (20)

Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
Stakeholder-centred Identification of Data Quality Issues: Knowledge that Can...
 
Seminar Report Vaibhav
Seminar Report VaibhavSeminar Report Vaibhav
Seminar Report Vaibhav
 
Evaluating the effectiveness of data quality framework in software engineering
Evaluating the effectiveness of data quality framework in  software engineeringEvaluating the effectiveness of data quality framework in  software engineering
Evaluating the effectiveness of data quality framework in software engineering
 
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional DatasetsProjection Multi Scale Hashing Keyword Search in Multidimensional Datasets
Projection Multi Scale Hashing Keyword Search in Multidimensional Datasets
 
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSINGMETA DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
META DATA QUALITY CONTROL ARCHITECTURE IN DATA WAREHOUSING
 
Principles of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data AnalyticsPrinciples of Software-defined Elastic Systems for Big Data Analytics
Principles of Software-defined Elastic Systems for Big Data Analytics
 
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-RSelecting the correct Data Mining Method: Classification & InDaMiTe-R
Selecting the correct Data Mining Method: Classification & InDaMiTe-R
 
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
Prof. Melinda Laituri, Colorado State University | Map Data Integrity | SotM ...
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
 
Automating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge BaseAutomating Data Science over a Human Genomics Knowledge Base
Automating Data Science over a Human Genomics Knowledge Base
 
Making data typing efforts or automatically detecting data types for automat...
Making data typing efforts or automatically detecting data types  for automat...Making data typing efforts or automatically detecting data types  for automat...
Making data typing efforts or automatically detecting data types for automat...
 
Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)Nikita rajbhoj(a 50)
Nikita rajbhoj(a 50)
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
Ijariie1184
Ijariie1184Ijariie1184
Ijariie1184
 
How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?How to clean data less through Linked (Open Data) approach?
How to clean data less through Linked (Open Data) approach?
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
 
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITYDIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
DIRA : A FRAMEWORK OF DATA INTEGRATION USING DATA QUALITY
 
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data MeetupCrowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
Crowdsourcing Approaches to Big Data Curation - Rio Big Data Meetup
 
Chapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.pptChapter 4 Organizational Aspects of Data Management.ppt
Chapter 4 Organizational Aspects of Data Management.ppt
 
Data Mining : Concepts and Techniques
Data Mining : Concepts and TechniquesData Mining : Concepts and Techniques
Data Mining : Concepts and Techniques
 

More from Anastasija Nikiforova

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Anastasija Nikiforova
 
Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Anastasija Nikiforova
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Anastasija Nikiforova
 
Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Anastasija Nikiforova
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Anastasija Nikiforova
 
Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Anastasija Nikiforova
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Anastasija Nikiforova
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...Anastasija Nikiforova
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Anastasija Nikiforova
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISAnastasija Nikiforova
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...Anastasija Nikiforova
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Anastasija Nikiforova
 
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...Anastasija Nikiforova
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
 
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSOPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSAnastasija Nikiforova
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
 
Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...Anastasija Nikiforova
 

More from Anastasija Nikiforova (20)

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
 
Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...
 
Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...
 
Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...
 
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
IoTSE-based Open Database Vulnerability inspection in three Baltic Countries:...
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
 
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERSOPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
OPEN DATA: ECOSYSTEM, CURRENT AND FUTURE TRENDS, SUCCESS STORIES AND BARRIERS
 
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
 
Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...Towards enrichment of the open government data: a stakeholder-centered determ...
Towards enrichment of the open government data: a stakeholder-centered determ...
 
Atvērto datu potenciāls
Atvērto datu potenciālsAtvērto datu potenciāls
Atvērto datu potenciāls
 

Recently uploaded

Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 

Recently uploaded (20)

CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 

A step towards a data quality theory

  • 1. A STEP TOWARDS A DATA QUALITY THEORY The Third International Workshop on Data Science Engineering and its Applications (DSEA 2019) In conjunction with The Fifth International Conference on Social Networks Analysis, Management and Security(SNAMS-2019) Granada, Spain. October 22-25, 2019. Janis Bicevskis, Anastasija Nikiforova, Zane Bicevska, Ivo Oditis, Girts Karnitis Faculty of Computing, University of Latvia Anastasija.Nikiforova@lu.lv
  • 2.  Def. I: «Quality» is a desirable goal to be achieved through management of the production process.  Def. II: «Data quality» is a relative concept, largely dependent on specific requirements resulting from the data use. late 60’s the data quality issues were firstly researched by statisticians, when mainly mathematical theory for considering duplicates in statistical data sets was proposed late 80’s the data quality issue has attracted management researchers early 90’s computer researchers have begun their own researches, focusing on the data that are stored in databases and data warehouses, examining how to define, measure and improve the quality of different types of data, relating the concept of “data quality” to the “data quality dimension” nowadays almost 30 years later, since the data are everywhere and their amount increases significantly, this issue is still popular and relevant, but, unfortunately, has not yet been solved 2017 Organizations believe poor data quality to be responsible for an average of $15 million per year in losses (Gartner) Data quality weaknesses can lead to huge losses The aggregate economic impact from applications based on open data across the EU27 economy is estimated to be €140 billion annually. 2016 Decisions resulting from bad data cost the US economy $3.1 trillion dollars per year (IBM) A BRIEF INSIGHT INTO THE HISTORY
  • 3. [open] data are usually used by wide audience that may not have deep knowledge in IT or data quality areas a solution should be simple enough ensuring particular users with possibility to take part in the analysis of «third-party» data for their own purposes DATA QUALITY Solution: previously proposed user-oriented data object-driven approach (Bicevskis, Bicevska, Nikiforova, Oditis, 2018), (Nikiforova, 2019) !!! The same data may be sufficiently qualitative in one case BUT completely useless under other circumstances.
  • 4. RELATED RESEARCHES Problem I: necessity to involve data quality experts at every stage of data quality analysis process. Solution: data object-driven approach to data quality evaluation (Bicevskis, Bicevska, Nikiforova, Oditis, 2018) Problem II: absence of data quality theory. * «… This state of affairs has led to much confusion within the data quality community and is even more bewildering for those who are new to the discipline and more importantly to business stakeholders…» (DAMA UK, 2018) ** In different proposals, dimensions of the same name can have different semantics and vice versa. (Batini, 2016) Example I: (Kerr, et al., 2007): New Zealand’s healthcare data:  6 data quality dimensions,  24 characteristics  69 data quality criteria. Example II: (Dahbi et al., 2018; Weiskopf et al., 2013):  2 data quality dimensions: accuracy and completeness  Most of the theoretical researches are characterized by a wide range of data and information quality dimensions: ✘ data quality theoretical studies have not provided a unified system of data quality concepts yet*; ✘ the exact meaning of each dimension and how it should be assesd is still under discussion**; ✘ different proposals often use the same notation indicating semantically different dimensions and vice versa. ✘ sometimes the difference between some of them is almost unnoticeable. ✘ each dimension can be supplied with one or more metrics that varies from one solution to another; ✘ the number of different dimensions, their definitions are often useful for only particular solution. Question: How to relate particular dimension (and which one?) to a particular use-case???
  • 5. SUMMARY  This research is of a theoretical nature, the main objectives of which are:  to provide a clear and straightforward definition of data quality concepts to ensure that all stakeholders perceive them equally,  to provide a language family that will describe the data quality requirements and assess the quality of data, taking into account the various possible uses of the data and their variability over time.  to provide a formalisation of the previously proposed practical solution to take a step towards a theory of data quality, which hasn’t been proposed yet, despite numerous attempts.
  • 6. TDQM data quality lifecycle Data quality definition Data quality measuring Data quality analysis Data quality improvement MAIN PRINCIPLES OF THE PROPOSED SOLUTION  Each specific application can have its own specific DQ checks;  DQ requirements can be formulated on several levels:  DQ can be checked in various stages of the data processing;  DQ definition language is graphical DSL: • the diagrams are easy to read, create, understand and edit even by non- IT and non-DQ experts; • syntax and semantics can be easily applied to any new IS. from informal text in natural language (PIM) to an automatically executable model, SQL statements or program code (PSM);
  • 7. !!! All three components are defined by using a graphical domain specific language (DSL)** **Three DSL families were developed as graphic languages based on the possibilities of the modelling platform DIMOD 2. DATA QUALITY REQUIREMENTS - conditions that must be met in order a data object is considered of high quality. ** May contain: informal or formalized implementation-independent descriptions of conditions 3. DATA QUALITY MEASURING PROCESS - procedure to be followed to assess quality of the data DATA QUALITY MODEL instead of dimensions 1. DATA OBJECT (DO) - the set of values of the parameters that characterize a real-life object  primary data object - the initial DO which quality is analysed;  secondary data object – DO that determines the context for analysis of the primary DO.  both, primary and secondary DOs may contain an unlimited number of data sub-objects. * Many objects of the same structure form class of data objects ** The primary data object is usually one, but the number of secondary data objects is not limited and determined by the nature of the primary data object and the specific use-case. d1 d2 d3 d4 dn d..
  • 8. ARCHITECTURE OF DATA QUALITY SYSTEM
  • 9.  DO is a set of attribute values that characterize one real object.  The address for the attribute value of a single data object is <dataObjectName.attributeName> - is used at the stage of determining data quality requirements.  Can be formulated at different levels of abstraction:  from the formal language grammar  to definitions of variables in programming languages. DATA OBJECT Students Programs inputMessage studentName varchar courseCode varchar progCode varchar Name varchar Success Assessment enumerable Date date courseCode varchar Assessment enumerable Date date Courses Code varchar Name varchar Name varchar Code varchar Primary DO Secondary data object Data sub-object
  • 10.  In order to include quality requirements in the contextual requirements, addresses of the secondary data object’s parameters are used in the appropriate conditions: <secondaryDataObjectName(instanceIdent). attributeName>.  If the secondary data object should be searched for by its attribute values, a secondary data object search command similar to the primary data object is used: <instanceIdent = seekInst (secondaryObjectName, expression)>.  When processing a data object class:  instances of the data object class are selected,  examining the fulfilment of the quality requirements for each individual instance.  The instance processing cycle is determined by users. The most commonly used options If quality is analysed for all instances of a DO reviewing all class instances by changing address <dataObjectName(instanceIdent).attributeName>, that is (a) calculated first by selecting the first instance using method: instanceIdent = getFirst(dataObjectName), (b) followed by a transition to the next instance using <instanceIdent = getNext(dataObjectName)> method. If quality is analysed for only one instance of a DO using a dynamically calculated address <instanceIdent = seekInst(dataObject, expression)>, If an instance of a DO is found, then (a) a reference to the DO is inserted into the variable instanceIdent, (b) the value TRUE is returned to the environment; otherwise – FALSE and NULL is inserted into the variable. QUALITY SPECIFICATION FOR DATA OBJECT’S CLASS
  • 11.  When processing a data object class:  instances of the data object class are selected,  examining the fulfilment of the quality requirements for each individual instance.  DQ requirements are defined by using logical expressions.  The names of DO attributes/ fields serve as operands in the logical expressions. PRE-CONDITION QUALITY DEFINITIONS Check Course instProgram = seekInst(Programs,'Programs.Code = Students(instStudent).progCode') Check Student instStudent = seekInst(Students,'Students.Name = inputMessage.studentName') Check Course instCourse = seekInst(Programs(instProgram).Courses, 'Courses.Code = inputMessage.courseCode') Send Message sendMessage(18, inputMessage.courseCode) Send Message sendMessage(19, inputMessage.courseCode) Send Message sendMessage(17, inputMessage.studentName) YES YES YES NO NO NO  Pre-condition verifies (bold lines in «DO»):  whether a student to whom inputMessage applies exists;  whether a student is registered to any training program;  whether the course specified in inputMessage belongs to training program. Students Programs inputMessage studentName varchar courseCode varchar progCode varchar Name varchar Success Assessment enumerable Date date courseCode varchar Assessment enumerable Date date Courses Code varchar Name varchar Name varchar Code varchar If quality is analysed for all instances of a DO If quality is analysed for only one instance of a DO review all class instances by changing address <dataObjectName(instanceIdent).attributeName>, that is (a) calculated first by selecting the first instance using method: <instanceIdent = getFirst(dataObjectName)>, (b) followed by a transition to the next instance using <instanceIdent = getNext(dataObjectName)> method. using a dynamically calculated address <instanceIdent = seekInst(dataObject, expression)>, If an instance of a DO is found, then (a) a reference to the DO is inserted into the variable instanceIdent, (b) the value TRUE is returned to the environment; otherwise – FALSE and NULL is inserted into the variable.
  • 12.  A concrete DO or a class of DO is used as an input for a quality verification process.  The quality verification process creates a test protocol. EXAMPLE: POST-CONDITION QUALITY DEFINITIONS Check Course Insert instSuccess = seekInst(Students(instStudent).Success, 'Success.courseCode = inputMessage.courseCode) Check Assessment Insert Success(instSuccess).Assessment = inputMessage.Assessment Check Date Insert Success(instSuccess).Date = inputMessage.Date Send Message sendMessage(23, inputMessage.Date) Send Message sendMessage(22, inputMessage.Assessment) Send Message sendMessage(21, inputMessage.courseCode) Seek Student instStudent = seekInst(Students, 'Student.Name = inputMessage.studentName') YES YES YES NO NO NO  Post-condition is executed after Data_Input and it verifies (thin arrows in Fig. «DO»):  whether a new instance has been added to the Student sub-object Success data object;  whether a new instance with the corresponding course assessment has been added to the Student sub-object Success data item;  whether a new instance with the corresponding exam date has been added to the Student data object Success sub-object. Students Programs inputMessage studentName varchar courseCode varchar progCode varchar Name varchar Success Assessment enumerable Date date courseCode varchar Assessment enumerable Date date Courses Code varchar Name varchar Name varchar Code varchar
  • 13.  In total: 25 data sets  23 (92%) have at least several data quality issues;  The most popular and frequently occurred data quality issues: ✘ lack of values even for the primary parameters; ✘ doubtful/ invalid dates; ✘ issues in interrelated parameters; ✘ multiple notation for the same object; ✘ values that don’t belong to the list of valid values; ✘ contextual data quality issues such as lack of values and conflicting values; EXPERIENCE OF EVALUATION OF OPEN DATA QUALITY  structured and semi-structured open data sets provided by different data publishers;  the data quality requirements formulated for each data set vary from very simple to fairly complex.
  • 14.  The research proposes a data-object driven theory of data quality, which arose from previous studies, eliminating the lack of formalization.  An end-user who is interested in data quality analysis according to his needs is set into the centre of the data quality analysis.  The most significant advantages:  all concepts of the proposed data quality theory are straightforward;  the proposed approach is an «external» mechanism that allows describing the DQ and veryfying the applicability of data to a specific use case independently from the IS accumulating and processing data;  the use of graphical DSLs simplifies the interaction process by allowing multiple stakeholders to be involved;  designing of diagrams is fairly simple  it is assumed that DQ analysis can be performed even by non-IT and non-DQ experts;  the appliance of the proposed solution for the analysis of “third-party” data sets proves the simplicity and effectiveness of the proposed solution. RESULTS
  • 15. THANK YOU FOR ATTENTION! For more information, see ResearchGate See also anastasijanikiforova.com For questions or any other queries, contact me via email - Anastasija.Nikiforova@lu.lv Article: Bicevskis, J., Nikiforova, A., Bicevska, Z., Oditis, I., & Karnitis, G. (2019, October). A step towards a data quality theory. In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS) (pp. 303-308). IEEE.