A first step towards defining roles and formalizing responsibilities of key players and stakeholders for ensuring and improving data quality and usability of digital Earth Science datasets
New Paradigm for Ensuring and Improving Data Quality and Usability
1. A
New
Paradigm
for
Ensuring
and
Improving
Dataset
Quality
and
Usability
–
Roles
and
Responsibili?es
of
Stewards
and
Other
Major
Product
Stakeholders
Ge
Peng
NOAA’s
Coopera?ve
Ins?tute
for
Climate
and
Satellite
–
North
Carolina
(CICS-‐NC)
NC
State
University
and
NOAA’s
Na?onal
Centers
for
Environmental
Informa?on
(NCEI)
In
Collabora?on
with
Nancy
Ritchey,
Kenneth
Casey,
Edward
Kearns,
Jeffrey
PriveQe,
Drew
Saunders,
Philip
Jones,
Tom
Maycock,
and
Steve
Ansari
Version
20160515
CC-‐BY-‐SA
4.0
POC:
gpeng@cicsnc.org
2. What
Is
Data
Quality?
Who
Should
Care?
Ø How
good
or
bad
a
data
product
is.
Ø All
Key
Players
-‐
everyone
who
develops,
creates,
produces,
stewards,
manages,
publishes,
or
serves
the
product
Ø Other
major
product
stakeholders
(including
sponsors,
power
users,
and
management)
Ø General
users
What
Is
Data
Usability?
Ø How
easy
or
hard
a
data
product
is
understood
and
used.
3. Quality
-‐
How
good
or
bad
something
is
• Product
quality
–
degree
to
which
the
data
product
is
produced
and
described
correctly.
• Stewardship
quality
–
degree
to
which
the
data
product
was
being
preserved
and
cared
for
properly.
Steward
-‐
A
person
managing
or
caring
for
other’s
assets
• A
role
in
incorporaSng
processes,
policies,
guidelines
and
responsibiliSes
to
administering
organizaSon’s
data
in
compliance
with
policy
and/or
regulatory
obligaSons.
• Requires
expert
domain
knowledge
and
general
knowledge
for
relevant
domains
and
intenSon
to
ensure
and
improve
the
stewardship
of
other
people’s
datasets.
§ Data
steward:
Ø A
role
responsible
for
managing
both
dataset
and
metadata
§ Scien?fic
steward:
Ø A
role
responsible
for
managing
data
quality
and
usability
§ Technology
steward:
Ø A
role
responsible
for
managing
tools
and
systems
(
Source:
Chisholm
2014;
Peng
et
al.
2016)
4. • Stewards
are
stewardship
roles
assigned
to
domain
subject
maYer
experts
(SMEs)
who
have
general
knowledge
of
other
relevant
domains.
§ SMEs
are
people
with
extensive
knowledge
and
experiences
in
their
local
domains.
§ The
role
of
SME
is
gained
and
not
assigned.
• Stewards
need
to
have
a
mindset
of
caring
for
other
people’s
asset
(e.g.,
data
products)
and
are
capable
of
communicaSng
within
and
across-‐domains.
• One
person
could
be
assigned
more
than
one
stewardship
role.
(Source:
Chisholm
2014;
Peng
et
al.
2016)
Something
about
Stewards
5. Ensuring
and
improving
data
quality
and
usability
throughout
the
life
cycle
of
a
dataset
• Old
days
–
one
person
Ø Primarily
done
by
data
producers
Ø Usability,
i.e.,
easy
to
use,
is
usually
not
taking
into
consideraSon
Ø InformaSon
about
procedures
or
pracSces
on
data
quality
are
hard
to
come
by
Ø Data
choice
is
limited
for
users
and
users
have
no
choice
but
to
wait
for
the
release
of
the
dataset
• Nowadays
–
an
integrated
team
Ø Need
to
be
more
scalable
Ø Need
to
be
more
integrated
Ø Need
to
be
more
Smely
Ø InformaSon
about
methods
and
results
need
to
be
§ readily
available;
in
an
easy
to
understand
and
interoperable
format
Ø Users
have
many
choices
and
they
do
not
have
to
wait
for
or
use
your
data
7. Product
Quality
Stewardship
Quality
Use/Service
Quality
Data
Producers
• Define/Create/Obtain
Stewards
• Maintain/Preserve/Document/Access
Data
Providers/Users
• Use/Service
Food
Quality
• Requirements
• Produc?on/distribu?on
• Info
on
product
specs
• Storage,
transport,
re-‐distribu?on
• Product
packing/labels
• Cooking
instruc?on
• Stores/restaurants/homes
• Derived
products
-‐-‐-‐>
• Timeliness/Presenta?on
Data
Quality
Producers
Middlemen
Providers
A
shared
responsibility
in
ensuring
quality!
8. So
We
All
Have
To
Talk
To
Each
Other
–
That
Is
The
Problem!
(another
example:
adap?ng
ISO
OAIS
RM
for
long-‐term
preserva?on)
Func?onal
En??es
Data
Produc?on
Roles
Ingest
Metadata
Documenta?on
Archive
Dissemina?on
Access
Service
Data
Use
Data
Producer
Metadata
Specialist
Access
POC
Science
POC
User
Service
POC
Access
Specialist
User
Service
POC
Archive
POC
Science
POC
Data
Consumer
Stakeholders
including
Sponsors
and
Management
• We
do
not
talk
in
the
same
language
• We
do
not
communicate
in
the
same
channel
Potential interfaces in knowledge domains
9. Why
Do
We
Need
to
Define
Roles
of
Stewards?
Data
Producer
Metadata
POC
Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standard
10. Why
Do
We
Need
to
Define
Roles
of
Stewards?
Stewards
help
capture
and
convey
DQ
info
into
the
context
of
DQ
metadata!
Data
Producer
Metadata
POC
Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standard
11. Why
Do
We
Need
to
Define
Responsibili?es
of
Key
Players
and
Stakeholders?
Data
Producer
Program
Managers
Metadata
POC
Stewardship
Management
Adap?ng
ISO
Data
Quality
(DQ)
Metadata
Standards
Ø Crea?ng
and
improving
DQ
metadata
and
documenta?ons
is
beyond
the
current
job
scope
and
exper?se
of
data
providers
and
metadata
curators.
Ø Defining
responsibili?es
will
help
facilitate
the
process!
Ø It
will
help
raise
the
awareness
and
improve
requirements
of
data
quality
and
usability.
You
are
responsible
for
data
quality
of
your
data.
So
you
should
provide
us
with
the
DQ
metadata!
You
are
responsible
for
metadata.
You
should
create
the
DQ
metadata
yourself!
12. First
Step
in
Formalizing
Roles
and
High-‐Level
Responsibili?es
13. Data
Producer
• Ensure
and
improve
Scien,fic
Quality
of
the
data
product
-‐
defining
and
documen?ng
data
product
accuracy,
precision,
uncertainty
sources
and
es?mates
• Ensure
Data
Quality
during
produc?on–
screening/assurance
• Assess
and
improve
Data
Quality
–
verifica?on/valida?on
• Ensure
Data
Integrity
–
crea?on/staging
• Help
ensure
Preservability
-‐
providing
informa?on
about
data
product
(?me,
space,
size,
variables,
etc.)
• Ensure
Produc,on
Sustainability
• Help
Ensure
Transparency
-‐
providing
informa?on
on
data
source,
algorithm
and
processing
steps,
and
error
es?mates/
sources
• Ensure
and
improve
Data
Usability
-‐
providing
informa?on
about
the
product
(update
frequency,
latency,
variable
aQributes,
etc.)
and
guidance
on
data
use
Roles
Responsibili?es
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
14. • Ensure
Data
Integrity
–
ingest
and
archive
• Ensure
and
improve
Data
Provenance
and
Traceability
• Improve
Data
Quality
metadata
• Ensure
and
improve
archiving
requirements
• Assess/improve
Data
Quality
–
Evalua?on/verifica?on
• Promote
and
improve
Data
Usability
–
Characteriza?on
• Help
ensure
and
improve
Data
Quality
metadata
• Ensure
and
improve
data
quality
and
usability
requirements
• Ensure
Data
Integrity
–
ingest,
archive
retrieval,
data
access,
and
file
system
and
technology
upgrade
• Ensure
and
Improve
Data
Accessibility
and
Discoverability
• Promote
and
improve
Data
Interoperability
• Ensure
and
improve
sobware
and
system
requirements
Data
Steward
Scien?fic
Steward
Technology
Steward
Roles
Responsibili?es
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
15. End-‐User
• Request
Transparency
in
data
quality
procedures
and
prac?ces
• Request
Provenance
of
the
data
product
• Request
evalua?on
results
of
product,
stewardship,
and
service
maturity
of
the
data
product
• Provide
feedback
on
Quality
and
Usability
of
the
data
product
Manager
• Help
increase
awareness
of
Data
Quality
and
Usability
• Help
improve
data
quality
and
usability
requirements
• Help
ensure
Data
Interoperability
Sponsor
• Define
Data
Quality
and
Usability
requirements
• Require
data
quality
oversight
and
monitoring
• Encourage
Transparency
in
data
quality
procedures
and
prac?ces
Data
Distributor
• Ensure
and
improve
Representa,on
of
data
quality
informa?on
• Ensure
and
improve
Traceability
of
data
quality
informa?on
• Ensure
user
feedback
• Help
improve
data
quality
and
usability
requirements
Roles
Responsibili?es
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
16. Data
Originator
• Ensure and improve Scientific Quality of the data product - defining and
documenting data product accuracy, precision, uncertainty sources and estimates
• Ensure Data Quality during production– screening/assurance
• Assess and improve Data Quality – verification/validation
• Ensure Data Integrity – creation/staging
• Help ensure Preservability - providing information about data product (time, space,
size, variables, etc.)
• Ensure Production Sustainability
• Help Ensure Transparency - providing information on data source, algorithm and
processing steps, and error estimates/sources
• Ensure and improve Data Usability - providing information about the product (update
frequency, latency, variable attributes, etc.) and guidance on data use
Data
Steward
• Ensure Data Integrity – ingest and archive
• Ensure and improve Data Provenance and Traceability
• Improve Data Quality metadata
• Ensure and improve archiving requirements
Technology
Steward
• Ensure Data Integrity – ingest, archive retrieval, data access, and file system and
technology upgrade
• Ensure and Improve Data Accessibility and Discoverability
• Promote and improve Data Interoperability
• Ensure and improve software and system requirements
Scien?fic
Steward
• Assess/improve Data Quality – Evaluation/verification
• Promote and improve Data Usability – Characterization
• Help ensure and improve Data Quality metadata
• Ensure and improve data quality and usability requirements
Documenta?on
• Capture
• Convey
• Be
traceable
• Be
transparent
• Be
machine
–
readable
• Be
human-‐
understandable
Quality
Ra?ng
• Assess
• Improve
• Be
transparent
• Be
quanSfiable
• Be
machine-‐
readable
• Be
human-‐
understandable
• Understandable
info
for
users
• Ac?onable
info
for
management
• Integrable
tags
for
machines
Roles
Responsibili?es
One
person
may
wear
several
hats!
End-‐User
• Request Transparency in data quality procedures and practices
• Request Provenance of the data product
• Request evaluation results of product, stewardship, and service maturity of the data
product
• Provide feedback on Quality and Usability of the data product
Within
the
context
of
ensuring
and
improving
dataset
quality
(DQ)
and
usability
Data
Distributor
• Ensure and improve Representation of data quality information
• Ensure and improve Traceability of data quality information
• Ensure user feedback
• Help improve data quality and usability requirements
Sponsor
• Define Data Quality and Usability requirements
• Require data quality oversight and monitoring
• Encourage Transparency in data quality procedures and practices
Manager
• Help increase awareness of Data Quality and Usability
• Help improve data quality and usability requirements
• Help ensure Data Interoperability
Version:
20160515
CC-‐BY-‐SA
4.0
POC:
gpeng@cicsnc.org
17. Take
Away
Messages
• Ensuring
data
quality
is
an
end-‐to-‐end
process
and
a
shared
responsibility
of
all
key
players
(data
producers,
managers/stewards,
providers/publishers)
and
other
major
stakeholders
(sponsors,
power
users,
and
management).
• Effec?ve
stewardship
of
scien?fic
data
requires:
§ Expert
domain
knowledge
in
data
management,
technology,
and
science
§ ConSnuous
oversight
from
all
stewards,
and
§ Open
and
conSnuous
communicaSon
among
key
players
and
stakeholders
• Defining
roles
and
responsibili?es
of
key
players
and
stakeholders
will
help
facilitate
the
process
of
§ Ensuring
and
improving
dataset
quality
and
usability
§ Capturing
and
conveying
informaSon
about
data
quality
18. Acknowledgement
The
idea
of
using
food
quality
for
an
analog
of
data
quality
originated
from
one
of
the
family
dinner
table
discussions.
I
thank
my
family
for
beneficial
discussions
that
followed,
for
allowing
me
to
use
them
as
“Guinea
Pigs”,
and
for
their
helpful
comments!
To
cite
this
presenta?on
Peng,
G.,
2015:
A
New
Paradigm
for
Ensuring
and
Improving
Dataset
Quality
and
Usability
–
Roles
and
ResponsibiliSes
of
Stewards
and
Other
Major
Product
Stakeholders.
Updated:
May
15,
2016.
Slideshare.
Access
date:
mm/dd/yyyy.
View
Latest
Version
of
This
Presenta?on
hYp://Snyurl.com/RolesRs-‐DQU
Related
Presenta?on:
Stewards
–
Knowledge
and
CommunicaSon
Hub
hYp://Snyurl.com/Stewards-‐Hub
19. Image
source
hYp://www.busyinbrooklyn.com/wp-‐content/uploads/2013/09/USDA_GRADES.jpg;
hYp://www.kaleelbrothers.com/images/Fresh-‐Produce.png;
hYp://www.pgabeef.com/images/storage_chart.gif;
hYps://www.colorado.gov/pacific/sites/default/files/u/6556/Egg-‐Grading.JPG;
hYp://www.hickmanseggs.com/w3/wp-‐content/uploads/2014/04/egg_size.jpg;
hYps://c2.staScflickr.com/8/7159/6801729225_82e823a5d6_z.jpg;
hYp://www.thepoultrysite.com/arScles/contents/09-‐12CobbChicks1.jpg;
hYp://www.topratedsteakhouses.com/wp-‐content/uploads/2013/12/Grilled-‐Beef-‐with-‐Tomato.jpg;
hYp://cdn2.hubspot.net/hub/66214/file-‐15223310-‐jpg/images/wearingmanyhats.jpg;
References
Chisholm,
M.,
2014:
Data
Stewards
versus
Subject
MaYer
Experts
and
Data
Managers.
Informa/on
Management.
Version:
May
28,
2014.
[Available
online
at:
hYp://
www.informaSon-‐management.com/news/news/data-‐stewards-‐versus-‐subject-‐
maYer-‐experts-‐and-‐data-‐managers-‐10025704-‐1.html.]
Peng,
G.,
N.
A.
Ritchey,
K.
S.
Casey,
E.
J.
Kearns,
J.
L.
PriveYe,
D.
Saunders,
P.
Jones,
T.
Maycock,
and
S.
Ansari,
2016:
ScienSfic
Stewardship
in
the
Open
Data
and
Big
Data
Era
-‐
Roles
and
ResponsibiliSes
of
Stewards
and
Other
Major
Product
Stakeholders.
D.-‐Lib
Magazine,
22.
doi:
10.1045/may2016-‐peng.
[Available
online
at:
hYp://dlib.org/dlib/may16/peng/05peng.html.]