SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
A	
  New	
  Paradigm	
  for	
  Ensuring	
  and	
  Improving	
  	
  
Dataset	
  Quality	
  and	
  Usability	
  	
  
–	
  Roles	
  and	
  Responsibili?es	
  of	
  Stewards	
  and	
  Other	
  Major	
  Product	
  Stakeholders	
  
Ge	
  Peng	
  	
  
NOAA’s	
  Coopera?ve	
  Ins?tute	
  for	
  Climate	
  and	
  Satellite	
  –	
  North	
  Carolina	
  (CICS-­‐NC)	
  	
  
NC	
  State	
  University	
  and	
  NOAA’s	
  Na?onal	
  Centers	
  for	
  Environmental	
  Informa?on	
  (NCEI)	
  
	
  
	
  
	
  
	
  
In	
  Collabora?on	
  with	
  
Nancy	
  Ritchey,	
  Kenneth	
  Casey,	
  Edward	
  Kearns,	
  Jeffrey	
  PriveQe,	
  	
  
Drew	
  Saunders,	
  Philip	
  Jones,	
  Tom	
  Maycock,	
  and	
  Steve	
  Ansari	
  
	
  
Version	
  20160515	
  	
  	
  CC-­‐BY-­‐SA	
  4.0	
  	
  	
  	
  POC:	
  gpeng@cicsnc.org	
  
What	
  Is	
  Data	
  Quality?	
  	
  
Who	
  Should	
  Care?	
  
Ø How	
  good	
  or	
  bad	
  a	
  data	
  product	
  is.	
  	
  
Ø All	
  Key	
  Players	
  -­‐	
  everyone	
  who	
  develops,	
  creates,	
  produces,	
  
stewards,	
  manages,	
  publishes,	
  or	
  serves	
  the	
  product	
  	
  	
  
Ø Other	
  major	
  product	
  stakeholders	
  (including	
  sponsors,	
  power	
  
users,	
  and	
  management)	
  
Ø General	
  users	
  	
  
What	
  Is	
  Data	
  Usability?	
  	
  
Ø How	
  easy	
  or	
  hard	
  a	
  data	
  product	
  is	
  understood	
  and	
  used.	
  	
  
Quality	
  -­‐	
  How	
  good	
  or	
  bad	
  something	
  is	
  	
  
•  Product	
  quality	
  –	
  degree	
  to	
  which	
  the	
  data	
  product	
  is	
  produced	
  and	
  described	
  correctly.	
  
•  Stewardship	
  quality	
  –	
  degree	
  to	
  which	
  the	
  data	
  product	
  was	
  being	
  preserved	
  and	
  cared	
  for	
  	
  
properly.	
  
Steward	
  -­‐	
  A	
  person	
  managing	
  or	
  caring	
  for	
  other’s	
  assets	
  
•  A	
  role	
  in	
  incorporaSng	
  processes,	
  policies,	
  guidelines	
  and	
  responsibiliSes	
  to	
  
administering	
  organizaSon’s	
  data	
  in	
  compliance	
  with	
  policy	
  and/or	
  regulatory	
  
obligaSons.	
  	
  
•  Requires	
  expert	
  domain	
  knowledge	
  and	
  general	
  knowledge	
  for	
  relevant	
  domains	
  and	
  
intenSon	
  to	
  ensure	
  and	
  improve	
  the	
  stewardship	
  of	
  other	
  people’s	
  datasets.	
  
§  Data	
  steward:	
  	
  
Ø  A	
  role	
  responsible	
  for	
  managing	
  both	
  dataset	
  and	
  metadata	
  	
  
§  Scien?fic	
  steward:	
  	
  
Ø  A	
  role	
  responsible	
  for	
  managing	
  data	
  quality	
  and	
  usability	
  	
  
§  Technology	
  steward:	
  	
  
Ø  A	
  role	
  responsible	
  for	
  managing	
  tools	
  and	
  systems	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (	
  Source:	
  Chisholm	
  2014;	
  	
  Peng	
  et	
  al.	
  2016)	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
•  Stewards	
  are	
  stewardship	
  roles	
  assigned	
  to	
  domain	
  subject	
  
maYer	
  experts	
  (SMEs)	
  who	
  have	
  general	
  knowledge	
  of	
  other	
  
relevant	
  domains.	
  
§  SMEs	
  are	
  people	
  with	
  extensive	
  knowledge	
  and	
  experiences	
  in	
  their	
  
local	
  domains.	
  	
  
§  The	
  role	
  of	
  SME	
  is	
  gained	
  and	
  not	
  assigned.	
  
•  Stewards	
  need	
  to	
  have	
  a	
  mindset	
  of	
  caring	
  for	
  other	
  people’s	
  
asset	
  (e.g.,	
  data	
  products)	
  and	
  are	
  capable	
  of	
  communicaSng	
  
within	
  and	
  across-­‐domains.	
  
•  One	
  person	
  could	
  be	
  assigned	
  more	
  than	
  one	
  stewardship	
  role.	
  	
  
	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  (Source:	
  Chisholm	
  2014;	
  Peng	
  et	
  al.	
  2016)	
  
Something	
  about	
  Stewards	
  
Ensuring	
  and	
  improving	
  data	
  quality	
  and	
  usability	
  	
  
throughout	
  the	
  life	
  cycle	
  of	
  a	
  dataset	
  
•  Old	
  days	
  –	
  one	
  person	
  
Ø  Primarily	
  done	
  by	
  data	
  producers	
  
Ø  Usability,	
  i.e.,	
  easy	
  to	
  use,	
  is	
  usually	
  not	
  taking	
  into	
  consideraSon	
  
Ø  InformaSon	
  about	
  procedures	
  or	
  pracSces	
  on	
  data	
  quality	
  are	
  hard	
  to	
  come	
  by	
  
Ø  Data	
  choice	
  is	
  limited	
  for	
  users	
  and	
  users	
  have	
  no	
  choice	
  but	
  to	
  wait	
  for	
  the	
  
release	
  of	
  the	
  dataset	
  
•  Nowadays	
  –	
  an	
  integrated	
  team	
  
Ø  Need	
  to	
  be	
  more	
  scalable	
  
Ø  Need	
  to	
  be	
  more	
  integrated	
  
Ø  Need	
  to	
  be	
  more	
  Smely	
  	
  
Ø  InformaSon	
  about	
  methods	
  and	
  results	
  need	
  to	
  be	
  
§  readily	
  available;	
  in	
  an	
  easy	
  to	
  understand	
  and	
  interoperable	
  format	
  
Ø  Users	
  have	
  many	
  choices	
  and	
  they	
  do	
  not	
  have	
  to	
  wait	
  for	
  or	
  use	
  your	
  data	
  
A	
  Quality	
  Example	
  We	
  Can	
  All	
  Relate	
  To	
  
Product	
  Quality	
   Stewardship	
  Quality	
   Use/Service	
  Quality	
  
Data	
  Producers	
  
•  Define/Create/Obtain	
  
Stewards	
  
•  Maintain/Preserve/Document/Access	
  
Data	
  Providers/Users	
  
•  Use/Service	
  
Food	
  Quality	
  
•  Requirements	
  
•  Produc?on/distribu?on	
  
•  Info	
  on	
  product	
  specs	
  
•  Storage,	
  transport,	
  re-­‐distribu?on	
  
•  Product	
  packing/labels	
  
•  Cooking	
  instruc?on	
  
•  Stores/restaurants/homes	
  
•  Derived	
  products	
  -­‐-­‐-­‐>	
  	
  
•  Timeliness/Presenta?on	
  
Data	
  Quality	
  
Producers	
   Middlemen	
   Providers	
  
A	
  shared	
  responsibility	
  in	
  ensuring	
  quality!	
  
So	
  We	
  All	
  Have	
  To	
  Talk	
  To	
  Each	
  Other	
  –	
  That	
  Is	
  The	
  Problem!	
  	
  
	
  (another	
  example:	
  adap?ng	
  ISO	
  OAIS	
  RM	
  for	
  long-­‐term	
  preserva?on)	
  
Func?onal	
  
En??es	
  
Data	
  
Produc?on	
  
Roles	
  
Ingest	
  
Metadata	
  	
  
Documenta?on	
  
Archive	
  
Dissemina?on	
  	
  
Access	
  
Service	
  
Data	
  Use	
  
Data	
  
Producer	
  
Metadata	
  Specialist	
  
Access	
  POC	
  
Science	
  POC	
  
User	
  Service	
  POC	
  
	
  Access	
  Specialist	
  
User	
  Service	
  POC	
  
Archive	
  POC	
  
Science	
  POC	
  
Data	
  
Consumer	
  
	
  Stakeholders	
  including	
  Sponsors	
  and	
  Management	
  
•  We	
  do	
  not	
  talk	
  in	
  the	
  same	
  language	
  
•  We	
  do	
  not	
  communicate	
  in	
  the	
  same	
  channel	
  
Potential interfaces in knowledge domains
Why	
  Do	
  We	
  Need	
  to	
  Define	
  Roles	
  of	
  Stewards?	
  
Data	
  Producer	
   Metadata	
  POC	
  
Adap?ng	
  ISO	
  Data	
  Quality	
  (DQ)	
  Metadata	
  Standard	
  	
  
Why	
  Do	
  We	
  Need	
  to	
  Define	
  Roles	
  of	
  Stewards?	
  
Stewards	
  help	
  capture	
  and	
  convey	
  DQ	
  info	
  into	
  the	
  context	
  of	
  DQ	
  metadata!	
  
Data	
  Producer	
   Metadata	
  POC	
  
Adap?ng	
  ISO	
  Data	
  Quality	
  (DQ)	
  Metadata	
  Standard	
  	
  
Why	
  Do	
  We	
  Need	
  to	
  Define	
  Responsibili?es	
  of	
  	
  
Key	
  Players	
  and	
  Stakeholders?	
  
Data	
  Producer	
  
Program	
  Managers	
  
Metadata	
  POC	
  
Stewardship	
  Management	
  
Adap?ng	
  ISO	
  Data	
  Quality	
  (DQ)	
  Metadata	
  Standards	
  	
  
Ø  Crea?ng	
  and	
  improving	
  DQ	
  metadata	
  and	
  documenta?ons	
  is	
  beyond	
  the	
  current	
  job	
  
scope	
  and	
  exper?se	
  of	
  data	
  providers	
  and	
  metadata	
  curators.	
  
Ø  Defining	
  responsibili?es	
  will	
  help	
  facilitate	
  the	
  process!	
  
Ø  It	
  will	
  help	
  raise	
  the	
  awareness	
  and	
  improve	
  requirements	
  of	
  data	
  quality	
  and	
  usability.	
  
You	
  are	
  responsible	
  for	
  
data	
  quality	
  of	
  your	
  data.	
  
So	
  you	
  should	
  provide	
  us	
  
with	
  the	
  DQ	
  metadata!	
  
You	
  are	
  responsible	
  for	
  
metadata.	
  You	
  should	
  
create	
  the	
  DQ	
  metadata	
  
yourself!	
  
 First	
  Step	
  in	
  Formalizing	
  Roles	
  and	
  	
  
High-­‐Level	
  Responsibili?es	
  	
  
Data	
  
Producer	
  
• Ensure	
  and	
  improve	
  Scien,fic	
  Quality	
  of	
  the	
  data	
  product	
  -­‐	
  
defining	
  and	
  documen?ng	
  data	
  product	
  accuracy,	
  precision,	
  
uncertainty	
  sources	
  and	
  es?mates	
  
• Ensure	
  Data	
  Quality	
  during	
  produc?on–	
  screening/assurance	
  
• Assess	
  and	
  improve	
  Data	
  Quality	
  –	
  verifica?on/valida?on	
  
• Ensure	
  Data	
  Integrity	
  –	
  crea?on/staging	
  
• Help	
  ensure	
  Preservability	
  -­‐	
  providing	
  informa?on	
  about	
  
data	
  product	
  (?me,	
  space,	
  size,	
  	
  variables,	
  etc.)	
  
• Ensure	
  Produc,on	
  Sustainability	
  
• Help	
  Ensure	
  Transparency	
  -­‐	
  providing	
  informa?on	
  on	
  data	
  
source,	
  algorithm	
  and	
  processing	
  steps,	
  and	
  error	
  es?mates/
sources	
  
• Ensure	
  and	
  improve	
  Data	
  Usability	
  -­‐	
  providing	
  informa?on	
  
about	
  the	
  product	
  (update	
  frequency,	
  latency,	
  	
  variable	
  
aQributes,	
  etc.)	
  and	
  guidance	
  on	
  data	
  use	
  
Roles	
   Responsibili?es	
   Within	
  the	
  context	
  of	
  	
  
ensuring	
  and	
  improving	
  	
  
dataset	
  quality	
  (DQ)	
  and	
  usability	
  
• Ensure	
  Data	
  Integrity	
  –	
  ingest	
  and	
  archive	
  
• Ensure	
  and	
  improve	
  Data	
  Provenance	
  and	
  Traceability	
  
• Improve	
  Data	
  Quality	
  metadata	
  
• Ensure	
  and	
  improve	
  archiving	
  requirements	
  
• Assess/improve	
  Data	
  Quality	
  –	
  Evalua?on/verifica?on	
  
• Promote	
  and	
  improve	
  Data	
  Usability	
  –	
  Characteriza?on	
  	
  
• Help	
  ensure	
  and	
  improve	
  Data	
  Quality	
  metadata	
  
• Ensure	
  and	
  improve	
  data	
  quality	
  and	
  usability	
  requirements	
  
• Ensure	
  Data	
  Integrity	
  –	
  ingest,	
  archive	
  retrieval,	
  data	
  access,	
  
and	
  file	
  system	
  and	
  technology	
  upgrade	
  
• Ensure	
  and	
  Improve	
  Data	
  Accessibility	
  and	
  Discoverability	
  
• Promote	
  and	
  improve	
  Data	
  Interoperability	
  
• Ensure	
  and	
  improve	
  sobware	
  and	
  system	
  requirements	
  
Data	
  
Steward	
  
Scien?fic	
  
Steward	
  
Technology	
  	
  
Steward	
  
Roles	
   Responsibili?es	
   Within	
  the	
  context	
  of	
  	
  
ensuring	
  and	
  improving	
  	
  
dataset	
  quality	
  (DQ)	
  and	
  usability	
  
End-­‐User	
  
• Request	
  Transparency	
  in	
  data	
  quality	
  procedures	
  and	
  prac?ces	
  
• Request	
  Provenance	
  of	
  the	
  data	
  product	
  
• Request	
  evalua?on	
  results	
  of	
  product,	
  stewardship,	
  and	
  service	
  
maturity	
  of	
  the	
  data	
  product	
  
• Provide	
  feedback	
  on	
  Quality	
  and	
  Usability	
  of	
  the	
  data	
  product	
  
Manager	
  
• Help	
  increase	
  awareness	
  of	
  Data	
  Quality	
  and	
  Usability	
  	
  
• Help	
  improve	
  data	
  quality	
  and	
  usability	
  requirements	
  
• Help	
  ensure	
  Data	
  Interoperability	
  	
  
Sponsor	
  
• Define	
  Data	
  Quality	
  and	
  Usability	
  requirements	
  
• Require	
  data	
  quality	
  oversight	
  and	
  monitoring	
  
• Encourage	
  Transparency	
  in	
  data	
  quality	
  procedures	
  and	
  prac?ces	
  
Data	
  
Distributor	
  
• Ensure	
  and	
  improve	
  Representa,on	
  of	
  data	
  quality	
  informa?on	
  
• Ensure	
  and	
  improve	
  Traceability	
  of	
  data	
  quality	
  informa?on	
  
• Ensure	
  user	
  feedback	
  
• Help	
  improve	
  data	
  quality	
  and	
  usability	
  requirements	
  
Roles	
   Responsibili?es	
   Within	
  the	
  context	
  of	
  	
  
ensuring	
  and	
  improving	
  	
  
dataset	
  quality	
  (DQ)	
  and	
  usability	
  
Data	
  
Originator	
  
•  Ensure and improve Scientific Quality of the data product - defining and
documenting data product accuracy, precision, uncertainty sources and estimates
•  Ensure Data Quality during production– screening/assurance
•  Assess and improve Data Quality – verification/validation
•  Ensure Data Integrity – creation/staging
•  Help ensure Preservability - providing information about data product (time, space,
size, variables, etc.)
•  Ensure Production Sustainability
•  Help Ensure Transparency - providing information on data source, algorithm and
processing steps, and error estimates/sources
•  Ensure and improve Data Usability - providing information about the product (update
frequency, latency, variable attributes, etc.) and guidance on data use
Data	
  
Steward	
  
•  Ensure Data Integrity – ingest and archive
•  Ensure and improve Data Provenance and Traceability
•  Improve Data Quality metadata
•  Ensure and improve archiving requirements
Technology	
  
Steward	
  
•  Ensure Data Integrity – ingest, archive retrieval, data access, and file system and
technology upgrade
•  Ensure and Improve Data Accessibility and Discoverability
•  Promote and improve Data Interoperability
•  Ensure and improve software and system requirements
Scien?fic	
  
Steward	
  
•  Assess/improve Data Quality – Evaluation/verification
•  Promote and improve Data Usability – Characterization
•  Help ensure and improve Data Quality metadata
•  Ensure and improve data quality and usability requirements
	
  
Documenta?on	
  
•  Capture	
  
•  Convey	
  
•  Be	
  traceable	
  
•  Be	
  transparent	
  
•  Be	
  machine	
  –
readable	
  
•  Be	
  human-­‐
understandable	
  
Quality	
  Ra?ng	
  
•  Assess	
  
•  Improve	
  
•  Be	
  transparent	
  
•  Be	
  quanSfiable	
  
•  Be	
  machine-­‐
readable	
  
•  Be	
  human-­‐
understandable	
  
•  Understandable	
  
info	
  for	
  users	
  
•  Ac?onable	
  info	
  
for	
  management	
  
•  Integrable	
  tags	
  
for	
  machines	
  
Roles	
   Responsibili?es	
  
One	
  person	
  may	
  wear	
  
several	
  hats!	
  
End-­‐User	
  
•  Request Transparency in data quality procedures and practices
•  Request Provenance of the data product
•  Request evaluation results of product, stewardship, and service maturity of the data
product
•  Provide feedback on Quality and Usability of the data product
Within	
  the	
  context	
  of	
  	
  
ensuring	
  and	
  improving	
  	
  
dataset	
  quality	
  (DQ)	
  and	
  usability	
  
Data	
  
Distributor	
  
•  Ensure and improve Representation of data quality information
•  Ensure and improve Traceability of data quality information
•  Ensure user feedback
•  Help improve data quality and usability requirements
Sponsor	
  
•  Define Data Quality and Usability requirements
•  Require data quality oversight and monitoring
•  Encourage Transparency in data quality procedures and practices
Manager	
  
•  Help increase awareness of Data Quality and Usability
•  Help improve data quality and usability requirements
•  Help ensure Data Interoperability
Version:	
  20160515	
  	
  	
  	
  
	
  CC-­‐BY-­‐SA	
  4.0	
  	
  	
  	
  	
  
POC:	
  gpeng@cicsnc.org	
  
 Take	
  Away	
  Messages	
  
•  Ensuring	
  data	
  quality	
  is	
  an	
  end-­‐to-­‐end	
  process	
  and	
  a	
  shared	
  responsibility	
  of	
  
all	
  key	
  players	
  (data	
  producers,	
  managers/stewards,	
  providers/publishers)	
  
and	
  other	
  major	
  stakeholders	
  (sponsors,	
  power	
  users,	
  and	
  management).	
  
•  Effec?ve	
  stewardship	
  of	
  scien?fic	
  data	
  requires:	
  	
  
§  Expert	
  domain	
  knowledge	
  in	
  data	
  management,	
  technology,	
  and	
  science	
  	
  
§  ConSnuous	
  oversight	
  from	
  all	
  stewards,	
  and	
  	
  
§  Open	
  and	
  conSnuous	
  communicaSon	
  among	
  key	
  players	
  and	
  stakeholders	
  
•  Defining	
  roles	
  and	
  responsibili?es	
  of	
  key	
  players	
  and	
  stakeholders	
  will	
  help	
  
facilitate	
  the	
  process	
  of	
  	
  
§  Ensuring	
  and	
  improving	
  dataset	
  quality	
  and	
  usability	
  
§  Capturing	
  and	
  conveying	
  informaSon	
  about	
  data	
  quality	
  
Acknowledgement	
  
The	
  idea	
  of	
  using	
  food	
  quality	
  for	
  an	
  analog	
  of	
  data	
  quality	
  originated	
  from	
  
one	
  of	
  the	
  family	
  dinner	
  table	
  discussions.	
  I	
  thank	
  my	
  family	
  for	
  beneficial	
  
discussions	
  that	
  followed,	
  for	
  allowing	
  me	
  to	
  use	
  them	
  as	
  “Guinea	
  Pigs”,	
  and	
  
for	
  their	
  helpful	
  comments!	
  
To	
  cite	
  this	
  presenta?on	
  
Peng,	
  G.,	
  2015:	
  A	
  New	
  Paradigm	
  for	
  Ensuring	
  and	
  Improving	
  Dataset	
  Quality	
  and	
  
Usability	
  –	
  Roles	
  and	
  ResponsibiliSes	
  of	
  Stewards	
  and	
  Other	
  Major	
  Product	
  
Stakeholders.	
  Updated:	
  May	
  15,	
  2016.	
  Slideshare.	
  Access	
  date:	
  mm/dd/yyyy.	
  
View	
  Latest	
  Version	
  of	
  This	
  Presenta?on	
  
hYp://Snyurl.com/RolesRs-­‐DQU	
  
Related	
  Presenta?on:	
  Stewards	
  –	
  Knowledge	
  and	
  CommunicaSon	
  Hub	
  	
  
hYp://Snyurl.com/Stewards-­‐Hub	
  
Image	
  source	
  
hYp://www.busyinbrooklyn.com/wp-­‐content/uploads/2013/09/USDA_GRADES.jpg;	
  	
  
hYp://www.kaleelbrothers.com/images/Fresh-­‐Produce.png;	
  	
  
hYp://www.pgabeef.com/images/storage_chart.gif;	
  	
  
hYps://www.colorado.gov/pacific/sites/default/files/u/6556/Egg-­‐Grading.JPG;	
  	
  
hYp://www.hickmanseggs.com/w3/wp-­‐content/uploads/2014/04/egg_size.jpg;	
  
hYps://c2.staScflickr.com/8/7159/6801729225_82e823a5d6_z.jpg;	
  	
  
hYp://www.thepoultrysite.com/arScles/contents/09-­‐12CobbChicks1.jpg;	
  	
  
hYp://www.topratedsteakhouses.com/wp-­‐content/uploads/2013/12/Grilled-­‐Beef-­‐with-­‐Tomato.jpg;	
  
hYp://cdn2.hubspot.net/hub/66214/file-­‐15223310-­‐jpg/images/wearingmanyhats.jpg;	
  	
  	
  	
  
References	
  
Chisholm,	
  M.,	
  2014:	
  Data	
  Stewards	
  versus	
  Subject	
  MaYer	
  Experts	
  and	
  Data	
  	
  Managers.	
  
Informa/on	
  Management.	
  Version:	
  May	
  28,	
  2014.	
  [Available	
  online	
  at:	
  hYp://
www.informaSon-­‐management.com/news/news/data-­‐stewards-­‐versus-­‐subject-­‐
maYer-­‐experts-­‐and-­‐data-­‐managers-­‐10025704-­‐1.html.]	
  	
  	
  	
  
Peng,	
  G.,	
  N.	
  A.	
  Ritchey,	
  K.	
  S.	
  Casey,	
  E.	
  J.	
  Kearns,	
  J.	
  L.	
  PriveYe,	
  D.	
  Saunders,	
  P.	
  Jones,	
  T.	
  
Maycock,	
  and	
  S.	
  Ansari,	
  2016:	
  ScienSfic	
  Stewardship	
  in	
  the	
  Open	
  Data	
  and	
  Big	
  Data	
  Era	
  
-­‐	
  Roles	
  and	
  ResponsibiliSes	
  of	
  Stewards	
  and	
  Other	
  Major	
  Product	
  Stakeholders.	
  D.-­‐Lib	
  
Magazine,	
  22.	
  doi:	
  10.1045/may2016-­‐peng.	
  [Available	
  online	
  at:	
  
hYp://dlib.org/dlib/may16/peng/05peng.html.]	
  	
  

Weitere ähnliche Inhalte

Ähnlich wie New Paradigm for Ensuring and Improving Data Quality and Usability

Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Mojtaba Lotfaliany
 
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementCurlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementNick Lynch
 
Elements of a Good Information System
Elements of a Good Information SystemElements of a Good Information System
Elements of a Good Information SystemMark John Ignacio
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data qualityKhaled Mosharraf
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesLouise Corti
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesDATAVERSITY
 
Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity Logikcull.com
 
Sharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical viewSharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical viewOpen City Foundation
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data qualityIUPUI
 
Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...DRIscience
 
SIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALSIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALJohn Koch
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best PracticesChristopher Eaker
 
Digital Preservation - Manage and Provide Access
Digital Preservation - Manage and Provide AccessDigital Preservation - Manage and Provide Access
Digital Preservation - Manage and Provide AccessMichaelPaulmeno
 
Ashley Ohmann--Data Governance Final 011315
Ashley Ohmann--Data Governance Final 011315Ashley Ohmann--Data Governance Final 011315
Ashley Ohmann--Data Governance Final 011315Ashley Ohmann
 
How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...Soumodeep Nanee Kundu
 
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012TEST Huddle
 
Love Your Code Workshop Introduction_Corti_Engeli
Love Your Code Workshop Introduction_Corti_EngeliLove Your Code Workshop Introduction_Corti_Engeli
Love Your Code Workshop Introduction_Corti_EngeliLouise Corti
 

Ähnlich wie New Paradigm for Ensuring and Improving Data Quality and Usability (20)

Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing Best Practice in Data Management and Sharing
Best Practice in Data Management and Sharing
 
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge ManagementCurlew Research Brussels 2014 Electronic Data & Knowledge Management
Curlew Research Brussels 2014 Electronic Data & Knowledge Management
 
Elements of a Good Information System
Elements of a Good Information SystemElements of a Good Information System
Elements of a Good Information System
 
Foundation of data quality
Foundation of data qualityFoundation of data quality
Foundation of data quality
 
Engaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciencesEngaging with students and researchers: the case of the social sciences
Engaging with students and researchers: the case of the social sciences
 
Data-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success StoriesData-Ed Webinar: Data Quality Success Stories
Data-Ed Webinar: Data Quality Success Stories
 
Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity Custodian Interviews - How to Leverage a Valuable Opportunity
Custodian Interviews - How to Leverage a Valuable Opportunity
 
Sharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical viewSharon Dawes (CTG Albany) Open data quality: a practical view
Sharon Dawes (CTG Albany) Open data quality: a practical view
 
Ensuring data quality
Ensuring data qualityEnsuring data quality
Ensuring data quality
 
Intro to Data Management
Intro to Data ManagementIntro to Data Management
Intro to Data Management
 
Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...Establishing best practices to improve usefulness and usability of web interf...
Establishing best practices to improve usefulness and usability of web interf...
 
Resume_17Apr15
Resume_17Apr15Resume_17Apr15
Resume_17Apr15
 
SIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINALSIAS Bio-IT Conference_FINAL
SIAS Bio-IT Conference_FINAL
 
Data Management Best Practices
Data Management Best PracticesData Management Best Practices
Data Management Best Practices
 
Digital Preservation - Manage and Provide Access
Digital Preservation - Manage and Provide AccessDigital Preservation - Manage and Provide Access
Digital Preservation - Manage and Provide Access
 
Ashley Ohmann--Data Governance Final 011315
Ashley Ohmann--Data Governance Final 011315Ashley Ohmann--Data Governance Final 011315
Ashley Ohmann--Data Governance Final 011315
 
How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...How do you assess the quality and reliability of data sources in data analysi...
How do you assess the quality and reliability of data sources in data analysi...
 
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
Ray Scott - Agile Solutions – Leading with Test Data Management - EuroSTAR 2012
 
Love Your Code Workshop Introduction_Corti_Engeli
Love Your Code Workshop Introduction_Corti_EngeliLove Your Code Workshop Introduction_Corti_Engeli
Love Your Code Workshop Introduction_Corti_Engeli
 
ROER4D Open Data Initiative
ROER4D Open Data InitiativeROER4D Open Data Initiative
ROER4D Open Data Initiative
 

Mehr von Ge Peng

Improving Stewardship of Scientific Data Through Use of a Maturity Matrix
Improving Stewardship of Scientific Data Through Use of a Maturity MatrixImproving Stewardship of Scientific Data Through Use of a Maturity Matrix
Improving Stewardship of Scientific Data Through Use of a Maturity MatrixGe Peng
 
Service Tools and Social Media Data Sharing Use Case
Service Tools and Social Media Data Sharing Use CaseService Tools and Social Media Data Sharing Use Case
Service Tools and Social Media Data Sharing Use CaseGe Peng
 
Non Functional Requirements for Climate Data Records
Non Functional Requirements for Climate Data RecordsNon Functional Requirements for Climate Data Records
Non Functional Requirements for Climate Data RecordsGe Peng
 
Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695Ge Peng
 
Scientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixScientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixGe Peng
 
Peng etal UPQ_AMS2014_P332
Peng etal UPQ_AMS2014_P332Peng etal UPQ_AMS2014_P332
Peng etal UPQ_AMS2014_P332Ge Peng
 

Mehr von Ge Peng (6)

Improving Stewardship of Scientific Data Through Use of a Maturity Matrix
Improving Stewardship of Scientific Data Through Use of a Maturity MatrixImproving Stewardship of Scientific Data Through Use of a Maturity Matrix
Improving Stewardship of Scientific Data Through Use of a Maturity Matrix
 
Service Tools and Social Media Data Sharing Use Case
Service Tools and Social Media Data Sharing Use CaseService Tools and Social Media Data Sharing Use Case
Service Tools and Social Media Data Sharing Use Case
 
Non Functional Requirements for Climate Data Records
Non Functional Requirements for Climate Data RecordsNon Functional Requirements for Climate Data Records
Non Functional Requirements for Climate Data Records
 
Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695Peng Privette SMM_AMS2014_P695
Peng Privette SMM_AMS2014_P695
 
Scientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixScientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity Matrix
 
Peng etal UPQ_AMS2014_P332
Peng etal UPQ_AMS2014_P332Peng etal UPQ_AMS2014_P332
Peng etal UPQ_AMS2014_P332
 

Kürzlich hochgeladen

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Onlineanilsa9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Kürzlich hochgeladen (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Chinhat Lucknow best sexual service Online
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

New Paradigm for Ensuring and Improving Data Quality and Usability

  • 1. A  New  Paradigm  for  Ensuring  and  Improving     Dataset  Quality  and  Usability     –  Roles  and  Responsibili?es  of  Stewards  and  Other  Major  Product  Stakeholders   Ge  Peng     NOAA’s  Coopera?ve  Ins?tute  for  Climate  and  Satellite  –  North  Carolina  (CICS-­‐NC)     NC  State  University  and  NOAA’s  Na?onal  Centers  for  Environmental  Informa?on  (NCEI)           In  Collabora?on  with   Nancy  Ritchey,  Kenneth  Casey,  Edward  Kearns,  Jeffrey  PriveQe,     Drew  Saunders,  Philip  Jones,  Tom  Maycock,  and  Steve  Ansari     Version  20160515      CC-­‐BY-­‐SA  4.0        POC:  gpeng@cicsnc.org  
  • 2. What  Is  Data  Quality?     Who  Should  Care?   Ø How  good  or  bad  a  data  product  is.     Ø All  Key  Players  -­‐  everyone  who  develops,  creates,  produces,   stewards,  manages,  publishes,  or  serves  the  product       Ø Other  major  product  stakeholders  (including  sponsors,  power   users,  and  management)   Ø General  users     What  Is  Data  Usability?     Ø How  easy  or  hard  a  data  product  is  understood  and  used.    
  • 3. Quality  -­‐  How  good  or  bad  something  is     •  Product  quality  –  degree  to  which  the  data  product  is  produced  and  described  correctly.   •  Stewardship  quality  –  degree  to  which  the  data  product  was  being  preserved  and  cared  for     properly.   Steward  -­‐  A  person  managing  or  caring  for  other’s  assets   •  A  role  in  incorporaSng  processes,  policies,  guidelines  and  responsibiliSes  to   administering  organizaSon’s  data  in  compliance  with  policy  and/or  regulatory   obligaSons.     •  Requires  expert  domain  knowledge  and  general  knowledge  for  relevant  domains  and   intenSon  to  ensure  and  improve  the  stewardship  of  other  people’s  datasets.   §  Data  steward:     Ø  A  role  responsible  for  managing  both  dataset  and  metadata     §  Scien?fic  steward:     Ø  A  role  responsible  for  managing  data  quality  and  usability     §  Technology  steward:     Ø  A  role  responsible  for  managing  tools  and  systems                                                                                                                                                                                        (  Source:  Chisholm  2014;    Peng  et  al.  2016)                                    
  • 4. •  Stewards  are  stewardship  roles  assigned  to  domain  subject   maYer  experts  (SMEs)  who  have  general  knowledge  of  other   relevant  domains.   §  SMEs  are  people  with  extensive  knowledge  and  experiences  in  their   local  domains.     §  The  role  of  SME  is  gained  and  not  assigned.   •  Stewards  need  to  have  a  mindset  of  caring  for  other  people’s   asset  (e.g.,  data  products)  and  are  capable  of  communicaSng   within  and  across-­‐domains.   •  One  person  could  be  assigned  more  than  one  stewardship  role.                                                                                                                          (Source:  Chisholm  2014;  Peng  et  al.  2016)   Something  about  Stewards  
  • 5. Ensuring  and  improving  data  quality  and  usability     throughout  the  life  cycle  of  a  dataset   •  Old  days  –  one  person   Ø  Primarily  done  by  data  producers   Ø  Usability,  i.e.,  easy  to  use,  is  usually  not  taking  into  consideraSon   Ø  InformaSon  about  procedures  or  pracSces  on  data  quality  are  hard  to  come  by   Ø  Data  choice  is  limited  for  users  and  users  have  no  choice  but  to  wait  for  the   release  of  the  dataset   •  Nowadays  –  an  integrated  team   Ø  Need  to  be  more  scalable   Ø  Need  to  be  more  integrated   Ø  Need  to  be  more  Smely     Ø  InformaSon  about  methods  and  results  need  to  be   §  readily  available;  in  an  easy  to  understand  and  interoperable  format   Ø  Users  have  many  choices  and  they  do  not  have  to  wait  for  or  use  your  data  
  • 6. A  Quality  Example  We  Can  All  Relate  To  
  • 7. Product  Quality   Stewardship  Quality   Use/Service  Quality   Data  Producers   •  Define/Create/Obtain   Stewards   •  Maintain/Preserve/Document/Access   Data  Providers/Users   •  Use/Service   Food  Quality   •  Requirements   •  Produc?on/distribu?on   •  Info  on  product  specs   •  Storage,  transport,  re-­‐distribu?on   •  Product  packing/labels   •  Cooking  instruc?on   •  Stores/restaurants/homes   •  Derived  products  -­‐-­‐-­‐>     •  Timeliness/Presenta?on   Data  Quality   Producers   Middlemen   Providers   A  shared  responsibility  in  ensuring  quality!  
  • 8. So  We  All  Have  To  Talk  To  Each  Other  –  That  Is  The  Problem!      (another  example:  adap?ng  ISO  OAIS  RM  for  long-­‐term  preserva?on)   Func?onal   En??es   Data   Produc?on   Roles   Ingest   Metadata     Documenta?on   Archive   Dissemina?on     Access   Service   Data  Use   Data   Producer   Metadata  Specialist   Access  POC   Science  POC   User  Service  POC    Access  Specialist   User  Service  POC   Archive  POC   Science  POC   Data   Consumer    Stakeholders  including  Sponsors  and  Management   •  We  do  not  talk  in  the  same  language   •  We  do  not  communicate  in  the  same  channel   Potential interfaces in knowledge domains
  • 9. Why  Do  We  Need  to  Define  Roles  of  Stewards?   Data  Producer   Metadata  POC   Adap?ng  ISO  Data  Quality  (DQ)  Metadata  Standard    
  • 10. Why  Do  We  Need  to  Define  Roles  of  Stewards?   Stewards  help  capture  and  convey  DQ  info  into  the  context  of  DQ  metadata!   Data  Producer   Metadata  POC   Adap?ng  ISO  Data  Quality  (DQ)  Metadata  Standard    
  • 11. Why  Do  We  Need  to  Define  Responsibili?es  of     Key  Players  and  Stakeholders?   Data  Producer   Program  Managers   Metadata  POC   Stewardship  Management   Adap?ng  ISO  Data  Quality  (DQ)  Metadata  Standards     Ø  Crea?ng  and  improving  DQ  metadata  and  documenta?ons  is  beyond  the  current  job   scope  and  exper?se  of  data  providers  and  metadata  curators.   Ø  Defining  responsibili?es  will  help  facilitate  the  process!   Ø  It  will  help  raise  the  awareness  and  improve  requirements  of  data  quality  and  usability.   You  are  responsible  for   data  quality  of  your  data.   So  you  should  provide  us   with  the  DQ  metadata!   You  are  responsible  for   metadata.  You  should   create  the  DQ  metadata   yourself!  
  • 12.  First  Step  in  Formalizing  Roles  and     High-­‐Level  Responsibili?es    
  • 13. Data   Producer   • Ensure  and  improve  Scien,fic  Quality  of  the  data  product  -­‐   defining  and  documen?ng  data  product  accuracy,  precision,   uncertainty  sources  and  es?mates   • Ensure  Data  Quality  during  produc?on–  screening/assurance   • Assess  and  improve  Data  Quality  –  verifica?on/valida?on   • Ensure  Data  Integrity  –  crea?on/staging   • Help  ensure  Preservability  -­‐  providing  informa?on  about   data  product  (?me,  space,  size,    variables,  etc.)   • Ensure  Produc,on  Sustainability   • Help  Ensure  Transparency  -­‐  providing  informa?on  on  data   source,  algorithm  and  processing  steps,  and  error  es?mates/ sources   • Ensure  and  improve  Data  Usability  -­‐  providing  informa?on   about  the  product  (update  frequency,  latency,    variable   aQributes,  etc.)  and  guidance  on  data  use   Roles   Responsibili?es   Within  the  context  of     ensuring  and  improving     dataset  quality  (DQ)  and  usability  
  • 14. • Ensure  Data  Integrity  –  ingest  and  archive   • Ensure  and  improve  Data  Provenance  and  Traceability   • Improve  Data  Quality  metadata   • Ensure  and  improve  archiving  requirements   • Assess/improve  Data  Quality  –  Evalua?on/verifica?on   • Promote  and  improve  Data  Usability  –  Characteriza?on     • Help  ensure  and  improve  Data  Quality  metadata   • Ensure  and  improve  data  quality  and  usability  requirements   • Ensure  Data  Integrity  –  ingest,  archive  retrieval,  data  access,   and  file  system  and  technology  upgrade   • Ensure  and  Improve  Data  Accessibility  and  Discoverability   • Promote  and  improve  Data  Interoperability   • Ensure  and  improve  sobware  and  system  requirements   Data   Steward   Scien?fic   Steward   Technology     Steward   Roles   Responsibili?es   Within  the  context  of     ensuring  and  improving     dataset  quality  (DQ)  and  usability  
  • 15. End-­‐User   • Request  Transparency  in  data  quality  procedures  and  prac?ces   • Request  Provenance  of  the  data  product   • Request  evalua?on  results  of  product,  stewardship,  and  service   maturity  of  the  data  product   • Provide  feedback  on  Quality  and  Usability  of  the  data  product   Manager   • Help  increase  awareness  of  Data  Quality  and  Usability     • Help  improve  data  quality  and  usability  requirements   • Help  ensure  Data  Interoperability     Sponsor   • Define  Data  Quality  and  Usability  requirements   • Require  data  quality  oversight  and  monitoring   • Encourage  Transparency  in  data  quality  procedures  and  prac?ces   Data   Distributor   • Ensure  and  improve  Representa,on  of  data  quality  informa?on   • Ensure  and  improve  Traceability  of  data  quality  informa?on   • Ensure  user  feedback   • Help  improve  data  quality  and  usability  requirements   Roles   Responsibili?es   Within  the  context  of     ensuring  and  improving     dataset  quality  (DQ)  and  usability  
  • 16. Data   Originator   •  Ensure and improve Scientific Quality of the data product - defining and documenting data product accuracy, precision, uncertainty sources and estimates •  Ensure Data Quality during production– screening/assurance •  Assess and improve Data Quality – verification/validation •  Ensure Data Integrity – creation/staging •  Help ensure Preservability - providing information about data product (time, space, size, variables, etc.) •  Ensure Production Sustainability •  Help Ensure Transparency - providing information on data source, algorithm and processing steps, and error estimates/sources •  Ensure and improve Data Usability - providing information about the product (update frequency, latency, variable attributes, etc.) and guidance on data use Data   Steward   •  Ensure Data Integrity – ingest and archive •  Ensure and improve Data Provenance and Traceability •  Improve Data Quality metadata •  Ensure and improve archiving requirements Technology   Steward   •  Ensure Data Integrity – ingest, archive retrieval, data access, and file system and technology upgrade •  Ensure and Improve Data Accessibility and Discoverability •  Promote and improve Data Interoperability •  Ensure and improve software and system requirements Scien?fic   Steward   •  Assess/improve Data Quality – Evaluation/verification •  Promote and improve Data Usability – Characterization •  Help ensure and improve Data Quality metadata •  Ensure and improve data quality and usability requirements   Documenta?on   •  Capture   •  Convey   •  Be  traceable   •  Be  transparent   •  Be  machine  – readable   •  Be  human-­‐ understandable   Quality  Ra?ng   •  Assess   •  Improve   •  Be  transparent   •  Be  quanSfiable   •  Be  machine-­‐ readable   •  Be  human-­‐ understandable   •  Understandable   info  for  users   •  Ac?onable  info   for  management   •  Integrable  tags   for  machines   Roles   Responsibili?es   One  person  may  wear   several  hats!   End-­‐User   •  Request Transparency in data quality procedures and practices •  Request Provenance of the data product •  Request evaluation results of product, stewardship, and service maturity of the data product •  Provide feedback on Quality and Usability of the data product Within  the  context  of     ensuring  and  improving     dataset  quality  (DQ)  and  usability   Data   Distributor   •  Ensure and improve Representation of data quality information •  Ensure and improve Traceability of data quality information •  Ensure user feedback •  Help improve data quality and usability requirements Sponsor   •  Define Data Quality and Usability requirements •  Require data quality oversight and monitoring •  Encourage Transparency in data quality procedures and practices Manager   •  Help increase awareness of Data Quality and Usability •  Help improve data quality and usability requirements •  Help ensure Data Interoperability Version:  20160515          CC-­‐BY-­‐SA  4.0           POC:  gpeng@cicsnc.org  
  • 17.  Take  Away  Messages   •  Ensuring  data  quality  is  an  end-­‐to-­‐end  process  and  a  shared  responsibility  of   all  key  players  (data  producers,  managers/stewards,  providers/publishers)   and  other  major  stakeholders  (sponsors,  power  users,  and  management).   •  Effec?ve  stewardship  of  scien?fic  data  requires:     §  Expert  domain  knowledge  in  data  management,  technology,  and  science     §  ConSnuous  oversight  from  all  stewards,  and     §  Open  and  conSnuous  communicaSon  among  key  players  and  stakeholders   •  Defining  roles  and  responsibili?es  of  key  players  and  stakeholders  will  help   facilitate  the  process  of     §  Ensuring  and  improving  dataset  quality  and  usability   §  Capturing  and  conveying  informaSon  about  data  quality  
  • 18. Acknowledgement   The  idea  of  using  food  quality  for  an  analog  of  data  quality  originated  from   one  of  the  family  dinner  table  discussions.  I  thank  my  family  for  beneficial   discussions  that  followed,  for  allowing  me  to  use  them  as  “Guinea  Pigs”,  and   for  their  helpful  comments!   To  cite  this  presenta?on   Peng,  G.,  2015:  A  New  Paradigm  for  Ensuring  and  Improving  Dataset  Quality  and   Usability  –  Roles  and  ResponsibiliSes  of  Stewards  and  Other  Major  Product   Stakeholders.  Updated:  May  15,  2016.  Slideshare.  Access  date:  mm/dd/yyyy.   View  Latest  Version  of  This  Presenta?on   hYp://Snyurl.com/RolesRs-­‐DQU   Related  Presenta?on:  Stewards  –  Knowledge  and  CommunicaSon  Hub     hYp://Snyurl.com/Stewards-­‐Hub  
  • 19. Image  source   hYp://www.busyinbrooklyn.com/wp-­‐content/uploads/2013/09/USDA_GRADES.jpg;     hYp://www.kaleelbrothers.com/images/Fresh-­‐Produce.png;     hYp://www.pgabeef.com/images/storage_chart.gif;     hYps://www.colorado.gov/pacific/sites/default/files/u/6556/Egg-­‐Grading.JPG;     hYp://www.hickmanseggs.com/w3/wp-­‐content/uploads/2014/04/egg_size.jpg;   hYps://c2.staScflickr.com/8/7159/6801729225_82e823a5d6_z.jpg;     hYp://www.thepoultrysite.com/arScles/contents/09-­‐12CobbChicks1.jpg;     hYp://www.topratedsteakhouses.com/wp-­‐content/uploads/2013/12/Grilled-­‐Beef-­‐with-­‐Tomato.jpg;   hYp://cdn2.hubspot.net/hub/66214/file-­‐15223310-­‐jpg/images/wearingmanyhats.jpg;         References   Chisholm,  M.,  2014:  Data  Stewards  versus  Subject  MaYer  Experts  and  Data    Managers.   Informa/on  Management.  Version:  May  28,  2014.  [Available  online  at:  hYp:// www.informaSon-­‐management.com/news/news/data-­‐stewards-­‐versus-­‐subject-­‐ maYer-­‐experts-­‐and-­‐data-­‐managers-­‐10025704-­‐1.html.]         Peng,  G.,  N.  A.  Ritchey,  K.  S.  Casey,  E.  J.  Kearns,  J.  L.  PriveYe,  D.  Saunders,  P.  Jones,  T.   Maycock,  and  S.  Ansari,  2016:  ScienSfic  Stewardship  in  the  Open  Data  and  Big  Data  Era   -­‐  Roles  and  ResponsibiliSes  of  Stewards  and  Other  Major  Product  Stakeholders.  D.-­‐Lib   Magazine,  22.  doi:  10.1045/may2016-­‐peng.  [Available  online  at:   hYp://dlib.org/dlib/may16/peng/05peng.html.]