SlideShare ist ein Scribd-Unternehmen logo
1 von 14
Mechanisms for
Data Quality and Validation
in Citizen Science
A. Wiggins, G. Newman, R. Stevenson & K. Crowston
Presented by Nathan Prestopnik
Motivation

 Data quality and validation are a primary concern
  for most citizen science projects
   More contributors = more opportunities for error

 There has been no review of appropriate data
  quality and validation mechanisms
   Diverse projects face similar challenges

 Contributors’ skills and scale of participation are
  important considerations in ensuring quality
Methods

 Survey
   Questionnaire with 70 items, all optional
   63 completed questionnaires representing 62 projects
   Mostly small-to-medium sized projects in US, Canada,
    UK; most focus on monitoring and observation

 Inductive development of framework
   Based on survey results and authors’ direct experience
    with citizen science projects
Survey: Resources

 FTEs: 0 – 50+
   Average: 2.4; Median: 1
   Often small fractions of several individuals’ time

 Annual budgets: $125 - $1,000,000
   Average: $105,000; Median: $35,000; Mode: $20,000
   Up to 5 different funding sources, usually grants, in-
    kind contributions (staff time), & private donations

 Age/duration: -1 to 100 years
   Average age: 13 years; Median: 9 years; Mode: 2 years
Survey: Methods Used
Method                                                n    Percentage
Expert review                                         46      77%
Photo submissions                                     24      40%
Paper data sheets submitted along with online entry   20      33%
Replication/rating by multiple participants           14      23%
QA/QC training program                                13      22%
Automatic filtering of unusual reports                11      18%
Uniform equipment                                     9       15%
Validation planned but not yet implemented            5       8%
Replication/rating, by the same participant           2       3%
Rating of established control items                   2       3%
None                                                  2       3%
Not sure/don’t know                                   2       3%
Survey:
         Combining Methods
Methods                                      n    Percentage
Single method                                10      17%
Multiple methods, up to 5 (average 2.5)      45      75%
Expert review + Automatic filtering          11      18%
Expert review + Paper data sheets            10      17%
Expert review + Photos                       14      23%
Expert review + Photos + Paper data sheets   6       10%
Expert review + Replication, multiple        10      17%
Survey:
     Resources & Methods
 Number of validation methods and staff are
  positively correlated (r2 = 0.11)
   More staffing = more supervisory capacity

 Number of validation methods and budget are
  negatively correlated (r2 = -0.15)
   If larger budgets means more contributors, this
    constrains scalability of multiple methods
   Larger projects may use fewer but more sophisticated
    mechanisms
   Suggests that human-supervised methods don’t scale
Survey:
 Other Validation Options
 “Please describe any additional validation methods
  used in your project”
   Several projects rely on personal knowledge of
    contributing individuals for data quality
     Not scientifically robust, but understandably relevant
   Most comments referred to details of expert review
     Reinforces the perceived value of expertise
   Reporting interface and associated error-checking is
    often overlooked, but provides important initial data
    verification
Choosing Mechanisms

 Data characteristics to consider when choosing
  mechanisms to ensure quality
   Accuracy and precision: taxonomic, spatial, temporal,
    etc.
   Error prevention: malfeasance (gaming the system),
    inexperience, data entry errors, etc.

 Evaluate assumptions about error and accuracy
   Where does error originate? How do mechanisms
    address this? At what step in the research process?
    How transparent is data review and outcomes? How
    much data will be reviewed? In how much detail?
Mechanisms: Protocols
Mechanism                 Process   Type/Detail
QA project plans          Before    SOP in some areas
Repeated samples/tasks    During    By multiple participants, single
                                    participant, or experts (calibration)
Tasks involving control   During    Contributions compared to known states
items
Uniform/calibrated        During    Used for measurements; cost/scale
equipment                           tradeoff; who pays?
Paper data sheets +       During    Extended details, verifying data entry
online entry*                       accuracy
Digital vouchers*         During    Photos, audio, specimens/archives
Data triangulation,       After     Corroboration from other data sources;
normalization, mining*              statistical & computer science methods
Data documentation*       After     Provide metadata about processes
Mechanisms: Participants

Mechanism                 Process   Types/Details
Participant training      Before,   Initial; Ongoing; Formal QA/QC
                          During
Participant testing       Before,   Following training; Pre/test-retest
                          During
Rating participant        During,   Unknown to participant; Known to
performance               After     participant
Filtering of unusual      During,   Automatically; Manually
reports                   After
Contacting participants   After     May alienate/educate contributors
about unusual reports
Automatic recognition     After     Techniques for image/text processing
Expert review             After     By professionals, experienced contributors,
                                    or multiple parties
Discussion

 Need to pay more attention to way that data are
  created, not just protocols but also qualities of data
  like accuracy, precision

 Clear need for quality/validation mechanisms for
  analysis, not only for data collection/processing
   Data mining techniques
   Spatio-temporal modeling

 Scalability of validation may be limited
   May need to plan different quality management
    techniques based on expected/actual project growth
Future Work

 Most projects worry more about contributor
  expertise than appropriate analysis methods
   Resources are needed to support suitable analysis
    approaches and tools

 Comparative valuation of the efficacy of the data
  quality and validation mechanisms identified
   Develop a QA/QC planning and evaluation tool

 Develop examples of appropriate data
  documentation for citizen science projects
   Necessary for peer review, data re-use
Thanks!

 Nate Prestopnik

 DataONE working group on Public Participation in
  Scientific Research

 US NSF grants 09-43049 & 11-11107

Weitere ähnliche Inhalte

Was ist angesagt?

Clean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSClean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSKatalyst HLS
 
Risk Based Monitoring in Practice
Risk Based Monitoring in PracticeRisk Based Monitoring in Practice
Risk Based Monitoring in Practicewww.datatrak.com
 
Bab 6 Tool Support For Testing
Bab 6 Tool Support For TestingBab 6 Tool Support For Testing
Bab 6 Tool Support For Testinglolayoriva
 
The secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialThe secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialpcirnkt
 

Was ist angesagt? (6)

CRO - Clinical Vendor Oversight Webinar.
CRO - Clinical Vendor Oversight Webinar.CRO - Clinical Vendor Oversight Webinar.
CRO - Clinical Vendor Oversight Webinar.
 
Clean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLSClean File_Form_Lock_Katalyst HLS
Clean File_Form_Lock_Katalyst HLS
 
Risk Based Monitoring in Practice
Risk Based Monitoring in PracticeRisk Based Monitoring in Practice
Risk Based Monitoring in Practice
 
Bab 6 Tool Support For Testing
Bab 6 Tool Support For TestingBab 6 Tool Support For Testing
Bab 6 Tool Support For Testing
 
Monitoring Plan Template
Monitoring Plan TemplateMonitoring Plan Template
Monitoring Plan Template
 
The secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trialThe secrets to conducting a rapid safety trial
The secrets to conducting a rapid safety trial
 

Andere mochten auch

GeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationGeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationCameron Kiddle
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureAndrea Wiggins
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureAndrea Wiggins
 
Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsAndrea Wiggins
 
4. sistema nervioso autonomo
4. sistema nervioso autonomo4. sistema nervioso autonomo
4. sistema nervioso autonomoFredy Vasquez
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceAndrea Wiggins
 

Andere mochten auch (8)

GeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 PresentationGeoChronos - CANARIE NEP Showcase 2009 Presentation
GeoChronos - CANARIE NEP Showcase 2009 Presentation
 
E scidocdays review
E scidocdays reviewE scidocdays review
E scidocdays review
 
Intellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and FutureIntellectual Diversity in the iSchools: Past, Present and Future
Intellectual Diversity in the iSchools: Past, Present and Future
 
Tales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science CyberinfrastructureTales of the Field: Building Small Science Cyberinfrastructure
Tales of the Field: Building Small Science Cyberinfrastructure
 
Online Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCamsOnline Communities in Citizen Science & BirdCams
Online Communities in Citizen Science & BirdCams
 
4. sistema nervioso autonomo
4. sistema nervioso autonomo4. sistema nervioso autonomo
4. sistema nervioso autonomo
 
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen ScienceFree as in Puppies: Compensating for ICT Constraints in Citizen Science
Free as in Puppies: Compensating for ICT Constraints in Citizen Science
 
All About me
All About meAll About me
All About me
 

Ähnlich wie Mechanisms for Data Quality and Validation in Citizen Science

Optimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill BarronOptimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill BarronNeill Barron
 
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...Sushanta Kumar Sarker
 
RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016Lauren Carr
 
sources of data.ppt
sources of data.pptsources of data.ppt
sources of data.pptTeenaPS1
 
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...ISCRAM Events
 
ISCRAM Impact Evaluation
ISCRAM Impact EvaluationISCRAM Impact Evaluation
ISCRAM Impact EvaluationKenny Meesters
 
Final-Audit-Sampling.pdf
Final-Audit-Sampling.pdfFinal-Audit-Sampling.pdf
Final-Audit-Sampling.pdfssuser5945a3
 
Scientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixScientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixGe Peng
 
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...GoLeanSixSigma.com
 
Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...jehill3
 
Acceptance Testing
Acceptance TestingAcceptance Testing
Acceptance Testingrosman
 
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...Water, Land and Ecosystems (WLE)
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Health Catalyst
 
Freeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with PatientsFreeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with PatientsTransPerfect Trial Interactive
 
Strengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use DataStrengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use DataMEASURE Evaluation
 
Quality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdfQuality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdfNileshJajoo2
 

Ähnlich wie Mechanisms for Data Quality and Validation in Citizen Science (20)

Optimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill BarronOptimising Clinical Trials Monitoring Data review - Neill Barron
Optimising Clinical Trials Monitoring Data review - Neill Barron
 
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...Presentation  on dealing with data quality sushanta, MEAL part-2 training 28 ...
Presentation on dealing with data quality sushanta, MEAL part-2 training 28 ...
 
Quality payment program 2018
Quality payment program 2018Quality payment program 2018
Quality payment program 2018
 
RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016RBM 101 Infographic FINAL 2016
RBM 101 Infographic FINAL 2016
 
Test process
Test processTest process
Test process
 
sources of data.ppt
sources of data.pptsources of data.ppt
sources of data.ppt
 
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
ISCRAM 2013: Designing towards an impact evaluation framework for a collabora...
 
ISCRAM Impact Evaluation
ISCRAM Impact EvaluationISCRAM Impact Evaluation
ISCRAM Impact Evaluation
 
Final-Audit-Sampling.pdf
Final-Audit-Sampling.pdfFinal-Audit-Sampling.pdf
Final-Audit-Sampling.pdf
 
Scientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity MatrixScientific Data Stewardship Maturity Matrix
Scientific Data Stewardship Maturity Matrix
 
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
SUCCESS STORY: Increasing Audit Processing Throughput by Over 100% With Lynne...
 
Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...Preliminary results from a survey on the use of metrics and evaluation strate...
Preliminary results from a survey on the use of metrics and evaluation strate...
 
Acceptance Testing
Acceptance TestingAcceptance Testing
Acceptance Testing
 
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
Use of Qualitative Approaches for Impact Assessments of Integrated Systems Re...
 
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
Optimize Your Healthcare Data Quality Investment: Three Ways to Accelerate Ti...
 
Freeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with PatientsFreeing Up Investigators' Time to Engage with Patients
Freeing Up Investigators' Time to Engage with Patients
 
Strengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use DataStrengthening an Organization’s Capacity to Demand and Use Data
Strengthening an Organization’s Capacity to Demand and Use Data
 
TRI's DIA 2015 Presentation, Therapeutic KRIs: Digestive Disease
TRI's DIA 2015 Presentation, Therapeutic KRIs:  Digestive DiseaseTRI's DIA 2015 Presentation, Therapeutic KRIs:  Digestive Disease
TRI's DIA 2015 Presentation, Therapeutic KRIs: Digestive Disease
 
#W4A2011 - C. Bailey
#W4A2011 - C. Bailey#W4A2011 - C. Bailey
#W4A2011 - C. Bailey
 
Quality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdfQuality Journey- Measurement System Analysis .pdf
Quality Journey- Measurement System Analysis .pdf
 

Mehr von Andrea Wiggins

Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Andrea Wiggins
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen ScienceAndrea Wiggins
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science PhenotypesAndrea Wiggins
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceAndrea Wiggins
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Andrea Wiggins
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen ScienceAndrea Wiggins
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityAndrea Wiggins
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Andrea Wiggins
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen ScienceAndrea Wiggins
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceAndrea Wiggins
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesAndrea Wiggins
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesAndrea Wiggins
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace dataAndrea Wiggins
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceAndrea Wiggins
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsAndrea Wiggins
 
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceDistributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceAndrea Wiggins
 
Designing Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceDesigning Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceAndrea Wiggins
 
National Park System Property Designations
National Park System Property DesignationsNational Park System Property Designations
National Park System Property DesignationsAndrea Wiggins
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsAndrea Wiggins
 

Mehr von Andrea Wiggins (20)

Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
Crowdsourcing Citizen Science Data Quality with a Human-Computer Learning Net...
 
Online Communities in Citizen Science
Online Communities in Citizen ScienceOnline Communities in Citizen Science
Online Communities in Citizen Science
 
Citizen Science Phenotypes
Citizen Science PhenotypesCitizen Science Phenotypes
Citizen Science Phenotypes
 
The Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen ScienceThe Evolving Landscape of Citizen Science
The Evolving Landscape of Citizen Science
 
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
Citizen Science 101: What Every Researcher Should Know About Crowdsourcing Sc...
 
Data Management for Citizen Science
Data Management for Citizen ScienceData Management for Citizen Science
Data Management for Citizen Science
 
With Great Data Comes Great Responsibility
With Great Data Comes Great ResponsibilityWith Great Data Comes Great Responsibility
With Great Data Comes Great Responsibility
 
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
Crowdsourcing Scientific Work: A Comparative Study of Technologies, Processes...
 
Open Source & Citizen Science
Open Source & Citizen ScienceOpen Source & Citizen Science
Open Source & Citizen Science
 
From Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen ScienceFrom Conservation to Crowdsourcing: A Typology of Citizen Science
From Conservation to Crowdsourcing: A Typology of Citizen Science
 
Motivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and IncentivesMotivation by Design: Technologies, Experiences, and Incentives
Motivation by Design: Technologies, Experiences, and Incentives
 
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themesData Intensive Collaboration in Science and Engineering: CSCW workshop themes
Data Intensive Collaboration in Science and Engineering: CSCW workshop themes
 
Secondary data analysis with digital trace data
Secondary data analysis with digital trace dataSecondary data analysis with digital trace data
Secondary data analysis with digital trace data
 
Open Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen ScienceOpen Source, Open Science, & Citizen Science
Open Source, Open Science, & Citizen Science
 
Reclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS ProjectsReclassifying Success and Tragedy in FLOSS Projects
Reclassifying Success and Tragedy in FLOSS Projects
 
Crowdsourcing Science
Crowdsourcing ScienceCrowdsourcing Science
Crowdsourcing Science
 
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen ScienceDistributed Scientific Collaboration: Research Opportunities in Citizen Science
Distributed Scientific Collaboration: Research Opportunities in Citizen Science
 
Designing Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen ScienceDesigning Virtual Organizations for Citizen Science
Designing Virtual Organizations for Citizen Science
 
National Park System Property Designations
National Park System Property DesignationsNational Park System Property Designations
National Park System Property Designations
 
Collaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna WorkflowsCollaborative Data Analysis with Taverna Workflows
Collaborative Data Analysis with Taverna Workflows
 

Kürzlich hochgeladen

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 

Kürzlich hochgeladen (20)

The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 

Mechanisms for Data Quality and Validation in Citizen Science

  • 1. Mechanisms for Data Quality and Validation in Citizen Science A. Wiggins, G. Newman, R. Stevenson & K. Crowston Presented by Nathan Prestopnik
  • 2. Motivation  Data quality and validation are a primary concern for most citizen science projects  More contributors = more opportunities for error  There has been no review of appropriate data quality and validation mechanisms  Diverse projects face similar challenges  Contributors’ skills and scale of participation are important considerations in ensuring quality
  • 3. Methods  Survey  Questionnaire with 70 items, all optional  63 completed questionnaires representing 62 projects  Mostly small-to-medium sized projects in US, Canada, UK; most focus on monitoring and observation  Inductive development of framework  Based on survey results and authors’ direct experience with citizen science projects
  • 4. Survey: Resources  FTEs: 0 – 50+  Average: 2.4; Median: 1  Often small fractions of several individuals’ time  Annual budgets: $125 - $1,000,000  Average: $105,000; Median: $35,000; Mode: $20,000  Up to 5 different funding sources, usually grants, in- kind contributions (staff time), & private donations  Age/duration: -1 to 100 years  Average age: 13 years; Median: 9 years; Mode: 2 years
  • 5. Survey: Methods Used Method n Percentage Expert review 46 77% Photo submissions 24 40% Paper data sheets submitted along with online entry 20 33% Replication/rating by multiple participants 14 23% QA/QC training program 13 22% Automatic filtering of unusual reports 11 18% Uniform equipment 9 15% Validation planned but not yet implemented 5 8% Replication/rating, by the same participant 2 3% Rating of established control items 2 3% None 2 3% Not sure/don’t know 2 3%
  • 6. Survey: Combining Methods Methods n Percentage Single method 10 17% Multiple methods, up to 5 (average 2.5) 45 75% Expert review + Automatic filtering 11 18% Expert review + Paper data sheets 10 17% Expert review + Photos 14 23% Expert review + Photos + Paper data sheets 6 10% Expert review + Replication, multiple 10 17%
  • 7. Survey: Resources & Methods  Number of validation methods and staff are positively correlated (r2 = 0.11)  More staffing = more supervisory capacity  Number of validation methods and budget are negatively correlated (r2 = -0.15)  If larger budgets means more contributors, this constrains scalability of multiple methods  Larger projects may use fewer but more sophisticated mechanisms  Suggests that human-supervised methods don’t scale
  • 8. Survey: Other Validation Options  “Please describe any additional validation methods used in your project”  Several projects rely on personal knowledge of contributing individuals for data quality  Not scientifically robust, but understandably relevant  Most comments referred to details of expert review  Reinforces the perceived value of expertise  Reporting interface and associated error-checking is often overlooked, but provides important initial data verification
  • 9. Choosing Mechanisms  Data characteristics to consider when choosing mechanisms to ensure quality  Accuracy and precision: taxonomic, spatial, temporal, etc.  Error prevention: malfeasance (gaming the system), inexperience, data entry errors, etc.  Evaluate assumptions about error and accuracy  Where does error originate? How do mechanisms address this? At what step in the research process? How transparent is data review and outcomes? How much data will be reviewed? In how much detail?
  • 10. Mechanisms: Protocols Mechanism Process Type/Detail QA project plans Before SOP in some areas Repeated samples/tasks During By multiple participants, single participant, or experts (calibration) Tasks involving control During Contributions compared to known states items Uniform/calibrated During Used for measurements; cost/scale equipment tradeoff; who pays? Paper data sheets + During Extended details, verifying data entry online entry* accuracy Digital vouchers* During Photos, audio, specimens/archives Data triangulation, After Corroboration from other data sources; normalization, mining* statistical & computer science methods Data documentation* After Provide metadata about processes
  • 11. Mechanisms: Participants Mechanism Process Types/Details Participant training Before, Initial; Ongoing; Formal QA/QC During Participant testing Before, Following training; Pre/test-retest During Rating participant During, Unknown to participant; Known to performance After participant Filtering of unusual During, Automatically; Manually reports After Contacting participants After May alienate/educate contributors about unusual reports Automatic recognition After Techniques for image/text processing Expert review After By professionals, experienced contributors, or multiple parties
  • 12. Discussion  Need to pay more attention to way that data are created, not just protocols but also qualities of data like accuracy, precision  Clear need for quality/validation mechanisms for analysis, not only for data collection/processing  Data mining techniques  Spatio-temporal modeling  Scalability of validation may be limited  May need to plan different quality management techniques based on expected/actual project growth
  • 13. Future Work  Most projects worry more about contributor expertise than appropriate analysis methods  Resources are needed to support suitable analysis approaches and tools  Comparative valuation of the efficacy of the data quality and validation mechanisms identified  Develop a QA/QC planning and evaluation tool  Develop examples of appropriate data documentation for citizen science projects  Necessary for peer review, data re-use
  • 14. Thanks!  Nate Prestopnik  DataONE working group on Public Participation in Scientific Research  US NSF grants 09-43049 & 11-11107

Hinweis der Redaktion

  1. Rating = classification or judgment tasks, admittedly not the clearest wording, but no one corrected this in text responsesPercentage = percentage of responding projects that use each method
  2. Percentage = Percentage of responding projects that use this combination of methodsThere were a few other combinations that a handful of projects used; these were the dominant ones.Surprised to see so many with photos, as they are hard to use and store, and the frequency of using paper data sheets
  3. Note that we did ask about numbers of contributions, but the units of contribution for each project (and even the way they count volunteers) were so different that they couldn’t be used for analysis
  4. Split framework of mechanisms in two for ease of viewing; these are methods that address the protocol as the presumed source of errorStarred items address errors arising from both protocols and participants
  5. These methods all address expected errors form participants, focusing primarily on skill evaluation and filtering or review for unusual reports