This document discusses the motivation for creating reproducible science through documenting data provenance using an R implementation. It describes challenges such as standard R tools not collecting provenance and specialized tools having a steep learning curve. It then presents an approach where R scripts are instrumented to collect provenance information as data and process dependencies are executed, generating a directed acyclic graph database of the provenance that can be explored and visualized.
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Aaron Ellison: Analytic Web
1. The Analytic Web
an R implementation
Barbara S. Lerner, Mount Holyoke College
Emery R. Boose, Harvard University
Leon J. Osterweil, University of Massachusetts
Aaron M. Ellison, Harvard University
2. Motivation
Create reproducible science by documenting data
provenance: processes used to create, modify,
visualize, analyze, and synthesize data
Challenges:
Standard tools (e.g., R) do not collect provenance
Specialized tools (e.g., Kepler) have steep learning curve
Computer scientists are interested in control flow, data flow,
abstraction; ecologists are interested in other things
Lack of community standards
How much information to collect, manage, store, and use
4. From R Scripts to Provenance Graphs
DDG
Explorer
Textual
DDG
R
Script
Instrumented
R Script
RData
Tracker
R
Interpreter
DDG
Database
Visual
DDG
Instrumented
by scientist
Legend
R Scripts
R Environment
Provenance