SlideShare ist ein Scribd-Unternehmen logo
1 von 42
Downloaden Sie, um offline zu lesen
Auditing and Maintaining Provenance in
Software Packages
Quan Pham1 Tanu Malik2 Ian Foster1,2
Department of Computer Science1 and Computation Institute2,
The University of Chicago,
Chicago, IL 60637, USA
quanpt@cs.uchicago.edu, tanum@ci.uchicago.edu
Presented by Boris Glavic
Illinois Institute of Technology
IPAW14
June, 10th, 2014
Provenance in Software Packages June, 10th
, 2014 1 / 29
Outline
1 Introduction
2 Software Pipeline Usecase
3 CDE-SP: Software Provenance in CDE
4 Experiment and Evaluation
5 Related Work
6 Conclusion
Provenance in Software Packages June, 10th
, 2014 2 / 29
Current Solutions for Ensuring Reproducibility and Issues
1 Publish source code and data
− GitHub, Figshare, Research Compendia
Pros: (in many cases) easy to accomplish
× Cons: need to recompile and re-execute
2 Publish software package including source code, data, and
environment dependencies
− CDE, RunMyCode.org
Pros: re-execute without installation
× Cons: not easy to combine and merge shared packages
3 Publish a virtual machine image (VMI) that includes OS, source code,
data, and environment
− Cloud BioLinux (NEBC), Swift Appliance (RDCEP)
Pros: no additional modules or components needed to rerun
× Problem: too hard to provision and understand
Introduction Provenance in Software Packages June, 10th
, 2014 3 / 29
Reproducibility Problem
Our philosophy:
”... releasing shoddy VMs is easy to do, but it doesn’t help you learn how
to do a better job of reproducibility along the way. Releasing software
pipelines, however crappy, is on the path towards better reproducibility.”
C. Tituss Brown1
Reproducibility problem: How can we make it easy to combine and
merge shared packages, while correctly attributing authorship of software
packages?
No need to provision VMIs or publish simply source code and data.
1
http://ivory.idyll.org/blog/vms-considered-harmful.html
Introduction Provenance in Software Packages June, 10th
, 2014 4 / 29
Problem Scope
Use CDE2 to capture and create portable software package
Extend, partially re-use, and combine CDE packages to create new
reproducible software pipelines
Attribute authorship of software packages in new software pipelines
CDE has an OVERLAP conïŹ‚ict!
2
Guo, P.J., Engler, D.: CDE: using system call interposition to automatically create
portable software packages. USENIX Association, Portland, OR (2011)
Introduction Provenance in Software Packages June, 10th
, 2014 5 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE
Create a portable software package
without installation, conïŹguration, or privilege permissions
Audit mode to create a CDE package
Introduction Provenance in Software Packages June, 10th
, 2014 6 / 29
CDE - Execution Mode
Introduction Provenance in Software Packages June, 10th
, 2014 7 / 29
CDE - Execution Mode
Introduction Provenance in Software Packages June, 10th
, 2014 7 / 29
CDE - Execution Mode
Introduction Provenance in Software Packages June, 10th
, 2014 7 / 29
CDE - Execution Mode
Introduction Provenance in Software Packages June, 10th
, 2014 7 / 29
CDE - Execution Mode
Introduction Provenance in Software Packages June, 10th
, 2014 7 / 29
CDE - Execution Mode
Introduction Provenance in Software Packages June, 10th
, 2014 7 / 29
Software Pipelines Contain CDE packages
A software pipeline consists many individual software modules
A software module depends on externally-developed libraries
A software module is often packaged together with speciïŹc versions of
libraries
Introduction Provenance in Software Packages June, 10th
, 2014 8 / 29
RDCEP Usecase
Alice, Bob, and Charlie are scientists at the Center for Robust Decision
Making on Climate and Energy Policy (RDCEP)
A develops data integration methods to produce higher-resolution
datasets depicting inferred land use over time.
B develops computational models to do model-based comparative
analysis. B’s software environment consists of A’s software modules
to produce high-resolution datasets.
C uses A and B’s software modules within data-intensive
computing methods to run them in parallel.
The Center wants to predict future yields of staple agricultural
commodities given changes in the climate.
C's Package (Merge from B's)
B's Package (from A's)
A's Package
Parallel init Aggregation Generate images Model-based analysis Parallel summary
Generate images Model-based analysisRetrive data Aggregation
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 9 / 29
A’s Experiment & Package
A’s package
cde-root
path to A’s ïŹles
a-experiment.sh
retrieve-data
aggregation
generate-image
f1, f2, a-output
path to common libs
libc.so
Re-execute A’s experiment:
cde-exec a-experiment.sh
cat a-experiment.sh
./retrieve-data f1
./aggregation f1 f2
./generate-image f2 a-output
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 11 / 29
B’s Experiment & Package
B’s package
cde-root
path to A’s ïŹles
[...]
path to B’s ïŹles
b-experiment.sh
analysis
b-output
path to common libs
libc.so
Re-execute B’s experiment:
cde-exec b-experiment.sh
cat b-experiment.sh
cd path to A’s experiment
cde-exec a-experiment.sh
cd path to B’s ïŹles
./analysis path to A’s ïŹles/a-output b-output
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 12 / 29
C’s Experiment & Package
C’s package
cde-root
path to A’s ïŹles
[...]
path to B’s ïŹles
[...]
path to C’s ïŹles
c-experiment.sh
parallel-init
parallel-summary
c-output
path to common libs
libc.so
Re-execute C’s experiment:
cde-exec c-experiment.sh
cat c-experiment.sh
parallel-init path to A’s ïŹles/f4
cd path to A’s ïŹles
cde-exec ./aggregation f4 f5
cde-exec ./generate-image f5 f6
cd path to B’s ïŹles
cde-exec ./analysis path to A’s ïŹles/f6 f7
cd path to C’s ïŹles
./parallel-summary path to B’s ïŹles/f7 c-output
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 13 / 29
Dependency Overlap in Multiple cde-root Directories
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 14 / 29
File Overlap of DiïŹ€erent Linux Distributions
RH SUSE U12 U13
Amz 5498 / 23k 3184 / 11k 1203 / 5.4k 1819 / 5.5k
RH 3861 / 12k 1654 / 6.6k 2223 / 6.3k
SUSE 1245 / 3.9k 2085 / 6.4k
U12 8226 / 24k
Table 1 : Ratio of diïŹ€erent ïŹles having the same path in 5 popular AMIs. The
denominator is number of ïŹles having the same path in two distributions, and the
numerator is the number of ïŹles with the same path but diïŹ€erent md5 checksum.
Ommited are manual pages in /usr/share/ directory.
Amz Amazon Linux AMI
RH Red Hat Enterprise Linux 6.4
SUSE SUSE Linux Enterprise Server 11
U12 Ubuntu Server 12.04.3 LTS
U13 Ubuntu Server 13.10
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 15 / 29
Re-direction in Multiple cde-root Directories
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 16 / 29
CDE-SP
CDE-SP: Enhanced CDE that includes software provenance
Describe tools and methods to audit, store, and query provenance
Provenance queries
Determine the environment under which a dependency was build
Examine the dependencies which must be present
Answer if packages in a pipeline can satisfy a new package
Attribute authorship of software packages in a pipeline
Combine and validate authorship from stored provenance
Software Pipeline Usecase Provenance in Software Packages June, 10th
, 2014 17 / 29
CDE-SP Audit
Objectives
Capture additional details of the origins of a library or a binary
Use these details for compiling and creating software pipelines
Methods
Create a dependency tree
Process system calls are monitored
Whenever a process executes a ïŹle system call, a dependency of that
process is recorded
Dependency can be a data ïŹle or a shared library
Extract information about binaries and required shared libraries
ïŹle, ldd, strings, and objdump UNIX commands
uname -a and function getpwuid(getuid())
CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th
, 2014 18 / 29
CDE-SP Audit
Objectives
Capture additional details of the origins of a library or a binary
Use these details for compiling and creating software pipelines
Methods
Create a dependency tree
Process system calls are monitored
Whenever a process executes a ïŹle system call, a dependency of that
process is recorded
Dependency can be a data ïŹle or a shared library
Extract information about binaries and required shared libraries
ïŹle, ldd, strings, and objdump UNIX commands
uname -a and function getpwuid(getuid())
CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th
, 2014 18 / 29
CDE-SP Audit
Objectives
Capture additional details of the origins of a library or a binary
Use these details for compiling and creating software pipelines
Methods
Create a dependency tree
Process system calls are monitored
Whenever a process executes a ïŹle system call, a dependency of that
process is recorded
Dependency can be a data ïŹle or a shared library
Extract information about binaries and required shared libraries
ïŹle, ldd, strings, and objdump UNIX commands
uname -a and function getpwuid(getuid())
CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th
, 2014 18 / 29
Storage
Store provenance within the package itself
Use LevelDB: a fast and light-weight key-value storage library
Encode in the key the UNIX process identiïŹer along with spawn time
Key Value Explanation
pid.PID1.exec.TIME PID2 PID1 wasTriggeredBy PID2
pid.PID.[path, pwd, args] VALUES Other properties of PID
io.PID.action.IO.TIME FILE(PATH) PID wasGeneratedBy / wa-
sUsedBy FILE(PATH)
meta.agent USERNAME User information
meta.machine OSNAME operating system distribution
Table 2 : LevelDB key-value pairs that store ïŹle and process provenance. Capital letter words are arguments.
CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th
, 2014 19 / 29
Query
LevelDB provides a minimal API for querying
Simple, light-weight query interface
Input: a program whose dependencies need to be retrieved
Output: a GraphViz ïŹle displaying ïŹle and process dependencies
Use depth ïŹrst search algorithm to create a dependency tree with the
input program as its root
Exclusion option to remove uninteresting dependencies:
/lib/, /usr/lib/, /usr/share/, /etc/
CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th
, 2014 20 / 29
Authorship of Software Modules
Combine authorship of the contributing packages
Validate authorship from the provenance stored in the original
package
Generate the subgraph associated with the part of the new package
Use subgraph isomorphism (NP-Hard) to validate with the original
provenance graph
Match provenance nodes of processes with the same paths of their
binaries and working directories
Match provenance nodes of ïŹles with the same path
CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th
, 2014 21 / 29
Experiments
Performance of CDE-SP
Auditing performance overhead
Disk storage increase
Provenance query runtime
Redirection overhead when multiple UUID-based directories are
created
Compare the lightweight virtualization approach of CDE-SP with
Kameleon3, a heavyweight virtualization approach used for
reproducibility
Experiments were run on Ubuntu 12.04 LTS workstation with an 8GBs
RAM and 8-core Intel(R) processor clocking at 1600MHz.
3
Emeras, J., Richard, O., Bzeznik, B.: Reconstructing the software environment of
an experiment with kameleon (2011)
Experiment and Evaluation Provenance in Software Packages June, 10th
, 2014 22 / 29
Performance & Size Overhead
Pipeline with two applications: Aggregation and Generate Image
2.1% slowdown of CDE-SP vs. 0-30% CDE virtualization overhead4
LevelDB database size 236kB (0.03% package size increase) contains
approximately 12,000 key-value pairs
Create
Package
Execution Disk Usage Provenance Query
CDE 852.6±2.4 568.8±2.4 732MB
CDE-SP 870.5±2.5 569.5±1.8 732MB+236kB 0.4±0.03
(seconds) (seconds) (seconds)
Table 3 : Increase in CDE-SP performance is negligible in comparison with CDE
4
Guo, P.J., Engler, D.: CDE: using system call interposition to automatically create
portable software packages. USENIX Association, Portland, OR (2011)
Experiment and Evaluation Provenance in Software Packages June, 10th
, 2014 23 / 29
Redirection Overhead in CDE-SP
Pipelined output of Aggregation to input of Generate Image
3 output ïŹles of Aggregation package were moved to Generate Image
package
2 cross-package execve() system calls
Less than a 1% slowdown of CDE-SP
Experiment and Evaluation Provenance in Software Packages June, 10th
, 2014 24 / 29
Kameleon
Use the Kameleon engine to make a bare bone VM appliance
Self-written YAML-formatted recipes
Self-written macrosteps and microsteps
Kameleon can create virtual machine appliances in diïŹ€erent formats
for diïŹ€erent Linux distributions
Generates bash scripts to create an initial virtual image of a Linux
distribution
Populates the image with more Linux packages
Populates with content of a CDE-SP package
Experiment and Evaluation Provenance in Software Packages June, 10th
, 2014 25 / 29
CDE-SP Vs Kameleon
0
200
400
600
800
1000
1200
1400
1600
Kameleon CDE-SP
Seconds
Figure 1 : Overhead when using CDE with Kameleon VM appliance
Experiment and Evaluation Provenance in Software Packages June, 10th
, 2014 26 / 29
Related Work
Research Objects: packages scientiïŹc workïŹ‚ows with auxiliary
information about workïŹ‚ows, including provenance information and
metadata, such as the authors, the version
CDE and Sumatra can capture an execution environment in a
lightweight fashion
SystemTap, being a kernel-based tracing mechanism, has better
performance compared to ptrace but needs to run at a higher
privilege level
Provenance-to-Use (PTU) and ReproZip include provenance in
self-contained software packages
Related Work Provenance in Software Packages June, 10th
, 2014 27 / 29
Conclusion
CDE does not encapsulate provenance of associated dependencies in
a software package
The lack of information about the origins of dependencies in a
software package creates issues when constructing software pipelines
from packages
CDE-SP can include software provenance as part of a software
package
CDE-SP can use software package provenance to build software
pipelines
CDE-SP can maintain provenance when used to construct software
pipelines
Conclusion Provenance in Software Packages June, 10th
, 2014 28 / 29
Acknowledgments
Neil Best at The University of Chicago
Joshua Elliott at The Columbia University
Justin Wozniak at Argonne National Laboratory
Allison Brizius at RDCEP Center
NSF grant SES-0951576, GEO-1343816
Acknowledgments Provenance in Software Packages June, 10th
, 2014 29 / 29

Weitere Àhnliche Inhalte

Ähnlich wie Ipaw14 presentation Quan, Tanu, Ian

Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...ForgeRock
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Vincenzo Ferme
 
Keynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus NetworksKeynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus NetworksDevOps4Networks
 
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...Puppet
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - RundeckNeil McCaughley
 
Continuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationContinuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationMark Rendell
 
What is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdfWhat is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdfkomalmanu87
 
What is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdfWhat is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdfkomalmanu87
 
penetration test using Kali linux seminar report
penetration test using Kali linux seminar reportpenetration test using Kali linux seminar report
penetration test using Kali linux seminar reportAbhayNaik8
 
Integração contínua com Jenkins
Integração contínua com JenkinsIntegração contínua com Jenkins
Integração contínua com JenkinsAécio Pires
 
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity PlatformRockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity PlatformOlivier Naveau
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystemsparkfabrik
 
Azure DevOps in Action
Azure DevOps in ActionAzure DevOps in Action
Azure DevOps in ActionCallon Campbell
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data Surya Saha
 
Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017Steve Springett
 
Deep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up GroupDeep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up GroupNeerajKumar1965
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with condaTravis Oliphant
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereGanesh Raju
 
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet
 

Ähnlich wie Ipaw14 presentation Quan, Tanu, Ian (20)

Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
Paradigmo. Rock Kit, the Rapid Deployment Toolkit for ForgeRock Identity Plat...
 
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...Using Docker Containers to Improve Reproducibility in Software and Web Engine...
Using Docker Containers to Improve Reproducibility in Software and Web Engine...
 
Keynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus NetworksKeynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
Keynote: DevOps 4 Networks by JR Rivers of Cumulus Networks
 
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
PuppetConf 2016: Continuous Delivery and DevOps with Jenkins and Puppet Enter...
 
Operations Support Workflow - Rundeck
Operations Support Workflow - RundeckOperations Support Workflow - Rundeck
Operations Support Workflow - Rundeck
 
Continuous Delivery with a PaaS Application
Continuous Delivery with a PaaS ApplicationContinuous Delivery with a PaaS Application
Continuous Delivery with a PaaS Application
 
What is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdfWhat is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdf
 
What is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdfWhat is DevOps Services_ Tools and Benefits.pdf
What is DevOps Services_ Tools and Benefits.pdf
 
penetration test using Kali linux seminar report
penetration test using Kali linux seminar reportpenetration test using Kali linux seminar report
penetration test using Kali linux seminar report
 
Integração contínua com Jenkins
Integração contínua com JenkinsIntegração contínua com Jenkins
Integração contínua com Jenkins
 
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity PlatformRockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
RockKit, the Rapid Deployment Toolkit for ForgeRock Identity Platform
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
 
Azure DevOps in Action
Azure DevOps in ActionAzure DevOps in Action
Azure DevOps in Action
 
Quality Control of NGS Data
Quality Control of NGS Data Quality Control of NGS Data
Quality Control of NGS Data
 
Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017Dependency-Check Ecosystem - OWASP Summit 2017
Dependency-Check Ecosystem - OWASP Summit 2017
 
Deep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up GroupDeep Dive on CI/CD NYC Meet Up Group
Deep Dive on CI/CD NYC Meet Up Group
 
What_is_DevOps.pptx
What_is_DevOps.pptxWhat_is_DevOps.pptx
What_is_DevOps.pptx
 
Effectively using Open Source with conda
Effectively using Open Source with condaEffectively using Open Source with conda
Effectively using Open Source with conda
 
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data EverywhereApache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
Apache Bigtop and ARM64 / AArch64 - Empowering Big Data Everywhere
 
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
Puppet Camp Boston 2014: Continuous Integration for Hyper-V with Puppet (Begi...
 

Mehr von Boris Glavic

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...Boris Glavic
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...Boris Glavic
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata GeneratorBoris Glavic
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...Boris Glavic
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...Boris Glavic
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSONBoris Glavic
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersBoris Glavic
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationBoris Glavic
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceBoris Glavic
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesBoris Glavic
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...Boris Glavic
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...Boris Glavic
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"Boris Glavic
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"Boris Glavic
 
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"Boris Glavic
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningBoris Glavic
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...Boris Glavic
 

Mehr von Boris Glavic (18)

2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
2019 - SIGMOD - Uncertainty Annotated Databases - A Lightweight Approach for ...
 
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
2019 - SIGMOD - Going Beyond Provenance: Explaining Query Answers with Patter...
 
2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator2016 VLDB - The iBench Integration Metadata Generator
2016 VLDB - The iBench Integration Metadata Generator
 
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
2016 VLDB - Messing Up with Bart: Error Generation for Evaluating Data-Cleani...
 
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
2016 QDB VLDB Workshop - Towards Rigorous Evaluation of Data Integration Syst...
 
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers
 
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
2015 TaPP - Interoperability for Provenance-aware Databases using PROV and JSON
 
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-AnswersTaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
TaPP 2015 - Towards Constraint-based Explanations for Answers and Non-Answers
 
ICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database VirtualizationICDE 2015 - LDV: Light-weight Database Virtualization
ICDE 2015 - LDV: Light-weight Database Virtualization
 
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of ProvenanceTaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
TaPP 2011 Talk Boris - Reexamining some Holy Grails of Provenance
 
EDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested SubqueriesEDBT 2009 - Provenance for Nested Subqueries
EDBT 2009 - Provenance for Nested Subqueries
 
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
ICDE 2009 - Perm: Processing Provenance and Data on the same Data Model throu...
 
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
2010 VLDB - TRAMP: Understanding the Behavior of Schema Mappings through Prov...
 
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
WBDB 2012 - "Big Data Provenance: Challenges and Implications for Benchmarking"
 
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
DEBS 2013 - "Ariadne: Managing Fine-Grained Provenance on Data Streams"
 
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
SIGMOD 2013 - Patricia's talk on "Value invention for Data Exchange"
 
TaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data MiningTaPP 2013 - Provenance for Data Mining
TaPP 2013 - Provenance for Data Mining
 
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
TaPP 2014 Talk Boris - A Generic Provenance Middleware for Database Queries, ...
 

KĂŒrzlich hochgeladen

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)Areesha Ahmad
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Servicemonikaservice1
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learninglevieagacer
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Servicenishacall1
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flyPRADYUMMAURYA1
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSĂ©rgio Sacani
 
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRLKochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRLkantirani197
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticssakshisoni2385
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)Areesha Ahmad
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY1301aanya
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Silpa
 

KĂŒrzlich hochgeladen (20)

Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts ServiceJustdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
Justdial Call Girls In Indirapuram, Ghaziabad, 8800357707 Escorts Service
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
9999266834 Call Girls In Noida Sector 22 (Delhi) Call Girl Service
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit flypumpkin fruit fly, water melon fruit fly, cucumber fruit fly
pumpkin fruit fly, water melon fruit fly, cucumber fruit fly
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRLKochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
Kochi ❀CALL GIRL 84099*07087 ❀CALL GIRLS IN Kochi ESCORT SERVICE❀CALL GIRL
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
biology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGYbiology HL practice questions IB BIOLOGY
biology HL practice questions IB BIOLOGY
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
Locating and isolating a gene, FISH, GISH, Chromosome walking and jumping, te...
 

Ipaw14 presentation Quan, Tanu, Ian

  • 1. Auditing and Maintaining Provenance in Software Packages Quan Pham1 Tanu Malik2 Ian Foster1,2 Department of Computer Science1 and Computation Institute2, The University of Chicago, Chicago, IL 60637, USA quanpt@cs.uchicago.edu, tanum@ci.uchicago.edu Presented by Boris Glavic Illinois Institute of Technology IPAW14 June, 10th, 2014 Provenance in Software Packages June, 10th , 2014 1 / 29
  • 2. Outline 1 Introduction 2 Software Pipeline Usecase 3 CDE-SP: Software Provenance in CDE 4 Experiment and Evaluation 5 Related Work 6 Conclusion Provenance in Software Packages June, 10th , 2014 2 / 29
  • 3. Current Solutions for Ensuring Reproducibility and Issues 1 Publish source code and data − GitHub, Figshare, Research Compendia Pros: (in many cases) easy to accomplish × Cons: need to recompile and re-execute 2 Publish software package including source code, data, and environment dependencies − CDE, RunMyCode.org Pros: re-execute without installation × Cons: not easy to combine and merge shared packages 3 Publish a virtual machine image (VMI) that includes OS, source code, data, and environment − Cloud BioLinux (NEBC), Swift Appliance (RDCEP) Pros: no additional modules or components needed to rerun × Problem: too hard to provision and understand Introduction Provenance in Software Packages June, 10th , 2014 3 / 29
  • 4. Reproducibility Problem Our philosophy: ”... releasing shoddy VMs is easy to do, but it doesn’t help you learn how to do a better job of reproducibility along the way. Releasing software pipelines, however crappy, is on the path towards better reproducibility.” C. Tituss Brown1 Reproducibility problem: How can we make it easy to combine and merge shared packages, while correctly attributing authorship of software packages? No need to provision VMIs or publish simply source code and data. 1 http://ivory.idyll.org/blog/vms-considered-harmful.html Introduction Provenance in Software Packages June, 10th , 2014 4 / 29
  • 5. Problem Scope Use CDE2 to capture and create portable software package Extend, partially re-use, and combine CDE packages to create new reproducible software pipelines Attribute authorship of software packages in new software pipelines CDE has an OVERLAP conïŹ‚ict! 2 Guo, P.J., Engler, D.: CDE: using system call interposition to automatically create portable software packages. USENIX Association, Portland, OR (2011) Introduction Provenance in Software Packages June, 10th , 2014 5 / 29
  • 6. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 7. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 8. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 9. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 10. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 11. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 12. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 13. CDE Create a portable software package without installation, conïŹguration, or privilege permissions Audit mode to create a CDE package Introduction Provenance in Software Packages June, 10th , 2014 6 / 29
  • 14. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 15. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 16. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 17. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 18. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 19. CDE - Execution Mode Introduction Provenance in Software Packages June, 10th , 2014 7 / 29
  • 20. Software Pipelines Contain CDE packages A software pipeline consists many individual software modules A software module depends on externally-developed libraries A software module is often packaged together with speciïŹc versions of libraries Introduction Provenance in Software Packages June, 10th , 2014 8 / 29
  • 21. RDCEP Usecase Alice, Bob, and Charlie are scientists at the Center for Robust Decision Making on Climate and Energy Policy (RDCEP) A develops data integration methods to produce higher-resolution datasets depicting inferred land use over time. B develops computational models to do model-based comparative analysis. B’s software environment consists of A’s software modules to produce high-resolution datasets. C uses A and B’s software modules within data-intensive computing methods to run them in parallel. The Center wants to predict future yields of staple agricultural commodities given changes in the climate. C's Package (Merge from B's) B's Package (from A's) A's Package Parallel init Aggregation Generate images Model-based analysis Parallel summary Generate images Model-based analysisRetrive data Aggregation Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 9 / 29
  • 22. A’s Experiment & Package A’s package cde-root path to A’s ïŹles a-experiment.sh retrieve-data aggregation generate-image f1, f2, a-output path to common libs libc.so Re-execute A’s experiment: cde-exec a-experiment.sh cat a-experiment.sh ./retrieve-data f1 ./aggregation f1 f2 ./generate-image f2 a-output Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 11 / 29
  • 23. B’s Experiment & Package B’s package cde-root path to A’s ïŹles [...] path to B’s ïŹles b-experiment.sh analysis b-output path to common libs libc.so Re-execute B’s experiment: cde-exec b-experiment.sh cat b-experiment.sh cd path to A’s experiment cde-exec a-experiment.sh cd path to B’s ïŹles ./analysis path to A’s ïŹles/a-output b-output Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 12 / 29
  • 24. C’s Experiment & Package C’s package cde-root path to A’s ïŹles [...] path to B’s ïŹles [...] path to C’s ïŹles c-experiment.sh parallel-init parallel-summary c-output path to common libs libc.so Re-execute C’s experiment: cde-exec c-experiment.sh cat c-experiment.sh parallel-init path to A’s ïŹles/f4 cd path to A’s ïŹles cde-exec ./aggregation f4 f5 cde-exec ./generate-image f5 f6 cd path to B’s ïŹles cde-exec ./analysis path to A’s ïŹles/f6 f7 cd path to C’s ïŹles ./parallel-summary path to B’s ïŹles/f7 c-output Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 13 / 29
  • 25. Dependency Overlap in Multiple cde-root Directories Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 14 / 29
  • 26. File Overlap of DiïŹ€erent Linux Distributions RH SUSE U12 U13 Amz 5498 / 23k 3184 / 11k 1203 / 5.4k 1819 / 5.5k RH 3861 / 12k 1654 / 6.6k 2223 / 6.3k SUSE 1245 / 3.9k 2085 / 6.4k U12 8226 / 24k Table 1 : Ratio of diïŹ€erent ïŹles having the same path in 5 popular AMIs. The denominator is number of ïŹles having the same path in two distributions, and the numerator is the number of ïŹles with the same path but diïŹ€erent md5 checksum. Ommited are manual pages in /usr/share/ directory. Amz Amazon Linux AMI RH Red Hat Enterprise Linux 6.4 SUSE SUSE Linux Enterprise Server 11 U12 Ubuntu Server 12.04.3 LTS U13 Ubuntu Server 13.10 Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 15 / 29
  • 27. Re-direction in Multiple cde-root Directories Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 16 / 29
  • 28. CDE-SP CDE-SP: Enhanced CDE that includes software provenance Describe tools and methods to audit, store, and query provenance Provenance queries Determine the environment under which a dependency was build Examine the dependencies which must be present Answer if packages in a pipeline can satisfy a new package Attribute authorship of software packages in a pipeline Combine and validate authorship from stored provenance Software Pipeline Usecase Provenance in Software Packages June, 10th , 2014 17 / 29
  • 29. CDE-SP Audit Objectives Capture additional details of the origins of a library or a binary Use these details for compiling and creating software pipelines Methods Create a dependency tree Process system calls are monitored Whenever a process executes a ïŹle system call, a dependency of that process is recorded Dependency can be a data ïŹle or a shared library Extract information about binaries and required shared libraries ïŹle, ldd, strings, and objdump UNIX commands uname -a and function getpwuid(getuid()) CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 18 / 29
  • 30. CDE-SP Audit Objectives Capture additional details of the origins of a library or a binary Use these details for compiling and creating software pipelines Methods Create a dependency tree Process system calls are monitored Whenever a process executes a ïŹle system call, a dependency of that process is recorded Dependency can be a data ïŹle or a shared library Extract information about binaries and required shared libraries ïŹle, ldd, strings, and objdump UNIX commands uname -a and function getpwuid(getuid()) CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 18 / 29
  • 31. CDE-SP Audit Objectives Capture additional details of the origins of a library or a binary Use these details for compiling and creating software pipelines Methods Create a dependency tree Process system calls are monitored Whenever a process executes a ïŹle system call, a dependency of that process is recorded Dependency can be a data ïŹle or a shared library Extract information about binaries and required shared libraries ïŹle, ldd, strings, and objdump UNIX commands uname -a and function getpwuid(getuid()) CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 18 / 29
  • 32. Storage Store provenance within the package itself Use LevelDB: a fast and light-weight key-value storage library Encode in the key the UNIX process identiïŹer along with spawn time Key Value Explanation pid.PID1.exec.TIME PID2 PID1 wasTriggeredBy PID2 pid.PID.[path, pwd, args] VALUES Other properties of PID io.PID.action.IO.TIME FILE(PATH) PID wasGeneratedBy / wa- sUsedBy FILE(PATH) meta.agent USERNAME User information meta.machine OSNAME operating system distribution Table 2 : LevelDB key-value pairs that store ïŹle and process provenance. Capital letter words are arguments. CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 19 / 29
  • 33. Query LevelDB provides a minimal API for querying Simple, light-weight query interface Input: a program whose dependencies need to be retrieved Output: a GraphViz ïŹle displaying ïŹle and process dependencies Use depth ïŹrst search algorithm to create a dependency tree with the input program as its root Exclusion option to remove uninteresting dependencies: /lib/, /usr/lib/, /usr/share/, /etc/ CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 20 / 29
  • 34. Authorship of Software Modules Combine authorship of the contributing packages Validate authorship from the provenance stored in the original package Generate the subgraph associated with the part of the new package Use subgraph isomorphism (NP-Hard) to validate with the original provenance graph Match provenance nodes of processes with the same paths of their binaries and working directories Match provenance nodes of ïŹles with the same path CDE-SP: Software Provenance in CDE Provenance in Software Packages June, 10th , 2014 21 / 29
  • 35. Experiments Performance of CDE-SP Auditing performance overhead Disk storage increase Provenance query runtime Redirection overhead when multiple UUID-based directories are created Compare the lightweight virtualization approach of CDE-SP with Kameleon3, a heavyweight virtualization approach used for reproducibility Experiments were run on Ubuntu 12.04 LTS workstation with an 8GBs RAM and 8-core Intel(R) processor clocking at 1600MHz. 3 Emeras, J., Richard, O., Bzeznik, B.: Reconstructing the software environment of an experiment with kameleon (2011) Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 22 / 29
  • 36. Performance & Size Overhead Pipeline with two applications: Aggregation and Generate Image 2.1% slowdown of CDE-SP vs. 0-30% CDE virtualization overhead4 LevelDB database size 236kB (0.03% package size increase) contains approximately 12,000 key-value pairs Create Package Execution Disk Usage Provenance Query CDE 852.6±2.4 568.8±2.4 732MB CDE-SP 870.5±2.5 569.5±1.8 732MB+236kB 0.4±0.03 (seconds) (seconds) (seconds) Table 3 : Increase in CDE-SP performance is negligible in comparison with CDE 4 Guo, P.J., Engler, D.: CDE: using system call interposition to automatically create portable software packages. USENIX Association, Portland, OR (2011) Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 23 / 29
  • 37. Redirection Overhead in CDE-SP Pipelined output of Aggregation to input of Generate Image 3 output ïŹles of Aggregation package were moved to Generate Image package 2 cross-package execve() system calls Less than a 1% slowdown of CDE-SP Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 24 / 29
  • 38. Kameleon Use the Kameleon engine to make a bare bone VM appliance Self-written YAML-formatted recipes Self-written macrosteps and microsteps Kameleon can create virtual machine appliances in diïŹ€erent formats for diïŹ€erent Linux distributions Generates bash scripts to create an initial virtual image of a Linux distribution Populates the image with more Linux packages Populates with content of a CDE-SP package Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 25 / 29
  • 39. CDE-SP Vs Kameleon 0 200 400 600 800 1000 1200 1400 1600 Kameleon CDE-SP Seconds Figure 1 : Overhead when using CDE with Kameleon VM appliance Experiment and Evaluation Provenance in Software Packages June, 10th , 2014 26 / 29
  • 40. Related Work Research Objects: packages scientiïŹc workïŹ‚ows with auxiliary information about workïŹ‚ows, including provenance information and metadata, such as the authors, the version CDE and Sumatra can capture an execution environment in a lightweight fashion SystemTap, being a kernel-based tracing mechanism, has better performance compared to ptrace but needs to run at a higher privilege level Provenance-to-Use (PTU) and ReproZip include provenance in self-contained software packages Related Work Provenance in Software Packages June, 10th , 2014 27 / 29
  • 41. Conclusion CDE does not encapsulate provenance of associated dependencies in a software package The lack of information about the origins of dependencies in a software package creates issues when constructing software pipelines from packages CDE-SP can include software provenance as part of a software package CDE-SP can use software package provenance to build software pipelines CDE-SP can maintain provenance when used to construct software pipelines Conclusion Provenance in Software Packages June, 10th , 2014 28 / 29
  • 42. Acknowledgments Neil Best at The University of Chicago Joshua Elliott at The Columbia University Justin Wozniak at Argonne National Laboratory Allison Brizius at RDCEP Center NSF grant SES-0951576, GEO-1343816 Acknowledgments Provenance in Software Packages June, 10th , 2014 29 / 29