STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
FAIR Computational Workflows
1. FAIR Computational
Workflows
Professor Carole Goble
The University of Manchester UK
EU Research Infrastructures ELIXIR, IBISBA, EOSC-Life
BioExcel Centre of Excellence
Software Sustainability Institute UK
FAIRDOM Consortium
carole.goble@manchester.ac.uk
SERC Swedish e-Science Research Center Annual Meeting 13 May 2022
2. What is a Computational Workflow?
Multi-step processes for data analytics, data
processing pipelines and simulation sweeps
Linking computational steps
• Data flow between steps
• Control flow of steps
Handle data and processing dependencies
Operating over computational infrastructure
Come in many different flavours
Drug Discovery
3. What is a Computational Workflow?
Encoded method less dependent on implementations
inputs
outputs
tools, CLI,
containers,
workflows
Precise
specification
Software
Execution
WfMS
Engine
Workflow
Abstraction
Access to computational infrastructure and datasets,
tool interoperability, processing portability and
optimisation, data wrangling.
Composition
different
codes,
languages,
third parties
4. What is a Computational Workflow?
Inter-twingled mix and matching
Scripting
environments
Interactive Electronic
Research Notebooks
Workflow
Management Systems &
execution platforms
https://s.apache.org/existing-workflow-systems
300+ Systems
General and
Specialised
Interactive & exploratory
analysis with Human in the
Loop
Production, automated,
workflow-integrated
software
Tool chaining,
Batch processing,
Job Control
5. What is a Computational Workflow System?
From frameworks to web based analysis platforms, hybrid cloud deployment
6
Graph of jobs for
automatic
parallelisation, DIY
package &
containerisation
installation, auto-
documentation
Online portals users build
and reuse workflows
around publicly available or
user-uploaded data and
pre-wrapped, pre-installed
tools.
Communities cluster
around systems
Typically depends on:
• Support for specific data
types
• Support for specific codes
• Support for kinds of
workflow
• Skills level of workflow
developer
• Popularity
6. Why Computational Workflows?
prepare, analyze, share increasing volumes of complex data
CryoEM Image Analysis
Metagenomic Pipelines
Drug Discovery
Protein Ligand MD
Simulation
Genome Annotation
High Throughput Sequencing
[Fabrice Allain
JOBIM2021]
[Romain Dallet
JOBIM2021]
[Adam Hospital]
[Rob Finn]
[Carlos Oscar Sorzano Sanchez]
7. Why Computational Workflows?
data collection and model simulation
SERC: Data-driven computational
materials design, DCMD
Automatic workflow, data collection, and
development of open-data infrastructure
8. Why Computational Workflows?
SARS-CoV-2 allelic-variant surveillance
Automated repetitive monitoring of structured
data from the European COVID-19 Data Portal and
national SAR-CoV-2 sequencing datasets.
Scalable - global distributed PULSAR compute
network
• Improved data quality
• Uniformly analysed data for downstream
analysis & visualisation
• Submission of data to public archives
Ported tried and tested transparent methods
• EMERGEN, French SARS-CoV-2 genomic
surveillance
https://covid19.galaxyproject.org
10. Why Computational Workflow Systems?
Reproducibility
Regulation
Transparency
Documented Method
Labour saving
Productivity
Reliability
Sustained
Knowledge sharing
Scholarly Objects
Pool of know-how
Reuse, repurpose
Variant-based
Democratisation
of computational
analysis & methods
Upfront cost for downstream benefits
Benefits best when a community buys in and workflows are supported
11. Why FAIR Computational Workflows?
The FAIR Data Principles
RDA FAIR Data Maturity Model. Specification and Guidelines
https://zenodo.org/record/3909563#.YORYkUzTX19
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The
FAIR Guiding Principles for scientific data management
and stewardship. Sci Data 3, 160018 (2016).
https://doi.org/10.1038/sdata.2016.18
https://www.go-fair.org/fair-principles/
12. Why FAIR Computational Workflows?
The FAIR Data Principles
Enable automation
• Persistent human readable and machine-
actionable linked metadata
• Community standards
• Persistent identifiers
• Licensing and access rules
• Access protocols
• Register/index/search
RDA FAIR Data Maturity Model. Specification and Guidelines
https://zenodo.org/record/3909563#.YORYkUzTX19
Wilkinson, M., Dumontier, M., Aalbersberg, I. et al. The
FAIR Guiding Principles for scientific data management
and stewardship. Sci Data 3, 160018 (2016).
https://doi.org/10.1038/sdata.2016.18
https://www.go-fair.org/fair-principles/
13. Why FAIR Computational Workflows?
Developer/User viewpoints
How can I find already existing workflows?
Can I access them? Public or private? Git repository?
What language is it written in?
Can I rework it to use my tool?
Is it well enough described so I can understand it?
Can I use it?
Can I reuse it in our infrastructure?
Does it make FAIR data?
Will I get credit for it?
Can I track that credit?
How easy is it to be FAIR?
14. What are the FAIR Principles for Workflows?
Hybrid Processual Digital Objects
FAIR Method Objects
FAIR Software Objects
FAIR Data
In and Out
FAIR Enabling
Services
C. Goble, S. Cohen-Boulakia, S.
Soiland-Reyes, D. Garijo, Y. Gil, M.R.
Crusoe, K. Peters & D. Schober. FAIR
computational workflows. Data
Intelligence 2(2020), 108–121.
https://doi.org/10.1162/dint_a_00033
15. What are FAIR Principles for Workflows?
Hybrid Processual Digital Objects
Method “Data” Objects
Workflows as
FAIR Software
FAIR+R and FAIR++
Quality, maturity, maintainability
The principles revised
Workflows as
FAIR Digital Objects
Data-like method objects
Associated objects
The principles adapted
Workflows as
FAIR Data Instruments
FAIRification of the dataflow
The data principles supported
C. Goble, S. Cohen-Boulakia, S.
Soiland-Reyes, D. Garijo, Y. Gil, M.R.
Crusoe, K. Peters & D. Schober. FAIR
computational workflows. Data
Intelligence 2(2020), 108–121.
https://doi.org/10.1162/dint_a_00033
Workflow Objects
Software Objects
Data FAIRification
FAIR enabling services
Services
16. FAIR for Research Software
Processual Digital Objects
https://www.rd-alliance.org/groups/fair-4-researchsoftware-fair4rs-wg
FAIR for Research Software (FAIR4RS)
working group
Katz, et al PATTERNS 2, 2021
https://fairsharing.org/4100
17. FAIR Principles for Workflows
Hybrid Processual Digital Objects
Usable and Reusable
Living & reusable parts
versioned, forked, cloned
parts recycled
limited lifespans
citable credit
executability reproducibility,
portability
testing, maturity
quality, maintainability
FAIR+R FAIR++
Composition & agency
Abstractions
specification
implementation
instantiation
run result
modularisation
FAIR parts & dependencies
propagation of FAIR properties
18. Findable
Search engine supported
Public, private & DOI support
Different workflow languages
and systems
Git integration for repos
Versioning & snapshots
Described by metadata
licensing
authors
& credit
analytics
access
search
versions & status
other
workflows
200+
Workflows
90+
Teams
10+
different
systems
19. What Workflow Metadata?
Metadata for machines & people
Common metadata
about the workflow,
tools & parameters
Canonical workflow
description of the
steps of the workflow
Type the input and
outputs of the steps
Run Provenance
RO-Crate format for
packaging a workflow,
its metadata and
companion objects
(links to containers,
data etc) for exchange,
archiving, reporting,
citing.
FAIR Digital
Object
Open Communities
https://youtu.be/Rsuxn0m4bIM
21. Accessible (2)
A2. metadata are accessible, even when the workflow is no longer available
Enough metadata that a workflow is read-reproducible as a method description if it no longer runs
Metadata preservation belts
and braces
republish in another archive
22. Interoperable (1 & 2)
WfMS interoperability: describe
workflows independently of WfMS.
Platform independent pipeline
exchange and comparison.
Workflow Composability: Software interoperates
through APIs and metadata standards (FAIR4RS).
Workflow-ready tools.
Tested & validated canonical workflow blocks.
https://openwdl.org/
https://www.commonwl.org
Design for FAIR
Workflow Reuse
Licence
combinations
Access permissions
Clean interfaces
BioExcel Building Blocks
biomolecular simulation tools
https://workflowhub.eu/projects/11
23. Reusable and Usable
Composability + Associated Objects + Metadata + FAIR Services
Reusable – “can be understood, modified, built upon or incorporated into other
workflows” Usable – “can be executed”
Containers & Packaging Testing & monitoring
checker workflows
test data
https://crs4.github.io/life_monitor/
https://openebench.bsc.es/dashboard
Is a workflow reusable if it’s
• resource greedy
• needs special resources
• needs unavailable data
• cannot be ported or run by
anyone other than the
developers?
24. Data FAIRification & FAIR Data by Design
Metadata generated for data products, Assisted by WfMS and tools
Reviewing
Curation
Certification
Governance
Best Practice
Golden
Examples
Canonical
workflows
Design for
FAIR Data
and Reuse
nf-core
26. How can we Workflow FAIR Assist?
Workflow
developers
Tool and
data set
providers
Workflow readiness
FAIR Unit Testing
Brack, et al (2021). 10 Simple Rules for
making a software tool workflow-ready.
https://doi.org/10.5281/zenodo.5636487
Descriptions
Register in WorkflowHub
Best Practice
WfMS
platforms
Programmatic access
Automate FAIRness
FAIR Software
FAIR enabling Service
Use well documented
FAIR enabling and
FAIR workflows
credit the makers!
Users
28. Summary: FAIR Computational Workflows
Hybrid Processual Digital Objects FAIR takes a village*
*Borgman, C. L., & Bourne, P. E. (2021). Why it takes a village to manage and share data. Harvard Data Science Review (under Review), arXiv:2109.01694v1.
Building Communities
of Practice
29. Acknowledgements
The WorkflowHub Club, Bioschemas Community, RO-Crate
Community, CWL Community, Galaxy Europe, EOSC-Life
and ELIXIR Tools Platform.
Special Thanks
Stian Soiland-Reyes (U Manchester / U Amsterdam)
Paul Brack, Stuart Owen, Finn Bacall, Alan Williams (U Manchester)
Björn Grüning (U Freiburg)
Frederik Coppens (VIB)
Sarah Jones (GEANT)
Herve Menager (Pasteur Institute)
Sarah Cohen-Boulakia (U Paris Sacly)
Dan Katz (U Illinois Urbana-Champaign)
Simone Leo (CRS4)
Laura Rodriguez-Navas (BSC)
José Mª Fernández (BSC)
Workflow Community Initiative https://workflows.community/about
EOSC-Life https://www.eosc-life.eu/
ELIXIR http://elixir-europe.org
RO-Crate https://www.researchobject.org/ro-crate/
WorkflowHub https://workflowhub.eu/ and workflowhub.org
Galaxy Europe https://galaxyproject.eu/
Bioschemas https://bioschemas.org
Common Workflow Language https://www.commonwl.org/
Life Monitor https://crs4.github.io/life_monitor/