Weitere ähnliche Inhalte Ähnlich wie PA webinar on benefits & costs of FAIR implementation in life sciences (20) Mehr von Pistoia Alliance (20) Kürzlich hochgeladen (20) PA webinar on benefits & costs of FAIR implementation in life sciences 1. Benefits and costs of
FAIR Implementation
for the life sciences industry
Moderated by: Ian Harrow (Pistoia Alliance)
Panelists:
James Malone (SciBite) Filip Pattyn (OntoForce)
Alexandra Grebe de Barron (Bayer) Drashtti Vasant (Bayer)
3. ©PistoiaAlliance
FAIR Guiding Principles at-a-Glance
3
Findable:
• F1 (meta)data are assigned a globally
• unique and persistent identifier
• F2 data are described with rich metadata
• F3 metadata clearly and explicitly include the
identifier of the data it describes
• F4 (meta)data are registered or indexed in a
searchable resource
Interoperable:
• I1 (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation
• I2 (meta)data use vocabularies that follow FAIR
principles
• I3 (meta)data include qualified references to other
(meta)data
Accessible:
• A1 (meta)data are retrievable by their identifier using a
standardized communications protocol
• A1.1 the protocol is open, free, and universally
implementable
• A1.2 the protocol allows for an authentication and
authorization procedure, where necessary;
• A2 metadata are accessible, even when the data are
no longer available
Reusable:
• R1 meta(data) are richly described with a plurality of
accurate and relevant attributes
• R1.1 (meta)data are released with a clear and
accessible data usage license
• R1.2 (meta)data are associated with detailed
provenance
• R1.3 (meta)data meet domain-relevant community
standards
Source: The FAIR Guiding Principles for scientific data management and stewardship. Wilkinson MD et al 2016 doi.org/10.1038/sdata.2016.18
4. Poll Question 1
Where is your workplace?
A) A biopharmaceutical company
B) An agriculture or food company
C) A technology provider
D) An academic institution
E) Other
5. Poll Question 2
How mature is FAIR implementation in your workplace?
A) Minimal understanding of FAIR guidelines
B) Good understanding but minimal FAIR implementation
C) FAIR implementation is well underway
D) Mature FAIR implementation in selected areas of my organisation
E) Mature and systematic implementation of FAIR across my organisation
6. ©PistoiaAlliance
Our Expert Panel
• James Malone
– CTO at SciBite, a semantic
technology company.
Previously, Lead ontologist at
EMBL-EBI. Worked on Open
Targets & EBI’s linked data
platform. PhD in Machine
Learning in Bioinformatics.
• Alexandra Grebe de Barron
– IT business partner for Real
World Evidence at Bayer.
Works closely with scientists
across all functions to make
data FAIR for advanced
analytics. PhD in Molecular
Genetics.
• Filip Pattyn
– Scientific lead at
ONTOFORCE, a semantic
technology company.
Previously, a Consultant at
Menapi Informatics. Worked on
ICT and bioinformatics. PhD in
Applied Informatics in Medical
Sciences.
• Drashtti Vasant
– IT Business Partner for
Translational Sciences at
Bayer. Currently leading a
project to enable data
integration of pre-clinical
studies. Worked at the
European Bioinformatics
Institute and Thomson Reuters.
Known as the “FAIR Ladies” at Bayer
10. ©PistoiaAlliance
The Cost of unFAIR
• Cost of not doing FAIR – the cost of lost opportunity – is very high
• May 2018 EC report on cost-benefit estimated missed opportunity to be >€10 Billion
• Suggests barriers persist:
“The fact that the FAIR principles
are not common practice yet is due
to numerous reasons.”
“Despite the significant annual cost…many research performing organisations and
infrastructures are still reluctant to apply the FAIR principles and share the datasets
because of real or perceived costs, mostly related to time investment and money.”
11. ©PistoiaAlliance
Across Industries
• Life sciences is a good
starting point as so much
open data
• But not just a life science
problem
• The problem persists even
across organisations who
do not open their data
12. ©PistoiaAlliance
Technical Debt
• There exists a lot of historic data with intrinsic value
• Q: Is tomorrow’s data always going to be more valuable than today’s?
• Automating as much of this as possible seems sweet spot for historic data
• Retrospective, manual curation
expensive and likely impossible:
• much of metadata missing
• data generators have moved on
• Commercial technology no longer
supported
• These challenges teach us why
prospective FAIR is valuable..
13. ©PistoiaAlliance
Budgeting for Serendipity
• Structuring data for reuse should open up possibilities we
can’t conceive today
• Ishino et al (1987) reporting of repeat sequences
accidentally cloned part of gene sequencing work
• Mojica et al (1993) often go to first publication on
CRISPR, but made the connection with Ishino work after
‘trawling literature’^
• Value of hypothesis-free + hypothesis-driven research
• Data needs to be ‘broadly reusable’ to increase the
opportunity now and in future
https://www.broadinstitute.org/what-broad/areas-focus/project-spotlight/crispr-timeline
^https://www.cell.com/cell/pdf/S0092-8674(15)01705-5.pdf
14. ©PistoiaAlliance
Cost of Representing Biology
• “Machine readable” representations get very complex, very quickly
• Knowing up front the future use is very hard, what do we represent?
15. ©PistoiaAlliance
The EBI RDF Linked Data Platform
• Spectrum of semantics - knowing up front the future use is very hard
• Schema.org vs OWL modeling
• FAIR is not simply be ‘rebranding’ of semantic web (Mons et al, 2017)
• What can we justifiably simplify vs what is unsimplifiable
• Coordination took real effort (plus other cost to transform, maintain)
• Significant coordination activity even across 6 groups (and big
advantage that UniProt RDF already existed and we had previously
worked on Atlas RDF)
• Was really only achievable with minimum budget because the data
was already well annotated
• (Does not mean we shouldn’t try..)
16. ©PistoiaAlliance
Cost of Culture Change
• Curation has always been an underfunded, underappreciated research
activity
• Most value is in producing data, summary analysis, actionable insights
• Peer review already has ‘issues’
• Investing in technology necessary but not sufficient
• People require investment
• Involve data generators in these conversations
17. ©PistoiaAlliance
FAIR as a Machine Learning Enabler
• Creating training data, wrangling it, et al one of
biggest parts of ML
• Labeled training & test sets crucial step, need
generating or obtaining
• Also makes creating a new data set (e.g. subsetting
a few diff sets to create a new one) is expensive
• FAIR can help to:
1. Get you the data in the first place
2. Help you understand how you can use it (i.e. what is the
license)
3. Perform feature extraction by making those features more
readily extractable
4. Incorporate domain heuristics (e.g. from ontologies used
to describe data)
18. ©PistoiaAlliance
Cost effective ways to think about FAIR
• Ask third party vendors you use if they support FAIR (and how) – includes
technology providers through to CROs
• Agree on your metadata standards across org and stick to them
• Involve data generators in your discussions(!)
• If you/your group are wrangling data for machine learning, think about
‘putting’ back’ the clean up they do
• Let any license/data usage live with the data
• If you are developing knowledge graphs, think about the schema you design
• For data capture, think about hooking up to existing ontology standards
where suitable
• Automate annotation where feasible using technology
cost
$
$$
19. ©PistoiaAlliance
Increasingly FAIR
• Ensure FAIR data is shared across an organization to demonstrate
value
• Fund public curation in support of FAIR
• Use of FAIR-compatible metadata in ELNs
• Mandate minimum metadata for every experiment (requires
automated FAIR metric tests)
• Ensure FAIR data is shared across an organization to demonstrate
value
cost
$$
$$$
21. ©PistoiaAlliance
FAIRness as a cost-based measurement
How to assess FAIRness of a data source?
When is a dataset FAIR enough?
Filip Pattyn, PhD
Filip.pattyn@ontoforce.com
22. ©PistoiaAlliance
Simple as counting the principles?
q F1.
q F2.
q F3.
q F4.
q A1.
q A1.1.
q A1.2.
q A2.
q I1.
q I2.
q I3.
q R1.
q R1.1.
q R1.2.
q R1.3.
q F1.
q F2.
q F3.
q F4.
q A1.
q A1.1.
q A1.2.
q A2.
q I1.
q I2.
q I3.
q R1.
q R1.1.
q R1.2.
q R1.3.
Total count Total count
Data source 1 Data source 2
24. ©PistoiaAlliance
How to measure FAIRness?
• Measuring FAIRness
–Clear definition of what is being measured and why one
wants to measure it.
–Describe what’s a valid result and how one obtains it, thus
reproducible
• Qualities of a good measurement
–: able to distinguish differences
25. ©PistoiaAlliance
What’s the rationale behind FAIR?
• (Re-)use data for multiple purposes
• What’s the impact for the end-user? Who’s the audience?
• More FAIRness should mean less hurdles to solve a use case
26. ©PistoiaAlliance
When is a dataset FAIR or FAIR enough?
• Propagation of FAIRness
–I2. (meta)data use vocabularies that follow FAIR principles
> >
27. ©PistoiaAlliance
More FAIR means less effort
• What’s the effort needed to make a data source more FAIR so one
can solve a single or multiple use cases?
• Effort quantified as a cost
–Time
–Human and machine resources
• Unit of measure
–Price ($)
• Potential to calculate the Return On Investment (ROI) on FAIR data
–Who benefits when a data sources is more FAIR? They don’t have to do the
effort anymore.
30. ©PistoiaAlliance
FAIR enough means less effort
application
graphical UI
API
• ROI of FAIR enough data
• Data Consumers can
–solve use case that couldn’t be
solved before
–solve use cases with much less
effort
31. ©PistoiaAlliance
FAIR enough to bring value
Time
Cost
1st effort maintenance maintenance
2nd
value 1st effort
end-usersdatascientists
value 1st & 2nd effort
32. ©PistoiaAlliance
Food for thought
Price vs. Time of data transformations > Unit of cost
–Faster by more expensive skilled data scientist
–Slower by less expensive junior data scientist
–Manual vs. automated
Resources
Time
fast but expensive
Slow but
inexpensive
33. ©PistoiaAlliance
Food for thought
Data source FAIRness evolution
FAIRness ($)
data generation
initial use cases A
new use cases B new use cases C
new use
cases E
new use cases Fdata generation
initial use cases D
Technological
advancements
34. ©PistoiaAlliance
FAIRness as a cost-based measurement
• Pragmatic, no over-engineering
• Use case and user oriented …. & dependent > not fixed
• Ratio scale
• Calculate ROI of FAIRness
Consensus units of cost
Hans Constandt
Bérénice Wulbrecht
Kenny Knecht, PhD
Paul Vauterin, PhD
Filip Pattyn, PhD
filip@ontoforce.com
+32 486 739 129
www.disqover.com
www.ontoforce.com
36. ©PistoiaAlliance
Why FAIR in Pharma
36
scientific
discovery
medical
care
O O
OH
O
H3C
EHRAI
Digitalization
to overcome the gap towards translational medicine
37. ©PistoiaAlliance
3 - 9 % of all research expenditure
Not having FAIR research data costs the European Economy
10.2 - 26 bn EUR every year
37
Written by pwc: https://publications.europa.eu/en/publication-
detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
• Time spent on data collection, integration, analysis, registration,
publication and indexing
• Cost of storage for duplicated data
• Licence fees due to lack of open access to FAIR data
Impact on
research
activities
• Redundant research
• Lack of clarity about licenses and data use conditions
• Cross-fertilization
Impact on
collaboration
• Develop innovative services
• Create new business models
• Number of patents filed
• Use of machine science
• Job creation
Impact on
innovation
Allocation of 2,5% of R&D
expenditure into FAIR
implementation would yield a
positive ROI.
38. ©PistoiaAlliance
38
When are we
done with it?
How much does it cost to make all
our R&D data FAIR until 2022?
Never - as long as
we innovate.
Cost of FAIR implementation:
Make legacy data FAIR
Make data generation FAIR
Create awareness, educate,
change mindset, incentivise
Set up FAIR ecosystem
Depends on the use
case.
39. ©PistoiaAlliance
as described in the FAIR action plan
FAIR ecosystem: deliverables of a FAIR data service team
FAIR digital objects
data/metadata
software/code/algorithms
protocols
models
licenses
other research outputs
FAIR components
skills and investment
policies
data mgmt plans (DMPs)
persistent identifiers
standards
metrics
FAIR services
curation and stewardship
data lifecycle management
long-term preservation
file format transformation
data protection / security
handover plans for discontinued
services
39
40. ©PistoiaAlliance
Skills needed to support the implementation of FAIR
The FAIR data service team
40
– Business Analyst (Strategic mindset)
– Curator/Domain Expert
– Service Engineer/Developer
– Data/Ontology Engineer
– Product Manager
– System Architect
– Data Steward
– Data Scientist
42. ©PistoiaAlliance
PORTIN - Bayer
Case Study 1
42
Game Changer within translational data integration. Platform for access to clinical,
biomarker and biosample data from Bayer-sponsored interventional and non-
interventional clinical trials.
Easy access to all available clinical, biomarker and biosample data.
All data are semantically integrated within a common repository.
Data privacy questions and informed consents are considered
appropriately and contextual.
PORTIN is agnostic to data sources, types, or variety of data owners.
It enables scientists to search for patient cohorts within or across studies.
Reduced FTEs
Additional revenue generated (insights)
Savings on hardware costs = ~3 mio € p.a. till phase 2 (predicted profit: 350 mio € after phase 3)
43. ©PistoiaAlliance
IMI eTox
Case Study 2
43
The eTOX project broke ground in that it enabled pharmaceutical companies to share their
data on the toxicity of drug-like compounds for the first time on a large scale. This resulted
in the creation of a large database, which can now be mined for further insights, including
predictions on whether or not a particular compound is likely to have an adverse effect on
patients.
Tox studies data and in silico models expected to:
enable 10% spend reduction for 1% of INDs and enable better decisions
enable 10% spend reduction for 10% target and candidate selection and lead optimization
Overall expected impact in 5 years = ~82 million euros
IMI impact
Value of
R&D
project
Direct
product
outputs
Investment
into IMI
projects
–
Proba-
bility of
success
Deve-
lopment
cost
Reach,
relevance,
reputation
+= x– +
44. ©PistoiaAlliance
IMI AETIONOMY
Case Study 3
44
The AETIONOMY consortium chose to seek molecular characteristics of Alzheimer’s
disease (AD) and Parkinson’s disease (PD) that might contribute to a ‘taxonomy’ of
these conditions, and help our community move towards a precision-medicine
approach.
The project has developed innovative computational tools to manage and interpret
the complex healthcare and research data environment.
Identified groups of patients that differ significantly from each other.
New information about both the diseases
Insights into new disease models
Evaluate new data mining approaches
Validate new mechanistic disease hypotheses
45. ©PistoiaAlliance
Summary
45
Research is key driver of productivity and economic growth
Redundant research does not contribute to science
Collaboration, especially public-private, is the KEY for
successful research output and innovation
Costs on time/storage/license fees spent by researchers to
manually read and understand metadata could be down to
almost zero by FAIR data
A sustainable FAIR ecosystem is the foundation for
advanced data analytics and AI
Change in mindset is a battle half won
Data and service providers need to be part of the change
Puts the “patient” in the center
47. Data for AI Models:
The Past, The Present, The Future
Join us for the next Pistoia Alliance AI Center of Excellence
webinar:
Presented by:
Prof. John Overington, CIO of the Medicines Discovery Catapult
Thursday June 6th, 11 am EST/ 4pm GMT