SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
Benefits and costs of
FAIR Implementation
for the life sciences industry
Moderated by: Ian Harrow (Pistoia Alliance)
Panelists:
James Malone (SciBite) Filip Pattyn (OntoForce)
Alexandra Grebe de Barron (Bayer) Drashtti Vasant (Bayer)
This webinar is being recorded
©PistoiaAlliance
FAIR Guiding Principles at-a-Glance
3
Findable:
• F1 (meta)data are assigned a globally
• unique and persistent identifier
• F2 data are described with rich metadata
• F3 metadata clearly and explicitly include the
identifier of the data it describes
• F4 (meta)data are registered or indexed in a
searchable resource
Interoperable:
• I1 (meta)data use a formal, accessible, shared, and
broadly applicable language for knowledge
representation
• I2 (meta)data use vocabularies that follow FAIR
principles
• I3 (meta)data include qualified references to other
(meta)data
Accessible:
• A1 (meta)data are retrievable by their identifier using a
standardized communications protocol
• A1.1 the protocol is open, free, and universally
implementable
• A1.2 the protocol allows for an authentication and
authorization procedure, where necessary;
• A2 metadata are accessible, even when the data are
no longer available
Reusable:
• R1 meta(data) are richly described with a plurality of
accurate and relevant attributes
• R1.1 (meta)data are released with a clear and
accessible data usage license
• R1.2 (meta)data are associated with detailed
provenance
• R1.3 (meta)data meet domain-relevant community
standards
Source: The FAIR Guiding Principles for scientific data management and stewardship. Wilkinson MD et al 2016 doi.org/10.1038/sdata.2016.18
Poll Question 1
Where is your workplace?
A) A biopharmaceutical company
B) An agriculture or food company
C) A technology provider
D) An academic institution
E) Other
Poll Question 2
How mature is FAIR implementation in your workplace?
A) Minimal understanding of FAIR guidelines
B) Good understanding but minimal FAIR implementation
C) FAIR implementation is well underway
D) Mature FAIR implementation in selected areas of my organisation
E) Mature and systematic implementation of FAIR across my organisation
©PistoiaAlliance
Our Expert Panel
• James Malone
– CTO at SciBite, a semantic
technology company.
Previously, Lead ontologist at
EMBL-EBI. Worked on Open
Targets & EBI’s linked data
platform. PhD in Machine
Learning in Bioinformatics.
• Alexandra Grebe de Barron
– IT business partner for Real
World Evidence at Bayer.
Works closely with scientists
across all functions to make
data FAIR for advanced
analytics. PhD in Molecular
Genetics.
• Filip Pattyn
– Scientific lead at
ONTOFORCE, a semantic
technology company.
Previously, a Consultant at
Menapi Informatics. Worked on
ICT and bioinformatics. PhD in
Applied Informatics in Medical
Sciences.
• Drashtti Vasant
– IT Business Partner for
Translational Sciences at
Bayer. Currently leading a
project to enable data
integration of pre-clinical
studies. Worked at the
European Bioinformatics
Institute and Thomson Reuters.
Known as the “FAIR Ladies” at Bayer
©PistoiaAlliance
Cost effective FAIR
James Malone
@scibite
@jamesmalone
©PistoiaAlliance
More than the sum of its parts
https://eventhorizontelescope.org/
©PistoiaAlliance
How do we make all of the components actually
happen? Where are the pinch points?
©PistoiaAlliance
The Cost of unFAIR
• Cost of not doing FAIR – the cost of lost opportunity – is very high
• May 2018 EC report on cost-benefit estimated missed opportunity to be >€10 Billion
• Suggests barriers persist:
“The fact that the FAIR principles
are not common practice yet is due
to numerous reasons.”
“Despite the significant annual cost…many research performing organisations and
infrastructures are still reluctant to apply the FAIR principles and share the datasets
because of real or perceived costs, mostly related to time investment and money.”
©PistoiaAlliance
Across Industries
• Life sciences is a good
starting point as so much
open data
• But not just a life science
problem
• The problem persists even
across organisations who
do not open their data
©PistoiaAlliance
Technical Debt
• There exists a lot of historic data with intrinsic value
• Q: Is tomorrow’s data always going to be more valuable than today’s?
• Automating as much of this as possible seems sweet spot for historic data
• Retrospective, manual curation
expensive and likely impossible:
• much of metadata missing
• data generators have moved on
• Commercial technology no longer
supported
• These challenges teach us why
prospective FAIR is valuable..
©PistoiaAlliance
Budgeting for Serendipity
• Structuring data for reuse should open up possibilities we
can’t conceive today
• Ishino et al (1987) reporting of repeat sequences
accidentally cloned part of gene sequencing work
• Mojica et al (1993) often go to first publication on
CRISPR, but made the connection with Ishino work after
‘trawling literature’^
• Value of hypothesis-free + hypothesis-driven research
• Data needs to be ‘broadly reusable’ to increase the
opportunity now and in future
https://www.broadinstitute.org/what-broad/areas-focus/project-spotlight/crispr-timeline
^https://www.cell.com/cell/pdf/S0092-8674(15)01705-5.pdf
©PistoiaAlliance
Cost of Representing Biology
• “Machine readable” representations get very complex, very quickly
• Knowing up front the future use is very hard, what do we represent?
©PistoiaAlliance
The EBI RDF Linked Data Platform
• Spectrum of semantics - knowing up front the future use is very hard
• Schema.org vs OWL modeling
• FAIR is not simply be ‘rebranding’ of semantic web (Mons et al, 2017)
• What can we justifiably simplify vs what is unsimplifiable
• Coordination took real effort (plus other cost to transform, maintain)
• Significant coordination activity even across 6 groups (and big
advantage that UniProt RDF already existed and we had previously
worked on Atlas RDF)
• Was really only achievable with minimum budget because the data
was already well annotated
• (Does not mean we shouldn’t try..)
©PistoiaAlliance
Cost of Culture Change
• Curation has always been an underfunded, underappreciated research
activity
• Most value is in producing data, summary analysis, actionable insights
• Peer review already has ‘issues’
• Investing in technology necessary but not sufficient
• People require investment
• Involve data generators in these conversations
©PistoiaAlliance
FAIR as a Machine Learning Enabler
• Creating training data, wrangling it, et al one of
biggest parts of ML
• Labeled training & test sets crucial step, need
generating or obtaining
• Also makes creating a new data set (e.g. subsetting
a few diff sets to create a new one) is expensive
• FAIR can help to:
1. Get you the data in the first place
2. Help you understand how you can use it (i.e. what is the
license)
3. Perform feature extraction by making those features more
readily extractable
4. Incorporate domain heuristics (e.g. from ontologies used
to describe data)
©PistoiaAlliance
Cost effective ways to think about FAIR
• Ask third party vendors you use if they support FAIR (and how) – includes
technology providers through to CROs
• Agree on your metadata standards across org and stick to them
• Involve data generators in your discussions(!)
• If you/your group are wrangling data for machine learning, think about
‘putting’ back’ the clean up they do
• Let any license/data usage live with the data
• If you are developing knowledge graphs, think about the schema you design
• For data capture, think about hooking up to existing ontology standards
where suitable
• Automate annotation where feasible using technology
cost
$
$$
©PistoiaAlliance
Increasingly FAIR
• Ensure FAIR data is shared across an organization to demonstrate
value
• Fund public curation in support of FAIR
• Use of FAIR-compatible metadata in ELNs
• Mandate minimum metadata for every experiment (requires
automated FAIR metric tests)
• Ensure FAIR data is shared across an organization to demonstrate
value
cost
$$
$$$
©PistoiaAlliance
Each step brings cost & benefit: objective should be to
produce the required resolution you need to make
sense of the data
©PistoiaAlliance
FAIRness as a cost-based measurement
How to assess FAIRness of a data source?
When is a dataset FAIR enough?
Filip Pattyn, PhD
Filip.pattyn@ontoforce.com
©PistoiaAlliance
Simple as counting the principles?
q F1.
q F2.
q F3.
q F4.
q A1.
q A1.1.
q A1.2.
q A2.
q I1.
q I2.
q I3.
q R1.
q R1.1.
q R1.2.
q R1.3.
q F1.
q F2.
q F3.
q F4.
q A1.
q A1.1.
q A1.2.
q A2.
q I1.
q I2.
q I3.
q R1.
q R1.1.
q R1.2.
q R1.3.
Total count Total count
Data source 1 Data source 2
©PistoiaAlliance
Difficult to compare
©PistoiaAlliance
How to measure FAIRness?
• Measuring FAIRness
–Clear definition of what is being measured and why one
wants to measure it.
–Describe what’s a valid result and how one obtains it, thus
reproducible
• Qualities of a good measurement
–: able to distinguish differences
©PistoiaAlliance
What’s the rationale behind FAIR?
• (Re-)use data for multiple purposes
• What’s the impact for the end-user? Who’s the audience?
• More FAIRness should mean less hurdles to solve a use case
©PistoiaAlliance
When is a dataset FAIR or FAIR enough?
• Propagation of FAIRness
–I2. (meta)data use vocabularies that follow FAIR principles
> >
©PistoiaAlliance
More FAIR means less effort
• What’s the effort needed to make a data source more FAIR so one
can solve a single or multiple use cases?
• Effort quantified as a cost
–Time
–Human and machine resources
• Unit of measure
–Price ($)
• Potential to calculate the Return On Investment (ROI) on FAIR data
–Who benefits when a data sources is more FAIR? They don’t have to do the
effort anymore.
©PistoiaAlliance
More FAIR means less effort
transformations
…
…
…
more FAIR
application
graphical UI
API
©PistoiaAlliance
Different types of effort
transformations
…
…
…
more FAIR
application
graphical UI
API
©PistoiaAlliance
FAIR enough means less effort
application
graphical UI
API
• ROI of FAIR enough data
• Data Consumers can
–solve use case that couldn’t be
solved before
–solve use cases with much less
effort
©PistoiaAlliance
FAIR enough to bring value
Time
Cost
1st effort maintenance maintenance
2nd
value 1st effort
end-usersdatascientists
value 1st & 2nd effort
©PistoiaAlliance
Food for thought
Price vs. Time of data transformations > Unit of cost
–Faster by more expensive skilled data scientist
–Slower by less expensive junior data scientist
–Manual vs. automated
Resources
Time
fast but expensive
Slow but
inexpensive
©PistoiaAlliance
Food for thought
Data source FAIRness evolution
FAIRness ($)
data generation
initial use cases A
new use cases B new use cases C
new use
cases E
new use cases Fdata generation
initial use cases D
Technological
advancements
©PistoiaAlliance
FAIRness as a cost-based measurement
• Pragmatic, no over-engineering
• Use case and user oriented …. & dependent > not fixed
• Ratio scale
• Calculate ROI of FAIRness
Consensus units of cost
Hans Constandt
Bérénice Wulbrecht
Kenny Knecht, PhD
Paul Vauterin, PhD
Filip Pattyn, PhD
filip@ontoforce.com
+32 486 739 129
www.disqover.com
www.ontoforce.com
©PistoiaAlliance
///////////
Benefits and costs of
FAIR implementation
for the life sciences
industry
PISTOIA alliance debates
May 2019
FAIR ladies:
Alexandra Grebe de Barron
Drashtti Vasant
©PistoiaAlliance
Why FAIR in Pharma
36
scientific
discovery
medical
care
O O
OH
O
H3C
EHRAI
Digitalization
to overcome the gap towards translational medicine
©PistoiaAlliance
3 - 9 % of all research expenditure
Not having FAIR research data costs the European Economy
10.2 - 26 bn EUR every year
37
Written by pwc: https://publications.europa.eu/en/publication-
detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1
• Time spent on data collection, integration, analysis, registration,
publication and indexing
• Cost of storage for duplicated data
• Licence fees due to lack of open access to FAIR data
Impact on
research
activities
• Redundant research
• Lack of clarity about licenses and data use conditions
• Cross-fertilization
Impact on
collaboration
• Develop innovative services
• Create new business models
• Number of patents filed
• Use of machine science
• Job creation
Impact on
innovation
Allocation of 2,5% of R&D
expenditure into FAIR
implementation would yield a
positive ROI.
©PistoiaAlliance
38
When are we
done with it?
How much does it cost to make all
our R&D data FAIR until 2022?
Never - as long as
we innovate.
Cost of FAIR implementation:
Make legacy data FAIR
Make data generation FAIR
Create awareness, educate,
change mindset, incentivise
Set up FAIR ecosystem
Depends on the use
case.
©PistoiaAlliance
as described in the FAIR action plan
FAIR ecosystem: deliverables of a FAIR data service team
FAIR digital objects
data/metadata
software/code/algorithms
protocols
models
licenses
other research outputs
FAIR components
skills and investment
policies
data mgmt plans (DMPs)
persistent identifiers
standards
metrics
FAIR services
curation and stewardship
data lifecycle management
long-term preservation
file format transformation
data protection / security
handover plans for discontinued
services
39
©PistoiaAlliance
Skills needed to support the implementation of FAIR
The FAIR data service team
40
– Business Analyst (Strategic mindset)
– Curator/Domain Expert
– Service Engineer/Developer
– Data/Ontology Engineer
– Product Manager
– System Architect
– Data Steward
– Data Scientist
©PistoiaAlliance
Business Value
Benefits of FAIR Data
41
Innovation
Better
Prediction
Reduced
Trial length
Early
market
access
Generate
Insights
©PistoiaAlliance
PORTIN - Bayer
Case Study 1
42
Game Changer within translational data integration. Platform for access to clinical,
biomarker and biosample data from Bayer-sponsored interventional and non-
interventional clinical trials.
Easy access to all available clinical, biomarker and biosample data.
All data are semantically integrated within a common repository.
Data privacy questions and informed consents are considered
appropriately and contextual.
PORTIN is agnostic to data sources, types, or variety of data owners.
It enables scientists to search for patient cohorts within or across studies.
Reduced FTEs
Additional revenue generated (insights)
Savings on hardware costs = ~3 mio € p.a. till phase 2 (predicted profit: 350 mio € after phase 3)
©PistoiaAlliance
IMI eTox
Case Study 2
43
The eTOX project broke ground in that it enabled pharmaceutical companies to share their
data on the toxicity of drug-like compounds for the first time on a large scale. This resulted
in the creation of a large database, which can now be mined for further insights, including
predictions on whether or not a particular compound is likely to have an adverse effect on
patients.
Tox studies data and in silico models expected to:
enable 10% spend reduction for 1% of INDs and enable better decisions
enable 10% spend reduction for 10% target and candidate selection and lead optimization
Overall expected impact in 5 years = ~82 million euros
IMI impact
Value of
R&D
project
Direct
product
outputs
Investment
into IMI
projects
–
Proba-
bility of
success
Deve-
lopment
cost
Reach,
relevance,
reputation
+= x– +
©PistoiaAlliance
IMI AETIONOMY
Case Study 3
44
The AETIONOMY consortium chose to seek molecular characteristics of Alzheimer’s
disease (AD) and Parkinson’s disease (PD) that might contribute to a ‘taxonomy’ of
these conditions, and help our community move towards a precision-medicine
approach.
The project has developed innovative computational tools to manage and interpret
the complex healthcare and research data environment.
Identified groups of patients that differ significantly from each other.
New information about both the diseases
Insights into new disease models
Evaluate new data mining approaches
Validate new mechanistic disease hypotheses
©PistoiaAlliance
Summary
45
Research is key driver of productivity and economic growth
Redundant research does not contribute to science
Collaboration, especially public-private, is the KEY for
successful research output and innovation
Costs on time/storage/license fees spent by researchers to
manually read and understand metadata could be down to
almost zero by FAIR data
A sustainable FAIR ecosystem is the foundation for
advanced data analytics and AI
Change in mindset is a battle half won
Data and service providers need to be part of the change
Puts the “patient” in the center
Audience Q&A
Please use the Question function in GoToMeeting
Data for AI Models:
The Past, The Present, The Future
Join us for the next Pistoia Alliance AI Center of Excellence
webinar:
Presented by:
Prof. John Overington, CIO of the Medicines Discovery Catapult
Thursday June 6th, 11 am EST/ 4pm GMT
info@pistoiaalliance.org @pistoiaalliance www.pistoiaalliance.org

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

The future of FAIR
The future of FAIRThe future of FAIR
The future of FAIR
 
FAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to PracticeFAIR Data Knowledge Graphs–from Theory to Practice
FAIR Data Knowledge Graphs–from Theory to Practice
 
THOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier LinkingTHOR Workshop - Persistent Identifier Linking
THOR Workshop - Persistent Identifier Linking
 
FAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The HyveFAIR Data Experiences - Kees van Bochove - The Hyve
FAIR Data Experiences - Kees van Bochove - The Hyve
 
FAIR data overview
FAIR data overviewFAIR data overview
FAIR data overview
 
Darwin ai covid-net mitre
Darwin ai   covid-net mitreDarwin ai   covid-net mitre
Darwin ai covid-net mitre
 
FAIR principles and metrics for evaluation
FAIR principles and metrics for evaluationFAIR principles and metrics for evaluation
FAIR principles and metrics for evaluation
 
FAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDAFAIR data: what it means, how we achieve it, and the role of RDA
FAIR data: what it means, how we achieve it, and the role of RDA
 
Turning FAIR into Reality: Briefing on the EC’s report on FAIR data
Turning FAIR into Reality: Briefing on the EC’s report on FAIR dataTurning FAIR into Reality: Briefing on the EC’s report on FAIR data
Turning FAIR into Reality: Briefing on the EC’s report on FAIR data
 
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
AI-SDV 2021: Jay ven Eman - implementation-of-new-technology-within-a-big-pha...
 
Semantics and linked data at astra zeneca
Semantics and linked data at astra zenecaSemantics and linked data at astra zeneca
Semantics and linked data at astra zeneca
 
Designing a community resource - Sandra Orchard
Designing a community resource - Sandra OrchardDesigning a community resource - Sandra Orchard
Designing a community resource - Sandra Orchard
 
THOR Workshop - Introduction
THOR Workshop - IntroductionTHOR Workshop - Introduction
THOR Workshop - Introduction
 
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
Harnessing Edge Informatics to Accelerate Collaboration in BioPharma (Bio-IT ...
 
THOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEATHOR Workshop - Services PANGAEA
THOR Workshop - Services PANGAEA
 
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
AI-SDV 2020: Combining Knowledge and Machine Learning for the Analysis of Sci...
 
Advancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIRAdvancing Biomedical Knowledge Reuse with FAIR
Advancing Biomedical Knowledge Reuse with FAIR
 
"Cool" metadata for FAIR data
"Cool" metadata for FAIR data"Cool" metadata for FAIR data
"Cool" metadata for FAIR data
 
SciBite
SciBiteSciBite
SciBite
 
INSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology InfrastructureINSTRUCT - Integrated Structural Biology Infrastructure
INSTRUCT - Integrated Structural Biology Infrastructure
 

Ähnlich wie PA webinar on benefits & costs of FAIR implementation in life sciences

CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECAProject
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
Carole Goble
 
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Denodo
 

Ähnlich wie PA webinar on benefits & costs of FAIR implementation in life sciences (20)

Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
Making Data FAIR (Findable, Accessible, Interoperable, Reusable)
 
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...
OSFair2017 workshop | Monitoring the FAIRness of data sets - Introducing the ...
 
FAIR-Principles-and-ELN.pdf
FAIR-Principles-and-ELN.pdfFAIR-Principles-and-ELN.pdf
FAIR-Principles-and-ELN.pdf
 
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
Pistoia Alliance Debates: Ontologies mapping webinar 23rd Feb 2017
 
Scott Edmunds: FAIR or unfair? Principled publishing for data.
Scott Edmunds: FAIR or unfair? Principled publishing for data. Scott Edmunds: FAIR or unfair? Principled publishing for data.
Scott Edmunds: FAIR or unfair? Principled publishing for data.
 
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
CINECA webinar slides: Data Gravity in the Life Sciences: Lessons learned fro...
 
A coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon HodsonA coordinated framework for open data open science in Botswana/Simon Hodson
A coordinated framework for open data open science in Botswana/Simon Hodson
 
BioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative AdvantageBioPharma and FAIR Data, a Collaborative Advantage
BioPharma and FAIR Data, a Collaborative Advantage
 
FAIR data
FAIR dataFAIR data
FAIR data
 
I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17I o dav data workshop prof wafula final 19.9.17
I o dav data workshop prof wafula final 19.9.17
 
The FAIR Principles and the IMI FAIRplus project
The FAIR Principles and the IMI FAIRplus projectThe FAIR Principles and the IMI FAIRplus project
The FAIR Principles and the IMI FAIRplus project
 
How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)How are we Faring with FAIR? (and what FAIR is not)
How are we Faring with FAIR? (and what FAIR is not)
 
Noise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in DataNoise to Signal - The Biggest Problem in Data
Noise to Signal - The Biggest Problem in Data
 
FAIR Data Knowledge Graphs
FAIR Data Knowledge GraphsFAIR Data Knowledge Graphs
FAIR Data Knowledge Graphs
 
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
Square Pegs In Round Holes: Rethinking Data Availability in the Age of Automa...
 
Moving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time DataMoving Beyond Batch: Transactional Databases for Real-time Data
Moving Beyond Batch: Transactional Databases for Real-time Data
 
Findable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) dataFindable, Accessible, Interoperable and Reusable (FAIR) data
Findable, Accessible, Interoperable and Reusable (FAIR) data
 
Webinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management PlanningWebinar@AIMS_FAIR Principles and Data Management Planning
Webinar@AIMS_FAIR Principles and Data Management Planning
 
Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13Irving-TeraData: data and science driven big industry-nfdp13
Irving-TeraData: data and science driven big industry-nfdp13
 
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The HyveOpen Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
Open Insights Harvard DBMI - Personal Health Train - Kees van Bochove - The Hyve
 

Mehr von Pistoia Alliance

Mehr von Pistoia Alliance (20)

MPS webinar master deck
MPS webinar master deckMPS webinar master deck
MPS webinar master deck
 
Digital webinar master deck final
Digital webinar master deck finalDigital webinar master deck final
Digital webinar master deck final
 
Heartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirtiHeartificial intelligence - claudio-mirti
Heartificial intelligence - claudio-mirti
 
Fair by design
Fair by designFair by design
Fair by design
 
2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar2020.04.07 automated molecular design and the bradshaw platform webinar
2020.04.07 automated molecular design and the bradshaw platform webinar
 
AI in translational medicine webinar
AI in translational medicine webinarAI in translational medicine webinar
AI in translational medicine webinar
 
CEDAR work bench for metadata management
CEDAR work bench for metadata managementCEDAR work bench for metadata management
CEDAR work bench for metadata management
 
Implementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcareImplementing Blockchain applications in healthcare
Implementing Blockchain applications in healthcare
 
Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...Building trust and accountability - the role User Experience design can play ...
Building trust and accountability - the role User Experience design can play ...
 
Pistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier DatathonPistoia Alliance-Elsevier Datathon
Pistoia Alliance-Elsevier Datathon
 
Data for AI models, the past, the present, the future
Data for AI models, the past, the present, the futureData for AI models, the past, the present, the future
Data for AI models, the past, the present, the future
 
AI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoEAI & ML in Drug Design: Pistoia Alliance CoE
AI & ML in Drug Design: Pistoia Alliance CoE
 
Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019Ai in drug design webinar 26 feb 2019
Ai in drug design webinar 26 feb 2019
 
Blockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab SlidesBlockchain and IOT and the GxP Lab Slides
Blockchain and IOT and the GxP Lab Slides
 
Knowledge Graphs for Pharma PA Slideshow
Knowledge Graphs for Pharma PA SlideshowKnowledge Graphs for Pharma PA Slideshow
Knowledge Graphs for Pharma PA Slideshow
 
Pistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinarPistoia alliance harmonizing fair data catalog approaches webinar
Pistoia alliance harmonizing fair data catalog approaches webinar
 
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
Joint Pistoia Alliance & PRISME AI in pharma webinar 18 Oct 2018
 
Pistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseasesPistoia Alliance datathon for drug repurposing for rare diseases
Pistoia Alliance datathon for drug repurposing for rare diseases
 
blockchain-introduction-pistoia-alliance
blockchain-introduction-pistoia-allianceblockchain-introduction-pistoia-alliance
blockchain-introduction-pistoia-alliance
 
Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2Pistoia Alliance Demystifying AI & ML part 2
Pistoia Alliance Demystifying AI & ML part 2
 

Kürzlich hochgeladen

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
amitlee9823
 

Kürzlich hochgeladen (20)

Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

PA webinar on benefits & costs of FAIR implementation in life sciences

  • 1. Benefits and costs of FAIR Implementation for the life sciences industry Moderated by: Ian Harrow (Pistoia Alliance) Panelists: James Malone (SciBite) Filip Pattyn (OntoForce) Alexandra Grebe de Barron (Bayer) Drashtti Vasant (Bayer)
  • 2. This webinar is being recorded
  • 3. ©PistoiaAlliance FAIR Guiding Principles at-a-Glance 3 Findable: • F1 (meta)data are assigned a globally • unique and persistent identifier • F2 data are described with rich metadata • F3 metadata clearly and explicitly include the identifier of the data it describes • F4 (meta)data are registered or indexed in a searchable resource Interoperable: • I1 (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation • I2 (meta)data use vocabularies that follow FAIR principles • I3 (meta)data include qualified references to other (meta)data Accessible: • A1 (meta)data are retrievable by their identifier using a standardized communications protocol • A1.1 the protocol is open, free, and universally implementable • A1.2 the protocol allows for an authentication and authorization procedure, where necessary; • A2 metadata are accessible, even when the data are no longer available Reusable: • R1 meta(data) are richly described with a plurality of accurate and relevant attributes • R1.1 (meta)data are released with a clear and accessible data usage license • R1.2 (meta)data are associated with detailed provenance • R1.3 (meta)data meet domain-relevant community standards Source: The FAIR Guiding Principles for scientific data management and stewardship. Wilkinson MD et al 2016 doi.org/10.1038/sdata.2016.18
  • 4. Poll Question 1 Where is your workplace? A) A biopharmaceutical company B) An agriculture or food company C) A technology provider D) An academic institution E) Other
  • 5. Poll Question 2 How mature is FAIR implementation in your workplace? A) Minimal understanding of FAIR guidelines B) Good understanding but minimal FAIR implementation C) FAIR implementation is well underway D) Mature FAIR implementation in selected areas of my organisation E) Mature and systematic implementation of FAIR across my organisation
  • 6. ©PistoiaAlliance Our Expert Panel • James Malone – CTO at SciBite, a semantic technology company. Previously, Lead ontologist at EMBL-EBI. Worked on Open Targets & EBI’s linked data platform. PhD in Machine Learning in Bioinformatics. • Alexandra Grebe de Barron – IT business partner for Real World Evidence at Bayer. Works closely with scientists across all functions to make data FAIR for advanced analytics. PhD in Molecular Genetics. • Filip Pattyn – Scientific lead at ONTOFORCE, a semantic technology company. Previously, a Consultant at Menapi Informatics. Worked on ICT and bioinformatics. PhD in Applied Informatics in Medical Sciences. • Drashtti Vasant – IT Business Partner for Translational Sciences at Bayer. Currently leading a project to enable data integration of pre-clinical studies. Worked at the European Bioinformatics Institute and Thomson Reuters. Known as the “FAIR Ladies” at Bayer
  • 7. ©PistoiaAlliance Cost effective FAIR James Malone @scibite @jamesmalone
  • 8. ©PistoiaAlliance More than the sum of its parts https://eventhorizontelescope.org/
  • 9. ©PistoiaAlliance How do we make all of the components actually happen? Where are the pinch points?
  • 10. ©PistoiaAlliance The Cost of unFAIR • Cost of not doing FAIR – the cost of lost opportunity – is very high • May 2018 EC report on cost-benefit estimated missed opportunity to be >€10 Billion • Suggests barriers persist: “The fact that the FAIR principles are not common practice yet is due to numerous reasons.” “Despite the significant annual cost…many research performing organisations and infrastructures are still reluctant to apply the FAIR principles and share the datasets because of real or perceived costs, mostly related to time investment and money.”
  • 11. ©PistoiaAlliance Across Industries • Life sciences is a good starting point as so much open data • But not just a life science problem • The problem persists even across organisations who do not open their data
  • 12. ©PistoiaAlliance Technical Debt • There exists a lot of historic data with intrinsic value • Q: Is tomorrow’s data always going to be more valuable than today’s? • Automating as much of this as possible seems sweet spot for historic data • Retrospective, manual curation expensive and likely impossible: • much of metadata missing • data generators have moved on • Commercial technology no longer supported • These challenges teach us why prospective FAIR is valuable..
  • 13. ©PistoiaAlliance Budgeting for Serendipity • Structuring data for reuse should open up possibilities we can’t conceive today • Ishino et al (1987) reporting of repeat sequences accidentally cloned part of gene sequencing work • Mojica et al (1993) often go to first publication on CRISPR, but made the connection with Ishino work after ‘trawling literature’^ • Value of hypothesis-free + hypothesis-driven research • Data needs to be ‘broadly reusable’ to increase the opportunity now and in future https://www.broadinstitute.org/what-broad/areas-focus/project-spotlight/crispr-timeline ^https://www.cell.com/cell/pdf/S0092-8674(15)01705-5.pdf
  • 14. ©PistoiaAlliance Cost of Representing Biology • “Machine readable” representations get very complex, very quickly • Knowing up front the future use is very hard, what do we represent?
  • 15. ©PistoiaAlliance The EBI RDF Linked Data Platform • Spectrum of semantics - knowing up front the future use is very hard • Schema.org vs OWL modeling • FAIR is not simply be ‘rebranding’ of semantic web (Mons et al, 2017) • What can we justifiably simplify vs what is unsimplifiable • Coordination took real effort (plus other cost to transform, maintain) • Significant coordination activity even across 6 groups (and big advantage that UniProt RDF already existed and we had previously worked on Atlas RDF) • Was really only achievable with minimum budget because the data was already well annotated • (Does not mean we shouldn’t try..)
  • 16. ©PistoiaAlliance Cost of Culture Change • Curation has always been an underfunded, underappreciated research activity • Most value is in producing data, summary analysis, actionable insights • Peer review already has ‘issues’ • Investing in technology necessary but not sufficient • People require investment • Involve data generators in these conversations
  • 17. ©PistoiaAlliance FAIR as a Machine Learning Enabler • Creating training data, wrangling it, et al one of biggest parts of ML • Labeled training & test sets crucial step, need generating or obtaining • Also makes creating a new data set (e.g. subsetting a few diff sets to create a new one) is expensive • FAIR can help to: 1. Get you the data in the first place 2. Help you understand how you can use it (i.e. what is the license) 3. Perform feature extraction by making those features more readily extractable 4. Incorporate domain heuristics (e.g. from ontologies used to describe data)
  • 18. ©PistoiaAlliance Cost effective ways to think about FAIR • Ask third party vendors you use if they support FAIR (and how) – includes technology providers through to CROs • Agree on your metadata standards across org and stick to them • Involve data generators in your discussions(!) • If you/your group are wrangling data for machine learning, think about ‘putting’ back’ the clean up they do • Let any license/data usage live with the data • If you are developing knowledge graphs, think about the schema you design • For data capture, think about hooking up to existing ontology standards where suitable • Automate annotation where feasible using technology cost $ $$
  • 19. ©PistoiaAlliance Increasingly FAIR • Ensure FAIR data is shared across an organization to demonstrate value • Fund public curation in support of FAIR • Use of FAIR-compatible metadata in ELNs • Mandate minimum metadata for every experiment (requires automated FAIR metric tests) • Ensure FAIR data is shared across an organization to demonstrate value cost $$ $$$
  • 20. ©PistoiaAlliance Each step brings cost & benefit: objective should be to produce the required resolution you need to make sense of the data
  • 21. ©PistoiaAlliance FAIRness as a cost-based measurement How to assess FAIRness of a data source? When is a dataset FAIR enough? Filip Pattyn, PhD Filip.pattyn@ontoforce.com
  • 22. ©PistoiaAlliance Simple as counting the principles? q F1. q F2. q F3. q F4. q A1. q A1.1. q A1.2. q A2. q I1. q I2. q I3. q R1. q R1.1. q R1.2. q R1.3. q F1. q F2. q F3. q F4. q A1. q A1.1. q A1.2. q A2. q I1. q I2. q I3. q R1. q R1.1. q R1.2. q R1.3. Total count Total count Data source 1 Data source 2
  • 24. ©PistoiaAlliance How to measure FAIRness? • Measuring FAIRness –Clear definition of what is being measured and why one wants to measure it. –Describe what’s a valid result and how one obtains it, thus reproducible • Qualities of a good measurement –: able to distinguish differences
  • 25. ©PistoiaAlliance What’s the rationale behind FAIR? • (Re-)use data for multiple purposes • What’s the impact for the end-user? Who’s the audience? • More FAIRness should mean less hurdles to solve a use case
  • 26. ©PistoiaAlliance When is a dataset FAIR or FAIR enough? • Propagation of FAIRness –I2. (meta)data use vocabularies that follow FAIR principles > >
  • 27. ©PistoiaAlliance More FAIR means less effort • What’s the effort needed to make a data source more FAIR so one can solve a single or multiple use cases? • Effort quantified as a cost –Time –Human and machine resources • Unit of measure –Price ($) • Potential to calculate the Return On Investment (ROI) on FAIR data –Who benefits when a data sources is more FAIR? They don’t have to do the effort anymore.
  • 28. ©PistoiaAlliance More FAIR means less effort transformations … … … more FAIR application graphical UI API
  • 29. ©PistoiaAlliance Different types of effort transformations … … … more FAIR application graphical UI API
  • 30. ©PistoiaAlliance FAIR enough means less effort application graphical UI API • ROI of FAIR enough data • Data Consumers can –solve use case that couldn’t be solved before –solve use cases with much less effort
  • 31. ©PistoiaAlliance FAIR enough to bring value Time Cost 1st effort maintenance maintenance 2nd value 1st effort end-usersdatascientists value 1st & 2nd effort
  • 32. ©PistoiaAlliance Food for thought Price vs. Time of data transformations > Unit of cost –Faster by more expensive skilled data scientist –Slower by less expensive junior data scientist –Manual vs. automated Resources Time fast but expensive Slow but inexpensive
  • 33. ©PistoiaAlliance Food for thought Data source FAIRness evolution FAIRness ($) data generation initial use cases A new use cases B new use cases C new use cases E new use cases Fdata generation initial use cases D Technological advancements
  • 34. ©PistoiaAlliance FAIRness as a cost-based measurement • Pragmatic, no over-engineering • Use case and user oriented …. & dependent > not fixed • Ratio scale • Calculate ROI of FAIRness Consensus units of cost Hans Constandt Bérénice Wulbrecht Kenny Knecht, PhD Paul Vauterin, PhD Filip Pattyn, PhD filip@ontoforce.com +32 486 739 129 www.disqover.com www.ontoforce.com
  • 35. ©PistoiaAlliance /////////// Benefits and costs of FAIR implementation for the life sciences industry PISTOIA alliance debates May 2019 FAIR ladies: Alexandra Grebe de Barron Drashtti Vasant
  • 36. ©PistoiaAlliance Why FAIR in Pharma 36 scientific discovery medical care O O OH O H3C EHRAI Digitalization to overcome the gap towards translational medicine
  • 37. ©PistoiaAlliance 3 - 9 % of all research expenditure Not having FAIR research data costs the European Economy 10.2 - 26 bn EUR every year 37 Written by pwc: https://publications.europa.eu/en/publication- detail/-/publication/d375368c-1a0a-11e9-8d04-01aa75ed71a1 • Time spent on data collection, integration, analysis, registration, publication and indexing • Cost of storage for duplicated data • Licence fees due to lack of open access to FAIR data Impact on research activities • Redundant research • Lack of clarity about licenses and data use conditions • Cross-fertilization Impact on collaboration • Develop innovative services • Create new business models • Number of patents filed • Use of machine science • Job creation Impact on innovation Allocation of 2,5% of R&D expenditure into FAIR implementation would yield a positive ROI.
  • 38. ©PistoiaAlliance 38 When are we done with it? How much does it cost to make all our R&D data FAIR until 2022? Never - as long as we innovate. Cost of FAIR implementation: Make legacy data FAIR Make data generation FAIR Create awareness, educate, change mindset, incentivise Set up FAIR ecosystem Depends on the use case.
  • 39. ©PistoiaAlliance as described in the FAIR action plan FAIR ecosystem: deliverables of a FAIR data service team FAIR digital objects data/metadata software/code/algorithms protocols models licenses other research outputs FAIR components skills and investment policies data mgmt plans (DMPs) persistent identifiers standards metrics FAIR services curation and stewardship data lifecycle management long-term preservation file format transformation data protection / security handover plans for discontinued services 39
  • 40. ©PistoiaAlliance Skills needed to support the implementation of FAIR The FAIR data service team 40 – Business Analyst (Strategic mindset) – Curator/Domain Expert – Service Engineer/Developer – Data/Ontology Engineer – Product Manager – System Architect – Data Steward – Data Scientist
  • 41. ©PistoiaAlliance Business Value Benefits of FAIR Data 41 Innovation Better Prediction Reduced Trial length Early market access Generate Insights
  • 42. ©PistoiaAlliance PORTIN - Bayer Case Study 1 42 Game Changer within translational data integration. Platform for access to clinical, biomarker and biosample data from Bayer-sponsored interventional and non- interventional clinical trials. Easy access to all available clinical, biomarker and biosample data. All data are semantically integrated within a common repository. Data privacy questions and informed consents are considered appropriately and contextual. PORTIN is agnostic to data sources, types, or variety of data owners. It enables scientists to search for patient cohorts within or across studies. Reduced FTEs Additional revenue generated (insights) Savings on hardware costs = ~3 mio € p.a. till phase 2 (predicted profit: 350 mio € after phase 3)
  • 43. ©PistoiaAlliance IMI eTox Case Study 2 43 The eTOX project broke ground in that it enabled pharmaceutical companies to share their data on the toxicity of drug-like compounds for the first time on a large scale. This resulted in the creation of a large database, which can now be mined for further insights, including predictions on whether or not a particular compound is likely to have an adverse effect on patients. Tox studies data and in silico models expected to: enable 10% spend reduction for 1% of INDs and enable better decisions enable 10% spend reduction for 10% target and candidate selection and lead optimization Overall expected impact in 5 years = ~82 million euros IMI impact Value of R&D project Direct product outputs Investment into IMI projects – Proba- bility of success Deve- lopment cost Reach, relevance, reputation += x– +
  • 44. ©PistoiaAlliance IMI AETIONOMY Case Study 3 44 The AETIONOMY consortium chose to seek molecular characteristics of Alzheimer’s disease (AD) and Parkinson’s disease (PD) that might contribute to a ‘taxonomy’ of these conditions, and help our community move towards a precision-medicine approach. The project has developed innovative computational tools to manage and interpret the complex healthcare and research data environment. Identified groups of patients that differ significantly from each other. New information about both the diseases Insights into new disease models Evaluate new data mining approaches Validate new mechanistic disease hypotheses
  • 45. ©PistoiaAlliance Summary 45 Research is key driver of productivity and economic growth Redundant research does not contribute to science Collaboration, especially public-private, is the KEY for successful research output and innovation Costs on time/storage/license fees spent by researchers to manually read and understand metadata could be down to almost zero by FAIR data A sustainable FAIR ecosystem is the foundation for advanced data analytics and AI Change in mindset is a battle half won Data and service providers need to be part of the change Puts the “patient” in the center
  • 46. Audience Q&A Please use the Question function in GoToMeeting
  • 47. Data for AI Models: The Past, The Present, The Future Join us for the next Pistoia Alliance AI Center of Excellence webinar: Presented by: Prof. John Overington, CIO of the Medicines Discovery Catapult Thursday June 6th, 11 am EST/ 4pm GMT