Presentation about new indicators for innovation missions focusing on the mission to transform the prevention, diagnosis and treatment of AI, given at the EMAEE conference, University of Sussex 5 June 2019.
2. Introduction Data and methods Findings Conclusion
My story [Spoiler alert]
➢ Mission oriented innovation policies hold promise… and risks
How do we design them, target them and evaluate them?
➢ Natural language processing and text mining can be used to produce
indicators about mission-oriented policies
This includes simple indicators of activity in a mission field, and more complex
indicators of disciplinary crossover, entry by new actors, diversity in trajectories...
➢ We develop experimental indicators for a UK mission: Transform the
prevention, diagnosis and treatment of chronic diseases with AI
There seems to be a rationale for the mission but already significant activity taking
place involving new actors and diverse discipline mixes and application domains.
■ What will this policy add?
4. Introduction Data and methods Findings Conclusion
Context
➢ Innovation missions are back in fashion [BEIS, 2017, Mazzucato,
2018]
➢ Idea: set up a challenge and provide resources to address it.
➢ Many rationales:
Productivity puzzle [Cantner & Vannucinni, 2018]
Emergence failures [Gustafsson & Autio, 2011]
Low additionality in existing instruments [Frenken, 2017]
Directionality [Aghion et al, 2014]
Technology - society alignment [Sarewitz, 2017, Schot & Steinmueller, 2018]
5. Introduction Data and methods Findings Conclusion
[AI mini-interlude]
➢ Why does AI appear in (almost) every innovation mission?
AI: Technology systems that are able to react flexibly (‘intelligently’?) to diverse
situations, thus informing or automating action [Agrawal et al, 2018, Mateos-Garcia,
2018]
■ Most recent example of a General Purpose Technology? [Cockburn et al, 2018,
Klinger et al, 2018]
Perception of some deployment failures: application not necessarily happening where
it could could create most (social) value
■ Unequal access to data and skills
■ Need to develop new processes
■ Regulatory and cultural barriers
■ Irresponsible innovation and unsuitable diffusion
Innovation missions could help diffuse AI responsibly into lagging sectors and applications
6. Introduction Data and methods Findings Conclusion
Examples
European Union United Kingdom
7. Introduction Data and methods Findings Conclusion
Missions are nothing new
Mowery, Nelson and Martin, 2010
8. Introduction Data and methods Findings Conclusion
Modern mission features (&
challenges)
➢ Mission selection
Missions should address genuine societal challenges
Missions should be ambitious…
...and feasible
➢ Implementation
Avoid looking for technology fixes for ‘wicked’ societal challenges
Enable bottom-up experimentation and learning
Avoid capture by new or old vested interests
➢ Evaluation
Seek to change the structure of research and technology fields
Need to attribute social, health, educational… impacts
9. Introduction Data and methods Findings Conclusion
Evidence gaps
➢ Mission: Use data, Artificial Intelligence and innovation to
transform the prevention, early diagnosis and treatment of chronic
diseases by 2030
How do we find AI?
■ We have no codes for emerging technologies
How do we monitor diffusion?
■ We don’t have a taxonomy of application domains
How do we measure impact?
■ We tend to measure outputs (not impacts) and to ignore relational dimensions of activity
(field formation).
We need to go beyond traditional data sources and analytical methods to inform these new
innovation policies
10. Introduction Data and methods Findings Conclusion
New data for new policies
Nesta, 2019
12. Introduction Data and methods Findings Conclusion
Research questions
➢ Can we use open (text) data to develop policy-relevant indicators
about an innovation mission (AI + Chronic Diseases)?
Levels of activity
Mission composition
■ What disciplines are participating?
■ Who is involved?
■ What is the diversity of trajectories being pursued?
Descriptive, policy oriented analysis as part of EURITO, a Horizon 2020 project.
The paper contains many more indicators. Here we focus on those with an evolutionary flavour.
All this for a mission that was only announced recently.
13. Introduction Data and methods Findings Conclusion
Gateway to Research
➢ Data on UK research council funding data between 2006 and now.
We work with ~37,000 research grants.
Projects
Organisations Outputs Predictive model
trained on discipline
labelled-data
14. Introduction Data and methods Findings Conclusion
Defining a mission field
Use data, Artificial Intelligence and innovation to transform the
prevention, early diagnosis and treatment of chronic diseases by 2030
Technology systems that are
able to react flexibly
(‘intelligently’?) to diverse
situations, thus informing or
automating action. This
generally involves the use of
machine learning algorithms.
-Take decades to become fully
established
-They require a long-term and
systematic approach to
prevention and treatment
Examples: cardiovascular diseases,
mainly heart disease and stroke;
cancer; chronic respiratory
diseases; and diabetes.
16. Introduction Data and methods Findings Conclusion
Looking for AI (options)
TITLE: Computer models for CRLM progression assessment based on histopathological image
scans
====
In cancer diagnosis and treatment procedures, qualitative histopathological evaluation on
microscopic samples from tumour tissue is regarded as the gold standard for confirmative
diagnosis of almost all types of cancer. For CRLM, the assessment of pathological tumour
regression after preoperative chemotherapy is mostly based on estimating the proportion
of tumour cells in relation to the total tumour area (including the latter tumour
necrosis, fibrosis and other regressive changes) as well as biologically relevant
histology features, in particular the tumour invasion front. These properties determine
pathological tumour regression or grade of response to chemotherapy and provide valuable
prognostic information on the risk for cancer progression. Currently, this
histopathological evaluation is performed by expert pathologists through visual
assessment of the tumour slides. This is often time-consuming and expensive due to the
large amount of slides to be reviewed and the limited availability of subspecialised
liver pathologists. Moreover, visual evaluations are inherently subject to inter- and
intra-observer variability, and may be unacceptably inconsistent and imprecise, with
negative impact in the actual diagnosis and future treatment planning.
This project aims at developing an intelligent computer system that enables automatic,
precise, objective and reproducible assessment of CRLM tumour regression and precise
characterisation of the tumour invasion front based on the digital scans of resected CRLM
tumour tissue slides, by integrating beyond the state-of-the-art, specifically designed
computer vision, image processing and machine learning schemes. The outcome is useful in
both clinical and research domains, by providing additional reference in the association
between tumour regression and chemotherapy treatment for prognostic purposes in clinical
practice, and enabling better understanding on the mechanism of tumour progression in the
liver - and hence gain valuable knowledge that can be used to counteract it.
We want to automatically identify
projects related to our two mission
components (AI and chronic
diseases). We do this by studying
their abstracts.
Options:
Keyword search [Cockburn et al, 2018]
✓ Intuitive and easy to explain
✖ Fragile
Topic modelling [Klinger et al, 2018]
✓ Comprehensive, probabilistic
✖ Hard to tune, interpret and evaluate
Supervised machine learning [Mann &
Puttmann, 2017]
✓ Yields probabilities
✖ Requires labelled dataset
17. Introduction Data and methods Findings Conclusion
Looking for AI (our approach)
➢ We adopt an expanded keyword search
We expand an initial seed-list of terms with semantically similar terms (synonyms)
based on a word embeddings model trained on our data [Mikholov et al, 2013]
Seed terms
AI: [artificial_intelligence,
machine_learning…]
Chronic diseases : List of diseases
crowdsourced from Wikipedia.
Similarity query
Expand list with semantically
similar terms (similarity depends
on window/similarity
arguments)
Classification
Tag projects featuring terms in the
seed or expanded list
We train multiple models based
on combinations of parameters
(grid search) and measure % of
models that identify a project
in our categories
We evaluate precision and
recall for different thresholds
of ‘model support’ for a
classification. We select
projects with >60% in model
support.
Precision: 70%, Recall: 77%
18. Introduction Data and methods Findings Conclusion
Looking for AI (examples)
TITLE: Managing the Data Explosion in
Post-Genomic Biology with Fast Bayesian
Computational Methods
====
ABSTRACT EXCERPT: Rapid technological advances in
molecular biology are providing an unprecedented
opportunity to investigate the basic processes of
life. This `post-genomic' phase of molecular biology
has resulted in an explosion of typically high
dimensional structured data from new technologies for
transcriptomics (microarrays), proteomics and
metabolomics. Such data requires novel mathematical,
statistical and computational methods for their
interpretation and analysis. This proposal focuses on
the development of statistical and computational
methods for the analysis of such data, using novel
approaches from the fields of machine learning and
nonparametric Bayesian statistics. The project
involves a close collaboration of scientists with
expertise in machine learning and statistics,
bioinformatics and molecular biology. The new
software tools will be developed in the context of
real-world scientific problems, such as: elucidating
signalling networks in plant stress responses;
metabolic regulation in
TITLE: An investigation into the provision of
ICT to support behavioural monitoring of
children with autism
====
ABSTRACT EXCERPT: Autism is a developmental
disorder, which usually diagnosed in children before
the age of three years old. It is estimated to
affect approximately 1 in 100 children and is
characterised by a range of behavioural excesses and
deficits in the areas of social imagination,
communication and social interaction. These
behaviours often pose significant challenges to
parents or caregivers and affect the development of
positive and adaptive skills.
A number of approaches exist that aim to reduce such
socially challenging behaviours while increasing
socially significant ones. The most notable
evidence-based approach is Applied Behaviour
Analysis (ABA). Using the core principles of ABA,
Behaviour Analysts (BAs) aim to increase/decrease
behaviours through a process of understanding the
events that occur before and after a behaviour. By
understanding these events (the function),
procedures can be implemented, which attempt to
modify the behaviour. As a result, maintaing careful
data collection are
TITLE: Intelligent and Personalised Risk
Stratification and Early Diagnosis of Lung
Cancer
====
ABSTRACT EXCERPT: Lung cancer is the second most
common cancer in both males and females, and has
a very poor prognosis, causing >35,000 of
cancer-related deaths each year (nearly 100 every
day). This is due to the mostly very late-stage
diagnosis of cancer: nearly 50% of all lung
cancer cases are only diagnosed at very late
Stage IV where no curative treatment exists. The
annual cost of lung cancer to the UK economy is
estimated to be around £2.4 billion, taking
into account the cost of treatment and premature
death, the cost to business of sick leave and of
unpaid care by friends and family. It eclipses
the cost of any other cancer, and continues to
present a significant economic and healthcare
burden. There is currently no national lung
cancer screening programme in the UK, as current
tests are deemed to be inadequate, and not
outweighing risks associated with screening that
involves radiation exposure.
In some cases we capture projects where the treatment of chronic diseases is just an application
area. In others, it is the focus of the project.
19. Introduction Data and methods Findings Conclusion
Trajectory mapping
➢ We want to analyse the composition of the AI ‘mission field’: what
are its constituent trajectories and how have they evolved over time?
Traditional technological trajectory analyses use citations [Kim and Shin, 2018]
Less suitable for us: projects do not cite each other, lag in citations from the papers
they generate (which could create bias in any case)
We opt for a semantic approach
■ Assume that cluster of words contain components of a trajectory
We use hierarchical topic modelling [Gerlach et al, 2018]
✓ It models relations between documents and words as a network and looks for
communities in it
✓ Clusters words into topics and documents into categories
✓ Number of topics estimated automatically (vs. LDA).
22. Introduction Data and methods Findings Conclusion
Levels of activity
Is the mission necessary?
We have identified 1180 AI projects, 4390
Chronic disease projects, and 83 active mission
field projects (budget £66m).
☐ Chronic diseases are underrepresented in
AI and AI is underrepresented in chronic
diseases [all differences in proportions
significant]
Consistent with the idea of a lag in AI diffusion
that could motivate the mission.
23. Introduction Data and methods Findings Conclusion
Evolution of activity
But...
Levels of activity (projects and funding) in the active mission field have been growing rapidly for
some time.
What will be the additionality of the recently announced mission?
24. Introduction Data and methods Findings Conclusion
Composition (disciplines)
Is the mission field disciplinarily
diverse?
We use our predicted disciplines per project.
☐ Top figure classifies each project into its
‘top discipline’. The active mission field
combines relevant disciplines more than its
components (as expected)
☐ Bottom figure considers entropy in the
discipline mix for each project. Active
mission field papers tend to be more
diverse.
25. Introduction Data and methods Findings Conclusion
Composition (actors)
Does the mission field attract new
entrants?
We measure number of years in the data and
projects for organisations in different
categories.
☐ On average, active mission field
participants have been active longer (but
the level of aggregation is very high)
☐ When considering ‘specialist’
organisations, we find that mission field
specialists entered more recently.
The picture is more complicated than ‘mission,
new entrants’
26. Introduction Data and methods Findings Conclusion
Composition (trajectories)
A semantic analysis of the active
mission field
☐ We fit a hierarchical topic model on the
abstracts of the potential mission field (all
AI and chronic disease related papers).
■ The analysis reveals hierarchies of
topics and document categories in the
corpus:
■ Level 0: 423 topics, 373 clusters
■ Level 1: 62 topics, 58 clusters
■ Level 2: 16 topics, 11 clusters
■ We get a topic mix (topics and
weights) for each document (project)
https://topsbm.github.io Words Projects
Topics (+ aggr) (+ aggr) Clusters
27. Introduction Data and methods Findings Conclusion
Composition (trajectories)
What are the thematic components
of the mission field?
We distinguish between AI-derived topics and
Chronic-disease derived ones.
☐ AI topics have a stronger presence in the
mission field (representing more generality
in technology / focus on digital
applications and technologies?)
■ Preliminary exploration suggests that
topics related to cancer treatment,
imaging, creation of databases have a
stronger presence.
28. Introduction Data and methods Findings Conclusion
Composition (trajectories)
How are these thematic
components evolving?
We consider the total weight of topics from
different sources, and the entropy of the yearly
topic mix in the active mission field.
☐ Diversified set of research activities and our
initial measure of diversity (entropy) stable
after an initial ‘jump’ with an increase of
activity since the early 2010s. Interestingly,
this precedes the ‘Deep Learning boom’
since 2012.
30. ➢ Mission oriented innovation policies hold promise… and risks
How do we design them, target them and evaluate them?
➢ Natural language processing and text mining can be used to produce
indicators about mission-oriented policies
This includes simple indicators of activity in a mission field, and more complex
indicators of disciplinary crossover, entry by new actors, diversity in trajectories...
➢ We develop experimental indicators for a UK mission: Transform the
prevention, diagnosis and treatment of chronic diseases with AI
There seems to be a rationale for the mission but already significant activity taking
place involving new actors and diverse discipline mixes and application domains.
■ What will this policy add?
Introduction Data and methods Findings Conclusion
Recap
31. Introduction Data and methods Findings Conclusion
Limitations
➢ We rely on research funding data
What about patents and business activity?
➢ Motivate selection of parameters
...but the results that we reported are robust to changes
➢ Gateway to Research abstracts are short and generic
...but using publications from projects would make our analysis less timely
➢ Our ‘controls’ are imperfect
We have compared the active mission field with AI and Chronic Diseases - can we use
another applied control to compare patterns of activity?
➢ Organisations captured at a high level of resolution
Difficult to identify novel entrants with university-level information: use individual
level data?
32. Introduction Data and methods Findings Conclusion
Next steps
➢ Put all the code and data in https://github.com/nestauk
➢ Validate results with domain experts and policy-makers
In particular, adapt the chronic disease definition to reflect UK policy goals
➢ Incorporate new data sources and (network) indicators
➢ Monitor the effects of the UK AI mission
Does the mission change the observed trends and patterns?
➢ Compare activity in UK with EU
Carry out a similar analysis with H2020 data and compare results
➢ Expand the query system to explore other missions with different
syntaxes
Analyse this: “Ensure that people can enjoy at least 5 extra healthy, independent years of
life by 2035, while narrowing the gap between the experience of the richest and poorest”
😬😬😬