Our classification technique uses a deep CNN to classify skin lesions. An image is warped through the CNN architecture into a probability distribution over clinical skin disease classes. The CNN was pretrained on a large generic image dataset and fine-tuned on a dataset of over 129,000 skin lesions spanning 2,032 diseases. Data integration from multiple sources is key to future digital medicine, but challenges include data quality, availability, and privacy. Techniques like distributed learning models and homomorphic encryption can help address privacy concerns while enabling large-scale data sharing and analysis.
Big data and machine learning: opportunità per la medicina di precisione e i rischi di una nuova «AI winter»
1. Laboratory
for
Biomedical
Informatics
Big data and machine learning:
opportunità per la medicina di
precisione e i rischi di una nuova
«AI winter»
Riccardo Bellazzi
Università di Pavia
ICS Maugeri Pavia
3. 2
Our classification technique is a deep CNN. Data flow is from left to right: an
image of a skin lesion (for example, melanoma) is sequentially warped into a
probability distribution over clinical classes of skin disease using Google
Inception v3 CNN architecture pretrained on the ImageNet dataset (1.28 million
images over 1,000 generic object classes) and fine-tuned on our own dataset
of 129,450 skin lesions comprising 2,032 different diseases.
4. Public Data
Sets
Electronic
Health
records
Machine
Learning
Image
Data
To Help Predict Outcomes and
Support Medical Decisions
To Learn About what Differences
in People are Important for
Predicting Disease
To Understand the Disease Traits
Caused by a Gene Variant
To Help Interpret Features in
Medical Images and Tissues
5. Your logo here
Genome / GenotypeExposome / Expotype
Phenome / Phenotype
Biomarkers (DNA sequence,
Epigenetics)
Environmental risk factors
(pollution, radiation, toxic agents, …)
Anatomy, Physiological, biochemical parameters
(cholesterol, temperature, glucose, heart rate…)
Social media / Integrated personal health record / Research repositories
Data Collection in the exposomics world (by F. Martin)
Surveys, GIS, biomarkers, smartphones, sensors, wearables, Electronic Health Records, …
8. Informatics for integrating biology
and the bedside i2b2
More than 70 million patients
More than 100 million patients
An open-source
software program
developed at the NIH
research center "i2b2”
i2b2 Academic
Users' Group: a
wide community that
contributes to i2b2
development
9. i2b2 projects the “Pavia” initiative
INSTALLAZIONI «AZIENDALI»
1. IRCCS ICS Maugeri (Pavia - IT)
2. IRCCS Fondazione Policlinico (Milano – IT)
3. Papa Giovanni XXIII (Bergamo – IT)
4. Shahid Rajee Hearth Center (Teheran– Iran)
5. Centre hospitalier universitaire vaudois
(Losanna - CH)
PROGETTI DI RICERCA E RETI
1. Rete Ematologica Lombarda
2. Italian MDS Network (REL, FISM, GROM-L)
3. Centro Nazionale di Adroterapia Oncologica
4. Registro Italiano Autismo
5. Progetto ”NONCADO” (Regione Lombardia)
6. FP7-INHERITANCE (La Coruña – ES)
7. IMI-SUMMIT (Lund - SE)
8. MOSAIC (ASL, Maugeri Pavia - IT, Valencia
-ES, Athens- GR)
INSTALLAZIONI SPERIMENTALI
1. IRCCS Mondino (Pavia - IT)
2. IRCCS San Matteo (Pavia - IT)
10. The Onco-i2b2 project
Clinical patient
management
Data
Laboratory
Research
Samples
Biobank
Anonymized data
Anonymized samples
i2b2
Researcher
Patient
Match IDs
CRCHIS
12. Temporal text mining from clinical reports
• Most part of clinical information is in unstructured text
• Systems that automatically extract and display relevant events
can help physicians search for specific information and improve
medical decisions
2009-02-122009-01-07
Echocardiogram ECG
Time
13.
14.
15.
16. Ready for a third AI winter?
1970 – First
neural networks
1989 – Expert
systems
20xx – Deep
learning (?)
18. Physics of the Medical
Record
Handling Time in Health Record Studies
George Hripcsak, MD, MS
Biomedical Informatics, Columbia University
Medical Informatics Services, New York-Presbyterian
Biomedical Informatics
discovery and impact
19. Data quality
All medical record information should be
regarded as suspect; much of it is fiction.
Burnum ... Ann Intern Med 1989
Data shall be used only for the purpose for
which they were collected. If no purpose was
defined prior to the collection of the data,
then the data should not be used.
van der Lei ... Method Inform Med 1991
21. Hard challenges
Quality of the data
Ambiguous or unknown meaning
Insufficient physical exercise / low physical exercise
Accuracy
50-100% accuracy [Hogan JAMIA 1997]
… 36 year old man … 27 year old woman …
Completeness
mostly missing
Complexity
disease ontologies
Bias
22. Missing
Data are noisy and mostly missing
Sampled when sick
Implicit information
0
100
200
300
400
500
600
60
70
80
90
100
110
120
Time Time
Glucose (mg/dl) Glucose (mg/dl)
23. observe
&
interpret
Truth
Health status of
the patient
Concept
Clinician or
patient’s
conception
Record
EHR/PHR
Concept
2nd clinician’s
conception of
the patient (or
self, lawyer,
compliance, ...)
Model
Computable
representation
author read
process
24. observe
&
interpret
Truth
Health status of
the patient
Concept
Clinician or
patient’s
conception
Record
EHR/PHR
Concept
2nd clinician’s
conception of
the patient (or
self, lawyer,
compliance, ...)
Model
Computable
representation
author read
process
Error Error
Error
Implicit
28. Observational Healthcare Data
Sciences and Informatics (OHDSI)
OHDSI is an international effort coordinated by Columbia to
collect a billion patient records for observational research
Now 52 databases & 682 million records
Discover drug side effects and new uses of drugs for 1000s of
drugs and effects
For patients: Given my
disease and medications,
what is my risk of side
effects?
Clinical experiments, tools,
data nodes, analytic
methods, infrastructure
(terminology, data model)
OHDSI Collaborators
29. How OHDSI works
Source data
warehouse, with
identifiable
patient-level data
Standardized, de-
identified patient-
level database
(OMOP CDM v5)
ETL
Summary
statistics results
repository
OHDSI.
orgConsistency
Temporality
Strength Plausibility
Experiment
Coherence
Biological gradient Specificity
Analogy
Comparative
effectiveness
Predictive modeling
OHDSI Data
Partners
OHDSI Coordinating
Center
Standardized
large-scale
analytics
Analysis
results
Analytics
development
and testing
Research
and
education
Data
network
support
i) Shared conceptual data model; ii) shared electronic formats
iii) Shared data management platforms; iv) shared analytics
34. - Codify and preserve useful knowledge,
- Learn how best to share and disseminate findings
- Critically assess the quality of the evidence in
decision-making
- Provide a rationale or explanation for the
recommendation
- Assess confidence in the recommendation
- Describe the data and knowledge sources and the
reasoning model
- Learn from experience