Philips john huffman

1
Philips HealthCare Informatics
A Perspective on Big Data, Analytics and AI
John Huffman, CTO
Philips Healthcare Informatics
September 2016, Utrecht, NL

2
A Little Bit About My Background
35 years or so of AI, reasoning and knowledge integration
• Started at Thinking Machines when it started in the early 80’s
– Worked with Danny Hillis, Brewster Kahle on The Connection Machine
• MCC (US Fifth Generation Project)
– Worked with Doug Lenat on AI and CYC (comprehensive common sense
knowledge and reasoning project)
 Liaison to NLP and CHI groups
• Progressively worked on systems of integrated information, knowledge
representation, workflow and integrated decision support through start-ups
(usually my own) and finally larger companies
– Aware, SGI, Stentor, Poiesis Informatics, Philips

3
Lots of Hype Around Big Data…
Many companies getting into the fray…

4
Many Opinions on Where We Are
Has anyone actually leveraged this in healthcare?

5
Advanced Analytics Process*
Multi-Stage Process
*CRISP – Cross Industry Standard Process for Data Mining

6
Too much focus on one component…
Multi-Stage Process
*CRISP – Cross Industry Standard Process for Data Mining

8
Analytics Lifecycle Overview
Data
Ingestion
Model
training
Production
Model
Evaluation
Data Scientist
Landing
Zone
Data
Processing
ETLed
Processed
Zone
Model
Repository
Data Science
Cleaned
Data
Data
Cleaning
Big Data Platform
Anonymized
data
Repository

9
Feature Eng
Hosted solution
Analytics Lifecycle (more detail)
REST ML APIs
ML AlgosIPs
Data Science
Hosted Cluster
(Create Model)
ETLs
ML
R lib
ML
Py lib
Models
ML
Scoring
Service
Feature
Engg.
Predictive
Analytical
Apps
Operationalize
Model
Evaluate
Model
Predictive
Model
Evaluator
Model Staging
Hosted Cluster
(Evaluation)
Production
Cluster
Access
Processing
Data
Access
Processing
Feature Eng
ML Frmk
ML Framework
Models
ML Scoring Service
ML Frmk
Data
Big Data platform
Data Science Platform
(Analytics and ML)
Proposition Owner
Model
Evaluator
Service
Predictive
model
creation
Domain
Services
Domain
Services
Original raw
data
ETLed data Anonymized
data
Scripts and
Model Rep.
Create
model
Data
Preparation
Phase

10
Challenges in Data Collection and Processing
Before any analytics can start…
• Data Identification, Collection and Preparation
– Domain knowledge important to discriminate relevant data
• ETL – extracting relevant data from raw data
• Massaging – pre-processing the data
– [Automatic] annotation of data (e.g. masking of bones in chest xray)
• Normalization of the data
– Especially complex when data is received from multiple sources
• Aggregation of data
– For purpose of statistical analysis
• Note – All the above steps must be done on the same set of technologies that
will be present during the deployment of the resultant model

11
Training and Validating the Model
Which method is appropriate?
• Effective model creation requires an understanding of the nuances and strengths of
different methods
– Selection of the right method depending on the task
 Classification/Regression/Clustering/Dimensionality reduction…
• Identification and compute of the metric(s) to evaluate the model
– Requires training and test data
• Ensure there is no overfitting
• Validate the model
– On extended data sets, cohort variation
• Fine tune the parameters of the model
• Note – All the above steps to be done on the same set of technologies that will be
present during the deployment

12
Challenges in Deployment and Operations
• Installation (On-Premise, Cloud, Hybrid)
• Configuration
• Health Monitoring
• Auto-Scaling
• Multi-Tenancy
• Disaster Recovery
• Licensing
• Performance Monitoring
• Metering and Billing
• Upgrades
• Snapshots
• Certificate Management
• Resource Utilization and Trending
• Privacy and Security

13
These Methods Are Not New
Decades to centuries old technologies
• Neural Networks
– (1943) by Warren McCulloch and Walter Pitts, original called threshold
logic
• Deep Learning
– (1965) Ivakhnenko and Lapa, papers in 1971 already described deep
networks with 8 layers trained by the group method of data handling
algorithm
• Random Decision Forest
– (1995) Ho
• Big Data (MapReduce)
– 2000-2004 various papers, underlying methods well-known in the mid-90’s.
Apache Hadoop (open source) has been available since 2011
• Bayesian methods
– Bayes lived in the 1700’s. Naïve Bayes methods since the 50’s

14
Some Lessons from AI History
Well-known that data is much more important than method…
• Just Google
– “More data and simple algorithms beat complex analytics methods”
• This is well-known from expert system and AI experience
– “Brittleness”
 Application of models on data outside the training domain
frequently fails in unusual, unexpected ways
– Marvin Minsky, “Society of Mind”
 Complex and intelligent behavior comes from the orchestration of
simple agents
• Without a broad, semantically interoperable, clean data repository – complex
analytics, decision support algorithms, and workflow optimizations cannot be
derived
• Data is the intellectual property in this domain

15
Analytics Stack
Analytics is a set of tools – not a solution
General ML
Algorithms
R SDK
Data Repositories
(S3, HDFS, Hive…)
REST Machine Learning APIs
Py SDK
Analytical
Apps
Clinical
Image
Analytics
Clinical
Text
Analytics
3rd Party
Apps
JDBC/OBDC
Distributed
Processing
framework
IPs
Deep
Learning
libraries
NLP
building
blocks
Model Rep. Scripts Rep.
• Provide easy to use SDKs (R and Python)
• Prebaked thin client development environments
• Rstudio and Jupyter
• All ML Capabilities are exposed via RESTFul APIs
• Provide higher level abstraction APIs for
Clinical Text and Clinical images
• Provide Building blocks for NLP and DL
frameworks
• Host Research IP assets
• Persist the models and scripts in
repositories (shared across development
and deployment clusters)

16
Philips Approach - HSDP
Analytics and Big Data are an integrated component of the platform
ConnectStore Authorize
Share Orchestrate
Manages, updates, monitors
and remotely controls smart
devices
Securely identifies users,
authorizes consent,
ensures data privacy and
tracks user activity
Standardizes interfaces between
HealthSuite enabled applications and
devices with third-party systems
Provides functionality to help
complete routine tasks and coordinate
communications among users
A tailored set of capabilities and tools, optimized for rapid prototyping
and development of healthcare and health-related applications
Host
Provides managed infrastructure
to monitor the health of systems
and performance of applications
Analyze
Acquire, access and manage
personal data from devices and
applications through a cloud-
hosted repository
Offers the foundational
infrastructure to build decision
support algorithms and machine
learning applications

Philips john huffman

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Andere mochten auch

Andere mochten auch (13)

Ähnlich wie Philips john huffman

Ähnlich wie Philips john huffman (20)

Mehr von BigDataExpo

Mehr von BigDataExpo (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Philips john huffman

Hinweis der Redaktion