SlideShare ist ein Scribd-Unternehmen logo
1 von 27
Downloaden Sie, um offline zu lesen
HOW UCARE.AI PREDICTS
HEALTHCARE COSTS
To improve patient experience and reduce payment challenges
DATAx Singapore, 5th March 2019
Koh D. (2018, December 19). Parkway Pantai hospitals launch AI-powered predictive hospital
bill estimation system, Healthcare IT News. Retrieved from https://www.healthcareitnews.com
WHY IS HOSPITALIZATION
COST PREDICTION
IMPORTANT?
Current health expenditure per capita, PPP (current international $), World
Bank, Retrieved from https://data.worldbank.org on 25 Feb 2019
Healthcare costs have risen, and will rise more in the future
Baker, J.A. (2018, April 07). Singapore ranks high in report on
medical inflation in Asia, Channel News Asia. Retrieved from
https://www.channelnewsasia.com
(2018, November 18). Medical inflation in Singapore to hit 10% in 2019,
Singapore Business Review. Retrieved from https://sbr.com.sg
2019 Global Medical Trend Rates Report, Aon.
Retrieved from https://www.aon.com on 12 Feb 2018
Survey finds healthcare providers benefit from up-front cost
estimates, yet many patients find it difficult to secure such
information, TransUnion. Retrieved from
https://www.newsroom.transunion on 25 Feb 2018
For Providers:
Up-front cost estimates improve patients experience and retention
For Patients:
Estimates provide view of potential bill
& reduce payment challenges with providers
INTENT
Provide potential patients with (more) accurate estimates
Preference for overestimation, instead of underestimation
Minimal disruption (i.e., plug-n-play) <- Constraint
PROBLEM
Poor accuracy with existing system
Costly to update models
Low frequency of updates
HOW DID UCARE
SOLVE THE PROBLEMS?
Patient
Doctor
Case
Estimate
Encryption
Server
Decryption
Server
Staging
Extract
Data Storage
Transform
& Load
Data Preparation
+
Feature
Engineering
+
Machine
Learning
+
Validation
ML Model
Storage
Publishing
Server
Artefact
Storage
Deployment
Server
Airflow Jenkins
Azure Blob PYT RCH
pandas
Azure Blob
Hospital 1
API
Hospital 4
API
Hospital 2
API
uCare cipher uCare cipher
Log Storage
and retrieval
API Server
Biz Logic
Layer
Feedback loop
v3
├── dataprep
│ ├── prep_patient.py
│ ├── prep_doctor.py
│ ├── prep_case.py
│ ├── ...
│ ├── ...
│ ├── ...
├── feateng
│ ├── feat_encoding.py
│ ├── ...
├── ml
│ ├── validation
│ │ ├── train_val_split.py
│ │ ├── validate.py
│ │ ├── ...
│ ├── serve
│ │ ├── api.py
│ │ ├── package_api.py
│ ├── train.py
│ ├── ...
├── utils
│ ├── io.py
│ ├── notification.py
│ ├── ...
tests
├── dataprep
├── ...
Data
Preparation
Feature
Engineering
Machine
Learning
Utilities
Unit tests
DATA VALIDATION & INGESTION
Encrypt
Decrypt
Data Validation
- Schema, format checks
- Duplicate, null value
checks
- Volume checks
Provider
environment
uCare
environment
Extract
Load
Transform
DATA PREPARATION
* International Classification of Diseases
Categoricals
- Converting ICD9 to ICD10*
- Standardization (e.g., gender, room)
Numerics
- Filling missing values
- Handling outliers
Etc. (e.g., text, images, EHRs)
Merging across
multiple datasets
Data Augmentation
External data (e.g.,
doctor data from SMC)
Internal features (e.g.,
disease embeddings)
VALIDATION
* (K-fold) Cross Validation
Training fold
Validation fold
- Shuffle data (randomly)
- Split into k-folds and cross
validate
Training fold
Validation fold
- Sort data by time (e.g.,
admission date)
- Split into time periods (e.g.,
yearly, half-yearly)
- Train only on data from
period(s) before
Validation
- Random split/CV*
- Time-based split/CV
Train set
Validation set
VALIDATION (is so important there are two slides!)
- 19% of patients had >1
admission (38% of cases)
- In validation, >100 cases of
ICD10/TOSP not seen before
- Medical prices increase over time
TIME
Cannot use future to predict past—validation results differ greatly
FEATURE ENGINEERING
Datetime
- Time differences (e.g., age ,LOS)
- Details (e.g., hour, day, month)
Categoricals
- Grouping (e.g., ICD10, TOSP)
- Encoding (e.g., one-hot, binary)
Numerics
- Normalization, binning
- Statistical transforms (skewed data)
Interactions
- E.g., age-gender, age-surgery
- E.g., conditional probabilities
MACHINE LEARNING
MACHINE LEARNING
Data Weightage
- Recency (e.g., linear, exponent)
- Domain knowledge
Models / Objective functions
- E.g., KNN, LR, DT, NN, RNN
- E.g., MSE, MAE, Huber, Fair
Feature Selection
- “Classic” feature importance
- Permutation, drop column
Parameter Tuning
- Random / grid search
- Bayesian optimization
DEPLOYMENT
Train
Validate
DeployFeedback
Optimize
Complexity (e.g., infra, security, networks, modelling, monitoring)
Versioning and Rollback
20180101
20180201
20180301
latest
Abstraction
via APIs
Provider
environment
MISCELLANEOUS TIPS
Take time to learn from domain experts and users
More data and/or features != Better model
Proper engineering practices make (everyone’s) life easier
OVERALL OUTCOMES
Accuracy and usability improvements
- Reduced MAE (55%), RMSE (60%), and underestimation %
- Improved query latency (< 1 sec)
- Customization at institution level
Additional features
- Integration of business logic layer for strategic purposes
OVERALL OUTCOMES
Minimal disruption to front-line users
- API-based, existing front-end used—no training required
Reduced cost, increased frequency of updates
- System built for daily updates
Improved user & customer experience
- Based on anecdotal evidence
KEY TAKEAWAYS
Building useful data products is a team effort
Be humble of the data—don’t assume anything
Machine learning is < 20% of effort—the methodology and
engineering process is more important
THANK YOU!hello@ucare.ai

Weitere ähnliche Inhalte

Was ist angesagt?

Interpersonal Communication & BCC
Interpersonal Communication & BCCInterpersonal Communication & BCC
Interpersonal Communication & BCCAkhilesh Bhargava
 
Analysis and interpretation of surveillance data
Analysis and interpretation of surveillance dataAnalysis and interpretation of surveillance data
Analysis and interpretation of surveillance dataAbino David
 
Assessing M&E Systems For Data Quality
Assessing M&E Systems For Data QualityAssessing M&E Systems For Data Quality
Assessing M&E Systems For Data QualityMEASURE Evaluation
 
Health Need Assessment
Health Need AssessmentHealth Need Assessment
Health Need AssessmentRajbir Kaur
 
Major accidents & their prevention in rural & urban areas
Major accidents & their prevention in rural & urban areasMajor accidents & their prevention in rural & urban areas
Major accidents & their prevention in rural & urban areasCharmi Doshi
 
Modello transteorico e_motivazione_al_cambiamento
Modello transteorico e_motivazione_al_cambiamentoModello transteorico e_motivazione_al_cambiamento
Modello transteorico e_motivazione_al_cambiamentoLuoghi di Prevenzione
 
Rapid Epidemiological Methods
 Rapid Epidemiological Methods Rapid Epidemiological Methods
Rapid Epidemiological MethodsShanthosh Priyan
 
Primary health centre organization and functions
Primary health centre organization and functionsPrimary health centre organization and functions
Primary health centre organization and functionsKailash Nagar
 
Monotoring and evaluation principles and theories
Monotoring and evaluation  principles and theoriesMonotoring and evaluation  principles and theories
Monotoring and evaluation principles and theoriescommochally
 
Epidemiology lecture 1 introduction
Epidemiology lecture 1 introductionEpidemiology lecture 1 introduction
Epidemiology lecture 1 introductionINAAMUL HAQ
 
WHO Building Blocks_
WHO Building Blocks_WHO Building Blocks_
WHO Building Blocks_CORE Group
 
Health education power point
Health education power pointHealth education power point
Health education power pointNursing Path
 
concept and scope of community health and community health nursing
concept and scope of community health and community health nursingconcept and scope of community health and community health nursing
concept and scope of community health and community health nursingPuspanjali mohapatro
 
Communication in Public Health
Communication in Public HealthCommunication in Public Health
Communication in Public HealthAymery Constant
 
Monitoring and Evaluation of Health Services
Monitoring and Evaluation of Health ServicesMonitoring and Evaluation of Health Services
Monitoring and Evaluation of Health ServicesNayyar Kazmi
 

Was ist angesagt? (20)

Interpersonal Communication & BCC
Interpersonal Communication & BCCInterpersonal Communication & BCC
Interpersonal Communication & BCC
 
Doing Community Surveys
Doing Community SurveysDoing Community Surveys
Doing Community Surveys
 
Analysis and interpretation of surveillance data
Analysis and interpretation of surveillance dataAnalysis and interpretation of surveillance data
Analysis and interpretation of surveillance data
 
Assessing M&E Systems For Data Quality
Assessing M&E Systems For Data QualityAssessing M&E Systems For Data Quality
Assessing M&E Systems For Data Quality
 
Chetan epidemiology
Chetan epidemiologyChetan epidemiology
Chetan epidemiology
 
Health Need Assessment
Health Need AssessmentHealth Need Assessment
Health Need Assessment
 
Major accidents & their prevention in rural & urban areas
Major accidents & their prevention in rural & urban areasMajor accidents & their prevention in rural & urban areas
Major accidents & their prevention in rural & urban areas
 
Referral system
Referral systemReferral system
Referral system
 
Modello transteorico e_motivazione_al_cambiamento
Modello transteorico e_motivazione_al_cambiamentoModello transteorico e_motivazione_al_cambiamento
Modello transteorico e_motivazione_al_cambiamento
 
Universal Health Coverage
Universal Health CoverageUniversal Health Coverage
Universal Health Coverage
 
Rapid Epidemiological Methods
 Rapid Epidemiological Methods Rapid Epidemiological Methods
Rapid Epidemiological Methods
 
Primary health centre organization and functions
Primary health centre organization and functionsPrimary health centre organization and functions
Primary health centre organization and functions
 
Monotoring and evaluation principles and theories
Monotoring and evaluation  principles and theoriesMonotoring and evaluation  principles and theories
Monotoring and evaluation principles and theories
 
Epidemiology lecture 1 introduction
Epidemiology lecture 1 introductionEpidemiology lecture 1 introduction
Epidemiology lecture 1 introduction
 
WHO Building Blocks_
WHO Building Blocks_WHO Building Blocks_
WHO Building Blocks_
 
Health education power point
Health education power pointHealth education power point
Health education power point
 
concept and scope of community health and community health nursing
concept and scope of community health and community health nursingconcept and scope of community health and community health nursing
concept and scope of community health and community health nursing
 
Communication in Public Health
Communication in Public HealthCommunication in Public Health
Communication in Public Health
 
PHC Primary Health Centre
PHC Primary Health CentrePHC Primary Health Centre
PHC Primary Health Centre
 
Monitoring and Evaluation of Health Services
Monitoring and Evaluation of Health ServicesMonitoring and Evaluation of Health Services
Monitoring and Evaluation of Health Services
 

Ähnlich wie Predicting Hospital Bills at Pre-admission

IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET Journal
 
Insights and Graph-based ML Anomaly Detection for eBay Edge Services
 Insights and Graph-based ML Anomaly Detection for eBay Edge Services Insights and Graph-based ML Anomaly Detection for eBay Edge Services
Insights and Graph-based ML Anomaly Detection for eBay Edge ServicesHanzhang Wang
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloudSankar Nagarajan
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITOMarcoMellia
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESIRJET Journal
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESIRJET Journal
 
IRJET- Titanic Survival Analysis using Logistic Regression
IRJET-  	  Titanic Survival Analysis using Logistic RegressionIRJET-  	  Titanic Survival Analysis using Logistic Regression
IRJET- Titanic Survival Analysis using Logistic RegressionIRJET Journal
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET Journal
 
Performance Comparison of Dimensionality Reduction Methods using MCDR
Performance Comparison of Dimensionality Reduction Methods using MCDRPerformance Comparison of Dimensionality Reduction Methods using MCDR
Performance Comparison of Dimensionality Reduction Methods using MCDRAM Publications
 
IRJET- Information Retrieval and De-duplication for Tourism Recommender System
IRJET- Information Retrieval and De-duplication for Tourism Recommender SystemIRJET- Information Retrieval and De-duplication for Tourism Recommender System
IRJET- Information Retrieval and De-duplication for Tourism Recommender SystemIRJET Journal
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisIRJET Journal
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunk
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabRoman
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseGanesan Narayanasamy
 
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUES
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUESGI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUES
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUESAM Publications
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaAlluxio, Inc.
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET Journal
 
IRJET- A Data Mining with Big Data Disease Prediction
IRJET-  	  A Data Mining with Big Data Disease PredictionIRJET-  	  A Data Mining with Big Data Disease Prediction
IRJET- A Data Mining with Big Data Disease PredictionIRJET Journal
 

Ähnlich wie Predicting Hospital Bills at Pre-admission (20)

IRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine LearningIRJET - Airplane Crash Analysis and Prediction using Machine Learning
IRJET - Airplane Crash Analysis and Prediction using Machine Learning
 
Network predictive analysis
Network predictive analysisNetwork predictive analysis
Network predictive analysis
 
Insights and Graph-based ML Anomaly Detection for eBay Edge Services
 Insights and Graph-based ML Anomaly Detection for eBay Edge Services Insights and Graph-based ML Anomaly Detection for eBay Edge Services
Insights and Graph-based ML Anomaly Detection for eBay Edge Services
 
Real-time data integration to the cloud
Real-time data integration to the cloudReal-time data integration to the cloud
Real-time data integration to the cloud
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIES
 
STOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIESSTOCK PRICE PREDICTION USING TIME SERIES
STOCK PRICE PREDICTION USING TIME SERIES
 
IRJET- Titanic Survival Analysis using Logistic Regression
IRJET-  	  Titanic Survival Analysis using Logistic RegressionIRJET-  	  Titanic Survival Analysis using Logistic Regression
IRJET- Titanic Survival Analysis using Logistic Regression
 
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET-  	  Predicting Outcome of Judicial Cases and Analysis using Machine Le...
IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...
 
Performance Comparison of Dimensionality Reduction Methods using MCDR
Performance Comparison of Dimensionality Reduction Methods using MCDRPerformance Comparison of Dimensionality Reduction Methods using MCDR
Performance Comparison of Dimensionality Reduction Methods using MCDR
 
IRJET- Information Retrieval and De-duplication for Tourism Recommender System
IRJET- Information Retrieval and De-duplication for Tourism Recommender SystemIRJET- Information Retrieval and De-duplication for Tourism Recommender System
IRJET- Information Retrieval and De-duplication for Tourism Recommender System
 
Smart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend AnalysisSmart E-Logistics for SCM Spend Analysis
Smart E-Logistics for SCM Spend Analysis
 
SplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and LogsSplunkLive! Zurich 2018: Integrating Metrics and Logs
SplunkLive! Zurich 2018: Integrating Metrics and Logs
 
Scaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at GrabScaling Experimentation & Data Capture at Grab
Scaling Experimentation & Data Capture at Grab
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Deep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the EnterpriseDeep Learning Image Processing Applications in the Enterprise
Deep Learning Image Processing Applications in the Enterprise
 
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUES
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUESGI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUES
GI-ANFIS APPROACH FOR ENVISAGE HEART ATTACK DISEASE USING DATA MINING TECHNIQUES
 
The hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at HelixaThe hidden engineering behind machine learning products at Helixa
The hidden engineering behind machine learning products at Helixa
 
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...IRJET-  	  Comparative Analysis of Various Tools for Data Mining and Big Data...
IRJET- Comparative Analysis of Various Tools for Data Mining and Big Data...
 
IRJET- A Data Mining with Big Data Disease Prediction
IRJET-  	  A Data Mining with Big Data Disease PredictionIRJET-  	  A Data Mining with Big Data Disease Prediction
IRJET- A Data Mining with Big Data Disease Prediction
 

Mehr von Eugene Yan Ziyou

System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and searchEugene Yan Ziyou
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixEugene Yan Ziyou
 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsEugene Yan Ziyou
 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Eugene Yan Ziyou
 
INSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my JourneyINSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my JourneyEugene Yan Ziyou
 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceEugene Yan Ziyou
 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data ScienceEugene Yan Ziyou
 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Eugene Yan Ziyou
 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionEugene Yan Ziyou
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaEugene Yan Ziyou
 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)Eugene Yan Ziyou
 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Eugene Yan Ziyou
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveEugene Yan Ziyou
 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communityEugene Yan Ziyou
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Eugene Yan Ziyou
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsEugene Yan Ziyou
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsEugene Yan Ziyou
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and DistributionEugene Yan Ziyou
 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USEugene Yan Ziyou
 

Mehr von Eugene Yan Ziyou (20)

System design for recommendations and search
System design for recommendations and searchSystem design for recommendations and search
System design for recommendations and search
 
Recommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrixRecommender Systems: Beyond the user-item matrix
Recommender Systems: Beyond the user-item matrix
 
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech GiantsOLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
OLX Group Prod Tech 2019 Keynote: Asia's Tech Giants
 
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
Data Science Challenges and Impact at Lazada (Big Data and Analytics Innovati...
 
INSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my JourneyINSEAD Sharing on Lazada Data Science and my Journey
INSEAD Sharing on Lazada Data Science and my Journey
 
SMU BIA Sharing on Data Science
SMU BIA Sharing on Data ScienceSMU BIA Sharing on Data Science
SMU BIA Sharing on Data Science
 
Culture at Lazada Data Science
Culture at Lazada Data ScienceCulture at Lazada Data Science
Culture at Lazada Data Science
 
Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...Competition Improves Performance: Only when Competition Form matches Goal Ori...
Competition Improves Performance: Only when Competition Form matches Goal Ori...
 
How Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversionHow Lazada ranks products to improve customer experience and conversion
How Lazada ranks products to improve customer experience and conversion
 
Sharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at LazadaSharing about my data science journey and what I do at Lazada
Sharing about my data science journey and what I do at Lazada
 
AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)AXA x DSSG Meetup Sharing (Feb 2016)
AXA x DSSG Meetup Sharing (Feb 2016)
 
Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)Garuda Robotics x DataScience SG Meetup (Sep 2015)
Garuda Robotics x DataScience SG Meetup (Sep 2015)
 
DataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDiveDataKind SG sharing of our first DataDive
DataKind SG sharing of our first DataDive
 
Social network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG communitySocial network analysis and growth recommendations for DataScience SG community
Social network analysis and growth recommendations for DataScience SG community
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)Nielsen x DataScience SG Meetup (Apr 2015)
Nielsen x DataScience SG Meetup (Apr 2015)
 
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc testsStatistical inference: Statistical Power, ANOVA, and Post Hoc tests
Statistical inference: Statistical Power, ANOVA, and Post Hoc tests
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
 
Statistical inference: Probability and Distribution
Statistical inference: Probability and DistributionStatistical inference: Probability and Distribution
Statistical inference: Probability and Distribution
 
A Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the USA Study on the Relationship between Education and Income in the US
A Study on the Relationship between Education and Income in the US
 

Kürzlich hochgeladen

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxHimangsuNath
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaManalVerma4
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...Jack Cole
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxSimranPal17
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxaleedritatuxx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...Dr Arash Najmaei ( Phd., MBA, BSc)
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectBoston Institute of Analytics
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxTasha Penwell
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelBoston Institute of Analytics
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 

Kürzlich hochgeladen (20)

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Networking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptxNetworking Case Study prepared by teacher.pptx
Networking Case Study prepared by teacher.pptx
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
IBEF report on the Insurance market in India
IBEF report on the Insurance market in IndiaIBEF report on the Insurance market in India
IBEF report on the Insurance market in India
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
why-transparency-and-traceability-are-essential-for-sustainable-supply-chains...
 
What To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptxWhat To Do For World Nature Conservation Day by Slidesgo.pptx
What To Do For World Nature Conservation Day by Slidesgo.pptx
 
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptxmodul pembelajaran robotic Workshop _ by Slidesgo.pptx
modul pembelajaran robotic Workshop _ by Slidesgo.pptx
 
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
6 Tips for Interpretable Topic Models _ by Nicha Ruchirawat _ Towards Data Sc...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Decoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis ProjectDecoding Patterns: Customer Churn Prediction Data Analysis Project
Decoding Patterns: Customer Churn Prediction Data Analysis Project
 
Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptxThe Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
The Power of Data-Driven Storytelling_ Unveiling the Layers of Insight.pptx
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis modelDecoding Movie Sentiments: Analyzing Reviews with Data Analysis model
Decoding Movie Sentiments: Analyzing Reviews with Data Analysis model
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 

Predicting Hospital Bills at Pre-admission

  • 1. HOW UCARE.AI PREDICTS HEALTHCARE COSTS To improve patient experience and reduce payment challenges DATAx Singapore, 5th March 2019
  • 2. Koh D. (2018, December 19). Parkway Pantai hospitals launch AI-powered predictive hospital bill estimation system, Healthcare IT News. Retrieved from https://www.healthcareitnews.com
  • 3. WHY IS HOSPITALIZATION COST PREDICTION IMPORTANT?
  • 4. Current health expenditure per capita, PPP (current international $), World Bank, Retrieved from https://data.worldbank.org on 25 Feb 2019 Healthcare costs have risen, and will rise more in the future
  • 5. Baker, J.A. (2018, April 07). Singapore ranks high in report on medical inflation in Asia, Channel News Asia. Retrieved from https://www.channelnewsasia.com
  • 6. (2018, November 18). Medical inflation in Singapore to hit 10% in 2019, Singapore Business Review. Retrieved from https://sbr.com.sg
  • 7. 2019 Global Medical Trend Rates Report, Aon. Retrieved from https://www.aon.com on 12 Feb 2018
  • 8. Survey finds healthcare providers benefit from up-front cost estimates, yet many patients find it difficult to secure such information, TransUnion. Retrieved from https://www.newsroom.transunion on 25 Feb 2018 For Providers: Up-front cost estimates improve patients experience and retention
  • 9. For Patients: Estimates provide view of potential bill & reduce payment challenges with providers
  • 10. INTENT Provide potential patients with (more) accurate estimates Preference for overestimation, instead of underestimation Minimal disruption (i.e., plug-n-play) <- Constraint
  • 11. PROBLEM Poor accuracy with existing system Costly to update models Low frequency of updates
  • 12. HOW DID UCARE SOLVE THE PROBLEMS?
  • 13. Patient Doctor Case Estimate Encryption Server Decryption Server Staging Extract Data Storage Transform & Load Data Preparation + Feature Engineering + Machine Learning + Validation ML Model Storage Publishing Server Artefact Storage Deployment Server Airflow Jenkins Azure Blob PYT RCH pandas Azure Blob Hospital 1 API Hospital 4 API Hospital 2 API uCare cipher uCare cipher Log Storage and retrieval API Server Biz Logic Layer Feedback loop
  • 14. v3 ├── dataprep │ ├── prep_patient.py │ ├── prep_doctor.py │ ├── prep_case.py │ ├── ... │ ├── ... │ ├── ... ├── feateng │ ├── feat_encoding.py │ ├── ... ├── ml │ ├── validation │ │ ├── train_val_split.py │ │ ├── validate.py │ │ ├── ... │ ├── serve │ │ ├── api.py │ │ ├── package_api.py │ ├── train.py │ ├── ... ├── utils │ ├── io.py │ ├── notification.py │ ├── ... tests ├── dataprep ├── ... Data Preparation Feature Engineering Machine Learning Utilities Unit tests
  • 15. DATA VALIDATION & INGESTION Encrypt Decrypt Data Validation - Schema, format checks - Duplicate, null value checks - Volume checks Provider environment uCare environment Extract Load Transform
  • 16. DATA PREPARATION * International Classification of Diseases Categoricals - Converting ICD9 to ICD10* - Standardization (e.g., gender, room) Numerics - Filling missing values - Handling outliers Etc. (e.g., text, images, EHRs) Merging across multiple datasets Data Augmentation External data (e.g., doctor data from SMC) Internal features (e.g., disease embeddings)
  • 17. VALIDATION * (K-fold) Cross Validation Training fold Validation fold - Shuffle data (randomly) - Split into k-folds and cross validate Training fold Validation fold - Sort data by time (e.g., admission date) - Split into time periods (e.g., yearly, half-yearly) - Train only on data from period(s) before Validation - Random split/CV* - Time-based split/CV Train set Validation set
  • 18. VALIDATION (is so important there are two slides!) - 19% of patients had >1 admission (38% of cases) - In validation, >100 cases of ICD10/TOSP not seen before - Medical prices increase over time TIME Cannot use future to predict past—validation results differ greatly
  • 19. FEATURE ENGINEERING Datetime - Time differences (e.g., age ,LOS) - Details (e.g., hour, day, month) Categoricals - Grouping (e.g., ICD10, TOSP) - Encoding (e.g., one-hot, binary) Numerics - Normalization, binning - Statistical transforms (skewed data) Interactions - E.g., age-gender, age-surgery - E.g., conditional probabilities
  • 21. MACHINE LEARNING Data Weightage - Recency (e.g., linear, exponent) - Domain knowledge Models / Objective functions - E.g., KNN, LR, DT, NN, RNN - E.g., MSE, MAE, Huber, Fair Feature Selection - “Classic” feature importance - Permutation, drop column Parameter Tuning - Random / grid search - Bayesian optimization
  • 22. DEPLOYMENT Train Validate DeployFeedback Optimize Complexity (e.g., infra, security, networks, modelling, monitoring) Versioning and Rollback 20180101 20180201 20180301 latest Abstraction via APIs Provider environment
  • 23. MISCELLANEOUS TIPS Take time to learn from domain experts and users More data and/or features != Better model Proper engineering practices make (everyone’s) life easier
  • 24. OVERALL OUTCOMES Accuracy and usability improvements - Reduced MAE (55%), RMSE (60%), and underestimation % - Improved query latency (< 1 sec) - Customization at institution level Additional features - Integration of business logic layer for strategic purposes
  • 25. OVERALL OUTCOMES Minimal disruption to front-line users - API-based, existing front-end used—no training required Reduced cost, increased frequency of updates - System built for daily updates Improved user & customer experience - Based on anecdotal evidence
  • 26. KEY TAKEAWAYS Building useful data products is a team effort Be humble of the data—don’t assume anything Machine learning is < 20% of effort—the methodology and engineering process is more important