American Diabetic association states that 29.1 million Americans and 300+ million people all over the world have diabetes. Diabetic medication management is always challenging. Based on doctor’s prescription, patients take insulin dosage one hour before breakfast, lunch or dinner. But the real world scenario insulin intake can be changed based on the blood glucose level, calorie intake on a specific day, etc.
This talk explains how a real-time Big Data pipeline recommendation engine can be used to suggest insulin intake for diabetic patients in near real time. Based on calorie intake and blood glucose level from patients as well as generated dataset, insulin dosage can be recommended which will help patients to avoid over/under dosage. Designing medication recommender system is a need for the Healthcare industry. There is a growing trend for the applications to help doctors by recommending medication based on patient’s historic data. This also helps facilitate a doctor friendly and hospital free atmosphere for all users all over the world.
This talk would delve into a Diabetes medication recommender system using Databricks and Spark. Databricks supports HIPAA compliant deployment for processing PHI data. This talk would cover building a secure pipeline with encrypted data and the end-to-end recommendation system using Structured streaming and IoT data flowing from sensor.
Ähnlich wie Building Real-Time Data Pipeline for Diabetes Medication Recommender System Using Databricks with Arivoli Tirouvingadam Jayaradha Natarajan
Reliable Data Intestion in BigData / IoTGuido Schmutz
Ähnlich wie Building Real-Time Data Pipeline for Diabetes Medication Recommender System Using Databricks with Arivoli Tirouvingadam Jayaradha Natarajan (20)
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Building Real-Time Data Pipeline for Diabetes Medication Recommender System Using Databricks with Arivoli Tirouvingadam Jayaradha Natarajan
1. Building Real-Time Data
Pipeline
#DevSAIS17
For Diabetes Medication Recommender System
Using Databricks
Arivoli Tirouvingadame
Data Platform Engineer, Qventus
Jayaradha Natarajan
Sr. Data Engineer, Change Healthcare
2. $whoami
• Jayaradha Natarajan
Sr. Data Engineer, Change Healthcare
www.github.com/jayaradha
Open Source Committer
https://l10n.gnome.org/teams/ta/
Organizer, Data Riders meetup group
www.meetup.com/datariders
Arivoli Tirouvingadame
Data Platform Engineer, Qventus
http://www.github.com/olisource
Organizer, Data Riders meetup group
www.meetup.com/datariders
3. AI/ML in Healthcare
“AI will be ubiquitous in healthcare
by 2025”
https://www.techemergence.com/machine-learning-in-healthcare-executive-consensus/
5. “We are in the early days of AI
assisting Physicians better
prescribe medication”
https://www.truveris.com/resources/ai-in-healthcare-helping-physicians-better-prescribe-treatments
7. Life in a day … of a Diabetes
patient
Problem Challenge Symptoms
8. How can we prescribe Diabetes
medication better in near
real-time?
9. Solution
- Use Big Data pipeline to collect patient's Blood
glucose level and medication before/after food and
predict better medication in near real-time
Collect
Sensor Data
(Wearable
devices)
Model data
using ML
Algorithms
Predict
Medication
& alert
patient’s
mobile
device
Data
Collection
Model Predict
11. Ingestion data
o Typically, raw data can be structured/semi-
structured/unstructured with/without errors
o IoT devices (from Continuous Glucose Monitors)
produce structured data with/without errors
12. Data Storage and Cleansing
Sensor
Data
Cleansed Data
Storage
Model Storage
Raw Data
Storage
Recommendation/
Score storage
Data Cleansing
Age
Calorie
intake
Blood
glucose
level
13. Data Cleansing and modeling
o Data cleansing uses statistical analysis tools to read and audit data based on
a list of pre-defined constraints.
Streaming
Data
Range
check
Validate
Data
Split
Training
data
Test data
15. EMR
Raw Data Clean Data Model
Train
Transformation/
Cleansing
Reference Architecture
16. EMR
Raw Data Clean Data Model
Train
Transformation/
Cleansing
Prediction
Reference Architecture
17. Architecture components
o Kafka: Get sensor data in real-time from Wearable devices
o Apache Spark: Ingest raw data through Kafka. Use Structured Streaming (Data verification,
validation, cleansing, enrichment, etc.), and store it in S3 buckets
o MLlib: Process data stored in S3 buckets via Machine Learning libraries. Insulin intake can be
recommended
o AWS: Deploy model and other related services in EC2, EMR, etc..
o Mobile or Web App: Notify patients with medication recommendation
o D3/Tableau: Visualize via charts/dashboards
18. Pain points
o Maintaining multiple root accounts for Dev, Pre-Prod and Prod
environments is expensive
o Choosing HIPAA compliant services (most of the server-less
technologies are not HIPAA compliant)
o We have to build secured network from scratch and maintain them (for
example: using terraform, cloud formation, etc.).
o End-to-end encryption: Data-in-flight and Data-at-rest encryption
19. HIPAA Challenges
o HIPAA requires Healthcare Data to be protected.
o Ensure the confidentiality, integrity, and availability of Protected
Health Information (PHI) created, received, maintained, or
transmitted.
o Protect against any reasonably anticipated threats and hazards to the
security or integrity of PHI.
o Protect against reasonably anticipated uses or disclosures of PHI not
permitted by the Privacy Rule.
22. Databricks – Kafka - Connector
Kafka
Connector
Train
Raw
Data
Cleansed
Data Model
Spark to
clean data
Spark ML
data
Prediction
23. o Hybrid only or single tenant
o Selected AWS BAA HIPAA services
o Databricks auxiliary services (Web app and cluster management software) would be in a Databricks-owned AWS account
and run on dedicated VPC instance.
o Spark clusters would continue to be deployed to customers AWS account and on dedicated instances.
o End to End Encryption: Data-in-flight and Data-at-rest encryption
o Logging and Monitoring
o Audit
https://docs.databricks.com/user-guide/advanced/hipaa-compliant-deployment.html
Deployment
29. Future directions
o Health: Extend it to apply to any medication management
based solutions and emergency medication management
o Wellness: Predict calorie intake
o Fitness: Predict workouts needed to be done
30. Acknowledgements
- Catherine Crofts, PhD, Auckland University of Technology, Auckland, New Zealand
https://www.researchgate.net/profile/Catherine_Crofts
- Baba Medicals (India)
- http://reference.medscape.com/drug/humalog-insulin-lispro-999005
31. References
o https://www.esri.com/
o https://risk.lexisnexis.com/
o https://symphonyhealth.prahs.com/
o https://docs.databricks.com/user-guide/advanced/hipaa-compliant-deployment.html
o https://databricks.com/blog/2017/05/18/taking-apache-sparks-structured-structured-streaming-to-
production.html
o https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-
sparks-structured-streaming.html
o https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html