SlideShare ist ein Scribd-Unternehmen logo
1 von 21
Downloaden Sie, um offline zu lesen
RWE & Patient Analytics
Leveraging Databricks
An Use Case
Harini Gopalakrishnan & Martin Longpre
Sanofi
Disclaimer
• The views and opinions expressed in this presentation are that of
the individual presenter and should not be attributed to any
organization with whom the presenter is employed or affiliated
• All registered trademarks cited are property of their respective
owners.
Agenda
Harini Gopalakrishnan -20 minutes
▪ What is Real world evidence and Real world data
▪ Advanced analytics in RWE generation
▪ Security and privacy of our Data
▪ Our journey – an conceptual view of the architecture
and what we have achieved
Martin Longpre – 20 minutes
▪ Databricks implementation- our customization
▪ Demo
▪ Look forward: where we want to partner for
improvements
Q&A – 20 minutes
Defining the Problem- Real World Data and
Evidence
Context: How do we define RWE & RWD
Real World Data (RWD) is a term used to
describe health care related data that are
collected outside the context of
randomized clinical trials (RCTs),
Real world evidence (RWE) is defined as the
insight or knowledge derived from the analysis
of real world data, conducted to respond to a
specific research question
RWE leverages analytics on RWD to discover, develop, deliver and
provide new insights on healthcare interventions
Examples of Real-world data sources
~ 130 TB (EHR/Claims)
~2000 TB per month in versions, transformations
Analysis in RWE: Advanced analytics methodology
Traditional analytics
• Traditional RWE statistics, meta-analysis, data modelling, propensity-score matching
Advanced analytics
• Predictive modelling, unsupervised clustering, rule extraction, model bootstrapping,
natural language processing, machine learning
Machine learning: a computer
program is said to learn from
experience (partially captured
within data), when its performance
increases with experience
Supervised techniques example
• Logistic regression
• Markov chain
• Bayesian network
• K-nearest neighbour
Non-supervised techniques examples
• K-means clustering
• Hierarchical ascendant classification
• Factorial analyses
• Non-negative matrix factorization
Innovation in evidence generation
Uses of RWE – why is it valuable
https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development
The driving reasons for
leveraging them more
recently include:
• Ease of availability in
compute resources for big
data
• Availability of curated and
high quality data sources
both internally and
externally
Real world evidence influences all aspects of a pharma value chain
Regulatory Decision
making
Reimbursement decisions
Clinical Guidelines
2 3
1
Transforming RWD to Evidence: Use case in action
AI based indication searching approach that relies on Real-World Data thus bringing a higher confidence and reducing
biases
Data is always privacy preserved and de-identified
Sanofi: Novel Indications via AI —
Finding new treatment indications for an
approved therapy is of immense value to
pharma for drug re-purposing efforts,
R&D candidate prioritization, and overall
productivity. Sanofi wanted to develop an
AI based indication searching approach
that relies on real-world data thus
bringing a higher confidence and
reducing biases. Sanofi applied
unsupervised machine learning to create
a phenotypic cluster of patients in order
to identify relevant indications that
worked across clusters. The pipeline
crunched nearly 17 million patients with
2,700 characteristics derived from
electronic health records (EHRs) The
initial results of the novel approach
recovered 90% of known indications and
identified many more deemed credible by
development teams producing a higher
level of confidence in results and a
reduction in cost and time to market, with
fewer, faster and more targeted trials,
while minimizing attrition and risk.
https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-th
e-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-award
Winner of the Gartner Award 2020 for Innovation in Health care and
Lifesciences
https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-th
e-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-award
Trust of data and analysis being performed is a MUST
“ Patients and consumers have a
significant role to play in the
collection of real-world data and
generation of real-world evidence,
but to be effective, patient and
consumer engagement approaches
would include considering them
partners and capturing outcomes that
are important to them “
▪ Patient consent is a must
▪ Privacy preserved linkage must be
performed, encryption is a key
aspect
▪ Establish trusted Patient relationship
to explain the usage of data and
consent (e. g: secondary use of
primary data)
▪ Data should not be used beyond the
intended purpose- governance
around the usage is a must
Our Architecture & implementation
Key aspects of a RWE Ecosystem
Data
Management
Secure data
storage – triple
encrypted with
audited access
control
Full data lineage –
complete history
of every data
transformation
Data pipeline –
designed for high
performance
handling of big
data
Analytics
Self-service tools
– filtering and
querying tools for
feasibility an
descriptive
information
Interactive tools –
dashboards and
applications for
study execution
Low-level tools –
R, Python and
SQL for
comparative
analysis and
advanced
analytics
Access
Control
Multi-tenant
configuration –
provide each
organization with
their own
namespace
User provisioning
– role-based
access controlled
by each
organization
Inherited data
permissions –
transformed data
retains access
control
Auditing
and
Monitoring
Full auditing of
user actions – log
each action and
generate reports
Comprehensive
monitoring –
performance,
usage, and
custom actions
Powerful computer resources to handle billions of rows of data
Complete history of all data updates, with ability to bind to
specific versions
Complete data traceability – every transform and resulting data
set is captured
Robust data security and access control for all data and projects
Ability to manage metadata, reference data and master data
Built on a scalable data lake
What does our system offer?
14
Data is always privacy preserved and de-identified. We do not own the KEY for re-identification within this eco system
Disclaimer: For example purposes only
Clinical Bioinformatics
Internal Sources External Sources
Self Service Analysis Advanced Analytics
Data Augmentation
Visualization / Dashboards
Data lake (Sanofi AWS )
Artificial Intelligence/ML
Standardized analytical workflows
Cohort Definitions and Data Modelling
Conventional Studies
(NLP)
Secured and Traceable Sanofi controlled
environment
Data and Analysis Collaboration*
Societies and
Consortia
Academic
Institutions
Regulatory
Agencies
Internal sources
Insights
External Collaboration Other Internal Platforms
The Conceptual architecture
https://aws.amazon.com/blogs/industries/sanofi-webinar-performing-end-to-end-real-world-evidence-generation-with-traceability-and-transparency-on-aws/
Data lake
(Secured and Access controlled at the data level)
When do we use Databricks
▪ Exploratory use cases – projects where we need to run AI/ML workflow for use cases that require
GPU , custom libraries, NLP /sentiment analysis
▪ Cross functional team: working on a specific project – both internal and external stakeholders
▪ Flexibility: Ability for users to manage their own cluster profiles – size up and down based on
policy
▪ Data ingestion pipelines migrating away from AWS Glue and Batch for cost and performance
reasons- 30% improvement in costs & productivity
▪ Delta lake under analysis: today it is directly managed in parquet /S3
▪ SQL analytics: under evaluation
▪ Usage of our Azure AD
configuration
▪ One AD groups per data
type
▪ Deactivation of the DBFS
file system for end users
(DBFS not align with our
data restriction polices)
▪ All data access are
predefined and available
through /mnt
▪ Integration of the
DB REPOS feature
connected directly
to our enterprise
Gitlab services
▪ Usage of CI/CD
pipelines for
deploying scripts
and tasks
Passthrough for Security
▪ Cluster names suffixed with
the policies names for audit
and monitoring
▪ Limit the type of worker
and driver for better budget
management
▪ Enforce the termination of
cluster with default values
based on projects/use cases
(manage by cluster policies)
Databricks Customization (1/2)
Gitlab integration Cluster Policies
▪ Only used for specific use case mostly for
Rstudio
▪Fully integrated to our AWS stack
▪IAM roles setup for S3 bucket accesses
▪One home folder per users created by default
(internal process)
Instance Profiling IAM roles and policies
Databricks Customization (2/2)
Demo and Future
Demo
Improvements
▪ Support for R studio
▪ Data access control and policy propagation to restrict
unauthorized use of data- no lineage on data
Summary- Our Journey and benefits
▪ Started from a traditional ware house 3
years ago to crate an end to end eco
system for evidence generation and insights
▪ Helped move away from conventional to
more advanced analytical approaches
leveraging the power of big data and cloud
▪ Delivered several evidence generating
studies, i.e studies at scale that have
impacted all aspects of pharma value chain
with demonstratable ROI
https://www.dovepress.com/cr_data/article_fulltext/s160000/160029/img/jmdh-160029_F003.jpg
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Weitere ähnliche Inhalte

Was ist angesagt?

Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
Jeffrey T. Pollock
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
Kai Wähner
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
Denodo
 

Was ist angesagt? (20)

Architecting a datalake
Architecting a datalakeArchitecting a datalake
Architecting a datalake
 
Lakehouse in Azure
Lakehouse in AzureLakehouse in Azure
Lakehouse in Azure
 
Modern Data architecture Design
Modern Data architecture DesignModern Data architecture Design
Modern Data architecture Design
 
Data Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to MeshData Mesh Part 4 Monolith to Mesh
Data Mesh Part 4 Monolith to Mesh
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
Introducing Azure SQL Database
Introducing Azure SQL DatabaseIntroducing Azure SQL Database
Introducing Azure SQL Database
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Activate Data Governance Using the Data Catalog
Activate Data Governance Using the Data CatalogActivate Data Governance Using the Data Catalog
Activate Data Governance Using the Data Catalog
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Building Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics PrimerBuilding Lakehouses on Delta Lake with SQL Analytics Primer
Building Lakehouses on Delta Lake with SQL Analytics Primer
 
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
Introduction to Data Virtualization (session 1 from Packed Lunch Webinar Series)
 
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
Intuit's Data Mesh - Data Mesh Leaning Community meetup 5.13.2021
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Building your Datalake on AWS
Building your Datalake on AWSBuilding your Datalake on AWS
Building your Datalake on AWS
 
Data Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital TransformationData Architecture Strategies: Data Architecture for Digital Transformation
Data Architecture Strategies: Data Architecture for Digital Transformation
 
Data Migration to Azure
Data Migration to AzureData Migration to Azure
Data Migration to Azure
 
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solutionDifferentiate Big Data vs Data Warehouse use cases for a cloud solution
Differentiate Big Data vs Data Warehouse use cases for a cloud solution
 

Ähnlich wie RWE & Patient Analytics Leveraging Databricks – A Use Case

Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
Matt Barnes
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
Denodo
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Denodo
 
Optimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WP
Optimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WPOptimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WP
Optimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WP
Radium Communications
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
Denodo
 

Ähnlich wie RWE & Patient Analytics Leveraging Databricks – A Use Case (20)

Transforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform StrategyTransforming GE Healthcare with Data Platform Strategy
Transforming GE Healthcare with Data Platform Strategy
 
Maven and google pharma r&d (1)
Maven and google pharma r&d  (1)Maven and google pharma r&d  (1)
Maven and google pharma r&d (1)
 
FDA News Webinar - Inspection Intelligence
FDA News Webinar - Inspection IntelligenceFDA News Webinar - Inspection Intelligence
FDA News Webinar - Inspection Intelligence
 
FDA News Webinar - Inspection Intelligence
FDA News Webinar - Inspection IntelligenceFDA News Webinar - Inspection Intelligence
FDA News Webinar - Inspection Intelligence
 
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
tranSMART Community Meeting 5-7 Nov 13 - Session 5: Recent tranSMART Lessons ...
 
Roadmap to next generation digital lab
Roadmap to next generation digital labRoadmap to next generation digital lab
Roadmap to next generation digital lab
 
Enabling patient-centricity-pfizer
Enabling patient-centricity-pfizerEnabling patient-centricity-pfizer
Enabling patient-centricity-pfizer
 
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
Enabling Patient Centricity for Pfizer through AWS Cloud (LFS301-S-i) - AWS r...
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 
Bridging Health Care and Clinical Trial Data through Technology
Bridging Health Care and Clinical Trial Data through TechnologyBridging Health Care and Clinical Trial Data through Technology
Bridging Health Care and Clinical Trial Data through Technology
 
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
 
Forrester2019
Forrester2019Forrester2019
Forrester2019
 
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
Implementar una estrategia eficiente de gobierno y seguridad del dato con la ...
 
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and ClinicalBig Data Analytics for Healthcare Decision Support- Operational and Clinical
Big Data Analytics for Healthcare Decision Support- Operational and Clinical
 
Optimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WP
Optimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WPOptimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WP
Optimizing_Customer_Lifecycle_with_Big_Data_Analytics_4079WP
 
Regulatory Intelligence
Regulatory IntelligenceRegulatory Intelligence
Regulatory Intelligence
 
Five Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data GovernanceFive Things to Consider About Data Mesh and Data Governance
Five Things to Consider About Data Mesh and Data Governance
 
Building an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-MakingBuilding an Intelligent Biobank to Power Research Decision-Making
Building an Intelligent Biobank to Power Research Decision-Making
 
Enterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for HealthcareEnterprise Analytics: Serving Big Data Projects for Healthcare
Enterprise Analytics: Serving Big Data Projects for Healthcare
 
Computer Software Assurance (CSA): Understanding the FDA’s New Draft Guidance
Computer Software Assurance (CSA): Understanding the FDA’s New Draft GuidanceComputer Software Assurance (CSA): Understanding the FDA’s New Draft Guidance
Computer Software Assurance (CSA): Understanding the FDA’s New Draft Guidance
 

Mehr von Databricks

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
Databricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Databricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
Databricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
Databricks
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Databricks
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Databricks
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
Databricks
 

Mehr von Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 
Jeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and QualityJeeves Grows Up: An AI Chatbot for Performance and Quality
Jeeves Grows Up: An AI Chatbot for Performance and Quality
 
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + FugueIntuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
Intuitive & Scalable Hyperparameter Tuning with Apache Spark + Fugue
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
Improving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot InstancesImproving Apache Spark for Dynamic Allocation and Spot Instances
Improving Apache Spark for Dynamic Allocation and Spot Instances
 

Kürzlich hochgeladen

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
vexqp
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Bertram Ludäscher
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
ptikerjasaptiker
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
vexqp
 

Kürzlich hochgeladen (20)

Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...Reconciling Conflicting Data Curation Actions:  Transparency Through Argument...
Reconciling Conflicting Data Curation Actions: Transparency Through Argument...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling ManjurJual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
Jual Cytotec Asli Obat Aborsi No. 1 Paling Manjur
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
怎样办理圣路易斯大学毕业证(SLU毕业证书)成绩单学校原版复制
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 

RWE & Patient Analytics Leveraging Databricks – A Use Case

  • 1. RWE & Patient Analytics Leveraging Databricks An Use Case Harini Gopalakrishnan & Martin Longpre Sanofi
  • 2. Disclaimer • The views and opinions expressed in this presentation are that of the individual presenter and should not be attributed to any organization with whom the presenter is employed or affiliated • All registered trademarks cited are property of their respective owners.
  • 3. Agenda Harini Gopalakrishnan -20 minutes ▪ What is Real world evidence and Real world data ▪ Advanced analytics in RWE generation ▪ Security and privacy of our Data ▪ Our journey – an conceptual view of the architecture and what we have achieved Martin Longpre – 20 minutes ▪ Databricks implementation- our customization ▪ Demo ▪ Look forward: where we want to partner for improvements Q&A – 20 minutes
  • 4. Defining the Problem- Real World Data and Evidence
  • 5. Context: How do we define RWE & RWD Real World Data (RWD) is a term used to describe health care related data that are collected outside the context of randomized clinical trials (RCTs), Real world evidence (RWE) is defined as the insight or knowledge derived from the analysis of real world data, conducted to respond to a specific research question RWE leverages analytics on RWD to discover, develop, deliver and provide new insights on healthcare interventions Examples of Real-world data sources ~ 130 TB (EHR/Claims) ~2000 TB per month in versions, transformations
  • 6. Analysis in RWE: Advanced analytics methodology Traditional analytics • Traditional RWE statistics, meta-analysis, data modelling, propensity-score matching Advanced analytics • Predictive modelling, unsupervised clustering, rule extraction, model bootstrapping, natural language processing, machine learning Machine learning: a computer program is said to learn from experience (partially captured within data), when its performance increases with experience Supervised techniques example • Logistic regression • Markov chain • Bayesian network • K-nearest neighbour Non-supervised techniques examples • K-means clustering • Hierarchical ascendant classification • Factorial analyses • Non-negative matrix factorization Innovation in evidence generation
  • 7. Uses of RWE – why is it valuable https://www.healthcatalyst.com/insights/real-world-data-chief-driver-drug-development The driving reasons for leveraging them more recently include: • Ease of availability in compute resources for big data • Availability of curated and high quality data sources both internally and externally Real world evidence influences all aspects of a pharma value chain Regulatory Decision making Reimbursement decisions Clinical Guidelines 2 3 1
  • 8. Transforming RWD to Evidence: Use case in action AI based indication searching approach that relies on Real-World Data thus bringing a higher confidence and reducing biases Data is always privacy preserved and de-identified Sanofi: Novel Indications via AI — Finding new treatment indications for an approved therapy is of immense value to pharma for drug re-purposing efforts, R&D candidate prioritization, and overall productivity. Sanofi wanted to develop an AI based indication searching approach that relies on real-world data thus bringing a higher confidence and reducing biases. Sanofi applied unsupervised machine learning to create a phenotypic cluster of patients in order to identify relevant indications that worked across clusters. The pipeline crunched nearly 17 million patients with 2,700 characteristics derived from electronic health records (EHRs) The initial results of the novel approach recovered 90% of known indications and identified many more deemed credible by development teams producing a higher level of confidence in results and a reduction in cost and time to market, with fewer, faster and more targeted trials, while minimizing attrition and risk. https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-th e-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-award
  • 9. Winner of the Gartner Award 2020 for Innovation in Health care and Lifesciences https://www.gartner.com/en/newsroom/press-releases/2020-11-17-gartner-announces-winners-of-th e-2020-gartner-healthcare-and-life-sciences-eye-on-innovation-award
  • 10. Trust of data and analysis being performed is a MUST “ Patients and consumers have a significant role to play in the collection of real-world data and generation of real-world evidence, but to be effective, patient and consumer engagement approaches would include considering them partners and capturing outcomes that are important to them “ ▪ Patient consent is a must ▪ Privacy preserved linkage must be performed, encryption is a key aspect ▪ Establish trusted Patient relationship to explain the usage of data and consent (e. g: secondary use of primary data) ▪ Data should not be used beyond the intended purpose- governance around the usage is a must
  • 11. Our Architecture & implementation
  • 12. Key aspects of a RWE Ecosystem Data Management Secure data storage – triple encrypted with audited access control Full data lineage – complete history of every data transformation Data pipeline – designed for high performance handling of big data Analytics Self-service tools – filtering and querying tools for feasibility an descriptive information Interactive tools – dashboards and applications for study execution Low-level tools – R, Python and SQL for comparative analysis and advanced analytics Access Control Multi-tenant configuration – provide each organization with their own namespace User provisioning – role-based access controlled by each organization Inherited data permissions – transformed data retains access control Auditing and Monitoring Full auditing of user actions – log each action and generate reports Comprehensive monitoring – performance, usage, and custom actions
  • 13. Powerful computer resources to handle billions of rows of data Complete history of all data updates, with ability to bind to specific versions Complete data traceability – every transform and resulting data set is captured Robust data security and access control for all data and projects Ability to manage metadata, reference data and master data Built on a scalable data lake What does our system offer?
  • 14. 14 Data is always privacy preserved and de-identified. We do not own the KEY for re-identification within this eco system Disclaimer: For example purposes only Clinical Bioinformatics Internal Sources External Sources Self Service Analysis Advanced Analytics Data Augmentation Visualization / Dashboards Data lake (Sanofi AWS ) Artificial Intelligence/ML Standardized analytical workflows Cohort Definitions and Data Modelling Conventional Studies (NLP) Secured and Traceable Sanofi controlled environment Data and Analysis Collaboration* Societies and Consortia Academic Institutions Regulatory Agencies Internal sources Insights External Collaboration Other Internal Platforms The Conceptual architecture https://aws.amazon.com/blogs/industries/sanofi-webinar-performing-end-to-end-real-world-evidence-generation-with-traceability-and-transparency-on-aws/ Data lake (Secured and Access controlled at the data level)
  • 15. When do we use Databricks ▪ Exploratory use cases – projects where we need to run AI/ML workflow for use cases that require GPU , custom libraries, NLP /sentiment analysis ▪ Cross functional team: working on a specific project – both internal and external stakeholders ▪ Flexibility: Ability for users to manage their own cluster profiles – size up and down based on policy ▪ Data ingestion pipelines migrating away from AWS Glue and Batch for cost and performance reasons- 30% improvement in costs & productivity ▪ Delta lake under analysis: today it is directly managed in parquet /S3 ▪ SQL analytics: under evaluation
  • 16. ▪ Usage of our Azure AD configuration ▪ One AD groups per data type ▪ Deactivation of the DBFS file system for end users (DBFS not align with our data restriction polices) ▪ All data access are predefined and available through /mnt ▪ Integration of the DB REPOS feature connected directly to our enterprise Gitlab services ▪ Usage of CI/CD pipelines for deploying scripts and tasks Passthrough for Security ▪ Cluster names suffixed with the policies names for audit and monitoring ▪ Limit the type of worker and driver for better budget management ▪ Enforce the termination of cluster with default values based on projects/use cases (manage by cluster policies) Databricks Customization (1/2) Gitlab integration Cluster Policies
  • 17. ▪ Only used for specific use case mostly for Rstudio ▪Fully integrated to our AWS stack ▪IAM roles setup for S3 bucket accesses ▪One home folder per users created by default (internal process) Instance Profiling IAM roles and policies Databricks Customization (2/2)
  • 19. Improvements ▪ Support for R studio ▪ Data access control and policy propagation to restrict unauthorized use of data- no lineage on data
  • 20. Summary- Our Journey and benefits ▪ Started from a traditional ware house 3 years ago to crate an end to end eco system for evidence generation and insights ▪ Helped move away from conventional to more advanced analytical approaches leveraging the power of big data and cloud ▪ Delivered several evidence generating studies, i.e studies at scale that have impacted all aspects of pharma value chain with demonstratable ROI https://www.dovepress.com/cr_data/article_fulltext/s160000/160029/img/jmdh-160029_F003.jpg
  • 21. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.