Welcome to Sanitas
04.09.2017 Zürich R User meetup1
822’500customers
750employees
#1in customer satisfaction1
1.4 mncustomer contacts
2.8 bn assets under management
61 mnCHF operating profit
Founded in 1958
2.7 bn CHF total revenues
2.5 bn CHF total paid-out claims
1 K-Tipp-Survey no. 15/2016
Source: annual report 2016
Zürich R User meetup
R at Sanitas – Workflow, Problems and
Solutions
Patrik Lengacher
Zürich, 04. September 2017
Source: Photoshopped adoption from : http://www.ecns.cn/cns-wire/2015/02-26/155833.shtml/
Who am I?
11/6/2017Zürich R – R at Sanitas3
Patrik Lengacher
Data Manager
Analytics
MSc, ETH, Mathematics
BSc, ETH, Mathematics
Sanitas – Health Insurance
Layzapp AG – Start up
Accenture – Consulting
Paul Scherrer Institute
github.com/plengacher
linkedin.com/in/plengacher
patrik.lengacher@{sanitas,gmail}.com
Agenda
11/6/2017Zürich R – R at Sanitas7
Zürich R – R at Sanitas
The data part of the data science workflow
Data Preparation
Modeling
Operationalize
Reproducibility
corporate identity & corporate design (CI/CD)
The data part of the data science workflow
11/6/2017Zürich R – R at Sanitas9
Data Sources
Data Prep
Optimize Data
Data blending
Data cleansing
Feature engineering
…
Modeling
Apply statistics / machine learning
Prototyping
Train & testing
Validation
Visualization
….
Operationalize
Deploy models
Dashboards
Reports
Presentations
…
Agenda
11/6/2017Zürich R – R at Sanitas10
Zürich R – R at Sanitas
The data part of the data science workflow
Data Preparation
Modeling
Operationalize
Reproducibility
corporate identity & corporate design (CI/CD)
Old New
Data Preparation
Firewalls and tool incompatibilities prohibit a clean workflow. Changes in the IT Infrastructure and the
use of our Rstudio Server lets R interact directly with the data sources.
11/6/2017Zürich R – R at Sanitas11
Agenda
11/6/2017Zürich R – R at Sanitas12
Zürich R – R at Sanitas
The data part of the data science workflow
Data Preparation
Modeling
Operationalize
Reproducibility
corporate identity & corporate design (CI/CD)
Old New
Modeling
If the local machine runs out of resources, isolated solutions were created. Due to the nature of the
solutions, they were not integrated in our infrastructure.
Zürich R – R at Sanitas13
Agenda
11/6/2017Zürich R – R at Sanitas14
Zürich R – R at Sanitas
The data part of the data science workflow
Data Preparation
Modeling
Operationalize
Reproducibility
corporate identity & corporate design (CI/CD)
Old New
Operationalize (I/II)
Results of Ad-Hoc Requests were written as comments into the source files. Furthermore emailing the
results became a time consuming repetitive task.
Zürich R – R at Sanitas15
Old New
Operationalize (II/II)
Results of Ad-Hoc Requests were written as comments into the source files. Furthermore emailing the
results became a time consuming repetitive task.
Zürich R – R at Sanitas16
Agenda
11/6/2017Zürich R – R at Sanitas17
Zürich R – R at Sanitas
The data part of the data science workflow
Data Preparation
Modeling
Operationalize
Reproducibility
corporate identity & corporate design (CI/CD)
Old New
Reproducibility (I/III)
Using markdown, shiny, MicroStrategy helps us to distribute results, presentations and reports. The
tools help to keep the documents up to date.
Zürich R – R at Sanitas18
Old New
Reproducibility (II/III)
Version Control and a SanitasR Package with the commonly used functions help us to reproduce our
results.
Zürich R – R at Sanitas19
Old New
Reproducibility (III/III)
Version Control and a SanitasR Package with the commonly used functions help us to reproduce our
results.
Zürich R – R at Sanitas20
Agenda
11/6/2017Zürich R – R at Sanitas21
Zürich R – R at Sanitas
The data part of the data science workflow
Data Preparation
Modeling
Operationalize
Reproducibility
corporate identity & corporate design (CI/CD)
Old New
CI/CD
Defining colors and themes help to get our plots CI/CD ready.
Zürich R – R at Sanitas22
06.11.2017Zürich R – R at Sanitas23

R at Sanitas - Workflow, Problems and Solutions

  • 1.
    Welcome to Sanitas 04.09.2017Zürich R User meetup1 822’500customers 750employees #1in customer satisfaction1 1.4 mncustomer contacts 2.8 bn assets under management 61 mnCHF operating profit Founded in 1958 2.7 bn CHF total revenues 2.5 bn CHF total paid-out claims 1 K-Tipp-Survey no. 15/2016 Source: annual report 2016
  • 2.
    Zürich R Usermeetup R at Sanitas – Workflow, Problems and Solutions Patrik Lengacher Zürich, 04. September 2017 Source: Photoshopped adoption from : http://www.ecns.cn/cns-wire/2015/02-26/155833.shtml/
  • 3.
    Who am I? 11/6/2017ZürichR – R at Sanitas3 Patrik Lengacher Data Manager Analytics MSc, ETH, Mathematics BSc, ETH, Mathematics Sanitas – Health Insurance Layzapp AG – Start up Accenture – Consulting Paul Scherrer Institute github.com/plengacher linkedin.com/in/plengacher patrik.lengacher@{sanitas,gmail}.com
  • 4.
    Agenda 11/6/2017Zürich R –R at Sanitas7 Zürich R – R at Sanitas The data part of the data science workflow Data Preparation Modeling Operationalize Reproducibility corporate identity & corporate design (CI/CD)
  • 5.
    The data partof the data science workflow 11/6/2017Zürich R – R at Sanitas9 Data Sources Data Prep Optimize Data Data blending Data cleansing Feature engineering … Modeling Apply statistics / machine learning Prototyping Train & testing Validation Visualization …. Operationalize Deploy models Dashboards Reports Presentations …
  • 6.
    Agenda 11/6/2017Zürich R –R at Sanitas10 Zürich R – R at Sanitas The data part of the data science workflow Data Preparation Modeling Operationalize Reproducibility corporate identity & corporate design (CI/CD)
  • 7.
    Old New Data Preparation Firewallsand tool incompatibilities prohibit a clean workflow. Changes in the IT Infrastructure and the use of our Rstudio Server lets R interact directly with the data sources. 11/6/2017Zürich R – R at Sanitas11
  • 8.
    Agenda 11/6/2017Zürich R –R at Sanitas12 Zürich R – R at Sanitas The data part of the data science workflow Data Preparation Modeling Operationalize Reproducibility corporate identity & corporate design (CI/CD)
  • 9.
    Old New Modeling If thelocal machine runs out of resources, isolated solutions were created. Due to the nature of the solutions, they were not integrated in our infrastructure. Zürich R – R at Sanitas13
  • 10.
    Agenda 11/6/2017Zürich R –R at Sanitas14 Zürich R – R at Sanitas The data part of the data science workflow Data Preparation Modeling Operationalize Reproducibility corporate identity & corporate design (CI/CD)
  • 11.
    Old New Operationalize (I/II) Resultsof Ad-Hoc Requests were written as comments into the source files. Furthermore emailing the results became a time consuming repetitive task. Zürich R – R at Sanitas15
  • 12.
    Old New Operationalize (II/II) Resultsof Ad-Hoc Requests were written as comments into the source files. Furthermore emailing the results became a time consuming repetitive task. Zürich R – R at Sanitas16
  • 13.
    Agenda 11/6/2017Zürich R –R at Sanitas17 Zürich R – R at Sanitas The data part of the data science workflow Data Preparation Modeling Operationalize Reproducibility corporate identity & corporate design (CI/CD)
  • 14.
    Old New Reproducibility (I/III) Usingmarkdown, shiny, MicroStrategy helps us to distribute results, presentations and reports. The tools help to keep the documents up to date. Zürich R – R at Sanitas18
  • 15.
    Old New Reproducibility (II/III) VersionControl and a SanitasR Package with the commonly used functions help us to reproduce our results. Zürich R – R at Sanitas19
  • 16.
    Old New Reproducibility (III/III) VersionControl and a SanitasR Package with the commonly used functions help us to reproduce our results. Zürich R – R at Sanitas20
  • 17.
    Agenda 11/6/2017Zürich R –R at Sanitas21 Zürich R – R at Sanitas The data part of the data science workflow Data Preparation Modeling Operationalize Reproducibility corporate identity & corporate design (CI/CD)
  • 18.
    Old New CI/CD Defining colorsand themes help to get our plots CI/CD ready. Zürich R – R at Sanitas22
  • 19.