SlideShare ist ein Scribd-Unternehmen logo
1 von 23
Training an Army of Data Scientists
Marco Blume – Trading Director
marco.blume@pinnacle.com
RStudio Conf 2018
Who
is
• Large online international sportsbook
• ~450 employees in 6 offices
• Been around 20 years!
• Unique model that relies heavily on data science
• Risk Management, Trading
• Similar to Financial Markets
Several Packages on CRAN related to our domain
• Pinnacle.API
• Odds.Converter
• Pinnacle.Data
• Other open source contributions
Who is Pinnacle?
Avid users of R technologies and RStudio products
• RStudio Server Pro
• RStudio Connect
• Tidyverse!
• RMarkdown
• On the bleeding edge of R community users
Very complex modelling problems
• Sports Models
• Trading Algorithms
• High transactional systems
• Professional Algorithm Developers and Data Scientists
Why an Army of Data Scientists?
Every aspect of the business needs to be data-driven
• Finance / Payment providers
• Marketing
• Customer Service
• Business to Business
• Many “micro-problems” to solve, not enough Data Scientists
Our Idea:
• Every department needs Data Scientists
• Focus on Tidyverse
• Offer internal and external training to the entire company
(around 450 staff)
• Train Junior Data Scientists to do data analysis and
produce RMD to communicate results
Training an Army of Data Scientists
Training an Army of Data Scientists
Our Target Audience
• Many non-technical employees in various positions
• Never written a line of code
• Many without college degrees
• Example: Customer Service 15 years
• Similar talks such as Mine’s keynote at UseR 2017
focus on more technical students
Our Approach:
• DataCamp as basis for external training w/ defined
curriculum
• Internal training w/ 4 levels based on Master the Tidyverse
by Garret
Training an Army of Data Scientists
Why we like it:
• Self-paced
• Quality instructors and content
• Many topics
• Micro-Courses
Data Camp
BUT…
• For us, the curriculum was not ordered well
• We defined our own DataCamp curriculum chapter by
chapter
Level 1:
Data Camp – Current Curriculum
Time: 8 hrs.
• Introduction to R
• Ch. 3 Matrices
• Ch. 4 Factors
• Ch. 6 Lists
• Introduction to the tidyverse
Level 2:
Data Camp – Current Curriculum
Time: 18 hrs.
• Data Visualization with ggplot2 (Part1)
• Ch. 3 qplot and wrap-up
• Data Manipulation in R with dplyr
• Importing data in R Part 1
• Ch. 1 Importing data from flat files with utils
• Ch. 4 Reproducible Excel work with XL connect
• Introduction to R
• Ch. 4 Factors
• Working with the Rstudio IDE Part 1
• Importing and Cleaning Data in R case studies
Level 3:
Data Camp – Current Curriculum
Time: 25 hrs.
• Data Visualization with ggplot2 (Part2)
• Cleaning Data in R
• Reporting with R markdown
• Ch. 4 Configuring R Markdown (optional)
• Introduction to R
• Ch. 3 Matrices
• Ch. 6 Lists
• Working with the Rstudio IDE Part 2
• Intermediate R
• Exploratory data analysis in R case study
Level 4:
Data Camp – Current Curriculum
Time: 25 hrs.
• Joining Data in R with dplyr
• Intermediate R Practice
• String Manipulation in R with stringr
• Data Visualization with ggplot2 (Part3)
• Writing Functions in R
• Case study
• With the help of a Mentor you can develop
a capstone project that results into a
markdown or a shiny application.
Level 5:
Data Camp - Lessons Learned
Data Camp - Lessons Learned
Data Camp - Lessons Learned
Data Camp - Lessons Learned
DataCamp “ReadCamp” package Available on GitHub:
https://github.com/marcoblume/readcamp
Additional Internal Support
• Community of R experts eager to help
• #r – programming ~ 100 users
• Many internal packages
• ggplot theme / RMD template
• Rstudio Server Pro
• Admins can fix difficult install / config issues for users
• Basic environment works out of box
Lessons Learned
• RStudio Server Pro
• Allows us to setup / manage environment for Junior DS
• Control access to data / audit
• RStudio Connect
• Easy deployment / sharing
• Anyone can become a Junior Data Scientist – any background
• Motivation is key (use FUN datasets not mpg / iris)
• Experts / previous trainees helping
• Internal eco-system of packages to build upon
Lessons Learned
• Focus on TIDYVERSE only
• ggplot very important to master
• RMD is central to our business now
• Common template and theme make it easier
to read and interpret
• Communication is key
• Wrappers around data
• No SQL required
• Customize curriculum based on feedback
and business needs
Success Stories
“About a year ago, I was offered the possibility to enroll in a paid-
by-the-company R training. Being the kind of person who likes
going beyond the so-called comfort zone, I decided to take on the
challenge. I come from a humanistic background and math was
never my favorite subject in school. After some time learning R, I
realized that it is not that different from learning any other
language. I usually tell myself: “If you were able to learn Russian,
you are for sure able to learn R!”
Success Stories
“I was a CSD manager in Pinnacle for 15 years until I was offered a new
position as a Junior BI Analyst. I did not doubt to accept the new post as it
gave me the opportunity to pursue a new career. I feel excited about starting
this new path. The combination of my expertise within the CSD department
and the R-tools that I am learning to use will help me analyze data in a more
efficient way. I look forward to continue learning and becoming a better
analyst!”
Contact us – We are Hiring!
Email: recruitment@pinnacle.com
Twitter: @PinnacleSports

Weitere ähnliche Inhalte

Ähnlich wie R Studio Conference

Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
Edureka!
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
Edureka!
 

Ähnlich wie R Studio Conference (20)

How to become a data scientist
How to become a data scientist How to become a data scientist
How to become a data scientist
 
Level Hybrid Program
Level Hybrid ProgramLevel Hybrid Program
Level Hybrid Program
 
Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017Team Data Science Process Presentation (TDSP), Aug 29, 2017
Team Data Science Process Presentation (TDSP), Aug 29, 2017
 
Data science training
Data science trainingData science training
Data science training
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
 
Business Analytics with R
Business Analytics with RBusiness Analytics with R
Business Analytics with R
 
fINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptxfINAL Lesson_1_Course_Introduction_v1.pptx
fINAL Lesson_1_Course_Introduction_v1.pptx
 
Project management for Big Data projects
Project management for Big Data projectsProject management for Big Data projects
Project management for Big Data projects
 
Project management for Big Data projects
Project management for Big Data projectsProject management for Big Data projects
Project management for Big Data projects
 
Data Science with Python - WeCloudData
Data Science with Python - WeCloudDataData Science with Python - WeCloudData
Data Science with Python - WeCloudData
 
Pega online training in canada
Pega online training in canadaPega online training in canada
Pega online training in canada
 
How to Start a Career in Data Science - Jovian.ml
How to Start a Career in Data Science - Jovian.ml How to Start a Career in Data Science - Jovian.ml
How to Start a Career in Data Science - Jovian.ml
 
Primavera P6 Professional Fundamentals Training
Primavera P6 Professional Fundamentals TrainingPrimavera P6 Professional Fundamentals Training
Primavera P6 Professional Fundamentals Training
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programming
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programming
 
Learn data science with r programming (1)
Learn data science with r programming (1)Learn data science with r programming (1)
Learn data science with r programming (1)
 
Learn data science with r programming
Learn data science with r programmingLearn data science with r programming
Learn data science with r programming
 
How to start your data career
How to start your data careerHow to start your data career
How to start your data career
 
Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!Learn Business Analytics with R at edureka!
Learn Business Analytics with R at edureka!
 
Essentials for a Better ICT Student in Palestine
Essentials for a Better ICT Student in PalestineEssentials for a Better ICT Student in Palestine
Essentials for a Better ICT Student in Palestine
 

Kürzlich hochgeladen

Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Kürzlich hochgeladen (20)

PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
psychiatric nursing HISTORY COLLECTION .docx
psychiatric  nursing HISTORY  COLLECTION  .docxpsychiatric  nursing HISTORY  COLLECTION  .docx
psychiatric nursing HISTORY COLLECTION .docx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 

R Studio Conference

  • 1. Training an Army of Data Scientists Marco Blume – Trading Director marco.blume@pinnacle.com RStudio Conf 2018
  • 3. • Large online international sportsbook • ~450 employees in 6 offices • Been around 20 years! • Unique model that relies heavily on data science • Risk Management, Trading • Similar to Financial Markets
  • 4. Several Packages on CRAN related to our domain • Pinnacle.API • Odds.Converter • Pinnacle.Data • Other open source contributions Who is Pinnacle? Avid users of R technologies and RStudio products • RStudio Server Pro • RStudio Connect • Tidyverse! • RMarkdown • On the bleeding edge of R community users
  • 5. Very complex modelling problems • Sports Models • Trading Algorithms • High transactional systems • Professional Algorithm Developers and Data Scientists Why an Army of Data Scientists? Every aspect of the business needs to be data-driven • Finance / Payment providers • Marketing • Customer Service • Business to Business • Many “micro-problems” to solve, not enough Data Scientists
  • 6. Our Idea: • Every department needs Data Scientists • Focus on Tidyverse • Offer internal and external training to the entire company (around 450 staff) • Train Junior Data Scientists to do data analysis and produce RMD to communicate results Training an Army of Data Scientists
  • 7. Training an Army of Data Scientists Our Target Audience • Many non-technical employees in various positions • Never written a line of code • Many without college degrees • Example: Customer Service 15 years • Similar talks such as Mine’s keynote at UseR 2017 focus on more technical students
  • 8. Our Approach: • DataCamp as basis for external training w/ defined curriculum • Internal training w/ 4 levels based on Master the Tidyverse by Garret Training an Army of Data Scientists
  • 9. Why we like it: • Self-paced • Quality instructors and content • Many topics • Micro-Courses Data Camp BUT… • For us, the curriculum was not ordered well • We defined our own DataCamp curriculum chapter by chapter
  • 10. Level 1: Data Camp – Current Curriculum Time: 8 hrs. • Introduction to R • Ch. 3 Matrices • Ch. 4 Factors • Ch. 6 Lists • Introduction to the tidyverse
  • 11. Level 2: Data Camp – Current Curriculum Time: 18 hrs. • Data Visualization with ggplot2 (Part1) • Ch. 3 qplot and wrap-up • Data Manipulation in R with dplyr • Importing data in R Part 1 • Ch. 1 Importing data from flat files with utils • Ch. 4 Reproducible Excel work with XL connect • Introduction to R • Ch. 4 Factors • Working with the Rstudio IDE Part 1 • Importing and Cleaning Data in R case studies
  • 12. Level 3: Data Camp – Current Curriculum Time: 25 hrs. • Data Visualization with ggplot2 (Part2) • Cleaning Data in R • Reporting with R markdown • Ch. 4 Configuring R Markdown (optional) • Introduction to R • Ch. 3 Matrices • Ch. 6 Lists • Working with the Rstudio IDE Part 2 • Intermediate R • Exploratory data analysis in R case study
  • 13. Level 4: Data Camp – Current Curriculum Time: 25 hrs. • Joining Data in R with dplyr • Intermediate R Practice • String Manipulation in R with stringr • Data Visualization with ggplot2 (Part3) • Writing Functions in R • Case study • With the help of a Mentor you can develop a capstone project that results into a markdown or a shiny application. Level 5:
  • 14. Data Camp - Lessons Learned
  • 15. Data Camp - Lessons Learned
  • 16. Data Camp - Lessons Learned
  • 17. Data Camp - Lessons Learned DataCamp “ReadCamp” package Available on GitHub: https://github.com/marcoblume/readcamp
  • 18. Additional Internal Support • Community of R experts eager to help • #r – programming ~ 100 users • Many internal packages • ggplot theme / RMD template • Rstudio Server Pro • Admins can fix difficult install / config issues for users • Basic environment works out of box
  • 19. Lessons Learned • RStudio Server Pro • Allows us to setup / manage environment for Junior DS • Control access to data / audit • RStudio Connect • Easy deployment / sharing • Anyone can become a Junior Data Scientist – any background • Motivation is key (use FUN datasets not mpg / iris) • Experts / previous trainees helping • Internal eco-system of packages to build upon
  • 20. Lessons Learned • Focus on TIDYVERSE only • ggplot very important to master • RMD is central to our business now • Common template and theme make it easier to read and interpret • Communication is key • Wrappers around data • No SQL required • Customize curriculum based on feedback and business needs
  • 21. Success Stories “About a year ago, I was offered the possibility to enroll in a paid- by-the-company R training. Being the kind of person who likes going beyond the so-called comfort zone, I decided to take on the challenge. I come from a humanistic background and math was never my favorite subject in school. After some time learning R, I realized that it is not that different from learning any other language. I usually tell myself: “If you were able to learn Russian, you are for sure able to learn R!”
  • 22. Success Stories “I was a CSD manager in Pinnacle for 15 years until I was offered a new position as a Junior BI Analyst. I did not doubt to accept the new post as it gave me the opportunity to pursue a new career. I feel excited about starting this new path. The combination of my expertise within the CSD department and the R-tools that I am learning to use will help me analyze data in a more efficient way. I look forward to continue learning and becoming a better analyst!”
  • 23. Contact us – We are Hiring! Email: recruitment@pinnacle.com Twitter: @PinnacleSports

Hinweis der Redaktion

  1. Website Screenshot Use the R font in an image
  2. Website Screenshot Use the R font in an image