3. • Large online international sportsbook
• ~450 employees in 6 offices
• Been around 20 years!
• Unique model that relies heavily on data science
• Risk Management, Trading
• Similar to Financial Markets
4. Several Packages on CRAN related to our domain
• Pinnacle.API
• Odds.Converter
• Pinnacle.Data
• Other open source contributions
Who is Pinnacle?
Avid users of R technologies and RStudio products
• RStudio Server Pro
• RStudio Connect
• Tidyverse!
• RMarkdown
• On the bleeding edge of R community users
5. Very complex modelling problems
• Sports Models
• Trading Algorithms
• High transactional systems
• Professional Algorithm Developers and Data Scientists
Why an Army of Data Scientists?
Every aspect of the business needs to be data-driven
• Finance / Payment providers
• Marketing
• Customer Service
• Business to Business
• Many “micro-problems” to solve, not enough Data Scientists
6. Our Idea:
• Every department needs Data Scientists
• Focus on Tidyverse
• Offer internal and external training to the entire company
(around 450 staff)
• Train Junior Data Scientists to do data analysis and
produce RMD to communicate results
Training an Army of Data Scientists
7. Training an Army of Data Scientists
Our Target Audience
• Many non-technical employees in various positions
• Never written a line of code
• Many without college degrees
• Example: Customer Service 15 years
• Similar talks such as Mine’s keynote at UseR 2017
focus on more technical students
8. Our Approach:
• DataCamp as basis for external training w/ defined
curriculum
• Internal training w/ 4 levels based on Master the Tidyverse
by Garret
Training an Army of Data Scientists
9. Why we like it:
• Self-paced
• Quality instructors and content
• Many topics
• Micro-Courses
Data Camp
BUT…
• For us, the curriculum was not ordered well
• We defined our own DataCamp curriculum chapter by
chapter
10. Level 1:
Data Camp – Current Curriculum
Time: 8 hrs.
• Introduction to R
• Ch. 3 Matrices
• Ch. 4 Factors
• Ch. 6 Lists
• Introduction to the tidyverse
11. Level 2:
Data Camp – Current Curriculum
Time: 18 hrs.
• Data Visualization with ggplot2 (Part1)
• Ch. 3 qplot and wrap-up
• Data Manipulation in R with dplyr
• Importing data in R Part 1
• Ch. 1 Importing data from flat files with utils
• Ch. 4 Reproducible Excel work with XL connect
• Introduction to R
• Ch. 4 Factors
• Working with the Rstudio IDE Part 1
• Importing and Cleaning Data in R case studies
12. Level 3:
Data Camp – Current Curriculum
Time: 25 hrs.
• Data Visualization with ggplot2 (Part2)
• Cleaning Data in R
• Reporting with R markdown
• Ch. 4 Configuring R Markdown (optional)
• Introduction to R
• Ch. 3 Matrices
• Ch. 6 Lists
• Working with the Rstudio IDE Part 2
• Intermediate R
• Exploratory data analysis in R case study
13. Level 4:
Data Camp – Current Curriculum
Time: 25 hrs.
• Joining Data in R with dplyr
• Intermediate R Practice
• String Manipulation in R with stringr
• Data Visualization with ggplot2 (Part3)
• Writing Functions in R
• Case study
• With the help of a Mentor you can develop
a capstone project that results into a
markdown or a shiny application.
Level 5:
17. Data Camp - Lessons Learned
DataCamp “ReadCamp” package Available on GitHub:
https://github.com/marcoblume/readcamp
18. Additional Internal Support
• Community of R experts eager to help
• #r – programming ~ 100 users
• Many internal packages
• ggplot theme / RMD template
• Rstudio Server Pro
• Admins can fix difficult install / config issues for users
• Basic environment works out of box
19. Lessons Learned
• RStudio Server Pro
• Allows us to setup / manage environment for Junior DS
• Control access to data / audit
• RStudio Connect
• Easy deployment / sharing
• Anyone can become a Junior Data Scientist – any background
• Motivation is key (use FUN datasets not mpg / iris)
• Experts / previous trainees helping
• Internal eco-system of packages to build upon
20. Lessons Learned
• Focus on TIDYVERSE only
• ggplot very important to master
• RMD is central to our business now
• Common template and theme make it easier
to read and interpret
• Communication is key
• Wrappers around data
• No SQL required
• Customize curriculum based on feedback
and business needs
21. Success Stories
“About a year ago, I was offered the possibility to enroll in a paid-
by-the-company R training. Being the kind of person who likes
going beyond the so-called comfort zone, I decided to take on the
challenge. I come from a humanistic background and math was
never my favorite subject in school. After some time learning R, I
realized that it is not that different from learning any other
language. I usually tell myself: “If you were able to learn Russian,
you are for sure able to learn R!”
22. Success Stories
“I was a CSD manager in Pinnacle for 15 years until I was offered a new
position as a Junior BI Analyst. I did not doubt to accept the new post as it
gave me the opportunity to pursue a new career. I feel excited about starting
this new path. The combination of my expertise within the CSD department
and the R-tools that I am learning to use will help me analyze data in a more
efficient way. I look forward to continue learning and becoming a better
analyst!”
23. Contact us – We are Hiring!
Email: recruitment@pinnacle.com
Twitter: @PinnacleSports