This talk will present R as a programming language suited for solving data analysis and modeling problems, MLflow as an open source project to help organizations manage their machine learning lifecycle and the intersection of both by adding support for R in MLflow. It will be highly interactive and touch on some of the technical implementation choices taken while making R available in MLflow. It will also demonstrate using MLflow tracking, projects, and models directly from R as well as reusing R models in MLflow to interoperate with other programming languages and technologies.
5. MLflowMLflow
“Helps teams manage their machine learning lifecycle.”
Tracking : Track experiments to record and compare params and results.
Projects : Reuse and reproduce code to share or transfer to production.
Models : Manage and deploy models from across libraries and platforms.
7. R LanguageR Language
R is a programming language and free so ware environment
for statistical computing and graphics.
Interface language diagram by John Chambers - Rick Becker useR 2016.
8. R CommunityR Community
Provides a rich package archive provided in and :
to manipulate data, to analyze clusters, to
visualize data, etc.
CRAN Bioconductor
dplyr cluster ggplot2
Daily downloads of CRAN packages.
9. R LanguageR Language
Language features I would highlighting:
2.1.1 Vectors
2.1.4 Expression objects
2.1.8 Promise objects
2.1.9 Dot-dot-dot
3.1.4 Operators
cran.r project.org/doc/manuals/R-lang.html
11. How to NOT write R codeHow to NOT write R code
This is how I would have written R code as a so ware engineer before
knowing R:
# Select columns subset
data = data.frame(mtcars$cyl, mtcars$hp)
colnames(data) = c("cyl", "hp")
# Transform each row
for (idx in 1:nrow(data)) {
data$cyl[idx] = data$cyl[idx] + 2
}
# One column at a time to use the CPU cache efficiently
for (idx in 1:nrow(data)) {
data$hp[idx] = data$hp[idx] + 20
}
12. 2.1.1 Vectors2.1.1 Vectors
Everything is a vector in R:
# Select columns subset
data = data.frame(mtcars$cyl, mtcars$hp)
colnames(data) = c("cyl", "hp")
# Transform each row
data$cyl = data$cyl + 2
# One column at a time to use the CPU cache efficiently
data$hp = data$hp + 20
13. 2.1.9 Dot-dot-dot2.1.9 Dot-dot-dot
Dynamic parameters using the ... parameter:
# Select columns subset
data = data.frame(cyl = mtcars$cyl, hp = mtcars$hp)
# Transform each row
data$cyl = data$cyl + 2
# One column at a time to use the CPU cache efficiently
data$hp = data$hp + 20
14. 2.1.4/2.1.8 Expression and Promise objects2.1.4/2.1.8 Expression and Promise objects
One can lazily evaluate operations and operate over expressions:
# Select columns subset
data = select(mtcars, cyl, hp)
# Transform each row
data = mutate(data, cyl = cyl + 2)
# One column at a time to use the CPU cache efficiently
data = mutate(data, hp = hp + 20)
15. 3.1.4 Operators3.1.4 Operators
Use <- for assignment, or the newer %>% pipe:
data mtcars %>%
select(mtcars, cyl, hp) %>%
mutate(data, cyl = cyl + 2) %>%
mutate(data, hp = hp + 20)
20. Tracking - ImplicitTracking - Implicit
Implicit MLflow run:
library(mlflow)
# Log a parameter (key value pair)
mlflow_log_param("param1", 5)
# Log a metric; metrics can be updated throughout the run
mlflow_log_metric("foo", 1)
mlflow_log_metric("foo", 2)
mlflow_log_metric("foo", 3)
# Log an artifact (output file)
writeLines("Hello world!", "output.txt")
mlflow_log_artifact("output.txt")
Run terminates when the R session finishes or by running:
mlflow_end_run()
Useful when sourcing files.
21. Tracking - ExplicitTracking - Explicit
Explicit MLflow run:
library(mlflow)
with(mlflow_start_run(), {
# Log a parameter (key value pair)
mlflow_log_param("param1", 5)
# Log a metric; metrics can be updated throughout the run
mlflow_log_metric("foo", 1)
mlflow_log_metric("foo", 2)
mlflow_log_metric("foo", 3)
# Log an artifact (output file)
writeLines("Hello world!", "output.txt")
mlflow_log_artifact("output.txt")
})
22. Tracking - SourcesTracking - Sources
mlflow_run("R/tracking.R")
Or adding the following to tracking.R in RStudio 1.2:
# !source mlflow mlflow_run(entry_point = .file)