As Atlassian continues to scale to more and more customers, the demand for our legendary support continues to grow. Atlassian needs to maintain balance between the staffing levels needed to service this increasing support ticket volume with the budgetary constraints needed to keep the business healthy – automated ticket volume forecasting is at the centre of this delicate balance
Automatic Forecasting using Prophet, Databricks, Delta Lake and MLflow
1.
2. PERRY STEPHENSON | SENIOR SOFTWARE ENGINEER | ATLASSIAN
Automatic Forecasting
Creating a robust, fault-tolerant, auditable and reproducible forecasting
pipeline
3. Empowered End User
Building pipelines with end-user tooling means
I can Get S#!* Done™ all by myself
Reproducibility
Lots of people talk about it.
I did something about it.
I ❤ Databricks
It makes my life easier and makes me look
good in front of colleagues and managers
Why am I
speaking?
Perry Stephenson
ATLASSIAN
20. DELTA LAKE GIVES YOU
VERSIONED TABLES FOR
REPRODUCIBLE DATA
SCIENCE
21.
22. merge into zone_myteam.my_table as existing
using my_latest_data as new
on existing.date_name = new.date_name
and existing.platform = new.platform
and existing.customer_region = new.customer_region
when matched then
update set *
when not matched then
insert *
23.
24.
25. Update
Modelling
Dataset
Support Ticket Forecasting Pipeline
- Creates a new version every time it
runs
- Latest version is always the most
accurate
- Can recover any previous version of
the training dataset
29. Support Ticket Forecasting Pipeline
Train
Forecast
Models
- Written in R, using Facebook Prophet
- Trains a model for every ticket grouping,
stores model + metadata in MLflow
- Takes an argument for “forecast_date”
to allow backfilling, defaults to month
end
44. Score
Forecast
Models
Support Ticket Forecasting Pipeline
- Reads from MLflow (two passes to
recover params), builds an execution
plan
- Scores each forecast, and prepares
aggregates for consumption
- Appends/overwrites results in our data
lake, with history maintained using Delta
- Includes MLflow links for every row in
the forecast table
45. READING FROM MLFLOW
required_forecasts <- mlflow_list_run_infos(experiment_id=611628)
for (i in 1:nrow(required_forecasts)) {
run_details <- mlflow_get_run(required_forecasts$run_uuid[i])
run_params <- run_details$params[[1]]
… … …
# not shown: unpack params and score model
# not shown: weekly/monthly/quarterly aggregations
# not shown: add MLFlow URL
}
# not shown: union and upload all forecasts at once
46. Score
Forecast
Models
Support Ticket Forecasting Pipeline
- Uploads all results to in-memory
temporary table
- Deletes all records from the final table
with the same forecast_date
- Merges changes in to forecast output
table