The Bosch Center for Artificial Intelligence provides AI services to Bosch’s business units and manufacturing plants. We strive to generate value for our customers by deploying machine learning in their products, services, and processes across different domains such Manufacturing, Engineering, Supply Chain Management, as well as Intelligent Services.
4. Robert Bosch – a worldwide leading IoT Company
268
Manufacturing
sites
1000s
Assembly lines
409,881
Associates
60 Countries
460 Local
subsidiaries
Four business sectors
Mobility Solutions Industrial
Technology
Energy & Building
Technology
Consumer Goods
Sunnyvale
Pittsburgh
Renningen
Tubingen
Haifa
Bangalore
Shanghai
Bosch Center for Artificial Intelligence
6. Manufacturing Analytics using Spark
Self-Serve Analytics Pipeline
• Automate data pipelining and preparation
• Centralize data storage across assembly
lines and plants
• Scalable compute and storage resources
• Standard analytics dashboards
• Self-service analysis
• Advanced analytics tools like Root
cause analysis
Data Preparation Root Cause Analysis
Apache Impala
Tableau Extracts
Hadoop File System
Bosch
Manufacturing
Plants
Kafka
Tableau Server
7. Manufacturing Analytics using Spark
Why are parts failing quality checks?
Process 1
Process 2
Process 5Process 4Process 3
Potential root causes
• Measured process
parameters
• Machine
configurations
• Tools and
components used
• Locations visited
Target of interest
Identify quality test
failures for certain parts.
8. Manufacturing Analytics using Spark
Root Cause Analysis: Modules
Part graph
generation
Feature
extraction
Feature matrix
generation
Root cause
modeling
Assembly process of
every unique part is
represented as a graph.
Features are extracted
from the part graph.
Target variables are
mapped to features.
Statistical models are
applied to extract
potential root causes.
9. Parameters
Tests
Tools etc.
Parameters
Tests
Tools etc.
Parameters
Tests
Tools etc.
Parameters
Tests
Tools etc.
Manufacturing Analytics using Spark
Root Cause Analysis: Sample code
PART_ID PART_GRAPH
B6788098
FF556828
A6678B34
LOC 1 LOC 2 LOC 3 LOC 4
Sample part graph
Part graphs
PART_ID FEATURES
B678809
8
[f1, f2]
FF55682
8
[f1, f2, f3, f4]
A6678B3
4
[f2, f3]
Features
Feature extractor
11. Manufacturing Analytics using Spark
▪ The volume of computations needed to identify root causes on a monthly
basis:
Root Cause Analysis: Computational Complexity
Total assembly lines:
~ 10000
Avg. # of parts produced
(per assembly line):
~ 2 Million
Avg. # of data records in HDFS
(per assembly line) : ~ 30 Billion
12. Manufacturing Analytics using Spark
Root Cause Analysis: The Challenge
Feature matrix generation
PART_ID FEATURES
B6788098 [f1, f2]
FF556828 [f1, f2, f3, f4]
A6678B34 [f2, f3]
PART_ID FEATURES
B6788098 [g1]
FF556828 [g1, g5, g6]
A6678B34 [g1, g2]
X =
DEPENDENT INDEPENDENT
f1 [ [g1],
[g1],
[g1] ]
f2 [ [g1,None],
[g1, None],
[g1, g2] ]
f3 [ [None, None],
[g5, g6],
[None, None] ]
• How to scale feature matrix
generation for products with
increasing volumes.
• Replaced loops with python
functional constructs like:
map, filter, reduce and partial
functions
Challenge Solution
7 hours
2 hours
Before After
14. Large Scale Forecasting using Spark: Background
and Motivation
▪ Collaboration between
controllers,
programmers, data
engineers, and data
scientists
• Automatically generate
sales forecasts
• Increase efficiency,
objectivity, and accuracy
• Improve financial decision
making for Bosch
GoalTeam
• Monthly forecast of KPIs
(>300.000 time series;
target 3-4M time series)
• Combination of +15 cutting-
edge mathematical models
(with two different data
transformations) in one tool
• Automated model
selection and hierarchically
consistent forecasts
Results
15. Large Scale Forecasting using Spark
15+ companies under the Bosch
group
• Each company has specific business
structure
• First application is for revenue forecasting
• Revenue can be broken down by customer,
product, region, and business divisions
Scale of the task
• Forecasts are needed monthly,
immediately after the month-closing
calculations.
Task: Millions of forecasts within a
few hours
• Assume we have 1 million time series
• 5 models per time series 5M forecasts
• ~5 seconds per model Compute time of
15M seconds
• 1000s cores needed
16. Large Scale Forecasting Using Spark
Technical Architecture
1. Create
Hierarchical Time
Series
3. AI based Time
Series Forecast
4 Consolidate
Hierarchy
2. Automated
Model Selection
using AI
Traditional Models Hybrid Models
Hierarchical
Models
State Space
Models
Kubernetes
17. Large Scale Forecasting using Spark
▪ The task is embarrassingly parallelizable!
Why R?
Latest and most popular models for forecasting are published in R.
• We can utilize these packages via user defined functions in Spark.
Why Spark?
Each core can receive one
time series and the names
of the models to be applied.
Compute forecasts.
Return the combined
results back to master
node.
18. Large Scale Forecasting using Spark
▪ Sparklyr
▪ Accepts data frames
▪ Returns data frames
Sparklyr vs. SparkR
▪ SparkR
▪ Accepts data frames or lists
▪ Returns data frames or lists
▪ More flexibility
Sparklyr UDF API
spark_apply
Applies a function to
each row or group of
SparkDataFrame
spark_apply()
19. Large Scale Forecasting using Spark
▪ User-defined functions (UDFs) in SparkR
via spark.lapply ()
▪ UDF over lists are more flexible
▪ Enables the change of modeling and use of
heterogeneous data without a lot of change to the
overall architecture
▪ Use SparkR::spark.addFile for sending
files needed in all executors
▪ SparkR::spark.lapply () fails when we have
a list with more than ~46k+ elements
(solved in JIRA Issue: [SPARK-25234])
Spark – lessons learned
20. Large Scale Forecasting using Spark
Performance Gains
*computation time for 1893 time series
21. Thank you!
Abhirup Mallik (Bosch)
Abishek Prasanna (Bosch)
Jeff Thompson (Bosch)
Kasia Vitanachy (Bosch)
Lisa Marion Garcia (Bosch)
Matthew Jones (Bosch)
Nicolas Douard (Virtue Foundation)
Patrick Emmerich (Bosch)
Phil Gaudreau (LinkedIn)
Ruobing Chen (Facebook)
Sascha Vetter (Bosch)
Zichu Li (University of Rochester)