Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Building a Real-Time Feature Store at iFood
1.
2. Building a Realtime Feature Store
at iFood
Daniel Galinkin
ML Platform Tech Lead
3. Agenda
iFood and AI
What is the iFood mission, and how we use AI
What is a Feature Store
What is a Feature Store, and why it is important to
solve AI problems
How iFood built its Feature Store
How iFood built its Feature Store by
leveraging Spark, Databricks and Delta Tables
5. BIGGEST
FOODTECH IN
LATIN AMERICA
(we’re in Brazil, Mexico and Colombia)
~30 million
orders per month
+800 cities
in all Brazilian states
+100 thousand
restaurants
6. AI Everywhere
▪ Restaurants
recommendations
▪ Dishes recommendations
▪ Optimize the drivers
allocation
▪ Estimate the delivery time
▪ Find the most efficient route
LogisticsDiscovery
▪ Optimize the use of
marketing ads
▪ Optimize the use of
coupons
Marketing
9. What are features?
▪ Any kind of data used to train a ML
model
▪ Feature types:
▪ State features
▪ Did the user have a coupon at the time?
▪ Aggregate features
▪ Average ticket price in the last 30 days for the user
▪ External features
▪ Was it raining at the time?
10. What is a feature store?
▪ The feature store is the central place in an
organization to query for features
▪ Features are mostly used by machine learning
algorithms
▪ They can also be useful for other applications
▪ For example, you could use the average ticket price for a user to show a high end or low end
list of restaurants
11. Feature store requirements
▪ General:
▪ Low latency
▪ Access & Calculation
▪ Access control
▪ Versioning
▪ Scalability
▪ Easy API for data access
▪ Machine Learning:
▪ Backfilling
▪ “Time-travel” - snapshot for historical feature values
13. Feature Store
Aggregation Service
iFood Software Architecture
Streaming as a first-class citizen
Orders
Microservice
Payments
Microservice
Fleet location
Microservice
Sessions
Microservice
Coupons
Microservice
Notifications
Microservice
Real-time
Data Lake
Feature
Store
Realtime events
Central Bus
14. iFood Real-time Data Lake Architecture
▪ Kafka storage is expensive
▪ Retention is limited
▪ Full event history enables
recalculation and backfilling for
features
▪ Delta tables provide a cheap
storage option
▪ Delta tables can double as either
batch or streaming sources
Realtime events
Central Bus
Data Lake Streaming Jobs
Data Lake
Streaming
Delta Table
15. iFood Feature Store Architecture
Kafka Bus
Real-time
Redis
Storage
Data Lake
Streaming
Delta Table
DynamoDB
Metadata
Aggregation Jobs
Aggregation Jobs
Aggregation Jobs
Historic Backfilling Jobs
Historic Backfilling Jobs
Historic Backfilling Jobs
Historic
Materialization Job
Real-time
Materialization Job
Historic
Delta Table
Storage
16. iFood Feature Store Architecture
Kafka Bus
Data Lake
Streaming
Delta Table
Aggregation Jobs
Aggregation Jobs
Aggregation Jobs
Real-time
Redis
Storage
DynamoDB
Metadata
Historic Backfilling Jobs
Historic Backfilling Jobs
Historic Backfilling Jobs
Historic
Materialization Job
Real-time
Materialization Job
Historic
Delta Table
Storage
The aggregation jobs
17. iFood Feature Store Architecture
The aggregation jobs
▪ Features are usually combinations of:
▪ Source - orders stream
▪ Window range - last 30 days
▪ Grouping key - by each user
▪ Value - ticket price
▪ Filter - during lunch
▪ Aggregation type - average
Kafka Bus
Data Lake
Streaming
Delta Table
Aggregation Jobs
Aggregation Jobs
Aggregation Jobs
18. iFood Feature Store Architecture
The aggregation jobs
▪ With spark streaming, you can
only execute one group by
operation per dataframe/job
▪ Each combination of grouping
key and window range results in a
new dataframe
▪ That means increased costs and
operational complexity
ordersStreamDF
.groupBy(col("user_id"), window(col("order_ts"), "1 day"))
.agg(sum("ticket"))
ordersStreamDF
.groupBy(col("user_id"), window(col("order_ts"), "3 days"))
.agg(sum("ticket"))
ordersStreamDF
.groupBy(col("user_id"), window(col("order_ts"), "7 days"))
.agg(sum("ticket"))
19. iFood Feature Store Architecture
The aggregation jobs
▪ We store the intermediate state for several
aggregation types for a fixed smaller window
▪ We then combine the results to emit the result
for several window sizes at once
▪ This also allows us use the same code and the
same job to calculate historical and real-time
features
20. iFood Feature Store Architecture
The aggregation jobs - Two-step aggregation logic
Orders
Streaming
Source
D-6
1
D-5
2
D-4
3
D-3
0
D-2
1
D-1
1
D-0
2
D-6
1
D-5
2
D-4
3
D-3
0
D-2
1
D-1
1
D-0
2
D-6 to D-4
6
D-5 to D-3
5
D-4 to D-2
4
D-6 to D-0
10
D-3 to D-1
2
D-2 to D-0
4
1 day windows
3 days windows
7 days windows
21. iFood Feature Store Architecture
The aggregation jobs
▪ How to express that?
▪ flatMapGroupsWithState
▪ Flexibility on storing state and
expressing calculation logic
▪ That allows us to combine
dozens of jobs into one
def combineAggregations(
sourceDF: DataFrame,
groupByKeys: Seq[String],
windowStep: Long,
combinationRules: Seq[CombinationRule]): DataFrame = {
putStateAndOutputPlaceholdersToFitCombinedSchema(df)
.groupByKey(row => combineGroupKeys())
.flatMapGroupsWithState((state, miniBatchIterator) => {
miniBatchIterator.foreach(row => {
if (inputWindowEnd() > newestOutputWindowEnd()) {
moveStateRangeForward()
}
if (inputRowIsInStateRange()) {
firstStepUpdateIntermediateValue()
}
})
combinationRules.foreach(combinationRule => {
secondStepCalculateFinalResultBasedOnIntermediateValues()
})
yieldAnOutputRowBasedOnTheResults()
})
}
22. iFood Feature Store Architecture
The aggregation jobs
Order ID
Customer
ID
Date ...
Customer 1 2020-01-01 ...
Entity Entity ID Date Feat. Name Feat. Value
Customer 1 2020-01-01 NOrders1Day 2
Customer 1 2020-01-01 NOrders3Days 6
Customer 1 2020-01-01 NOrders7Days 10
23. iFood Feature Store Architecture
Kafka Bus
Real-time
Redis
Storage
Historic
Materialization Job
Real-time
Materialization Job
Historic
Delta Table
Storage
Data Lake
Streaming
Delta Table
Aggregation Jobs
Aggregation Jobs
Aggregation Jobs
DynamoDB
Metadata
Historic Backfilling Jobs
Historic Backfilling Jobs
Historic Backfilling Jobs
The materialization jobs
24. iFood Feature Store Architecture
The materialization jobs
▪ Feature update commands are
stored to a kafka topic - think
CDC or log tailing
▪ Update feature F for entity E at row R with value V
▪ Using the Delta Table Storage,
we use MERGE INTO and the
map_concat function to be
flexible
Entity Entity ID Date Feat. Name Feat. Value
Customer 1 2020-01-01
AvgTicket
Price30Days
25.8
Entity Entity ID Date Features Map
Customer 1 2020-01-01
AvgTicketPrice30
Days -> 25.8
Entity Entity ID Date Feat. Name Feat. Value
Customer 2 2020-02-01
NOrders30D
ays
17
Entity Entity ID Date Features Map
Customer 1 2020-01-01
AvgTicketPrice30
Days -> 25.8
Customer 2 2020-02-01
NOrders30Days
-> 17
Entity Entity ID Date Feat. Name Feat. Value
Customer 1 2020-01-01
NOrders30D
ays
3
Entity Entity ID Date Features Map
Customer 1 2020-01-01
AvgTicketPrice30
Days -> 25.8
NOrders30Days
-> 3
Customer 2 2020-02-01
NOrders30Days
-> 17
25. iFood Feature Store Architecture
The materialization jobs
▪ Consumers are free to materialize
them to their database of choice
▪ For ML, we use:
▪ A delta table for historic feature values
▪ A redis cluster for low latency real-time access
Kafka Bus
Real-time
Redis
Storage
Historic
Materialization Job
Real-time
Materialization Job
Historic
Delta Table
Storage
26. iFood Feature Store Architecture
Real-time
Redis
Storage
Historic
Materialization Job
Real-time
Materialization Job
Historic
Delta Table
Storage
Kafka Bus
Data Lake
Streaming
Delta Table
Aggregation Jobs
Aggregation Jobs
Aggregation Jobs
DynamoDB
Metadata
Historic Backfilling Jobs
Historic Backfilling Jobs
Historic Backfilling Jobs
The backfilling jobs
27. iFood Feature Store Architecture
The backfilling jobs
▪ How to calculate features for streaming
data registered before the creation of a
feature?
▪ Use a metadata database to store the
creation time of each feature
▪ Run a backfilling job to create feature
values up to the feature creation
▪ Start the streaming job to emit results
using values that arrive after the
creation date
Kafka Bus
Data Lake
Streaming
Delta Table
Aggregation Jobs
Aggregation Jobs
Aggregation Jobs
DynamoDB
Metadata
Historic Backfilling Jobs
Historic Backfilling Jobs
Historic Backfilling Jobs
28. Lessons learned & Best practices
▪ Delta Tables double as streaming or batch sources
▪ OPTIMIZE is a must for streaming jobs saving to a
Delta Table
▪ Either on auto mode, or as a separate process
▪ When starting a brand new job from a streaming
delta table source, the files reading order is not
guaranteed
▪ This is even more noticeable after running OPTIMIZE (which you should!)
▪ If the events processing order is important for your job, either use Trigger.Once to process the
first historical batch, or process each partition sequentially in order
29. Lessons learned & Best practices
▪ flatMapGroupsWithState is really powerful
▪ State management should be handled with care
▪ foreachBatch is really powerful
▪ Please note it can be triggered on an empty Dataframe, though
▪ Be sure to use correct partition pruning
when using the MERGE INTO operation
▪ Be careful with parameter changes
between job restarts
▪ StreamTest really helps with unit tests,
debugging and raising the bar
30. Positive outcomes
▪ Unified codebase for historical and real-time
features - 50% less code
▪ Unified jobs for historical and real-time
features - from dozens of jobs to around 10
▪ Huge batch ETL jobs are substituted by much
smaller streaming clusters
▪ Though they run 24/7
▪ Delta tables allow for isolation between read
and write operations