SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Serverless machine learning architectures at Helixa
Data Science Milan meetup
15th December, 2020
Gianmario Spacagna, Luc Mioulet
AI team at Helixa
About Us
Gianmario Spacagna
@gm_spacagna
MBA, MSc in Software Engineering of
Distributed Systems
Chief Scientist, Helixa
Luc Mioulet
@lmioulet
PhD Signal Processing
Machine Learning Engineer, Helixa
In the next hour you will learn
about
1. Overview of serverless services in AWS
2. The Helixa ML system powering a platform used
by thousands of marketers around the globe
3. Map/reduce serverless architectures
Cloud Providers Disclaimer
The following examples will focus on AWS stack but consider that other cloud providers offers
similar services.
It is not part of this talk to compare different cloud solutions.
The content of this talk were
updated a month ago, before the
recent changes introduced by
Amazon
AWS Disclaimer
Serverless:
How to build and run applications without
thinking about servers
Dynamic allocation of resources by the cloud
provider
Traditional Serverful Way:
Serverless Way:
Source: https://serverless-stack.com/chapters/what-is-serverless.html
Philosophy behind serverless
"If a tree falls in a forest and no one is around
to hear it, does it make a sound?"
“If a server runs in the cloud and no
one is around to use it, does it need to
incur any costs?”
WinterClouds
Major serverless services available in AWS
Docker container
execution.
Script execution in
response of events.
Full list available at https://aws.amazon.com/serverless/
Orchestration of
components and
microservices
Queuing +
publisher/subscriber
message services.
NoSQL Key-Value
database.
REST API
management
service.
Query service to
analyze data at scale
using standard SQL
(like PrestoDB).
ETL service to crawl and
process large datasets on a
fully managed Spark
environment.
Lambda function:
listing files in a specified S3 directory
Event object Result objectPython script
Lambda cost: $1.04 / million requests
S3 LIST request cost: $5 / million
Serverless.com application framework
Hybrid solution for:
Benefits of Serverless architectures
Secure Scalable Cheap
Always available Worry free Low maintenance
The Helixa Market Research Platform
About Helixa
Helixa is an audience
intelligence platform that
uses Machine Learning to
provide accurate, and
timely, consumers
insights for modern
market research
Audience
:
Size: 1.5M / 223M represented population
François CholletBen Hamner George Hotz
Top Influencers
201x 114x 106x
Cifar News
Top Media
The Hacker News AngelList
65x 31x 28x
Tensorflow
Top Products and Companies
Waymo Airbnb Engineering
107x 66x 55x
Demographics
18-40 years old
Male
U.S. and India
Platform Requirements
Multiple Datasets Accurate consumers insights Real-time analytics quickly
Always available Minimum infrastructure
maintenance
Cost effective
Helixa ML System Overview
Helixa end-to-end pipeline
Insights Engine
Other Analytics
Tools
Audience
Projection
Real-time
analytics
applications
Common
Data Model
Data
Processing Data IntegrationsData
Contents
Embedding
Entity
Resolution
Taxonomy
Categorization
Users Digital
DNA
Traits
Classifiers
Latent Interests
Augmentation
Machine Learning
jobs
Helixa architecture
Data Ingestions
ML Cloud Services
Pre-trained models External APIs
ML LibrariesML pipelines
Model repository
Production
DB
Microservices
Data Lake
Batch Jobs
Analytics
applications
Batch inference
Model repository and evaluation metrics
Training and hyper-parameters tuning
Analysis and Research
ML libraries
Data Labeling
Feature Store
Feature Engineering
Data Lake
Tech stack and tools
In this talk we
will focus on
The Data Lake(house)
Native Cloud Object (Data) Storage
Benefits:
● Cheaper
● Elastic
● Highly available
● Performant
Hadoop HDFS
Artifacts are saved in S3 and crawled by Glue
Athena is used to build logical views on top of them such as:
â–Ș Retrieve the latest version of the artifact
â–Ș Aggregate multiple partitions of the same artifact
â–Ș Filter and merge with other tables
â–Ș Export snapshot of the views as versioned parquet datasets
Data Lake(house) using Glue and Athena
Feature Store Partitions (X)
S3 bucket
❏ users
❏ features
❏ feature_family=text_embedding
❏ timestamp=2020-10-14-12-58
❏ _metadata.json
❏ part000.parquet
❏ part001.parquet
❏ 

❏ timestamp=2020-09-18-18-35
❏ ...
❏ feature_family=picture_embedding
❏ ...
❏ feature_family=category_counts
❏ ...
❏ items
❏ other entities
Parquet data indexed by user_id
Metadata containing info on how
the features were created
Partition by set of features
generated by the same job
Creation time
Label Store Partitions (y)
S3 bucket
❏ users
❏ labels
❏ variable=gender
❏ source=first_name
❏ timestamp=2020-10-14-12-58
❏ _metadata.json
❏ part000.parquet
❏ part001.parquet
❏ 

❏ source=public_profile
❏ ...
❏ variable=age
❏ items
❏ other entities
Partition by the variable we are
trying to predict
Partition by the source of ground
truth
Label management for weak learning done via
Prediction Store Partitions (y_pred)
S3 bucket
❏ users
❏ predictions
❏ variable=gender
❏ model=xgbc
❏ timestamp=2020-11-05-17-22
❏ _metadata.json
❏ part000.parquet
❏ part001.parquet
❏ 

❏ model=cnn
❏ ...
❏ variable=age
❏ ...
❏ items
❏ other entities
Partition by the identifier of the
model used to predict
The Development
Platforms for managing the ML lifecycle
● Training
● Predictions
● Model serving
● Model repository
● Experiments
tracking
● Evaluation metrics
Production
● Dev data
versioning and
linkage
● Automated
evaluation reports
● Collaborative
experiments
● Deep Learning
computing
environment
R&D
R&D workflow
Pull
Notebooks and data stored
and shared in S3
Data cache
Dev unix
machine in
the cloud
Notebook name matching branch ID
Install the latest version of
the code
Develop code locally using
professional IDEs
Feature branches
matching Jira key
Gitflow
branching model
Commit and
push
EC2 memory-optimized machines (r4 or r5 family)
EBS volume of 250GB of storage
Alluxio and Jupyter services to start at boot time
200GB reserved for the Alluxio cache
S3 buckets mounted locally in --readonly mode using fuse API
Read parquet data in multi-processing using Dask directly from the local file system instead of
using the S3 boto API
cache configuration
Research & Development data: ~1TB
We only focus on 15% of data every month (~150GB)
Re-access of the data for every kernel restart (~5 times a day)
Data science team members (~5 people)
Datasets spread into files of ~120MB each
=> roughly 1.2k files and 500k read requests every month
We observed a speed-up between 3x to 5x using Alluxio
+ all of the benefits of accessing the S3 data from the POSIX API
benefits for the R&D
Processing large datasets with EMR
Picture source: https://dimensionless.in/different-ways-to-manage-apache-spark-applications-on-amazon-emr/
Ephemeral clusters on spot instances can dramatically reduce the cost of operations
+ SparkMagic
SUBMIT JOB
The Deployment
Automate code with a task-oriented
containerized jobs
Picture source: https://medium.com/@davidstevens_16424/make-my-day-ta-science-easier-e16bc50e719c
All of the analysis findings are moved into a production-quality
modules and entry points declared in makefiles for tasks such as:
● Data preparations
● Feature extractions
● Model selection / tuning
● Evaluations
● Model Inference
● Predictions post-processing
Automate tasks execution using Continuous
Integration (CI)
Picture source: https://deploybot.com/blog/the-expert-guide-to-continuous-integration
On commit
Code tests
Evaluation reportsBuilds & Deployment
On release
Automated code testing pyramid
Unit tests
● Single methods of data
processing utils and
major components
● Replace “assertEqual”
with uncertainty ranges
on predictions
70%
Integration tests
● Black-box testing single
jobs
● Subset of component
integrations (e.g.
transformers followed
by model predictions)
20%
End-to-end tests
● Static and small dataset
● Dry runs of the execution
plan
● Check APIs work
seamlessly through every
stage of the pipeline
10%
Blue/Green Deployment
GREEN
INFRASTRUCTURE
BLUE
INFRASTRUCTURE
PRODUCTION APIs
1. Deploy new
release
2. Update pointer
and reset state
No downtime, safe production rollback, and easy A/B testing
Different model prediction serving patterns,
different architectures
Offline training Online training
On-demand Microservices,
REST API
Real time
streaming analysis
Online Learning
Batch Batch AutomatedML
We will only cover serverless offline training patterns in this presentation.
Batch model serving: Embarrassingly parallel
data processing with AWS batch
Source: https://spotinst.com/blog/cost-efficient-batch-computing-on-spot-instances-aws-batch-integration/
JobsData batches
~ a few GBs
each
Output storage
Model serving via microservices
SERVERLESS CHOICE
Cheap and simple solution for
deploying containers without have
to care about the infrastructure
Limits as of today:
Max 4 vCPUs and 30GB of RAM
OR
SERVERFUL CHOICE
Advanced, customizable, powerful,
widespread solutions for containers
orchestration on pools of EC2
instances
Requires infrastructure management
AWS EC2
How do containers scale for real-time
varying requests load?
Number of requests per second
capacity
unexpected sudden burst
Over-provisioning cost
Training pipeline
Real-time serverless model serving
Lookup user
and model info
Get users
features
trigger
Update
metainfo
and configs
REST
request
Get model
Package requirements
EFS
read libraries
predictionsreturn
save model
Build and deploy
Comparison for real-time applications
Horizontal scaling Autoscaling rules based on predicted
load and capacity
Elastic, based on real-time demand
Provisioning time Minutes Immediately or seconds if cold start
Burst concurrency Depends on available resources 3000 + additional 500 every minute
Cost efficiency Pay for the over-provisioning Only pay for what you use (10x
cheaper in our use cases)
Vertical scaling Limited by instance types Limited to 3GB and 2 CPUs
Execution timeout Unlimited 15 minutes
Pick the best of both worlds
Orchestrating functions and microservices
with Step Functions
Workflows defined as a finite states
machine and plug-and-play integration
with most of the AWS services:
AWS Batch ECS
Sagemaker
Monitoring and Alerting
Data sanity checks
What to check:
- Value ranges
- Null values
- Anomalies
- Data distribution
Tools:
- Apache Griffin
- Amazon deequ
- Great expectations
Centralized logging with the ELK stack
Generate Logs Aggregation &
Transformation
Storage & Indexing Visualization & Analysis
Infrastructure Monitoring and Alerting
Basic Monitoring
AWS resources and custom
metrics generated by your
applications and services
General Infra Monitoring
Cloud-scale monitoring of
logs, metrics and traces from
distributed, dynamic and
hybrid infrastructure.
Serverless Monitoring
All-in-one performance
management tool down to
the single lines of code
specifically designed for
serverless applications.
KPIs and Metrics Dashboard
KPIs over time such as:
● Distribution shifts
● Model drift
● Utilization
● Coverage
Analytics dashboard on top of
athena SQL queries
Custom programmatic dashboards
with interactive charts
Design patterns for Map/Reduce in
serverless fashion
MapReduce with PyWren futures
PyWren will serialize and ship local Python code to be executed in lambda functions in the cloud
and return the list of deserialized results back to the driver
MapReduce with SFN parallel sync tasks
* A single Lambda function only supports up to 10 concurrent executions when invoked synchronously
As many parallel mappers we want
but
Maximum 10 concurrent synchronous
lambda invocations!
MapReduce with SFN queue polling
...
Mapper2
Mapper1
Mapper n
SQS queue
Poll the queue
Driver
* StepFunctions has a limit of 1000 transitions/second and a max execution history size of 25k events.
No limitations on async
lambda invocations
but
max 1000 transitions/second
max 25k events in the
execution history
MapReduce with SFN activity callbacks
Source: https://semantive.com/part-2-asynchronous-actions-within-aws-step-functions-without-servers/
...
...
mapper1 mapper n
Get activity token and wait for
mapper activity to complete
Start mapper activity asynchronously
with the corresponding token
Send activity task success
s
driver
unlimited parallel executions
without limits
MapReduce with S3 events
No limitations on async
lambda invocations
but
Relies on IO side effects
MapReduce with DynamoDB events
job_id task_id task_status task_type task_depe
ndencies
function_name payload
mr_example init lambda [] lambda_init {}
mr_example map_1 lambda [init] lambda_map {input_path: chunk1,
output_path: dir}
mr_example map_2 lambda [init] lambda_map {input_path: chunk2,
output_path: dir}
mr_example reduce lambda [map_1,
map_2]
lambda_reduce {dir_path: dir}
New or update events will trigger
Coordinator
InitUpdate
Map_1
Update
Map_2
Update
Reduce
Update
submitted
submitted
submitted
submitted
completed
completed
completed
completed
No limitations on
async invocations
but
Dynamodb
read/write
throttled
Job metadata
limited to 400kb
Job DynamoDB table
Fill with job meta-information and
dependencies
Run job entry point
(external service)
Job completed callback
Final Remarks
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems
Only a small fraction of real-world ML systems is composed of the ML Code.
The required surrounding infrastructure is vast and complex.
Facebook new motto in 2014Facebook original motto
Different tools
Embrace the serverless paradigm
Download the Non-Technical Guide
Topics covered:
✅ Getting started with understanding the technology
✅ Designing the right ML product
✅ Planning under uncertainty
✅ Building a balanced ML team
www.helixa.ai/machine-learning-guide-2020
Software and Machine Learning Engineer
We are looking for engineers:
● Interested in deploying ML to production
● Willing to learn about cloud, code optimization and serverless
technologies
Requisites:
● Bachelor’s degree or above in computer science or
software/computer/IT engineering fields.
● Knowledge of Pydata stack
● Knowledge of SQL
● Understanding of ML concepts
Contact: lmioulet@helixa.ai
Q&A
Appendix A:
Summary Steps
Steps to Managing the ML Product Lifecycle
1. Familiarize with the whole lifecycle and most popular tools and libraries.
2. Adopt a platform such as MLflow to track and version models and experiments.
3. Notebooks are good for explorations but the implementation should be in a codebase.
4. Make analysis, code and infrastructure, reproducible and avoid manual operations.
5. Communicate analysis results effectively summarizing only what is relevant.
6. Invest on automated tests at different integration levels.
7. Exploit Continuous Integration (CI) for automating builds and releases.
8. Deliver models and components inside Docker containers, when possible.
9. Centralize the logs collection for debugging and troubleshooting.
10.Monitor the infrastructure health using specific tools.
11.Consider a strategy for implementing Governance and Auditability.
Steps to migrate to Serverless architectures
1. Reverse Conway’s law: “Organizations produce software that resemble their organizational
communication structures”.
2. Divide your architecture in separate and simple services.
3. Adopt the serverless.com framework to make easier to develop lambda functions.
4. Pick the most suitable serverless MapReduce architecture for your needs.
5. Enjoy your team having fun with simplified and scalable deployments.
6. Make a report to your boss showing the consistent amount of saved costs.

Weitere Àhnliche Inhalte

Was ist angesagt?

Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksMatei Zaharia
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...Databricks
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4Sid Anand
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...Amazon Web Services
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Databricks
 
Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...DataWorks Summit/Hadoop Summit
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...Databricks
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta LakeDatabricks
 
Large Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On SparkLarge Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On SparkJen Aman
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaDatabricks
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Databricks
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsDatabricks
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservicesBigstep
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?DataWorks Summit
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...Spark Summit
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIDatabricks
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Spark Summit
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...DataWorks Summit
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 

Was ist angesagt? (20)

Lessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at DatabricksLessons from Large-Scale Cloud Software at Databricks
Lessons from Large-Scale Cloud Software at Databricks
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep... Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
Cloud Computing Was Built for Web Developers—What Does v2 Look Like for Deep...
 
Svccg nosql 2011_v4
Svccg nosql 2011_v4Svccg nosql 2011_v4
Svccg nosql 2011_v4
 
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
AWS Partner Webcast - Hadoop in the Cloud: Unlocking the Potential of Big Dat...
 
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
Accelerating Deep Learning Training with BigDL and Drizzle on Apache Spark wi...
 
Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...Real time, streaming advanced analytics, approximations, and recommendations ...
Real time, streaming advanced analytics, approximations, and recommendations ...
 
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
New Developments in the Open Source Ecosystem: Apache Spark 3.0, Delta Lake, ...
 
Hyperspace for Delta Lake
Hyperspace for Delta LakeHyperspace for Delta Lake
Hyperspace for Delta Lake
 
Large Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On SparkLarge Scale Multimedia Data Intelligence And Analysis On Spark
Large Scale Multimedia Data Intelligence And Analysis On Spark
 
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks DeltaEnd-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
End-to-End Spark/TensorFlow/PyTorch Pipelines with Databricks Delta
 
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
Continuous Applications at Scale of 100 Teams with Databricks Delta and Struc...
 
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUsChoose Your Weapon: Comparing Spark on FPGAs vs GPUs
Choose Your Weapon: Comparing Spark on FPGAs vs GPUs
 
Data Lake and the rise of the microservices
Data Lake and the rise of the microservicesData Lake and the rise of the microservices
Data Lake and the rise of the microservices
 
When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?When OLAP Meets Real-Time, What Happens in eBay?
When OLAP Meets Real-Time, What Happens in eBay?
 
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...How to Boost 100x Performance for Real World Application with Apache Spark-(G...
How to Boost 100x Performance for Real World Application with Apache Spark-(G...
 
Enancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AIEnancing Threat Detection with Big Data and AI
Enancing Threat Detection with Big Data and AI
 
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East ta...
 
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...Quick! Quick! Exploration!: A framework for searching a predictive model on A...
Quick! Quick! Exploration!: A framework for searching a predictive model on A...
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 

Ähnlich wie Serverless machine learning architectures at Helixa

GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsGianmario Spacagna
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...HostedbyConfluent
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on DatabricksDataScienceConferenc1
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsMĂĄrton Kodok
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentDatabricks
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous deliveryAnirudh Bhatnagar
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Kai WĂ€hner
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Riccardo Zamana
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Replyconfluent
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo
 
DataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopDataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopAmazon Web Services
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated MLMark Tabladillo
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSSri Ambati
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Sri Ambati
 

Ähnlich wie Serverless machine learning architectures at Helixa (20)

GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Tech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning productsTech leaders guide to effective building of machine learning products
Tech leaders guide to effective building of machine learning products
 
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
Considerations for Abstracting Complexities of a Real-Time ML Platform, Zhenz...
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
[DSC Europe 23] Petar Zecevic - ML in Production on Databricks
 
Vertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflowsVertex AI: Pipelines for your MLOps workflows
Vertex AI: Pipelines for your MLOps workflows
 
Infrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload DeploymentInfrastructure Agnostic Machine Learning Workload Deployment
Infrastructure Agnostic Machine Learning Workload Deployment
 
DEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNINGDEVOPS AND MACHINE LEARNING
DEVOPS AND MACHINE LEARNING
 
Path to continuous delivery
Path to continuous deliveryPath to continuous delivery
Path to continuous delivery
 
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
Simplified Machine Learning Architecture with an Event Streaming Platform (Ap...
 
Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020Azure Data Explorer deep dive - review 04.2020
Azure Data Explorer deep dive - review 04.2020
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0201906 02 Introduction to AutoML with ML.NET 1.0
201906 02 Introduction to AutoML with ML.NET 1.0
 
NextGenML
NextGenML NextGenML
NextGenML
 
DataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT WorkshopDataPalooza: ML & IoT Workshop
DataPalooza: ML & IoT Workshop
 
201908 Overview of Automated ML
201908 Overview of Automated ML201908 Overview of Automated ML
201908 Overview of Automated ML
 
Accelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWSAccelerate ML Deployment with H2O Driverless AI on AWS
Accelerate ML Deployment with H2O Driverless AI on AWS
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session Bring Your Own Recipes Hands-On Session
Bring Your Own Recipes Hands-On Session
 

Mehr von Data Science Milan

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital paymentsData Science Milan
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansData Science Milan
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsData Science Milan
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companiesData Science Milan
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIData Science Milan
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSData Science Milan
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraData Science Milan
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraData Science Milan
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AIData Science Milan
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...Data Science Milan
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaData Science Milan
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharData Science Milan
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoData Science Milan
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep LearningData Science Milan
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Data Science Milan
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...Data Science Milan
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyData Science Milan
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...Data Science Milan
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by CervedData Science Milan
 
Data science for smart manufacturing at Pirelli
Data science for smart manufacturing at PirelliData science for smart manufacturing at Pirelli
Data science for smart manufacturing at PirelliData Science Milan
 

Mehr von Data Science Milan (20)

ML & Graph algorithms to prevent financial crime in digital payments
ML & Graph  algorithms to prevent  financial crime in  digital paymentsML & Graph  algorithms to prevent  financial crime in  digital payments
ML & Graph algorithms to prevent financial crime in digital payments
 
How to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plansHow to use the Economic Complexity Index to guide innovation plans
How to use the Economic Complexity Index to guide innovation plans
 
Robustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning MethodsRobustness Metrics for ML Models based on Deep Learning Methods
Robustness Metrics for ML Models based on Deep Learning Methods
 
"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies"You don't need a bigger boat": serverless MLOps for reasonable companies
"You don't need a bigger boat": serverless MLOps for reasonable companies
 
Question generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AIQuestion generation using Natural Language Processing by QuestGen.AI
Question generation using Natural Language Processing by QuestGen.AI
 
Speed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWSSpeed up data preparation for ML pipelines on AWS
Speed up data preparation for ML pipelines on AWS
 
Reinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del PraReinforcement Learning Overview | Marco Del Pra
Reinforcement Learning Overview | Marco Del Pra
 
Time Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del PraTime Series Classification with Deep Learning | Marco Del Pra
Time Series Classification with Deep Learning | Marco Del Pra
 
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AILudwig: A code-free deep learning toolbox | Piero Molino, Uber AI
Ludwig: A code-free deep learning toolbox | Piero Molino, Uber AI
 
Audience projection of target consumers over multiple domains a ner and baye...
Audience projection of target consumers over multiple domains  a ner and baye...Audience projection of target consumers over multiple domains  a ner and baye...
Audience projection of target consumers over multiple domains a ner and baye...
 
Weak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina KhvatovaWeak supervised learning - Kristina Khvatova
Weak supervised learning - Kristina Khvatova
 
GANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex HoncharGANs beyond nice pictures: real value of data generation, Alex Honchar
GANs beyond nice pictures: real value of data generation, Alex Honchar
 
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo LomonacoContinual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
Continual/Lifelong Learning with Deep Architectures, Vincenzo Lomonaco
 
3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning3D Point Cloud analysis using Deep Learning
3D Point Cloud analysis using Deep Learning
 
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...Deep time-to-failure: predicting failures, churns and customer lifetime with ...
Deep time-to-failure: predicting failures, churns and customer lifetime with ...
 
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
50 Shades of Text - Leveraging Natural Language Processing (NLP), Alessandro ...
 
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data ReplyPricing Optimization: Close-out, Online and Renewal strategies, Data Reply
Pricing Optimization: Close-out, Online and Renewal strategies, Data Reply
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
 
A view of graph data usage by Cerved
A view of graph data usage by CervedA view of graph data usage by Cerved
A view of graph data usage by Cerved
 
Data science for smart manufacturing at Pirelli
Data science for smart manufacturing at PirelliData science for smart manufacturing at Pirelli
Data science for smart manufacturing at Pirelli
 

KĂŒrzlich hochgeladen

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

KĂŒrzlich hochgeladen (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Serverless machine learning architectures at Helixa

  • 1. Serverless machine learning architectures at Helixa Data Science Milan meetup 15th December, 2020 Gianmario Spacagna, Luc Mioulet AI team at Helixa
  • 2. About Us Gianmario Spacagna @gm_spacagna MBA, MSc in Software Engineering of Distributed Systems Chief Scientist, Helixa Luc Mioulet @lmioulet PhD Signal Processing Machine Learning Engineer, Helixa
  • 3. In the next hour you will learn about 1. Overview of serverless services in AWS 2. The Helixa ML system powering a platform used by thousands of marketers around the globe 3. Map/reduce serverless architectures
  • 4. Cloud Providers Disclaimer The following examples will focus on AWS stack but consider that other cloud providers offers similar services. It is not part of this talk to compare different cloud solutions.
  • 5. The content of this talk were updated a month ago, before the recent changes introduced by Amazon AWS Disclaimer
  • 6. Serverless: How to build and run applications without thinking about servers
  • 7. Dynamic allocation of resources by the cloud provider Traditional Serverful Way: Serverless Way: Source: https://serverless-stack.com/chapters/what-is-serverless.html
  • 8. Philosophy behind serverless "If a tree falls in a forest and no one is around to hear it, does it make a sound?" “If a server runs in the cloud and no one is around to use it, does it need to incur any costs?” WinterClouds
  • 9. Major serverless services available in AWS Docker container execution. Script execution in response of events. Full list available at https://aws.amazon.com/serverless/ Orchestration of components and microservices Queuing + publisher/subscriber message services. NoSQL Key-Value database. REST API management service. Query service to analyze data at scale using standard SQL (like PrestoDB). ETL service to crawl and process large datasets on a fully managed Spark environment.
  • 10. Lambda function: listing files in a specified S3 directory Event object Result objectPython script Lambda cost: $1.04 / million requests S3 LIST request cost: $5 / million
  • 12. Benefits of Serverless architectures Secure Scalable Cheap Always available Worry free Low maintenance
  • 13. The Helixa Market Research Platform
  • 14. About Helixa Helixa is an audience intelligence platform that uses Machine Learning to provide accurate, and timely, consumers insights for modern market research
  • 15. Audience : Size: 1.5M / 223M represented population François CholletBen Hamner George Hotz Top Influencers 201x 114x 106x Cifar News Top Media The Hacker News AngelList 65x 31x 28x Tensorflow Top Products and Companies Waymo Airbnb Engineering 107x 66x 55x Demographics 18-40 years old Male U.S. and India
  • 16. Platform Requirements Multiple Datasets Accurate consumers insights Real-time analytics quickly Always available Minimum infrastructure maintenance Cost effective
  • 17. Helixa ML System Overview
  • 18. Helixa end-to-end pipeline Insights Engine Other Analytics Tools Audience Projection Real-time analytics applications Common Data Model Data Processing Data IntegrationsData Contents Embedding Entity Resolution Taxonomy Categorization Users Digital DNA Traits Classifiers Latent Interests Augmentation Machine Learning jobs
  • 19. Helixa architecture Data Ingestions ML Cloud Services Pre-trained models External APIs ML LibrariesML pipelines Model repository Production DB Microservices Data Lake Batch Jobs Analytics applications
  • 20. Batch inference Model repository and evaluation metrics Training and hyper-parameters tuning Analysis and Research ML libraries Data Labeling Feature Store Feature Engineering Data Lake Tech stack and tools In this talk we will focus on
  • 22. Native Cloud Object (Data) Storage Benefits: ● Cheaper ● Elastic ● Highly available ● Performant Hadoop HDFS
  • 23. Artifacts are saved in S3 and crawled by Glue Athena is used to build logical views on top of them such as: â–Ș Retrieve the latest version of the artifact â–Ș Aggregate multiple partitions of the same artifact â–Ș Filter and merge with other tables â–Ș Export snapshot of the views as versioned parquet datasets Data Lake(house) using Glue and Athena
  • 24. Feature Store Partitions (X) S3 bucket ❏ users ❏ features ❏ feature_family=text_embedding ❏ timestamp=2020-10-14-12-58 ❏ _metadata.json ❏ part000.parquet ❏ part001.parquet ❏ 
 ❏ timestamp=2020-09-18-18-35 ❏ ... ❏ feature_family=picture_embedding ❏ ... ❏ feature_family=category_counts ❏ ... ❏ items ❏ other entities Parquet data indexed by user_id Metadata containing info on how the features were created Partition by set of features generated by the same job Creation time
  • 25. Label Store Partitions (y) S3 bucket ❏ users ❏ labels ❏ variable=gender ❏ source=first_name ❏ timestamp=2020-10-14-12-58 ❏ _metadata.json ❏ part000.parquet ❏ part001.parquet ❏ 
 ❏ source=public_profile ❏ ... ❏ variable=age ❏ items ❏ other entities Partition by the variable we are trying to predict Partition by the source of ground truth Label management for weak learning done via
  • 26. Prediction Store Partitions (y_pred) S3 bucket ❏ users ❏ predictions ❏ variable=gender ❏ model=xgbc ❏ timestamp=2020-11-05-17-22 ❏ _metadata.json ❏ part000.parquet ❏ part001.parquet ❏ 
 ❏ model=cnn ❏ ... ❏ variable=age ❏ ... ❏ items ❏ other entities Partition by the identifier of the model used to predict
  • 28. Platforms for managing the ML lifecycle ● Training ● Predictions ● Model serving ● Model repository ● Experiments tracking ● Evaluation metrics Production ● Dev data versioning and linkage ● Automated evaluation reports ● Collaborative experiments ● Deep Learning computing environment R&D
  • 29. R&D workflow Pull Notebooks and data stored and shared in S3 Data cache Dev unix machine in the cloud Notebook name matching branch ID Install the latest version of the code Develop code locally using professional IDEs Feature branches matching Jira key Gitflow branching model Commit and push
  • 30. EC2 memory-optimized machines (r4 or r5 family) EBS volume of 250GB of storage Alluxio and Jupyter services to start at boot time 200GB reserved for the Alluxio cache S3 buckets mounted locally in --readonly mode using fuse API Read parquet data in multi-processing using Dask directly from the local file system instead of using the S3 boto API cache configuration
  • 31. Research & Development data: ~1TB We only focus on 15% of data every month (~150GB) Re-access of the data for every kernel restart (~5 times a day) Data science team members (~5 people) Datasets spread into files of ~120MB each => roughly 1.2k files and 500k read requests every month We observed a speed-up between 3x to 5x using Alluxio + all of the benefits of accessing the S3 data from the POSIX API benefits for the R&D
  • 32. Processing large datasets with EMR Picture source: https://dimensionless.in/different-ways-to-manage-apache-spark-applications-on-amazon-emr/ Ephemeral clusters on spot instances can dramatically reduce the cost of operations + SparkMagic SUBMIT JOB
  • 34. Automate code with a task-oriented containerized jobs Picture source: https://medium.com/@davidstevens_16424/make-my-day-ta-science-easier-e16bc50e719c All of the analysis findings are moved into a production-quality modules and entry points declared in makefiles for tasks such as: ● Data preparations ● Feature extractions ● Model selection / tuning ● Evaluations ● Model Inference ● Predictions post-processing
  • 35. Automate tasks execution using Continuous Integration (CI) Picture source: https://deploybot.com/blog/the-expert-guide-to-continuous-integration On commit Code tests Evaluation reportsBuilds & Deployment On release
  • 36. Automated code testing pyramid Unit tests ● Single methods of data processing utils and major components ● Replace “assertEqual” with uncertainty ranges on predictions 70% Integration tests ● Black-box testing single jobs ● Subset of component integrations (e.g. transformers followed by model predictions) 20% End-to-end tests ● Static and small dataset ● Dry runs of the execution plan ● Check APIs work seamlessly through every stage of the pipeline 10%
  • 37. Blue/Green Deployment GREEN INFRASTRUCTURE BLUE INFRASTRUCTURE PRODUCTION APIs 1. Deploy new release 2. Update pointer and reset state No downtime, safe production rollback, and easy A/B testing
  • 38. Different model prediction serving patterns, different architectures Offline training Online training On-demand Microservices, REST API Real time streaming analysis Online Learning Batch Batch AutomatedML We will only cover serverless offline training patterns in this presentation.
  • 39. Batch model serving: Embarrassingly parallel data processing with AWS batch Source: https://spotinst.com/blog/cost-efficient-batch-computing-on-spot-instances-aws-batch-integration/ JobsData batches ~ a few GBs each Output storage
  • 40. Model serving via microservices SERVERLESS CHOICE Cheap and simple solution for deploying containers without have to care about the infrastructure Limits as of today: Max 4 vCPUs and 30GB of RAM OR SERVERFUL CHOICE Advanced, customizable, powerful, widespread solutions for containers orchestration on pools of EC2 instances Requires infrastructure management AWS EC2
  • 41. How do containers scale for real-time varying requests load? Number of requests per second capacity unexpected sudden burst Over-provisioning cost
  • 42. Training pipeline Real-time serverless model serving Lookup user and model info Get users features trigger Update metainfo and configs REST request Get model Package requirements EFS read libraries predictionsreturn save model Build and deploy
  • 43. Comparison for real-time applications Horizontal scaling Autoscaling rules based on predicted load and capacity Elastic, based on real-time demand Provisioning time Minutes Immediately or seconds if cold start Burst concurrency Depends on available resources 3000 + additional 500 every minute Cost efficiency Pay for the over-provisioning Only pay for what you use (10x cheaper in our use cases) Vertical scaling Limited by instance types Limited to 3GB and 2 CPUs Execution timeout Unlimited 15 minutes
  • 44. Pick the best of both worlds
  • 45. Orchestrating functions and microservices with Step Functions Workflows defined as a finite states machine and plug-and-play integration with most of the AWS services: AWS Batch ECS Sagemaker
  • 47. Data sanity checks What to check: - Value ranges - Null values - Anomalies - Data distribution Tools: - Apache Griffin - Amazon deequ - Great expectations
  • 48. Centralized logging with the ELK stack Generate Logs Aggregation & Transformation Storage & Indexing Visualization & Analysis
  • 49. Infrastructure Monitoring and Alerting Basic Monitoring AWS resources and custom metrics generated by your applications and services General Infra Monitoring Cloud-scale monitoring of logs, metrics and traces from distributed, dynamic and hybrid infrastructure. Serverless Monitoring All-in-one performance management tool down to the single lines of code specifically designed for serverless applications.
  • 50. KPIs and Metrics Dashboard KPIs over time such as: ● Distribution shifts ● Model drift ● Utilization ● Coverage Analytics dashboard on top of athena SQL queries Custom programmatic dashboards with interactive charts
  • 51. Design patterns for Map/Reduce in serverless fashion
  • 52. MapReduce with PyWren futures PyWren will serialize and ship local Python code to be executed in lambda functions in the cloud and return the list of deserialized results back to the driver
  • 53. MapReduce with SFN parallel sync tasks * A single Lambda function only supports up to 10 concurrent executions when invoked synchronously As many parallel mappers we want but Maximum 10 concurrent synchronous lambda invocations!
  • 54. MapReduce with SFN queue polling ... Mapper2 Mapper1 Mapper n SQS queue Poll the queue Driver * StepFunctions has a limit of 1000 transitions/second and a max execution history size of 25k events. No limitations on async lambda invocations but max 1000 transitions/second max 25k events in the execution history
  • 55. MapReduce with SFN activity callbacks Source: https://semantive.com/part-2-asynchronous-actions-within-aws-step-functions-without-servers/ ... ... mapper1 mapper n Get activity token and wait for mapper activity to complete Start mapper activity asynchronously with the corresponding token Send activity task success s driver unlimited parallel executions without limits
  • 56. MapReduce with S3 events No limitations on async lambda invocations but Relies on IO side effects
  • 57. MapReduce with DynamoDB events job_id task_id task_status task_type task_depe ndencies function_name payload mr_example init lambda [] lambda_init {} mr_example map_1 lambda [init] lambda_map {input_path: chunk1, output_path: dir} mr_example map_2 lambda [init] lambda_map {input_path: chunk2, output_path: dir} mr_example reduce lambda [map_1, map_2] lambda_reduce {dir_path: dir} New or update events will trigger Coordinator InitUpdate Map_1 Update Map_2 Update Reduce Update submitted submitted submitted submitted completed completed completed completed No limitations on async invocations but Dynamodb read/write throttled Job metadata limited to 400kb Job DynamoDB table Fill with job meta-information and dependencies Run job entry point (external service) Job completed callback
  • 59. Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems Only a small fraction of real-world ML systems is composed of the ML Code. The required surrounding infrastructure is vast and complex.
  • 60. Facebook new motto in 2014Facebook original motto
  • 63. Download the Non-Technical Guide Topics covered: ✅ Getting started with understanding the technology ✅ Designing the right ML product ✅ Planning under uncertainty ✅ Building a balanced ML team www.helixa.ai/machine-learning-guide-2020
  • 64. Software and Machine Learning Engineer We are looking for engineers: ● Interested in deploying ML to production ● Willing to learn about cloud, code optimization and serverless technologies Requisites: ● Bachelor’s degree or above in computer science or software/computer/IT engineering fields. ● Knowledge of Pydata stack ● Knowledge of SQL ● Understanding of ML concepts Contact: lmioulet@helixa.ai
  • 65. Q&A
  • 67. Steps to Managing the ML Product Lifecycle 1. Familiarize with the whole lifecycle and most popular tools and libraries. 2. Adopt a platform such as MLflow to track and version models and experiments. 3. Notebooks are good for explorations but the implementation should be in a codebase. 4. Make analysis, code and infrastructure, reproducible and avoid manual operations. 5. Communicate analysis results effectively summarizing only what is relevant. 6. Invest on automated tests at different integration levels. 7. Exploit Continuous Integration (CI) for automating builds and releases. 8. Deliver models and components inside Docker containers, when possible. 9. Centralize the logs collection for debugging and troubleshooting. 10.Monitor the infrastructure health using specific tools. 11.Consider a strategy for implementing Governance and Auditability.
  • 68. Steps to migrate to Serverless architectures 1. Reverse Conway’s law: “Organizations produce software that resemble their organizational communication structures”. 2. Divide your architecture in separate and simple services. 3. Adopt the serverless.com framework to make easier to develop lambda functions. 4. Pick the most suitable serverless MapReduce architecture for your needs. 5. Enjoy your team having fun with simplified and scalable deployments. 6. Make a report to your boss showing the consistent amount of saved costs.

Hinweis der Redaktion

  1. Secure Scalable Cheaper Always available Worry Free No maintenance needed
  2. Helixa is a platform using machine learning to provide audience insights for modern market Helixa is a forward-thinking audience intelligence platform that uses ethical AI and Machine Learning technology to connect data sources. We build research platforms for the 21st century with more depth and detail than you would ever get from a single source research platform. The results are incredibly nuanced, timely and meaningful insights about the audiences that matter to your business. Helixa is a forward-thinking audience intelligence platform that uses ethical AI and Machine Learning technology to connect data sources. We build research platforms for the 21st century with more depth and detail than you would ever get from a single source research platform. The results are incredibly nuanced, timely and meaningful insights about the audiences that matter to your business.
  3. Ben hamner - co founder and cto of kaggle George Hotz - american security hacker (ios and playstation jailbreaks) Francois chollet - deep learning expert and creator of keras Cifar - canadian research organization tackling science and humanity problems including with the use of AI The hacker news: cybersecurity news Waymo: self-driving transportation startup
  4. Multiple datasets (source agnostic) Anonymous privacy-preserving (no personal identifiers) Ethical and responsible design (mitigate biases and fairness) Accurate estimations of consumers behaviour (statistical significance) Fast analytics calculation and aggregation of insights (a few seconds) Always available, no downtime
  5. Common Data Model Twitter Data Processing ML System Overview Entity Resolution Content Embeddings Automated Categorization Users’ Digital DNA Universal Trait Classifiers Latent Interests Augmentation Audience projection model Insights Engine
  6. Data lake: unstructured data, raw data, DS Data warehouse: highly structured, BI Data lakehouse: ACID, BI support, easily accessible
  7. HDFS: apache data storage system,name node manages nodes that run data nodes,. The name node keeps track of the data and the data node manages loading and storing. Requires managing cluster and costly EBS volumes S3: AWS object storage. Managed scaling. Only drawback of s3 is on some limitations: file size (5TB) and
  8. Ingestion, crawl to discover partitions, update of catalog, Partitions are exactly like Hive
  9. In helixa we care about various objects, such as users, items, entities
 users have multiple features describing. Every time the feature extraction process is ran, it generates a new timestamp. Each time stamp contains...
  10. Next iteration of snorkel is snorkel flow: E2E ML platform integrating the concepts of snorkel.
  11. Luc
  12. Not a serverless part, but reduces costs compared to using sagemaker. Also reduces boundaries from notebook to code, because we use code from a repo within the notebook
  13. EMR recommended configurations: Favor r-family instance types Use a dedicated instance for the driver and spot instances for workers Set the maximizeResourceAllocation":"true” property (calculates the maximum compute and memory resources available for an executor on an instance in the core instance group. It then sets the corresponding spark-defaults settings based on this information) Avoid dynamic allocation, one job per time
  14. Luc
  15. Makefiles makes your code task-oriented, explicitly stating what the steps are necessary to perform a given task and making easy for any user to run those tasks without have to copy and paste verbose shell commands. Make commands should have as few arguments as possible, if not none Alternative to this is to use DVC pipelines.
  16. Leveraging commands in the Makefile, the Continuous Integration (CI) system can automate the build and test at every commit and the release and deployment at every pull request.
  17. What we mean by code tests are
.
  18. Online training is not part of the talk.
  19. In the case of Offline batch forecasting a good deployment model is the use of aws batch
  20. In the case of offline in demand prediction, using microservices Docker runtime support will be dropped by kubernetes in favor of containers that use the container runtime interface
  21. Docker can now replace the EFS volume This architecture requires smart orchestration and development of map-reduce architecture specific to a task
  22. Docker jobs will be for long running “memory” intensive jobs. AWS lambdas can do equivalent job if well written.
  23. Luc
  24. Split in half drifts and shifts, utilization
  25. serializes and run local Python code and return results back to the driver
  26. Unfortunately a single Lambda function only supports up to 10 concurrent executions when invoked synchronously.
  27. * StepFunctions has a limit of 1000 transitions/second and a max execution history size of 25k events.
  28. F8 conference in 2014
  29. Add