Diese Präsentation wurde erfolgreich gemeldet.

Running R on AWS Lambda by Ana-Maria Niculescu



Nächste SlideShare
Serverless Node.js
Serverless Node.js
Wird geladen in …3
1 von 24
1 von 24

Weitere Verwandte Inhalte

Weitere von Paris Women in Machine Learning and Data Science

Running R on AWS Lambda by Ana-Maria Niculescu

  1. 1. About Numeract LLC ● Data Science and Economics / Finance consulting services, open source projects ● Technology Stack: (Postgre)SQL, R, Python, Docker, AWS Contributors ● Mike Badescu, PhD Economics, Lehigh University ● Ana Niculescu, Msc Statistics for Smart Data, ENSAI, Rennes ● Teodor Ciuraru github.com/numeract/aws-lambda-r
  2. 2. Agenda Motivation APIs AWS Running R on Amazon Lambda Conclusions
  3. 3. Motivation Deploying R in production is not very easy but it is a good problem to have. There are diverse approaches towards creating deployment packages for AWS Lambda and creating an API but which one is the best fit for R? This presentation aims to compare currently available approaches and to present a custom solution. Overview
  4. 4. Motivation ● Shiny Dashboard for data visualisation and model validation ● In the backend, the Shiny App was also: ○ reading the data ○ performing data cleaning ○ computing features ○ building prediction models ● Client also needed to access to only the calculation process to use it outside (and independent from) the Shiny App Our use case
  5. 5. Motivation ● Rewriting the code in another programming language is not efficient ● Certain algorithms may not be available in other programming languages ● Even though certain algorithms may be available in other programming languages, they may not render the same results as in R Why not rewrite the needed code?
  6. 6. The answer to our problem ● Modularize the application ● Programming language independent ● Common concept: programmers know how to work with APIs ● Common "data language": JSON (but not the only one) APIs
  7. 7. Of R and APIs ● Not directly, as R is not a web server ● We need a wrapper / server that: ● receives requests and hands them to R ● passes any response from R to the client Candidate approaches ● Web server: plumber and OpenCPU ● Serverless: running R on AWS Lambda -> wrap R functions inside FasS Can we serve API requests from R?
  8. 8. Wrapping up the requirements Getting ready to roll our sleeves The App task engine should: ● get a request id ● read data from DB ● process data for 5-20 seconds ● return the results Our challenge: How to use this code in production? ● needs to be triggered ● uncertain and irregular future demand ● R code still in development ● clicking through an interface to re-deploy the code every time we needed it was not an option
  9. 9. Amazon Web Services Bref AWS is a collection of pay-as-you-go cloud services. The services we needed: ● IAM (Identity and Access Management) - for security ● VPC - a logically isolated virtual network ● EC2 - a virtual machine ● API Gateway ● S3 - storage ● Lambda - serverless application service
  10. 10. Amazon Web Services Lambda: Function as a Service AWS Lambda is a serverless compute service that launches and executes code when it is explicitly triggered by an event (API), and stays up ONLY as long as the code runs. ● Helps you build apps while minimizing infrastructure maintenance ● Keep the focus on what’s important: data engineering/analysis, not DevOps ● Pay only for what you use: supports ad-hoc requests ● Horizontally scalable Image source: https://dwdraju.medium.com/python-function-on-aws-lambda-with-api -gateway-endpoint-288eae7617cb
  11. 11. R on Lambda How difficult can it be? Turns out, it is not as intuitive to directly run R in AWS Lambda. Since December 2018, Amazon introduced custom runtimes for AWS Lambda. This allows us to use almost any programming language, including R.
  12. 12. R on Lambda Solutions Our new approach: ● Use a Base R custom runtime provided by Bakdata ● Copy the additional R packages we need inside the deployment package provided to Lambda Other approaches: ● Previously we used package rpy2 to run R code from Python) ● R package lambda.r: triggers a Lambda function from R
  13. 13. AWS-Lambda-R The architecture Looks simple, but you don’t want to click through all that every time you want to re-deploy the code...especially when you have multiple releases
  14. 14. AWS-Lambda-R Let’s not forget about security Image source: https://aws.amazon.com/security/
  15. 15. Automation aws-lambda-r scripts We needed a fast way to launch the infrastructure entirely. Our approach: a series of shell scripts that launch the entire AWS infrastructure needed to run R on AWS Lambda. Top view: Deploy R function on AWS Lambda Configure access to AWS Lambda
  16. 16. Automation aws-lambda-r implementation details The scripts: 1. use your settings to create a VPC, S3, authorization policies 2. install and compile R packages 3. create the zip file to load in AWS Lambda and save it to S3 4. create Lambda function and deploy the zip file 5. configure AWS API Gateway to allow accessing the code over the web The scripts use AWS CLI through (Git)Bash => available on all platforms
  17. 17. R example A simple function that runs in Lambda
  18. 18. Command Line Running the scripts...
  19. 19. Command Line API requests
  20. 20. Pros At the end of the setup, you will have the entire infrastructure to run R on AWS Lambda, without worrying about EC2 instances or scalability issues. ● use AWS Command Line Interface - no need for clicks anymore ● pay-as-you go ● fast deployment after each release ● easy to adapt to automatically deploy code written in Python or JavaScript (if needed in the future)
  21. 21. Limitations ● Lambda function memory allocation: 128 MB to 10,240 MB ● Function timeout: 15 minutes ● Maximum zip file size: 250MB ● this is the most important limitation as it prevents using large R packages
  22. 22. Finally Where Data Science stops and Data Engineering begins... ● Each project has unique requirements and constraints ● AWS was great for our needs, Lambda too, especially since it became more flexible through custom execution environments and layers ● Scripts still run in production, making the client happy ● It is worth automating something in order to be able to focus in what’s more important
  23. 23. github.com/numeract/aws-lambda-r anamaria.niculescu@numeract.com linkedin.com/in/anamarianiculescu/ github.com/AnaNiculescu36 Thank you!