This document discusses deploying deep learning models using TensorFlow and AWS Lambda. It begins with an overview of data science project teams and challenges in deployment. It then introduces serverless computing using AWS Lambda as a solution. Key points are that Lambda is triggered by events and TensorFlow models can be deployed using their meta graph and checkpoint files. The document demonstrates running a deep learning model on Lambda triggered by API Gateway.
2. Agenda
● The Golden Egg
● Goals
● Data Science Project Teams
● Challenges
● Some Solutions
● Conclusions
3. About me
● I’m not a ‘Math Guru’
● I know enough to be dangerous, I know enough to know
that I can be dangerous and know that I will make
mistakes (especially when I’m tired and hungry).
5. Slide by Andrew Ng, all rights reserved.
Deep Learning - A Primer
6. The Data Science Deliverable
A deep learning model!
This is the application
“file” that needs to be
updated!
An updated model could
be from any framework
7. Data Science Teams - The New Way
Data ScientistFinance Manager
Accountant
Tax and Compliance
Treasury
Data Gurus:
- Analytics
- Data Engineers
- Business Intelligence
- Compliance
IT Manager
9. Enter “Serverless Computing”
From Wikipedia: “Serverless computing is a cloud
computing execution model in which the cloud provider
dynamically manages the allocation of machine
resources.”
10. Some Notes on AWS Lambda
- AWS Lambda is triggered based on events
- Example events:
- Microbatch with DynamoDB (new file on S3, webhook)
- Batch with CloudWatch (cron jobs, web scraping)
- API calls with API Gateway (RESTful API)
11. TensorFlow Models
Meta graph
This is a protocol buffer which saves the complete TensorFlow graph; i.e. all variables, operations,
collections etc. This file has .meta extension.
Checkpoint file and Data file
Data (.data-xyz) is a binary file which contains all the values of the weights, biases, gradients and all the
other variables saved. Checkpoint files (.ckpt) simply save the record of latest checkpoint files saved.
Saver Model
Saved server model
The advantage to deep learning is that the models are better, generally speaking, as we throw more data at them.
Talking points:
Move from prescriptive to predictive analytics
Deliver a machine learning or deep learning model that will allow organizations to automate processes
Visualizations are still important, but used for telling a data story for EDA and also for visualizing how models are behaving in real time
Predictive analytics looks at the historical trends in data to provide insights. Organization members are then tasked to optimize processes to improve organizational results based on trends. However, companies need to automate tasks (remove the human from the actual task execution) based on certain indicators. In this case, visualizations are used in EDA to better understand the data with the goal of creating and deploying machine learning and deep learning models that can automate certain organization processes.
Talking points:
Organizations realize they need to automate their processes and that automation must come from real time analysis of data points
The deliverable is not just a BI dashboard anymore, the deliverable is a deployable machine learning and deep learning model
Embedding a data science team member into the group increases value
As mentioned, historically data science teams have been isolated from the rest of the organization.
Successful data driven organizations embed their data scientists into various business groups. For example: data extraction and loading into a warehouse table are done by engineering teams, however, a data science liaison, embedded within a certain department or relevant company wide project, can help data engineers improve the schema definition for the data being exposed which could save valuable time during the exploratory data analysis phase. Data engineers can create tables using their favorite Extract Transform and Load (ETL) tools to remove not-a-number (NaN) rows, remove columns that are irrelevant such as data base PK/FK’s, etc.
Inversely, the data scientist could help the person telling the data story (could be anyone in the group, including herself) what features are relevant, how the certain normalizations were completed without delving into the technical details, etc. “This was the only customer that bought a widget in Atlanta so the attributes for this person were adjusted to not skew the dataset in their favor”.
Talking points:
Support data source imports from multiple sources
EDA needed as first step to build and deliver artifacts to automate business processes. Artifacts in this context are machine learning and deep learning models.
Data engineers and DevOps need access to data science hub to streamline their own processes
Traditional teams use Excel spread sheets, among other tools, and are flying back and forth with emails, chat applications or external project management solutions. Even if all users work within shared environments such as Google Docs or Office 365, teams had no way of sharing all files and tools within one common environment, particularly for exploratory data analysis, since viewing and editing files within these environments are constrained to a certain set of file formats. Nevertheless, certain organizations and individuals prefer one language over the other. For example, a data science team involved with the Finance department may be more involved with using the R programming language, and the data science team involved with the marketing department may be more involved with the Python programming environment. In both cases, users may use multiple tools for one language. For example, some individuals may prefer RStudio for R, and others amy prefer using R with Jupyter Notebooks. Server management is important to optimize compute resources.