SlideShare ist ein Scribd-Unternehmen logo
1 von 28
Downloaden Sie, um offline zu lesen
Better Data with Machine
Learning and Serverless
Jonathan LeBlanc (Director of
Developer Advocacy @ Box)
Twitter: @jcleblanc
Email: jleblanc@box.com
Agenda for Today
Building Blocks: How are these systems built?
Best Practices: How do we architect the solution?
Security Considerations: How do ensure data security?
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Part 1: Building Blocks
1 What Machine Learning Isn’t
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
1 Components of the System
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Serverless Framework
Provides the compute and data
management from stored data location to
machine learning engine.
Machine Learning System
Provides the data enhancement capabilities
which improves the underlying source data’s
metadata (information about information).
1 Why Serverless?
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
On Demand: Machine learning ties are only required when files need
processing, which may be infrequent.
No hosting: You don’t have to run or manage any servers, containers, or VMs of
your own.
Pricing based on use: Execution resources are only run (and charged for) based
on your use, typically resulting in very low server costs.
Different stack options: Multiple serverless systems exist to fit stack needs,
including numerous open source options.
1 Components of the System
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Webhook / Event Pump System: Handles notifications to the middleware layer
when a new file should be processed.
Middleware Layer: Handles communication between the data source and
machine learning systems.
Metadata Layer: The storage facility for machine learning data responses.
Token Downscoping System: Allows you to pass tightly scoped read / write
tokens through multiple uncontrolled system layers.
1 How a Data / ML System Works
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Cloud Data
Data store &
initial metadata
Serverless Framework
Callback handler and code
execution
Machine Learning
Data processor and
enhancer
Webhook
Metadata
Execute
Callback
1 Common Serverless Frameworks
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
AWS Lambda:
https://aws.amazon.com/lambda/
Azure Functions:
https://azure.microsoft.com/en-us/services/functions/
Google Cloud Functions:
https://cloud.google.com/functions/
IronFunctions:
https://github.com/iron-io/functions
OpenWhisk:
https://openwhisk.apache.org/
Fission:
https://fission.io/
Considerations
1. Your stack
2. Pricing / free use
3. Supported languages
4. Regional support
1 Machine Learning Frameworks
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Audio / Video / Image
• [video] MS Video Indexer
• [audio] Voicebase
• [face] Hive AI
• [image] Clarifai
• [image] Google Vision
• [mixed] IBM Watson
• [moderation] MS Content
Moderator
• [face] Kairos
• [audio] AT&T Speech
• [image] Amazon
Rekognition
Text Extraction
• [id] Acuant
• [invoice] Rossum.AI
• [contract] eBrevia
• [lease] Leverton
• [resume] TextKernal
• [prediction] AmazonML
• [analysis] Aylien
• [classification]
MonkeyLearn
• [natural language] ApiAI
• [sentiment]
AlchemyText
Open Source
• TensorFlow
• Keras
• Scikit-learn
• MS Cognitive Toolkit
• Theano
• Caffe
• Torch
• Accord.NET
Part 2: Best Practices
2 Program Logic and Serverless Separation
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Serverless function agnostic: The core logic of the function should be separate
from the serverless requirements. Thin handlers / routers may be written on
top of the core logic to maintain separation.
Service deployments: To allow for deployment amongst numerous serverless
technologies, systems like serverless.com may be utilized.
Testability: The separation of concerns allows you to test the function
separately from the container.
Handler: Separate handler from core program logic for testability.
// API Gateway Handler
exports.handler = (event, context, callback) => {
// Check for valid event
if (isValidEvent()) {
processEvent();
} else {
callback(null, { statusCode: 200, body: 'Event received but invalid' });
}
};
AWS Lambda Handler
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
2 Dealing with Cold Starts
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
What is it: The latency experienced when a function is triggered, which only
runs when there isn’t a warn / idle container. A container is automatically
dropped after a period of inactivity.
Options: You can either keep the container warm through memory increases
and calls, or deal with the cold start.
Fewer libraries: The more libraries that are used the longer it will take to start
the container.
Smaller functions: Writing smaller functions decreases start time.
2 Exit Callback Hygiene
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Error logging: With many serverless environments proper callback use will
provide full data logging.
Reliability: Failing to exist properly can result in your function executing until a
timeout is hit. Timeouts may also cause subsequent invocations to require a
cold start, which results in additional latency.
Cost: If a timeout occurs, you will be charged for the entire timeout time.
// Success Callback
callback(null, { statusCode: 200, body: 'Event processed' });
// Error Callback
callback({ statusCode: 400, body: 'Event error' });
Processing AWS Lambda Exit Callbacks
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
2 Writing Stateless Single Purpose Functions
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Error isolation: Debugging and error handling is
easier with function / concern isolation.
Scaling: With monolith functions, you have to
optimize entire for all elements of the functions,
rather than the specific functionality receiving the
most calls / traffic.
Planning and testing: It’s easier to plan and write
test plans for functions with singular concerns.
/**
* Check for a valid event.
* @param {object} indexerEvent – indexer event
* @return {boolean} - true if valid event
*/
const isValidEvent = (indexerEvent) => {
return (indexerEvent.body || indexerEvent.queryStringParameters);
};
Valid Event Function
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Part 3: Security Considerations
3 Security Considerations
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Serverless use consideration: Are serverless systems a viable / approved
mechanism within your organization?
Token exposure: Many API auth systems are token based, with broadly scoped
tokens, leading to the potential of token leakage.
Credential exposure: With the use of numerous APIs, each with auth
credentials, we have the potential of credential leakage.
Sensitive information exposure: Data is being passed through multiple systems
and we have to be aware of how the information is used / stored.
3 Middleware System
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Serverless Solution
All compute functionality is offloaded to the serverless
framework.
On-prem Solution
All computer functionality (and connection to the ML
system) is run off of existing internal servers.
3 Protecting Credentials
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Use Secure Storage: Use a secure system to store API credentials or tokens,
such as the AWS Systems Manager Parameter Store.
Least Privilege Principle: Functions requiring access to credentials should follow
the least privilege principle, meaning they have access to only as much data as
they absolutely need.
Separate Environment Credentials: Credentials used in a more open developer
environment should not be the same used in a production deployment.
3 Token Downscoping
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Access Token
Fully scoped
access token
Downscoped Token
Tightly scoped child
token
Channel Transmission
Transmit through
uncontrolled channels
3 Token Downscoping Components
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Tightly scoped for single file: A token should only be scoped for the item
needed for processing, such as a file.
Short lived: Downscoped tokens should only live for their natural useful time
(e.g. 1 hour)
Revocable: Downscoped tokens may be revoked before natural expiration
through the API.
Split read / write functions: To further scope token exposure, separate read /
write tokens can be issued.
3 Sensitive Information Exposure
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Data in the files: What information is being transmitted through the channels in
the files, and is it sensitive information?
Are channels secure: Are all connections between your systems, the serverless
framework, and the machine learning system secure?
How the ML system handles data: Does the machine learning system store any
data long-term, and how secure is that storage?
Logging sensitive information: Are you logging sensitive information during
general program flow unintentionally?
3 Tokenisation Specification
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Data Request
Sensitive information
request
Cloud Data API
Data hosting service API
Secure Data Vault
Secure vault hosting
data files
1. PAN
4. Token / Status
2. PAN
3. Token / Status
Wrapup Topics
Building Blocks: How are these systems built?
Best Practices: How do we architect the solution?
Security Considerations: How do ensure data security?
Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
Better Data with Machine Learning and Serverless
Slides: http://bit.ly/ato-bdml
Jonathan LeBlanc (Director of
Developer Advocacy @ Box)
Twitter: @jcleblanc
Email: jleblanc@box.com

Weitere ähnliche Inhalte

Was ist angesagt?

Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...
Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...
Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...
SPTechCon
 
Android FakeID Vulnerability
Android FakeID VulnerabilityAndroid FakeID Vulnerability
Android FakeID Vulnerability
Mark Laubender
 

Was ist angesagt? (12)

Community call: Develop multi tenant apps with the Microsoft identity platform
Community call: Develop multi tenant apps with the Microsoft identity platformCommunity call: Develop multi tenant apps with the Microsoft identity platform
Community call: Develop multi tenant apps with the Microsoft identity platform
 
Lecture 11. Microsoft mobile services
Lecture 11. Microsoft mobile servicesLecture 11. Microsoft mobile services
Lecture 11. Microsoft mobile services
 
Live Identity Services Drilldown - PDC 2008
Live Identity Services Drilldown - PDC 2008Live Identity Services Drilldown - PDC 2008
Live Identity Services Drilldown - PDC 2008
 
7 Deadly Sins in Azure AD App Development
7 Deadly Sins in Azure AD App Development7 Deadly Sins in Azure AD App Development
7 Deadly Sins in Azure AD App Development
 
Power Apps community call-June 2020
Power Apps community call-June 2020Power Apps community call-June 2020
Power Apps community call-June 2020
 
IBM Connect 2014 - AD204: What's new in the IBM Domino Objects: By Example
IBM Connect 2014 - AD204: What's new in the IBM Domino Objects: By ExampleIBM Connect 2014 - AD204: What's new in the IBM Domino Objects: By Example
IBM Connect 2014 - AD204: What's new in the IBM Domino Objects: By Example
 
Microsoft Sharepoint 2013 : The Ultimate Enterprise Collaboration Platform
Microsoft Sharepoint 2013 : The Ultimate Enterprise Collaboration PlatformMicrosoft Sharepoint 2013 : The Ultimate Enterprise Collaboration Platform
Microsoft Sharepoint 2013 : The Ultimate Enterprise Collaboration Platform
 
Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...
Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...
Tutorial: Building Apps for SharePoint 2013 Inside and Outside of the Firewal...
 
Codendi Administration Guide
Codendi Administration GuideCodendi Administration Guide
Codendi Administration Guide
 
Getting started with ibm worklight tips
Getting started with ibm worklight tipsGetting started with ibm worklight tips
Getting started with ibm worklight tips
 
JUDCon 2014: Gearing up for mobile development with AeroGear
JUDCon 2014: Gearing up for mobile development with AeroGearJUDCon 2014: Gearing up for mobile development with AeroGear
JUDCon 2014: Gearing up for mobile development with AeroGear
 
Android FakeID Vulnerability
Android FakeID VulnerabilityAndroid FakeID Vulnerability
Android FakeID Vulnerability
 

Ähnlich wie Better Data with Machine Learning and Serverless

(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
Amazon Web Services
 

Ähnlich wie Better Data with Machine Learning and Serverless (20)

JavaScript App Security: Auth and Identity on the Client
JavaScript App Security: Auth and Identity on the ClientJavaScript App Security: Auth and Identity on the Client
JavaScript App Security: Auth and Identity on the Client
 
Integration-Monday-Serverless-Slackbots-with-Azure-Durable-Functions
Integration-Monday-Serverless-Slackbots-with-Azure-Durable-FunctionsIntegration-Monday-Serverless-Slackbots-with-Azure-Durable-Functions
Integration-Monday-Serverless-Slackbots-with-Azure-Durable-Functions
 
Top 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous DatabaseTop 20 FAQs on the Autonomous Database
Top 20 FAQs on the Autonomous Database
 
Building Complex Business Processes 3.7
Building Complex Business Processes 3.7Building Complex Business Processes 3.7
Building Complex Business Processes 3.7
 
Mastering the Lightning Framework - Part 2
Mastering the Lightning Framework - Part 2Mastering the Lightning Framework - Part 2
Mastering the Lightning Framework - Part 2
 
Durable Functions vs Logic App : la guerra dei workflow!!
Durable Functions vs Logic App : la guerra dei workflow!!Durable Functions vs Logic App : la guerra dei workflow!!
Durable Functions vs Logic App : la guerra dei workflow!!
 
Elk ruminating on logs
Elk ruminating on logsElk ruminating on logs
Elk ruminating on logs
 
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with KubernetesSumo Logic Cert Jam - Advanced Metrics with Kubernetes
Sumo Logic Cert Jam - Advanced Metrics with Kubernetes
 
The top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scaleThe top 3 challenges running multi-tenant Flink at scale
The top 3 challenges running multi-tenant Flink at scale
 
How to Make OpenStack Heat Better based on Our One Year Production Journey
How to Make OpenStack Heat Better based on Our One Year Production JourneyHow to Make OpenStack Heat Better based on Our One Year Production Journey
How to Make OpenStack Heat Better based on Our One Year Production Journey
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Using extended events for troubleshooting sql server
Using extended events for troubleshooting sql serverUsing extended events for troubleshooting sql server
Using extended events for troubleshooting sql server
 
DCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at NetflixDCSF19 Container Security: Theory & Practice at Netflix
DCSF19 Container Security: Theory & Practice at Netflix
 
Getting Started with Serverless Architectures
Getting Started with Serverless ArchitecturesGetting Started with Serverless Architectures
Getting Started with Serverless Architectures
 
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
(ATS3-PLAT07) Pipeline Pilot Protocol Tips, Tricks, and Challenges
 
JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31 JUG Poznan - 2017.01.31
JUG Poznan - 2017.01.31
 
10 tips for Cloud Native Security
10 tips for Cloud Native Security10 tips for Cloud Native Security
10 tips for Cloud Native Security
 
Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...Amazon ECS at Coursera: A unified execution framework while defending against...
Amazon ECS at Coursera: A unified execution framework while defending against...
 
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice(CMP406) Amazon ECS at Coursera: A general-purpose microservice
(CMP406) Amazon ECS at Coursera: A general-purpose microservice
 
Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3Querying and Analyzing Data in Amazon S3
Querying and Analyzing Data in Amazon S3
 

Mehr von Jonathan LeBlanc

Mehr von Jonathan LeBlanc (20)

Modern Cloud Data Security Practices
Modern Cloud Data Security PracticesModern Cloud Data Security Practices
Modern Cloud Data Security Practices
 
Box Authentication Types
Box Authentication TypesBox Authentication Types
Box Authentication Types
 
Understanding Box UI Elements
Understanding Box UI ElementsUnderstanding Box UI Elements
Understanding Box UI Elements
 
Understanding Box applications, tokens, and scoping
Understanding Box applications, tokens, and scopingUnderstanding Box applications, tokens, and scoping
Understanding Box applications, tokens, and scoping
 
The Future of Online Money: Creating Secure Payments Globally
The Future of Online Money: Creating Secure Payments GloballyThe Future of Online Money: Creating Secure Payments Globally
The Future of Online Money: Creating Secure Payments Globally
 
Modern API Security with JSON Web Tokens
Modern API Security with JSON Web TokensModern API Security with JSON Web Tokens
Modern API Security with JSON Web Tokens
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from Scratch
 
Secure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaSecure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication Media
 
Protecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsProtecting the Future of Mobile Payments
Protecting the Future of Mobile Payments
 
Node.js Authentication and Data Security
Node.js Authentication and Data SecurityNode.js Authentication and Data Security
Node.js Authentication and Data Security
 
PHP Identity and Data Security
PHP Identity and Data SecurityPHP Identity and Data Security
PHP Identity and Data Security
 
Secure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication MediaSecure Payments Over Mixed Communication Media
Secure Payments Over Mixed Communication Media
 
Protecting the Future of Mobile Payments
Protecting the Future of Mobile PaymentsProtecting the Future of Mobile Payments
Protecting the Future of Mobile Payments
 
Future of Identity, Data, and Wearable Security
Future of Identity, Data, and Wearable SecurityFuture of Identity, Data, and Wearable Security
Future of Identity, Data, and Wearable Security
 
Kill All Passwords
Kill All PasswordsKill All Passwords
Kill All Passwords
 
BattleHack Los Angeles
BattleHack Los Angeles BattleHack Los Angeles
BattleHack Los Angeles
 
Building a Mobile Location Aware System with Beacons
Building a Mobile Location Aware System with BeaconsBuilding a Mobile Location Aware System with Beacons
Building a Mobile Location Aware System with Beacons
 
Identity in the Future of Embeddables & Wearables
Identity in the Future of Embeddables & WearablesIdentity in the Future of Embeddables & Wearables
Identity in the Future of Embeddables & Wearables
 
Internet Security and Trends
Internet Security and TrendsInternet Security and Trends
Internet Security and Trends
 
Rebuilding Commerce
Rebuilding CommerceRebuilding Commerce
Rebuilding Commerce
 

Kürzlich hochgeladen

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Kürzlich hochgeladen (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 

Better Data with Machine Learning and Serverless

  • 1. Better Data with Machine Learning and Serverless Jonathan LeBlanc (Director of Developer Advocacy @ Box) Twitter: @jcleblanc Email: jleblanc@box.com
  • 2. Agenda for Today Building Blocks: How are these systems built? Best Practices: How do we architect the solution? Security Considerations: How do ensure data security? Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
  • 4. 1 What Machine Learning Isn’t Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
  • 5. 1 Components of the System Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Serverless Framework Provides the compute and data management from stored data location to machine learning engine. Machine Learning System Provides the data enhancement capabilities which improves the underlying source data’s metadata (information about information).
  • 6. 1 Why Serverless? Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com On Demand: Machine learning ties are only required when files need processing, which may be infrequent. No hosting: You don’t have to run or manage any servers, containers, or VMs of your own. Pricing based on use: Execution resources are only run (and charged for) based on your use, typically resulting in very low server costs. Different stack options: Multiple serverless systems exist to fit stack needs, including numerous open source options.
  • 7. 1 Components of the System Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Webhook / Event Pump System: Handles notifications to the middleware layer when a new file should be processed. Middleware Layer: Handles communication between the data source and machine learning systems. Metadata Layer: The storage facility for machine learning data responses. Token Downscoping System: Allows you to pass tightly scoped read / write tokens through multiple uncontrolled system layers.
  • 8. 1 How a Data / ML System Works Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Cloud Data Data store & initial metadata Serverless Framework Callback handler and code execution Machine Learning Data processor and enhancer Webhook Metadata Execute Callback
  • 9. 1 Common Serverless Frameworks Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com AWS Lambda: https://aws.amazon.com/lambda/ Azure Functions: https://azure.microsoft.com/en-us/services/functions/ Google Cloud Functions: https://cloud.google.com/functions/ IronFunctions: https://github.com/iron-io/functions OpenWhisk: https://openwhisk.apache.org/ Fission: https://fission.io/ Considerations 1. Your stack 2. Pricing / free use 3. Supported languages 4. Regional support
  • 10. 1 Machine Learning Frameworks Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Audio / Video / Image • [video] MS Video Indexer • [audio] Voicebase • [face] Hive AI • [image] Clarifai • [image] Google Vision • [mixed] IBM Watson • [moderation] MS Content Moderator • [face] Kairos • [audio] AT&T Speech • [image] Amazon Rekognition Text Extraction • [id] Acuant • [invoice] Rossum.AI • [contract] eBrevia • [lease] Leverton • [resume] TextKernal • [prediction] AmazonML • [analysis] Aylien • [classification] MonkeyLearn • [natural language] ApiAI • [sentiment] AlchemyText Open Source • TensorFlow • Keras • Scikit-learn • MS Cognitive Toolkit • Theano • Caffe • Torch • Accord.NET
  • 11. Part 2: Best Practices
  • 12. 2 Program Logic and Serverless Separation Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Serverless function agnostic: The core logic of the function should be separate from the serverless requirements. Thin handlers / routers may be written on top of the core logic to maintain separation. Service deployments: To allow for deployment amongst numerous serverless technologies, systems like serverless.com may be utilized. Testability: The separation of concerns allows you to test the function separately from the container. Handler: Separate handler from core program logic for testability.
  • 13. // API Gateway Handler exports.handler = (event, context, callback) => { // Check for valid event if (isValidEvent()) { processEvent(); } else { callback(null, { statusCode: 200, body: 'Event received but invalid' }); } }; AWS Lambda Handler Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
  • 14. 2 Dealing with Cold Starts Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com What is it: The latency experienced when a function is triggered, which only runs when there isn’t a warn / idle container. A container is automatically dropped after a period of inactivity. Options: You can either keep the container warm through memory increases and calls, or deal with the cold start. Fewer libraries: The more libraries that are used the longer it will take to start the container. Smaller functions: Writing smaller functions decreases start time.
  • 15. 2 Exit Callback Hygiene Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Error logging: With many serverless environments proper callback use will provide full data logging. Reliability: Failing to exist properly can result in your function executing until a timeout is hit. Timeouts may also cause subsequent invocations to require a cold start, which results in additional latency. Cost: If a timeout occurs, you will be charged for the entire timeout time.
  • 16. // Success Callback callback(null, { statusCode: 200, body: 'Event processed' }); // Error Callback callback({ statusCode: 400, body: 'Event error' }); Processing AWS Lambda Exit Callbacks Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
  • 17. 2 Writing Stateless Single Purpose Functions Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Error isolation: Debugging and error handling is easier with function / concern isolation. Scaling: With monolith functions, you have to optimize entire for all elements of the functions, rather than the specific functionality receiving the most calls / traffic. Planning and testing: It’s easier to plan and write test plans for functions with singular concerns.
  • 18. /** * Check for a valid event. * @param {object} indexerEvent – indexer event * @return {boolean} - true if valid event */ const isValidEvent = (indexerEvent) => { return (indexerEvent.body || indexerEvent.queryStringParameters); }; Valid Event Function Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
  • 19. Part 3: Security Considerations
  • 20. 3 Security Considerations Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Serverless use consideration: Are serverless systems a viable / approved mechanism within your organization? Token exposure: Many API auth systems are token based, with broadly scoped tokens, leading to the potential of token leakage. Credential exposure: With the use of numerous APIs, each with auth credentials, we have the potential of credential leakage. Sensitive information exposure: Data is being passed through multiple systems and we have to be aware of how the information is used / stored.
  • 21. 3 Middleware System Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Serverless Solution All compute functionality is offloaded to the serverless framework. On-prem Solution All computer functionality (and connection to the ML system) is run off of existing internal servers.
  • 22. 3 Protecting Credentials Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Use Secure Storage: Use a secure system to store API credentials or tokens, such as the AWS Systems Manager Parameter Store. Least Privilege Principle: Functions requiring access to credentials should follow the least privilege principle, meaning they have access to only as much data as they absolutely need. Separate Environment Credentials: Credentials used in a more open developer environment should not be the same used in a production deployment.
  • 23. 3 Token Downscoping Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Access Token Fully scoped access token Downscoped Token Tightly scoped child token Channel Transmission Transmit through uncontrolled channels
  • 24. 3 Token Downscoping Components Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Tightly scoped for single file: A token should only be scoped for the item needed for processing, such as a file. Short lived: Downscoped tokens should only live for their natural useful time (e.g. 1 hour) Revocable: Downscoped tokens may be revoked before natural expiration through the API. Split read / write functions: To further scope token exposure, separate read / write tokens can be issued.
  • 25. 3 Sensitive Information Exposure Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Data in the files: What information is being transmitted through the channels in the files, and is it sensitive information? Are channels secure: Are all connections between your systems, the serverless framework, and the machine learning system secure? How the ML system handles data: Does the machine learning system store any data long-term, and how secure is that storage? Logging sensitive information: Are you logging sensitive information during general program flow unintentionally?
  • 26. 3 Tokenisation Specification Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com Data Request Sensitive information request Cloud Data API Data hosting service API Secure Data Vault Secure vault hosting data files 1. PAN 4. Token / Status 2. PAN 3. Token / Status
  • 27. Wrapup Topics Building Blocks: How are these systems built? Best Practices: How do we architect the solution? Security Considerations: How do ensure data security? Jonathan LeBlanc • Director of Developer Advocacy @ Box • Twitter: @jcleblanc • Email: jleblanc@box.com
  • 28. Better Data with Machine Learning and Serverless Slides: http://bit.ly/ato-bdml Jonathan LeBlanc (Director of Developer Advocacy @ Box) Twitter: @jcleblanc Email: jleblanc@box.com