SlideShare ist ein Scribd-Unternehmen logo
1 von 35
Downloaden Sie, um offline zu lesen
Fast Machine Learning
Development with MongoDB
Jane Fine
Director of Product Marketing, Analytics - MongoDB
Spoke is a modern ticketing system to manage workplace requests that uses
Machine Learning to automatically answer questions and assign requests to
right teams.
● Started in August, 2016.
● Based on SF. Funded by Greylock and Accel.
● Small, fast-moving engineering team.
What is Spoke
Demo Flow
● Problems: natural language processing problems
● Challenge: customized machine learning models for every client
○ Need to learn quickly (near real time) from user interactions
○ 1000s of ML models
● MongoDB: very useful in scaling up our ML
Spoke: Overview of Challenges
Machine Learning Approach
Machine Learning Problem: Team Triaging
Problem: pick right team based on
the text and context of the request
Challenge: Each client has different
teams so pretraining not possible;
must learn from demonstration
Implication => Separate ML model
for each client
Traditional ML vs Adaptive Approach
Claim: most ml-driven
early teams and
startups are in the
second bucket
Low data and low
query volume domain
Startups must build
quickly and adapt to
users to show utility
Traditional ML Pipeline Adaptive ML Pipeline
Adapting with Online Machine Learning
● Online learning: Update the model at each time step as the data
sequentially arrives
● For the first year Spoke built quickly by using online learning to deliver a
slick product experience
○ Users see the utility because the system learns in real time!
● Easy to serve and scale using MongoDB
Serving flow
Online Learning with MongoDB
Training flow
Storing ML models in Mongo
● Simple schema for storing client ML models
for team routing and other product features
● Sub 500 ms fetch for models upto 5 MB
● Tip: keep a separate shard for this
collection to isolate rest of application DB
from
clientMLModelSchema:
{
client: {
ref: <clientId>
},
onlineModels: {
teamRouting: {
model: {}
},
…
},
}
Online Learning: Tips
● Use feature hashing to get bounded model size
○ E.g. a linear model with a hash of size 10k for a 5 way classification problem
=> model size = 200KB.
● Load test your setup to ensure it works for your QPS
● Gotchas:
○ Concurrency. Possible that for two training events arriving at the same time,
one will be ignored (Possible to avoid using queues)
○ No guarantees for deep neural nets
Augmenting TensorFlow with MongoDB
● Years later, we developed a batch training environment with Tensorflow
○ Batch learning is maintainable, retrainable, and allows deep NNs
● Still, online learning provides better UX due to immediate update
● Achieved a good compromise: use Mongo-based online learning model
for first few hundred responses and then silently switch to batch
Spoke Tech Stack
Mongo ML Capabilities
Multiple Data Models and Access Patterns in MongoDB
Rich Queries
Point | Range | Geospatial | Faceted Search | Aggregations | JOINs | Graph Traversals
JSON
Documents
Tabular Key-Value Text GraphGeospatial
Example: Text Classification
Data Model 1:
key-value
raw text input
whole corpus
0b917217ae
7fef14c0b3
cb9eadad9a
Example: Text Classification
Data Model 1:
key-value
raw text input
whole corpus
0b917217ae
7fef14c0b3
cb9eadad9a
Data Model 2: tabular
matrix: one row per
article, one column
per word in article
word1 word2 word3
article1 1 0 2
article2 0 1 0
article3 0 1 1
TF-IDF
Vectorization
Example: Text Classification
Data Model 2: tabular
matrix: one row per
article, one column
per word in article
Data Model 3:
JSON documents
extract keywords and
topics and enrich
word1 word2 word3
article1 1 0 2
article2 0 1 0
article3 0 1 1
LDA Topic
extraction
{
"_id" : “0b917217ae”,
"title" : "Document Model Design Patterns",
“text”: blob,
"topics" : [ "Models", "MVC" ],
“top_words”: [“join”, “embed”, “one-to-many”]
“model”:
{
“location”:
“last_updated”: Timestamp(“05-29-19
00:00:00”)
“confidence”: Decimal128("0.9123")
...
}
...
}
Example: Text Classification & Graph Traversal
Data Model 3:
JSON documents
extract keywords and
topics and enrich
Data Model 4: graph
tree/hierarchy of
topics modeled as a
graph
Hierarchical
Clustering
{
"_id" : “0b917217ae”,
"title" : "Document Model Design Patterns",
“text”: blob,
“parent”: “Databases”,
"topics" : [ "Models", "MVC" ],
“top_words”: [“join”, “embed”, “one-to-many”]
“model”:
{
“location”:
“last_updated”: Timestamp(“05-29-19
00:00:00”)
“confidence”: Decimal128("0.9123")
...
}
...
db.topics.insert( { _id: "Models", parent: "Databases" } )
db.topics.insert( { _id: "Storage", parent: "Databases" } )
db.topic.insert( { _id: "MVCC", parent: "Databases" } )
db.topic.insert( { _id: "Databases", parent: "Programming" } )
db.topic.insert( { _id: "Languages", parent: "Programming" } )
db.topic.insert( { _id: "Programming", parent: null } )
Programming
Languages Databases
ModelsStorage
MVCC
$graphlookup
Indexing in MongoDB
• Primary Index
– Every Collection has a primary key index
• Compound Index
– Index against multiple keys in the document
• MultiKey Index
– Index into arrays
• Text Indexes
– Support for text searches
• GeoSpatial Indexes
– 2d & 2dSphere indexes for spatial geometries
• Hashed Indexes
– Hashed based values for sharding
Index Types
• TTL Indexes
– Single Field indexes, when expired delete the
document
• Unique Indexes
– Ensures value is not duplicated
• Partial Indexes
– Expression based indexes, allowing indexes on
subsets of data
• Case Insensitive Indexes
– Supports text search using case insensitive search
• Sparse Indexes
– Only index documents which have the given field
Index Features
Scalability & Distributed Processing
Process large volumes of data in parallel
queries and
aggregations
run in parallel
data is
returned
in parallel
• Automatically scale beyond
the constraints of a single
node
• Optimized for query patterns
and data locality
• Transparent to applications
and tools
≤ ∑
⟕ "
sharded cluster
Intelligent Data Distribution: Workload Isolation
Enable different workloads on the same data
ANALYTICAL
ML & AI
A single replica set
• Combine operational and
analytical workloads on a
single platform
• No data movement or
duplication
• Extract insights in real-time to
enrich applications
• MongoDB Atlas - Analytics
Nodes
TRANSACTIONAL
Operational Analytics
S
S
S
Application
Text Search
MongoDB Text Search
db.restaurants.find( { $text: { $search: "java coffee shop" } } )
db.restaurants.find(
{ $text: { $search: "java coffee shop" } },
{ score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )
Match Text
Score and Sort
Results
Create Text Index1
2
3
db.restaurants.createIndex( { description: "text" } )
Index any field whose value is a string or an array of string elements
A collection can only have one text search index, but that index can cover
multiple fields
Optionally Specify Language:
Text Indexing
Text Matching
$text will tokenize the search string using whitespace and most punctuation as
delimiters, and perform a logical OR of all such tokens in the search string.
Search for a Single Word
Match Any of the Words
Search for a Phrase
Negations
Scoring and Sorting: Control Search Results with Weights
Weight is the significance of the field (default = 1)
For each indexed field, MongoDB multiplies the number of matches by the
weight and sums the results → score of the document
Use “textScore" metadata for projections, sorts, and conditions subsequent
the $match stage that includes the $text operation.
Spoke: Knowledge Base Search with ML
● User asks a question to Spoke and expects real time response
○ Search best knowledge answer from 1000s of answers
● We use a combination of ML algorithms in determining the right answer
○ Scoring each answer independently is not an option due to latency
● Candidate generation to rescue!
How Spoke uses Text Search
● Use MongoDB text search to select top k highest scoring articles
● Only run extensive ML-based search on k articles
○ Works as long as the right answer is in top k
○ Allows us to build latest ML algos without worrying too much about latency
● Tip: set your MongoDB text index weights carefully by fine tuning
{ title: 10,
body: 2,
keywords: 6...}
Future Directions
What Spoke is working on
● Understand user queries and take actions
○ “I need access to Salesforce” => “issue_license(user, software=salesforce)”
● Assign custom labels to user questions
○ Allow customers to add labels to their requests
■ {“hardware”, “software”, “licensing”, “urgent”},
■ {“benefits”, “payroll”, “immigration”}
○ Specific custom labels for each client stored in MongoDB
○ Automatically predict the right labels for requests
What MongoDB is working on: Full Text Search (Beta)
● Based on Apache Lucene 8
● Integrated into MongoDB Atlas
● Separate process co-located with mongod
● Shard-aware
● Indexing = collection scan -> steady state
How Do I use it?
Create a cluster on MongoDB Atlas using 4.2 RC (M30+)
Create an Full Text Index via the MongoDB Atlas UI or API
Query Index via $searchBeta operator using MongoDB Compass or shell,
add to your existing aggregation pipelines
What MongoDB is working on: Atlas Data Lake (beta)
● Serverless: no infrastructure to set up and manage
● Usage-based pricing: only pay for the queries your run
● On-demand: no need to load data; bring your own S3 bucket
● Auto-scalable: parallel execution delivers performance for large and
complex queries across multiple user sessions
● Multi-format: JSON, BSON, CSV, TSV, Avro, Parquet
● Integrated with Atlas: users are managed by Atlas, enabled via Atlas
console
● The best tools to work with your data: MongoDB Query language
enable flexible and efficient data access; integrates with Compass,
MongoDB Shell and MongoDB drivers
What MongoDB is working on: Atlas Data Lake (beta)
Operational Analytics
Aggregations
Machine
Learning and AIData Lake
in-app analytics
Transactional
in-app analytics
Primary Secondary Secondary AnalyticsAnalytics
Q&A

Weitere ähnliche Inhalte

Was ist angesagt?

Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
MongoDB
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
MongoDB
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Lucidworks
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
DATAVERSITY
 

Was ist angesagt? (20)

Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
Big Data Analytics 3: Machine Learning to Engage the Customer, with Apache Sp...
 
MongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business InsightsMongoDB and Hadoop: Driving Business Insights
MongoDB and Hadoop: Driving Business Insights
 
MongoDB 2.4 and spring data
MongoDB 2.4 and spring dataMongoDB 2.4 and spring data
MongoDB 2.4 and spring data
 
MongoDB et Hadoop
MongoDB et HadoopMongoDB et Hadoop
MongoDB et Hadoop
 
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorialsMongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
Mongo DB: Fundamentals & Basics/ An Overview of MongoDB/ Mongo DB tutorials
 
5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB5 Pitfalls to Avoid with MongoDB
5 Pitfalls to Avoid with MongoDB
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Keynote: New in MongoDB: Atlas, Charts, and Stitch
Keynote: New in MongoDB: Atlas, Charts, and StitchKeynote: New in MongoDB: Atlas, Charts, and Stitch
Keynote: New in MongoDB: Atlas, Charts, and Stitch
 
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and KubernetesMongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
MongoDB Evenings DC: Get MEAN and Lean with Docker and Kubernetes
 
Webinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDBWebinar: Best Practices for Getting Started with MongoDB
Webinar: Best Practices for Getting Started with MongoDB
 
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
Where Search Meets Machine Learning: Presented by Diana Hu & Joaquin Delgado,...
 
MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
Common MongoDB Use Cases
Common MongoDB Use CasesCommon MongoDB Use Cases
Common MongoDB Use Cases
 
Optimize drupal using mongo db
Optimize drupal using mongo dbOptimize drupal using mongo db
Optimize drupal using mongo db
 
NoSQL and MongoDB Introdction
NoSQL and MongoDB IntrodctionNoSQL and MongoDB Introdction
NoSQL and MongoDB Introdction
 
Relational to Graph - Import
Relational to Graph - ImportRelational to Graph - Import
Relational to Graph - Import
 
Database Trends for Modern Applications: Why the Database You Choose Matters
Database Trends for Modern Applications: Why the Database You Choose Matters Database Trends for Modern Applications: Why the Database You Choose Matters
Database Trends for Modern Applications: Why the Database You Choose Matters
 
Joins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation EnhancementsJoins and Other MongoDB 3.2 Aggregation Enhancements
Joins and Other MongoDB 3.2 Aggregation Enhancements
 
MongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and ImplicationsMongoDB Schema Design: Practical Applications and Implications
MongoDB Schema Design: Practical Applications and Implications
 
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
[PASS Summit 2016] Azure DocumentDB: A Deep Dive into Advanced Features
 

Ähnlich wie MongoDB .local London 2019: Fast Machine Learning Development with MongoDB

Ähnlich wie MongoDB .local London 2019: Fast Machine Learning Development with MongoDB (20)

MongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDBMongoDB World 2019: Fast Machine Learning Development with MongoDB
MongoDB World 2019: Fast Machine Learning Development with MongoDB
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Back to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDBBack to Basics 2017: Mí primera aplicación MongoDB
Back to Basics 2017: Mí primera aplicación MongoDB
 
Mongo db
Mongo dbMongo db
Mongo db
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
MongoDB performance
MongoDB performanceMongoDB performance
MongoDB performance
 
MongoDB
MongoDBMongoDB
MongoDB
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6
 
MongoDB Tips and Tricks
MongoDB Tips and TricksMongoDB Tips and Tricks
MongoDB Tips and Tricks
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 
Mongodb Introduction
Mongodb IntroductionMongodb Introduction
Mongodb Introduction
 
MongoDB for the SQL Server
MongoDB for the SQL ServerMongoDB for the SQL Server
MongoDB for the SQL Server
 
MongoDB
MongoDBMongoDB
MongoDB
 
Jumpstart: Building Your First MongoDB App
Jumpstart: Building Your First MongoDB AppJumpstart: Building Your First MongoDB App
Jumpstart: Building Your First MongoDB App
 
MongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behlMongoDB using Grails plugin by puneet behl
MongoDB using Grails plugin by puneet behl
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
MongoDB Pros and Cons
MongoDB Pros and ConsMongoDB Pros and Cons
MongoDB Pros and Cons
 
moma-django overview --> Django + MongoDB: building a custom ORM layer
moma-django overview --> Django + MongoDB: building a custom ORM layermoma-django overview --> Django + MongoDB: building a custom ORM layer
moma-django overview --> Django + MongoDB: building a custom ORM layer
 
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDBMongoDB.local Sydney: An Introduction to Document Databases with MongoDB
MongoDB.local Sydney: An Introduction to Document Databases with MongoDB
 

Mehr von Lisa Roth, PMP

Mehr von Lisa Roth, PMP (10)

MongoDB .local London 2019: New Product Announcements: MongoDB Atlas Autoscal...
MongoDB .local London 2019: New Product Announcements: MongoDB Atlas Autoscal...MongoDB .local London 2019: New Product Announcements: MongoDB Atlas Autoscal...
MongoDB .local London 2019: New Product Announcements: MongoDB Atlas Autoscal...
 
MongoDB .local London 2019: Gaining ML insight on Google Cloud with Google Vi...
MongoDB .local London 2019: Gaining ML insight on Google Cloud with Google Vi...MongoDB .local London 2019: Gaining ML insight on Google Cloud with Google Vi...
MongoDB .local London 2019: Gaining ML insight on Google Cloud with Google Vi...
 
MongoDB .local London 2019: The Human Element in an Automated World: Building...
MongoDB .local London 2019: The Human Element in an Automated World: Building...MongoDB .local London 2019: The Human Element in an Automated World: Building...
MongoDB .local London 2019: The Human Element in an Automated World: Building...
 
MongoDB .local London 2019: Diverse Representations in Design
MongoDB .local London 2019: Diverse Representations in DesignMongoDB .local London 2019: Diverse Representations in Design
MongoDB .local London 2019: Diverse Representations in Design
 
MongoDB .local London 2019: Launch Re-entry! How to Return to the Technical W...
MongoDB .local London 2019: Launch Re-entry! How to Return to the Technical W...MongoDB .local London 2019: Launch Re-entry! How to Return to the Technical W...
MongoDB .local London 2019: Launch Re-entry! How to Return to the Technical W...
 
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
MongoDB .local London 2019: Using AWS to Transform Customer Data in MongoDB i...
 
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of GiantsMongoDB .local London 2019: Streaming Data on the Shoulders of Giants
MongoDB .local London 2019: Streaming Data on the Shoulders of Giants
 
MongoDB .local London 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB .local London 2019: Tips and Tricks++ for Querying and Indexing MongoDBMongoDB .local London 2019: Tips and Tricks++ for Querying and Indexing MongoDB
MongoDB .local London 2019: Tips and Tricks++ for Querying and Indexing MongoDB
 
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDBMongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
MongoDB .local London 2019: A Complete Methodology to Data Modeling for MongoDB
 
MongoDB .local London 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local London 2019: Using Client Side Encryption in MongoDB 4.2MongoDB .local London 2019: Using Client Side Encryption in MongoDB 4.2
MongoDB .local London 2019: Using Client Side Encryption in MongoDB 4.2
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Kürzlich hochgeladen (20)

AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 

MongoDB .local London 2019: Fast Machine Learning Development with MongoDB

  • 1. Fast Machine Learning Development with MongoDB Jane Fine Director of Product Marketing, Analytics - MongoDB
  • 2. Spoke is a modern ticketing system to manage workplace requests that uses Machine Learning to automatically answer questions and assign requests to right teams. ● Started in August, 2016. ● Based on SF. Funded by Greylock and Accel. ● Small, fast-moving engineering team. What is Spoke
  • 4. ● Problems: natural language processing problems ● Challenge: customized machine learning models for every client ○ Need to learn quickly (near real time) from user interactions ○ 1000s of ML models ● MongoDB: very useful in scaling up our ML Spoke: Overview of Challenges
  • 6. Machine Learning Problem: Team Triaging Problem: pick right team based on the text and context of the request Challenge: Each client has different teams so pretraining not possible; must learn from demonstration Implication => Separate ML model for each client
  • 7. Traditional ML vs Adaptive Approach Claim: most ml-driven early teams and startups are in the second bucket Low data and low query volume domain Startups must build quickly and adapt to users to show utility Traditional ML Pipeline Adaptive ML Pipeline
  • 8. Adapting with Online Machine Learning ● Online learning: Update the model at each time step as the data sequentially arrives ● For the first year Spoke built quickly by using online learning to deliver a slick product experience ○ Users see the utility because the system learns in real time! ● Easy to serve and scale using MongoDB
  • 9. Serving flow Online Learning with MongoDB Training flow
  • 10. Storing ML models in Mongo ● Simple schema for storing client ML models for team routing and other product features ● Sub 500 ms fetch for models upto 5 MB ● Tip: keep a separate shard for this collection to isolate rest of application DB from clientMLModelSchema: { client: { ref: <clientId> }, onlineModels: { teamRouting: { model: {} }, … }, }
  • 11. Online Learning: Tips ● Use feature hashing to get bounded model size ○ E.g. a linear model with a hash of size 10k for a 5 way classification problem => model size = 200KB. ● Load test your setup to ensure it works for your QPS ● Gotchas: ○ Concurrency. Possible that for two training events arriving at the same time, one will be ignored (Possible to avoid using queues) ○ No guarantees for deep neural nets
  • 12. Augmenting TensorFlow with MongoDB ● Years later, we developed a batch training environment with Tensorflow ○ Batch learning is maintainable, retrainable, and allows deep NNs ● Still, online learning provides better UX due to immediate update ● Achieved a good compromise: use Mongo-based online learning model for first few hundred responses and then silently switch to batch
  • 15. Multiple Data Models and Access Patterns in MongoDB Rich Queries Point | Range | Geospatial | Faceted Search | Aggregations | JOINs | Graph Traversals JSON Documents Tabular Key-Value Text GraphGeospatial
  • 16. Example: Text Classification Data Model 1: key-value raw text input whole corpus 0b917217ae 7fef14c0b3 cb9eadad9a
  • 17. Example: Text Classification Data Model 1: key-value raw text input whole corpus 0b917217ae 7fef14c0b3 cb9eadad9a Data Model 2: tabular matrix: one row per article, one column per word in article word1 word2 word3 article1 1 0 2 article2 0 1 0 article3 0 1 1 TF-IDF Vectorization
  • 18. Example: Text Classification Data Model 2: tabular matrix: one row per article, one column per word in article Data Model 3: JSON documents extract keywords and topics and enrich word1 word2 word3 article1 1 0 2 article2 0 1 0 article3 0 1 1 LDA Topic extraction { "_id" : “0b917217ae”, "title" : "Document Model Design Patterns", “text”: blob, "topics" : [ "Models", "MVC" ], “top_words”: [“join”, “embed”, “one-to-many”] “model”: { “location”: “last_updated”: Timestamp(“05-29-19 00:00:00”) “confidence”: Decimal128("0.9123") ... } ... }
  • 19. Example: Text Classification & Graph Traversal Data Model 3: JSON documents extract keywords and topics and enrich Data Model 4: graph tree/hierarchy of topics modeled as a graph Hierarchical Clustering { "_id" : “0b917217ae”, "title" : "Document Model Design Patterns", “text”: blob, “parent”: “Databases”, "topics" : [ "Models", "MVC" ], “top_words”: [“join”, “embed”, “one-to-many”] “model”: { “location”: “last_updated”: Timestamp(“05-29-19 00:00:00”) “confidence”: Decimal128("0.9123") ... } ... db.topics.insert( { _id: "Models", parent: "Databases" } ) db.topics.insert( { _id: "Storage", parent: "Databases" } ) db.topic.insert( { _id: "MVCC", parent: "Databases" } ) db.topic.insert( { _id: "Databases", parent: "Programming" } ) db.topic.insert( { _id: "Languages", parent: "Programming" } ) db.topic.insert( { _id: "Programming", parent: null } ) Programming Languages Databases ModelsStorage MVCC $graphlookup
  • 20. Indexing in MongoDB • Primary Index – Every Collection has a primary key index • Compound Index – Index against multiple keys in the document • MultiKey Index – Index into arrays • Text Indexes – Support for text searches • GeoSpatial Indexes – 2d & 2dSphere indexes for spatial geometries • Hashed Indexes – Hashed based values for sharding Index Types • TTL Indexes – Single Field indexes, when expired delete the document • Unique Indexes – Ensures value is not duplicated • Partial Indexes – Expression based indexes, allowing indexes on subsets of data • Case Insensitive Indexes – Supports text search using case insensitive search • Sparse Indexes – Only index documents which have the given field Index Features
  • 21. Scalability & Distributed Processing Process large volumes of data in parallel queries and aggregations run in parallel data is returned in parallel • Automatically scale beyond the constraints of a single node • Optimized for query patterns and data locality • Transparent to applications and tools ≤ ∑ ⟕ " sharded cluster
  • 22. Intelligent Data Distribution: Workload Isolation Enable different workloads on the same data ANALYTICAL ML & AI A single replica set • Combine operational and analytical workloads on a single platform • No data movement or duplication • Extract insights in real-time to enrich applications • MongoDB Atlas - Analytics Nodes TRANSACTIONAL Operational Analytics S S S Application
  • 24. MongoDB Text Search db.restaurants.find( { $text: { $search: "java coffee shop" } } ) db.restaurants.find( { $text: { $search: "java coffee shop" } }, { score: { $meta: "textScore" } } ).sort( { score: { $meta: "textScore" } } ) Match Text Score and Sort Results Create Text Index1 2 3 db.restaurants.createIndex( { description: "text" } )
  • 25. Index any field whose value is a string or an array of string elements A collection can only have one text search index, but that index can cover multiple fields Optionally Specify Language: Text Indexing
  • 26. Text Matching $text will tokenize the search string using whitespace and most punctuation as delimiters, and perform a logical OR of all such tokens in the search string. Search for a Single Word Match Any of the Words Search for a Phrase Negations
  • 27. Scoring and Sorting: Control Search Results with Weights Weight is the significance of the field (default = 1) For each indexed field, MongoDB multiplies the number of matches by the weight and sums the results → score of the document Use “textScore" metadata for projections, sorts, and conditions subsequent the $match stage that includes the $text operation.
  • 28. Spoke: Knowledge Base Search with ML ● User asks a question to Spoke and expects real time response ○ Search best knowledge answer from 1000s of answers ● We use a combination of ML algorithms in determining the right answer ○ Scoring each answer independently is not an option due to latency ● Candidate generation to rescue!
  • 29. How Spoke uses Text Search ● Use MongoDB text search to select top k highest scoring articles ● Only run extensive ML-based search on k articles ○ Works as long as the right answer is in top k ○ Allows us to build latest ML algos without worrying too much about latency ● Tip: set your MongoDB text index weights carefully by fine tuning { title: 10, body: 2, keywords: 6...}
  • 31. What Spoke is working on ● Understand user queries and take actions ○ “I need access to Salesforce” => “issue_license(user, software=salesforce)” ● Assign custom labels to user questions ○ Allow customers to add labels to their requests ■ {“hardware”, “software”, “licensing”, “urgent”}, ■ {“benefits”, “payroll”, “immigration”} ○ Specific custom labels for each client stored in MongoDB ○ Automatically predict the right labels for requests
  • 32. What MongoDB is working on: Full Text Search (Beta) ● Based on Apache Lucene 8 ● Integrated into MongoDB Atlas ● Separate process co-located with mongod ● Shard-aware ● Indexing = collection scan -> steady state How Do I use it? Create a cluster on MongoDB Atlas using 4.2 RC (M30+) Create an Full Text Index via the MongoDB Atlas UI or API Query Index via $searchBeta operator using MongoDB Compass or shell, add to your existing aggregation pipelines
  • 33. What MongoDB is working on: Atlas Data Lake (beta) ● Serverless: no infrastructure to set up and manage ● Usage-based pricing: only pay for the queries your run ● On-demand: no need to load data; bring your own S3 bucket ● Auto-scalable: parallel execution delivers performance for large and complex queries across multiple user sessions ● Multi-format: JSON, BSON, CSV, TSV, Avro, Parquet ● Integrated with Atlas: users are managed by Atlas, enabled via Atlas console ● The best tools to work with your data: MongoDB Query language enable flexible and efficient data access; integrates with Compass, MongoDB Shell and MongoDB drivers
  • 34. What MongoDB is working on: Atlas Data Lake (beta) Operational Analytics Aggregations Machine Learning and AIData Lake in-app analytics Transactional in-app analytics Primary Secondary Secondary AnalyticsAnalytics
  • 35. Q&A