SlideShare ist ein Scribd-Unternehmen logo
1 von 18
Yuri Finkelstein
Lead Platform Services Architect
yfinkelstein@ebay.com
John Feibusch
Lead DBA Engineer
jfeibusc@ebay.com
May 2013
About eBay Platform Services
 Platform Services is an org within a larger eBay Platform
org which is responsible for developing and operating
common services that are used by Web Application
running on eBay Platform
• Media Storage platform services: image blob and metadata
• Unified Monitoring platform: logs and metrics
• User Behavior Tracking
• Ad Content management and analytics
• Messaging and other middleware services
Platform Services and Media Metadata Service
Requirements
 Platform Services is a DevOps organization
• We develop, we test, we deploy, we operate, we monitor
• Whatever we are responsible for, we own and understand at the depth
of the entire stack
• Therefore, we require transparency of the components we build on
• Transparency at the level of source code visibility is ideal
Key Requirements
 Key requirements of Media Metadata Service
• 99.999% availability
• Strictly defined invocation latency @95 %
• Simultaneous operation in multiple data centers with short replication
latency
• Reliable writes: synchronous writes to at least 2 nodes.
• Read-write workload with reads / write ~= 10/1
• Agility, fluid metadata content; constantly changing business
requirements
• Terabyte scale, billions of small entities to store and query
• Scalability at extreme: number of pictures on eBay is constantly growing
Enters MongoDB
 We have been operating MongoDB in this
project for over a year now
 Sharded cluster in 2 data centers
 Service nodes are built in Java and use
Morphia and Mongo driver
 MongoS runs on the service nodes
 1st year we were maturing the cluster for
writes only; this year we are taking reads
 Reads are from the user-facing web
applications with strong SLA requirements
 For reads, client first sets SlaveOK=true
and if required document is not found flips
to SlaveOK=false to read from Primary
---- Shards -----
---Replicas--->
P P P
H H H
---DC1--->--DC2-->
S
S
S
S
S
S
S
Morphia
Service Layer
Mongo Driver
MongoS
Metadata Service
Node
S – service instance; P – primary
mongod; H – hidden member
Centralized MongoDB configuration store
 Our MongoDB deployment package is based on
custom-build RPM and contains heavy customization
scripts
 One of them is responsible for fetching configuration for
the node it’s running on from a remote configuration
repository at start-up time
 Benefits:
• Can change MongoDB configuration instantly on arbitrary
large number of nodes
• Can change local system settings affecting MongoDB:
read-ahead –settings on block devices and IO scheduler
• Can relocate replica set members across machines (subject
to data migration)
• Consistent inventory tracking, visibility into config settings
on any Mongo machine
Central
MongoDB
Config
Repository
P P P
@ startup time
Upstart
 Upstart is a replacement for init.d; developed for Ubuntu, also used in
RHEL 6
 Can automatically start our monitoring agent whenever mongod starts.
Handles multiple mongod instances well
 Example:
 sudo start mongod interface=0
 Future: Upstart can be controlled by Puppet.
Run multiple MongoD instances on the same machine
 Starting to run multiple mongod processes on one node
 Instead of using multiple ports we create multiple virtual interfaces on a
single host and register them in DNS as if they were real IP addresses
 MongoD supports bind_ip which makes it possible to bind to a specific
virtual interface
 Why virtual interfaces ?
 So that DB hosts can be moved with just a DNS change
 Why do we want to run multiple MongoD on a single host?
 On large machines with lots of disk IO and storage capacity mongod can not
utilize all IO resources
 Running multiple shards on the same machine reduces data granularity and
reduces the scope of each write lock.
 This works well only when multiple MongoD on the same machine have similar
workload
Home grow MongoDB monitoring system
 Home grown agent runs on
each MongoDB host and
collects very specific metrics
that are not available in
MMS:
• Per block-device disk write
latency and disk IOPS
• Details of per-collection
MongoDB metrics
 Can overlay multiple graphs
form RS members on the
same chart
 GLE latency – very important
since we are doing
• getLastError ({w:2})
Media Metadata Service: Data Model
 2 main collections: Item and Image
• Item references multiple Images
 Item represents eBay Item:
• _id in Item is external ID of the item in eBay site DB
• These IDs are already sharded in balanced across N
logical DB hosts using ID ranges
• We use MongoDB pre-split points for initial
mapping our N site DB shards to M MongoDB shards
• This ensures good balance between the shards;
 Image represents a picture attached to an
Item
• _id in Image is based on modified ObjectID of Mongo
• This ensures good distribution across any number of
shards
 Our choice of document IDs in both
collections ensures good balance across
Mongo shards
Problem #1: What should be the ID for the documents?
 ObjectId is not a good shard key for sharded collection as
timestamp occupies the first 4 bytes.
 Problem: how should the app generate the ID when this is
required?
 Requirements:
• Even distribution across shards both long term and short
term
• Localization of the placement of the indexed _id values in the
B-Tree – minimize the chance of page fault on the index page
and increase the chance of collation of the dirty pages in page
cache to reduce the amount of random IO when flushing pages
to diss
• Compactness in size is always good to preserve space
 One possible solution: 6 byte ID in the following order
• 1 byte – rotating sequence ID incremented by each writer on
every document
• 1 byte – writer ID; assuming number of writers < 256
• 4 byte – timestamp in seconds
 Works with limitation that each writer can not insert more
than 256 documents per second
TTimestam
p
MachineID SequenceNo
MongoDB ObjectId():
4 4 4
SequenceNo WriterID TTimestamp
1 1 4
Shard-Friendly ID:
Shard Friednly ID details
Time
Seq=0
Seq=16
6-byte ID value
Seq=255
ff …
0f…
00…
55…
aa…
N-th min N-th+1 min
20 contiguous
ranges for each
sequence
Let’s say we have 20 writers and 3
shards
Number of contiguous intervals in
each shard:
256/3 * 20 = 1100
Worse case scenario: each
contiguous range requires a
separate IO. At 200 IOPS:
~5 sec to flush it
In reality it’s much better because
of 4 k pages
Rate of writes 256 docs/sec
Number of dirty locations over 1
minute: 256 * 60 * 20 = 307,000
So, if _id was md5 or some other
random value generator with
~perfect distribution this would
require 300 times more IOPS
Problem #2: md5 lookup problem
 Md5 is a digest of the image content; used for de-
dupe
 Requirement: find image documents with a given
md5 val
 Option 1: secondary index on the image
documents; does not work because:
• Large DB, random reads cause disk IO
• Image collections is sharded by image ID;
forced to query all shards
 Option 2: Stand-alone replica set (cache)
• Works since data is compact and fits in RAM;
no disk IO
• How do we store md5->image IDs in Mongo?
• Option 2.1: As an array
 Does not work well since when refs are added
documents will grow and relocate.
• Option 2.2: Single Binary Packed into an ID
 Works; lookup is based on prefix search and
covering index
{
_id:Binary(md5),
ref: [ref1, ref2, ref3 …]
}
{
_id:Binary(md5|ref)
}
Query:
Db.coll.find (
{
_id: {$gt : Binary(md5|0x0000)}
},
{ _id : 1}
)
Problem #3: Item’s main picture size lookup
 Image document has image dimensions:
width and height
 Item document references N pictures; one of
them is main
 Problem: lookup image dimensions of the
item’s main picture for 50 item documents at
once with SLA for latency < 20 msec
 It’s a variation of Problem #2 except it’s
worse because ItemID and image
dimensions are in different documents and
50 lookups at once are required
 Again we need a dedicated replica set
 Option 1: prefix search with $or and $and
 Option 2: just query by _id
 Option 3: query by id but on another
compound index: {_id:1, wh:1}
 Winner is option #3! Hint: covering index
{_id:Binary(item|WxH) }
Query:
Db.coll.find ({
$or: [
{_id: {$gt : Binary(id1|0x0000),
{$lt : Binary(id1|0xffff)}
},
{_id: {$gt : Binary(id2|0x0000),
{$lt : Binary(id2|0xffff)}
},
…
]})
{ _id:item, wh:WxH }
Query:
Db.coll.find (
{ _id : {$in : [item1, item2, .]})
{ _id:item, wh:WxH }
Query:
Db.coll.find (
{ _id : {$in : [item1, item2, .]})
.hint({_id:1, wh:1})
Problem #4: Periodic export to Hadoop
 Problem: daily copy of the new or
updated documents to Hadoop
 Option 1: service does 2 writes: to
mongo and to hadoop
• Does not work since Hadoop is not an
online system
 Option 2: secondary index on
lastUpdated (date); then query on
lastUpdated > T
• Does not work well since updating indexed
lastUdated is costly; also consuming a
large number of docs from a live cluster is
disruptive to latency SLAs
 Option 3: OpLog replication
• Winner:
 decouples export from site activity,
 Makes lastUpdated index unccessary
P P P
Problem:
P P P
OpLog
Listener
??
Problem #5: What’s the fastest way to perform
a full scan?
 Problem: you have a huge database/collection,
with terabytes of data and billions of documents
 You need to perform a form of batch processing
on all the documents and you want the fastest
pipe out of mongo
 Option 1: Do it on a live node as it’s serving traffic
• Does not work well when the node is busy
• Also – data consistency may be an issue
 Ok, need to take the node off-line
 Option 2: execute a natural-order scan:
• Natural order cursor
• Works, but slow; lot’s synchronization between two
sides
 Option 3: N cursors using range query on _id or
any other indexed field
• Slow in general case when order of indexed values
on B-Tree and order on disk do not match
 Option 4: N natural-order cursors
One cursor:
db.collection.find
({}, {$natural: 1})
N cursors:
db.collection.find
({}, {$natural: 1})
.skip (i*N)
.limit (N)
Summary
 We are running MongoDB in a demanding environment where it’s
exposed to business sensitive online applications
 It seems to be reliable – this is what matters
 It has lots of features and gives the user lots of option to choose from
 It’s the user’s depth of understanding of the product and desire to
have visibility into every aspect of its performance that will determine
when a particular use case will be a success or not
Questions?
 Thank you!
 Btw, if any of this sounds interesting, we have lots of
similar challenges to work on. So, you know the drill:
yfinkelstein at ebay dot com

Weitere ähnliche Inhalte

Was ist angesagt?

Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBMongoDB
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineNorberto Leite
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDBMongoDB
 
eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas MongoDB
 
A Mobile-First, Cloud-First Stack at Pearson
A Mobile-First, Cloud-First Stack at PearsonA Mobile-First, Cloud-First Stack at Pearson
A Mobile-First, Cloud-First Stack at PearsonMongoDB
 
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...MongoDB
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBMongoDB
 
MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016Norberto Leite
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBRick Copeland
 
Maximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMaximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMongoDB
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...Prasoon Kumar
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB
 
App Sharding to Autosharding at Sailthru
App Sharding to Autosharding at SailthruApp Sharding to Autosharding at Sailthru
App Sharding to Autosharding at SailthruMongoDB
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkMongoDB
 
Experian Health: Moving Universal Identity Manager from ANSI SQL to MongoDB
Experian Health: Moving Universal Identity Manager from ANSI SQL to MongoDBExperian Health: Moving Universal Identity Manager from ANSI SQL to MongoDB
Experian Health: Moving Universal Identity Manager from ANSI SQL to MongoDBMongoDB
 
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB
 

Was ist angesagt? (20)

Overcoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDBOvercoming Today's Data Challenges with MongoDB
Overcoming Today's Data Challenges with MongoDB
 
MongoDB: Agile Combustion Engine
MongoDB: Agile Combustion EngineMongoDB: Agile Combustion Engine
MongoDB: Agile Combustion Engine
 
When to Use MongoDB
When to Use MongoDBWhen to Use MongoDB
When to Use MongoDB
 
eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas eHarmony - Messaging Platform with MongoDB Atlas
eHarmony - Messaging Platform with MongoDB Atlas
 
Mongo db 3.4 Overview
Mongo db 3.4 OverviewMongo db 3.4 Overview
Mongo db 3.4 Overview
 
A Mobile-First, Cloud-First Stack at Pearson
A Mobile-First, Cloud-First Stack at PearsonA Mobile-First, Cloud-First Stack at Pearson
A Mobile-First, Cloud-First Stack at Pearson
 
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
Replacing Traditional Technologies with MongoDB: A Single Platform for All Fi...
 
Webinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDBWebinar: An Enterprise Architect’s View of MongoDB
Webinar: An Enterprise Architect’s View of MongoDB
 
MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016MongoDB Certification Study Group - May 2016
MongoDB Certification Study Group - May 2016
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
Maximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWSMaximizing MongoDB Performance on AWS
Maximizing MongoDB Performance on AWS
 
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
MongoDB Introduction talk at Dr Dobbs Conference, MongoDB Evenings at Bangalo...
 
MongoDB Operations for Developers
MongoDB Operations for DevelopersMongoDB Operations for Developers
MongoDB Operations for Developers
 
App Sharding to Autosharding at Sailthru
App Sharding to Autosharding at SailthruApp Sharding to Autosharding at Sailthru
App Sharding to Autosharding at Sailthru
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
MongoDB Days Silicon Valley: Jumpstart: The Right and Wrong Use Cases for Mon...
 
Blazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & SparkBlazing Fast Analytics with MongoDB & Spark
Blazing Fast Analytics with MongoDB & Spark
 
Experian Health: Moving Universal Identity Manager from ANSI SQL to MongoDB
Experian Health: Moving Universal Identity Manager from ANSI SQL to MongoDBExperian Health: Moving Universal Identity Manager from ANSI SQL to MongoDB
Experian Health: Moving Universal Identity Manager from ANSI SQL to MongoDB
 
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax MongoDB and RDBMS: Using Polyglot Persistence at Equifax
MongoDB and RDBMS: Using Polyglot Persistence at Equifax
 
MongoDB on Azure
MongoDB on AzureMongoDB on Azure
MongoDB on Azure
 

Ähnlich wie Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay

Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónMongoDB
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesMongoDB
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB
 
NoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in ProductionNoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in ProductionBit Zesty
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introductionSandeep Singh
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caringsporst
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongoDB
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware ProvisioningMongoDB
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At CraigslistJeremy Zawodny
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsDataStax
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Bhupesh Bansal
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop User Group
 

Ähnlich wie Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay (20)

Conceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producciónConceptos básicos. Seminario web 6: Despliegue de producción
Conceptos básicos. Seminario web 6: Despliegue de producción
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Hadoop introduction
Hadoop introductionHadoop introduction
Hadoop introduction
 
MongoDB at Gilt Groupe
MongoDB at Gilt GroupeMongoDB at Gilt Groupe
MongoDB at Gilt Groupe
 
NoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in ProductionNoSQLEU: Different NoSQL tools in Production
NoSQLEU: Different NoSQL tools in Production
 
MongoDB
MongoDBMongoDB
MongoDB
 
Hadoop training in bangalore
Hadoop training in bangaloreHadoop training in bangalore
Hadoop training in bangalore
 
Hadoop ppt1
Hadoop ppt1Hadoop ppt1
Hadoop ppt1
 
Hadoop-Quick introduction
Hadoop-Quick introductionHadoop-Quick introduction
Hadoop-Quick introduction
 
List of Engineering Colleges in Uttarakhand
List of Engineering Colleges in UttarakhandList of Engineering Colleges in Uttarakhand
List of Engineering Colleges in Uttarakhand
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
Hadoop.pptx
Hadoop.pptxHadoop.pptx
Hadoop.pptx
 
ShaREing Is Caring
ShaREing Is CaringShaREing Is Caring
ShaREing Is Caring
 
Mongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-finalMongodb at-gilt-groupe-seattle-2012-09-14-final
Mongodb at-gilt-groupe-seattle-2012-09-14-final
 
MongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOLMongoDB: How We Did It – Reanimating Identity at AOL
MongoDB: How We Did It – Reanimating Identity at AOL
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural LessonsCassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
Cassandra Community Webinar: From Mongo to Cassandra, Architectural Lessons
 
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
Voldemort & Hadoop @ Linkedin, Hadoop User Group Jan 2010
 
Hadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedInHadoop and Voldemort @ LinkedIn
Hadoop and Voldemort @ LinkedIn
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Kürzlich hochgeladen

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Kürzlich hochgeladen (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Storing eBay's Media Metadata on MongoDB, by Yuri Finkelstein, Architect, eBay

  • 1. Yuri Finkelstein Lead Platform Services Architect yfinkelstein@ebay.com John Feibusch Lead DBA Engineer jfeibusc@ebay.com May 2013
  • 2. About eBay Platform Services  Platform Services is an org within a larger eBay Platform org which is responsible for developing and operating common services that are used by Web Application running on eBay Platform • Media Storage platform services: image blob and metadata • Unified Monitoring platform: logs and metrics • User Behavior Tracking • Ad Content management and analytics • Messaging and other middleware services
  • 3. Platform Services and Media Metadata Service Requirements  Platform Services is a DevOps organization • We develop, we test, we deploy, we operate, we monitor • Whatever we are responsible for, we own and understand at the depth of the entire stack • Therefore, we require transparency of the components we build on • Transparency at the level of source code visibility is ideal
  • 4. Key Requirements  Key requirements of Media Metadata Service • 99.999% availability • Strictly defined invocation latency @95 % • Simultaneous operation in multiple data centers with short replication latency • Reliable writes: synchronous writes to at least 2 nodes. • Read-write workload with reads / write ~= 10/1 • Agility, fluid metadata content; constantly changing business requirements • Terabyte scale, billions of small entities to store and query • Scalability at extreme: number of pictures on eBay is constantly growing
  • 5. Enters MongoDB  We have been operating MongoDB in this project for over a year now  Sharded cluster in 2 data centers  Service nodes are built in Java and use Morphia and Mongo driver  MongoS runs on the service nodes  1st year we were maturing the cluster for writes only; this year we are taking reads  Reads are from the user-facing web applications with strong SLA requirements  For reads, client first sets SlaveOK=true and if required document is not found flips to SlaveOK=false to read from Primary ---- Shards ----- ---Replicas---> P P P H H H ---DC1--->--DC2--> S S S S S S S Morphia Service Layer Mongo Driver MongoS Metadata Service Node S – service instance; P – primary mongod; H – hidden member
  • 6. Centralized MongoDB configuration store  Our MongoDB deployment package is based on custom-build RPM and contains heavy customization scripts  One of them is responsible for fetching configuration for the node it’s running on from a remote configuration repository at start-up time  Benefits: • Can change MongoDB configuration instantly on arbitrary large number of nodes • Can change local system settings affecting MongoDB: read-ahead –settings on block devices and IO scheduler • Can relocate replica set members across machines (subject to data migration) • Consistent inventory tracking, visibility into config settings on any Mongo machine Central MongoDB Config Repository P P P @ startup time
  • 7. Upstart  Upstart is a replacement for init.d; developed for Ubuntu, also used in RHEL 6  Can automatically start our monitoring agent whenever mongod starts. Handles multiple mongod instances well  Example:  sudo start mongod interface=0  Future: Upstart can be controlled by Puppet.
  • 8. Run multiple MongoD instances on the same machine  Starting to run multiple mongod processes on one node  Instead of using multiple ports we create multiple virtual interfaces on a single host and register them in DNS as if they were real IP addresses  MongoD supports bind_ip which makes it possible to bind to a specific virtual interface  Why virtual interfaces ?  So that DB hosts can be moved with just a DNS change  Why do we want to run multiple MongoD on a single host?  On large machines with lots of disk IO and storage capacity mongod can not utilize all IO resources  Running multiple shards on the same machine reduces data granularity and reduces the scope of each write lock.  This works well only when multiple MongoD on the same machine have similar workload
  • 9. Home grow MongoDB monitoring system  Home grown agent runs on each MongoDB host and collects very specific metrics that are not available in MMS: • Per block-device disk write latency and disk IOPS • Details of per-collection MongoDB metrics  Can overlay multiple graphs form RS members on the same chart  GLE latency – very important since we are doing • getLastError ({w:2})
  • 10. Media Metadata Service: Data Model  2 main collections: Item and Image • Item references multiple Images  Item represents eBay Item: • _id in Item is external ID of the item in eBay site DB • These IDs are already sharded in balanced across N logical DB hosts using ID ranges • We use MongoDB pre-split points for initial mapping our N site DB shards to M MongoDB shards • This ensures good balance between the shards;  Image represents a picture attached to an Item • _id in Image is based on modified ObjectID of Mongo • This ensures good distribution across any number of shards  Our choice of document IDs in both collections ensures good balance across Mongo shards
  • 11. Problem #1: What should be the ID for the documents?  ObjectId is not a good shard key for sharded collection as timestamp occupies the first 4 bytes.  Problem: how should the app generate the ID when this is required?  Requirements: • Even distribution across shards both long term and short term • Localization of the placement of the indexed _id values in the B-Tree – minimize the chance of page fault on the index page and increase the chance of collation of the dirty pages in page cache to reduce the amount of random IO when flushing pages to diss • Compactness in size is always good to preserve space  One possible solution: 6 byte ID in the following order • 1 byte – rotating sequence ID incremented by each writer on every document • 1 byte – writer ID; assuming number of writers < 256 • 4 byte – timestamp in seconds  Works with limitation that each writer can not insert more than 256 documents per second TTimestam p MachineID SequenceNo MongoDB ObjectId(): 4 4 4 SequenceNo WriterID TTimestamp 1 1 4 Shard-Friendly ID:
  • 12. Shard Friednly ID details Time Seq=0 Seq=16 6-byte ID value Seq=255 ff … 0f… 00… 55… aa… N-th min N-th+1 min 20 contiguous ranges for each sequence Let’s say we have 20 writers and 3 shards Number of contiguous intervals in each shard: 256/3 * 20 = 1100 Worse case scenario: each contiguous range requires a separate IO. At 200 IOPS: ~5 sec to flush it In reality it’s much better because of 4 k pages Rate of writes 256 docs/sec Number of dirty locations over 1 minute: 256 * 60 * 20 = 307,000 So, if _id was md5 or some other random value generator with ~perfect distribution this would require 300 times more IOPS
  • 13. Problem #2: md5 lookup problem  Md5 is a digest of the image content; used for de- dupe  Requirement: find image documents with a given md5 val  Option 1: secondary index on the image documents; does not work because: • Large DB, random reads cause disk IO • Image collections is sharded by image ID; forced to query all shards  Option 2: Stand-alone replica set (cache) • Works since data is compact and fits in RAM; no disk IO • How do we store md5->image IDs in Mongo? • Option 2.1: As an array  Does not work well since when refs are added documents will grow and relocate. • Option 2.2: Single Binary Packed into an ID  Works; lookup is based on prefix search and covering index { _id:Binary(md5), ref: [ref1, ref2, ref3 …] } { _id:Binary(md5|ref) } Query: Db.coll.find ( { _id: {$gt : Binary(md5|0x0000)} }, { _id : 1} )
  • 14. Problem #3: Item’s main picture size lookup  Image document has image dimensions: width and height  Item document references N pictures; one of them is main  Problem: lookup image dimensions of the item’s main picture for 50 item documents at once with SLA for latency < 20 msec  It’s a variation of Problem #2 except it’s worse because ItemID and image dimensions are in different documents and 50 lookups at once are required  Again we need a dedicated replica set  Option 1: prefix search with $or and $and  Option 2: just query by _id  Option 3: query by id but on another compound index: {_id:1, wh:1}  Winner is option #3! Hint: covering index {_id:Binary(item|WxH) } Query: Db.coll.find ({ $or: [ {_id: {$gt : Binary(id1|0x0000), {$lt : Binary(id1|0xffff)} }, {_id: {$gt : Binary(id2|0x0000), {$lt : Binary(id2|0xffff)} }, … ]}) { _id:item, wh:WxH } Query: Db.coll.find ( { _id : {$in : [item1, item2, .]}) { _id:item, wh:WxH } Query: Db.coll.find ( { _id : {$in : [item1, item2, .]}) .hint({_id:1, wh:1})
  • 15. Problem #4: Periodic export to Hadoop  Problem: daily copy of the new or updated documents to Hadoop  Option 1: service does 2 writes: to mongo and to hadoop • Does not work since Hadoop is not an online system  Option 2: secondary index on lastUpdated (date); then query on lastUpdated > T • Does not work well since updating indexed lastUdated is costly; also consuming a large number of docs from a live cluster is disruptive to latency SLAs  Option 3: OpLog replication • Winner:  decouples export from site activity,  Makes lastUpdated index unccessary P P P Problem: P P P OpLog Listener ??
  • 16. Problem #5: What’s the fastest way to perform a full scan?  Problem: you have a huge database/collection, with terabytes of data and billions of documents  You need to perform a form of batch processing on all the documents and you want the fastest pipe out of mongo  Option 1: Do it on a live node as it’s serving traffic • Does not work well when the node is busy • Also – data consistency may be an issue  Ok, need to take the node off-line  Option 2: execute a natural-order scan: • Natural order cursor • Works, but slow; lot’s synchronization between two sides  Option 3: N cursors using range query on _id or any other indexed field • Slow in general case when order of indexed values on B-Tree and order on disk do not match  Option 4: N natural-order cursors One cursor: db.collection.find ({}, {$natural: 1}) N cursors: db.collection.find ({}, {$natural: 1}) .skip (i*N) .limit (N)
  • 17. Summary  We are running MongoDB in a demanding environment where it’s exposed to business sensitive online applications  It seems to be reliable – this is what matters  It has lots of features and gives the user lots of option to choose from  It’s the user’s depth of understanding of the product and desire to have visibility into every aspect of its performance that will determine when a particular use case will be a success or not
  • 18. Questions?  Thank you!  Btw, if any of this sounds interesting, we have lots of similar challenges to work on. So, you know the drill: yfinkelstein at ebay dot com

Hinweis der Redaktion

  1. Show app servers and mongos on them
  2. Fix md5-&gt;new document ID
  3. 3 shard20 writers