SlideShare ist ein Scribd-Unternehmen logo
1 von 27
MongoSF 4/30/2010From MySQL to MongoDB Migrating a Live Application Tony Tam
What is Wordnik Project to track language  like GPS for English Dictionary is a road block to the language Roughly 200 new words created daily Language is not static Capture information about all words Meaning is often undefined in traditional sense Machines can determine meaning through analysis Needs LOTS of data
Why should You care Every Developer can use a Robust Language API! Wordnik migrated to MongoDB > 5 Billion documents > 1.2 TB Zero application downtime Learn from our Experience
Wordnik Not just a website! But we have one Launched Wordnik entirely on MySQL Hit road bumps with insert speed ~4B rows on MyISAMtables Tables locked for 10’s of seconds during inserts But we need more data! Created elaborate update schemes to work around it Lost lots of sleep babysitting servers while researching LT solution
Wordnik + MongoDB What are our storage needs? Database vs. Application Logic No PK/FK constraints No Stored Procedures Consistency? Lots of R&D Tried most all noSQL solutions
Migrating Storage Engines Many parts to this effort Setup & Administration Software Design Optimization Many types of data at Wordnik Corpus Structured HierarchicalData User Data Migrated #1 & #2
Server Infrastructure Wordnik is Heavily Read-only Master / Slave deployment Looking at replica pairs MongoDB loves system resources Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out) Memory + Disk = Happy Mongo Many X the disk space of MySQL Easy pill to swallow until

Server Infrastructure Physical Hardware 2 x 4 core CPU, 32gb RAM, FC SAN Had bad luck on VMs (you might not) Disk speed => performance
Software Design Two distinct use cases for MongoDB Identical structure, different storage engine Same underlying objects, same storage fidelity (largelykey/value) Hierarchical data structure Same underlying objects, document-oriented storage
Software Design Create BasicDBObjects from POJOs and used collection methods BasicDBObjectdbo =  new BasicDBObject("sentence",s.getSentence())  .append("rating",s.getRating()).append(...); ID Generation to manage unique _ID values Analogous to MySQL AutoIncrement behavior Compatible with MySQL Ids (more later) dbo.append("_ID", getId()); collection.save(dbo); Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at runtime
Software Design Key-Value storage use case Easy as implementing new DAOs SentenceHandlerh = new MongoDBSentenceHandler(); Save methods construct BasicDBObject and call save() on collection Implement same interface Same methods against DAO between MySQL and MongoDB versions Data Abstraction 101
Software Design What about bulk inserts? FAF Queued approach Add objects to queue, return to caller Every X seconds, process queue All objects from same collection are appended to a single List<DBObject> Call collection.insert(
) before 2M characters Reduces network overhead Very fast inserts
Software Design Hierarchical Data done more elegantly Wordnik Dictionary Model Java POJOs already had JAXB annotations Part of public REST api Used Mysql 12+ tables 13 DAOs 2500 lines of code 50 requests/second uncached Memcache needed to maintain reasonable speed
Software Design TMGO
Software Design MongoDB’s Document Storage let us
 Turn the Objects into JSON via Jackson Mapper (fasterxml.com) Call save Support all fetch types, enhanced filters 1000 requests / second No explicit caching No less scary code
Software Design Saving a complex object String rawJSON = getMapper().writeValueAsString(veryComplexObject); collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON)); Fetching complex object BasicDBObjectdbo = cursor.next(); ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class); No joins, 20x faster
Migrating Data Migrating => existing data logic Use logic to select DAOs appropriately Read from old, write with new Great system test for MongoDB SentenceHandlermysqlSh = new MySQLSentenceHandler(); SentenceHandlermongoSh = new MongoDbSentenceHandler(); while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next());     ... }
Migrating Data Wordnik moved 5 billion rows from MySQL Sustained 100,000 inserts/second Migration tool was CPU bound ID generation logic, among other Wordnik reads MongoDB fast Read + create java objects @ 250k/second (!)
Going live to Production Choose your use case carefully if migrating incrementally Scary no matter what Test your perf monitoring system first! Use your DAOs from migration Turn on MongoDB on one server, monitor, tune (rollback, repeat) Full switch over when comfortable
Going live to Production Really? SentenceHandlerh = null; if(useMongoDb){ h = new MongoDbSentenceHandler(); } else{ h = new MySQLDbSentenceHandler(); } return h.find(...);
Optimizing Performance Home-grown connection pooling Master only ConnectionManager.getReadWriteConnection() Slave only ConnectionManager.getReadOnlyConnection() Round-robin all servers, bias on slaves ConnectionManager.getConnection()
Optimizing Performance Caching Had complex logic to handle cache invalidation Out-of-process caches are not free MongoDB loves your RAM Let it do your LRU cache (it will anyway) Hardware Do not skimp on your disk or RAM Indexes Schema-less design Even if no values in any document, needs to read document schema to check
Optimizing Performance Disk space Schemaless => schema per document (row) Choose your mappings wisely ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
Optimizing Performance A Typical Day at the Office for MongoDB API call rate: 47.7 calls/sec
Other Tips Data Types Use caution when changing DBObjectobj = cur.next(); long id = (Long) obj.get(“IWasAnIntOnce”) Attribute names Don’t change w/o migrating existing data! WTFDMDG????
What’s next? GridFS Store audio files on disk Requires clustered file system for shared access Capped Collections (rolling out this week) UGC from MySQL => MongoDB Beg/Bribe 10gen for some Features
Questions?

Weitere Àhnliche Inhalte

Was ist angesagt?

Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
Shridhar Joshi
 
Concurrency Patterns with MongoDB
Concurrency Patterns with MongoDBConcurrency Patterns with MongoDB
Concurrency Patterns with MongoDB
Yann Cluchey
 

Was ist angesagt? (20)

Running MongoDB in the Cloud
Running MongoDB in the CloudRunning MongoDB in the Cloud
Running MongoDB in the Cloud
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic ApproachLiving with SQL and NoSQL at craigslist, a Pragmatic Approach
Living with SQL and NoSQL at craigslist, a Pragmatic Approach
 
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DBHow to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
How to migrate your existing MongoDB and Cassandra Apps to Azure Cosmos DB
 
ManetoDB: Key/Value storage, BigData in Open Stack_ХДргДĐč ĐšĐŸĐČалДĐČ, Đ˜Đ»ŃŒŃ ĐĄĐČĐžŃ€ĐžĐŽĐŸĐČ
ManetoDB: Key/Value storage, BigData in Open Stack_ХДргДĐč ĐšĐŸĐČалДĐČ, Đ˜Đ»ŃŒŃ ĐĄĐČĐžŃ€ĐžĐŽĐŸĐČManetoDB: Key/Value storage, BigData in Open Stack_ХДргДĐč ĐšĐŸĐČалДĐČ, Đ˜Đ»ŃŒŃ ĐĄĐČĐžŃ€ĐžĐŽĐŸĐČ
ManetoDB: Key/Value storage, BigData in Open Stack_ХДргДĐč ĐšĐŸĐČалДĐČ, Đ˜Đ»ŃŒŃ ĐĄĐČĐžŃ€ĐžĐŽĐŸĐČ
 
Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012Sphinx at Craigslist in 2012
Sphinx at Craigslist in 2012
 
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)Realtime Search Infrastructure at Craigslist (OpenWest 2014)
Realtime Search Infrastructure at Craigslist (OpenWest 2014)
 
Tweaking perfomance on high-load projects_Đ”ŃƒĐŒĐ°ĐœŃĐșĐžĐč Đ”ĐŒĐžŃ‚Ń€ĐžĐč
Tweaking perfomance on high-load projects_Đ”ŃƒĐŒĐ°ĐœŃĐșĐžĐč Đ”ĐŒĐžŃ‚Ń€ĐžĐčTweaking perfomance on high-load projects_Đ”ŃƒĐŒĐ°ĐœŃĐșĐžĐč Đ”ĐŒĐžŃ‚Ń€ĐžĐč
Tweaking perfomance on high-load projects_Đ”ŃƒĐŒĐ°ĐœŃĐșĐžĐč Đ”ĐŒĐžŃ‚Ń€ĐžĐč
 
Dev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDBDev Jumpstart: Build Your First App with MongoDB
Dev Jumpstart: Build Your First App with MongoDB
 
MySQL And Search At Craigslist
MySQL And Search At CraigslistMySQL And Search At Craigslist
MySQL And Search At Craigslist
 
Lightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at CogentaLightning talk: elasticsearch at Cogenta
Lightning talk: elasticsearch at Cogenta
 
Mongo presentation conf
Mongo presentation confMongo presentation conf
Mongo presentation conf
 
An Introduction to MongoDB Compass
An Introduction to MongoDB CompassAn Introduction to MongoDB Compass
An Introduction to MongoDB Compass
 
Concurrency Patterns with MongoDB
Concurrency Patterns with MongoDBConcurrency Patterns with MongoDB
Concurrency Patterns with MongoDB
 
Fusion-io and MySQL at Craigslist
Fusion-io and MySQL at CraigslistFusion-io and MySQL at Craigslist
Fusion-io and MySQL at Craigslist
 
Nosql why and how on Microsoft Azure
Nosql why and how on Microsoft AzureNosql why and how on Microsoft Azure
Nosql why and how on Microsoft Azure
 
Intergalactic data speak_highload++_20131028
Intergalactic data speak_highload++_20131028Intergalactic data speak_highload++_20131028
Intergalactic data speak_highload++_20131028
 
Node js crash course session 5
Node js crash course   session 5Node js crash course   session 5
Node js crash course session 5
 
tdtechtalk20160330johan
tdtechtalk20160330johantdtechtalk20160330johan
tdtechtalk20160330johan
 
Intro Couchdb
Intro CouchdbIntro Couchdb
Intro Couchdb
 

Ähnlich wie From MySQL to MongoDB at Wordnik (Tony Tam)

Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
Steven Francia
 
Beginning MEAN Stack
Beginning MEAN StackBeginning MEAN Stack
Beginning MEAN Stack
Rob Davarnia
 

Ähnlich wie From MySQL to MongoDB at Wordnik (Tony Tam) (20)

Open source Technology
Open source TechnologyOpen source Technology
Open source Technology
 
Mongo DB at Community Engine
Mongo DB at Community EngineMongo DB at Community Engine
Mongo DB at Community Engine
 
MongoDB at community engine
MongoDB at community engineMongoDB at community engine
MongoDB at community engine
 
Hybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS ApplicationsHybrid MongoDB and RDBMS Applications
Hybrid MongoDB and RDBMS Applications
 
Beginning MEAN Stack
Beginning MEAN StackBeginning MEAN Stack
Beginning MEAN Stack
 
MongoDB presentation
MongoDB presentationMongoDB presentation
MongoDB presentation
 
MongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDBMongoATL: How Sourceforge is Using MongoDB
MongoATL: How Sourceforge is Using MongoDB
 
Node Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js TutorialNode Js, AngularJs and Express Js Tutorial
Node Js, AngularJs and Express Js Tutorial
 
MongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of viewMongoDB vs Mysql. A devops point of view
MongoDB vs Mysql. A devops point of view
 
Experiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure teamExperiences using CouchDB inside Microsoft's Azure team
Experiences using CouchDB inside Microsoft's Azure team
 
MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling MongoDB Introduction and Data Modelling
MongoDB Introduction and Data Modelling
 
SQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDBSQL vs NoSQL, an experiment with MongoDB
SQL vs NoSQL, an experiment with MongoDB
 
Mongodb
MongodbMongodb
Mongodb
 
MongoDB.pptx
MongoDB.pptxMongoDB.pptx
MongoDB.pptx
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Building your first app with MongoDB
Building your first app with MongoDBBuilding your first app with MongoDB
Building your first app with MongoDB
 
Introduction to MongoDB and its best practices
Introduction to MongoDB and its best practicesIntroduction to MongoDB and its best practices
Introduction to MongoDB and its best practices
 
GWT is Smarter Than You
GWT is Smarter Than YouGWT is Smarter Than You
GWT is Smarter Than You
 
How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)How it's made - MyGet (CloudBurst)
How it's made - MyGet (CloudBurst)
 
Introduction to meteor
Introduction to meteorIntroduction to meteor
Introduction to meteor
 

Mehr von MongoSF

Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) 
MongoSF
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
MongoSF
 
C# Development (Sam Corder)
C# Development (Sam Corder)C# Development (Sam Corder)
C# Development (Sam Corder)
MongoSF
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
MongoSF
 
Administration (Eliot Horowitz)
Administration (Eliot Horowitz)Administration (Eliot Horowitz)
Administration (Eliot Horowitz)
MongoSF
 
Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)
MongoSF
 
MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)
MongoSF
 
Administration
AdministrationAdministration
Administration
MongoSF
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
MongoSF
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)
MongoSF
 
Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)
MongoSF
 
Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)
MongoSF
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
MongoSF
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)
MongoSF
 
Zero to Mongo in 60 Hours
Zero to Mongo in 60 HoursZero to Mongo in 60 Hours
Zero to Mongo in 60 Hours
MongoSF
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
MongoSF
 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)
MongoSF
 
Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)
MongoSF
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
MongoSF
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
MongoSF
 

Mehr von MongoSF (20)

Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) 
Webinar: Typische MongoDB AnwendungsfÀlle (Common MongoDB Use Cases) 
 
Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)Schema design with MongoDB (Dwight Merriman)
Schema design with MongoDB (Dwight Merriman)
 
C# Development (Sam Corder)
C# Development (Sam Corder)C# Development (Sam Corder)
C# Development (Sam Corder)
 
Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)Flexible Event Tracking (Paul Gebheim)
Flexible Event Tracking (Paul Gebheim)
 
Administration (Eliot Horowitz)
Administration (Eliot Horowitz)Administration (Eliot Horowitz)
Administration (Eliot Horowitz)
 
Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)Ruby Development and MongoMapper (John Nunemaker)
Ruby Development and MongoMapper (John Nunemaker)
 
MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)MongoHQ (Jason McCay & Ben Wyrosdick)
MongoHQ (Jason McCay & Ben Wyrosdick)
 
Administration
AdministrationAdministration
Administration
 
Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)Sharding with MongoDB (Eliot Horowitz)
Sharding with MongoDB (Eliot Horowitz)
 
Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)Practical Ruby Projects (Alex Sharp)
Practical Ruby Projects (Alex Sharp)
 
Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)Implementing MongoDB at Shutterfly (Kenny Gorman)
Implementing MongoDB at Shutterfly (Kenny Gorman)
 
Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)Debugging Ruby (Aman Gupta)
Debugging Ruby (Aman Gupta)
 
Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)Indexing and Query Optimizer (Aaron Staple)
Indexing and Query Optimizer (Aaron Staple)
 
MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)MongoDB Replication (Dwight Merriman)
MongoDB Replication (Dwight Merriman)
 
Zero to Mongo in 60 Hours
Zero to Mongo in 60 HoursZero to Mongo in 60 Hours
Zero to Mongo in 60 Hours
 
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
Building a Mongo DSL in Scala at Hot Potato (Lincoln Hochberg)
 
PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)PHP Development with MongoDB (Fitz Agard)
PHP Development with MongoDB (Fitz Agard)
 
Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)Java Development with MongoDB (James Williams)
Java Development with MongoDB (James Williams)
 
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
Real time ecommerce analytics with MongoDB at Gilt Groupe (Michael Bryzek & M...
 
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
Map/reduce, geospatial indexing, and other cool features (Kristina Chodorow)
 

KĂŒrzlich hochgeladen

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

KĂŒrzlich hochgeladen (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls đŸ„° 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

From MySQL to MongoDB at Wordnik (Tony Tam)

  • 1. MongoSF 4/30/2010From MySQL to MongoDB Migrating a Live Application Tony Tam
  • 2. What is Wordnik Project to track language like GPS for English Dictionary is a road block to the language Roughly 200 new words created daily Language is not static Capture information about all words Meaning is often undefined in traditional sense Machines can determine meaning through analysis Needs LOTS of data
  • 3. Why should You care Every Developer can use a Robust Language API! Wordnik migrated to MongoDB > 5 Billion documents > 1.2 TB Zero application downtime Learn from our Experience
  • 4. Wordnik Not just a website! But we have one Launched Wordnik entirely on MySQL Hit road bumps with insert speed ~4B rows on MyISAMtables Tables locked for 10’s of seconds during inserts But we need more data! Created elaborate update schemes to work around it Lost lots of sleep babysitting servers while researching LT solution
  • 5. Wordnik + MongoDB What are our storage needs? Database vs. Application Logic No PK/FK constraints No Stored Procedures Consistency? Lots of R&D Tried most all noSQL solutions
  • 6. Migrating Storage Engines Many parts to this effort Setup & Administration Software Design Optimization Many types of data at Wordnik Corpus Structured HierarchicalData User Data Migrated #1 & #2
  • 7. Server Infrastructure Wordnik is Heavily Read-only Master / Slave deployment Looking at replica pairs MongoDB loves system resources Wordnik runs dedicated boxes to avoid other apps being sent to disk (aka time-out) Memory + Disk = Happy Mongo Many X the disk space of MySQL Easy pill to swallow until

  • 8. Server Infrastructure Physical Hardware 2 x 4 core CPU, 32gb RAM, FC SAN Had bad luck on VMs (you might not) Disk speed => performance
  • 9. Software Design Two distinct use cases for MongoDB Identical structure, different storage engine Same underlying objects, same storage fidelity (largelykey/value) Hierarchical data structure Same underlying objects, document-oriented storage
  • 10. Software Design Create BasicDBObjects from POJOs and used collection methods BasicDBObjectdbo = new BasicDBObject("sentence",s.getSentence()) .append("rating",s.getRating()).append(...); ID Generation to manage unique _ID values Analogous to MySQL AutoIncrement behavior Compatible with MySQL Ids (more later) dbo.append("_ID", getId()); collection.save(dbo); Implemented all CRUD methods in DAO Swappable between MongoDB and MySQL at runtime
  • 11. Software Design Key-Value storage use case Easy as implementing new DAOs SentenceHandlerh = new MongoDBSentenceHandler(); Save methods construct BasicDBObject and call save() on collection Implement same interface Same methods against DAO between MySQL and MongoDB versions Data Abstraction 101
  • 12. Software Design What about bulk inserts? FAF Queued approach Add objects to queue, return to caller Every X seconds, process queue All objects from same collection are appended to a single List<DBObject> Call collection.insert(
) before 2M characters Reduces network overhead Very fast inserts
  • 13. Software Design Hierarchical Data done more elegantly Wordnik Dictionary Model Java POJOs already had JAXB annotations Part of public REST api Used Mysql 12+ tables 13 DAOs 2500 lines of code 50 requests/second uncached Memcache needed to maintain reasonable speed
  • 15. Software Design MongoDB’s Document Storage let us
 Turn the Objects into JSON via Jackson Mapper (fasterxml.com) Call save Support all fetch types, enhanced filters 1000 requests / second No explicit caching No less scary code
  • 16. Software Design Saving a complex object String rawJSON = getMapper().writeValueAsString(veryComplexObject); collection.save(newBasicDBOBject(getId(),JSON.parse(rawJSON)); Fetching complex object BasicDBObjectdbo = cursor.next(); ComplexObjectobj = getMapper().readValue(dbo.toString(), ComplexObject.class); No joins, 20x faster
  • 17. Migrating Data Migrating => existing data logic Use logic to select DAOs appropriately Read from old, write with new Great system test for MongoDB SentenceHandlermysqlSh = new MySQLSentenceHandler(); SentenceHandlermongoSh = new MongoDbSentenceHandler(); while(hasMoreData){ mongoSh.asyncWrite(mysqlSh.next()); ... }
  • 18. Migrating Data Wordnik moved 5 billion rows from MySQL Sustained 100,000 inserts/second Migration tool was CPU bound ID generation logic, among other Wordnik reads MongoDB fast Read + create java objects @ 250k/second (!)
  • 19. Going live to Production Choose your use case carefully if migrating incrementally Scary no matter what Test your perf monitoring system first! Use your DAOs from migration Turn on MongoDB on one server, monitor, tune (rollback, repeat) Full switch over when comfortable
  • 20. Going live to Production Really? SentenceHandlerh = null; if(useMongoDb){ h = new MongoDbSentenceHandler(); } else{ h = new MySQLDbSentenceHandler(); } return h.find(...);
  • 21. Optimizing Performance Home-grown connection pooling Master only ConnectionManager.getReadWriteConnection() Slave only ConnectionManager.getReadOnlyConnection() Round-robin all servers, bias on slaves ConnectionManager.getConnection()
  • 22. Optimizing Performance Caching Had complex logic to handle cache invalidation Out-of-process caches are not free MongoDB loves your RAM Let it do your LRU cache (it will anyway) Hardware Do not skimp on your disk or RAM Indexes Schema-less design Even if no values in any document, needs to read document schema to check
  • 23. Optimizing Performance Disk space Schemaless => schema per document (row) Choose your mappings wisely ({veryLongAttributeName:true}) => more disk space than ({vlan:true})
  • 24. Optimizing Performance A Typical Day at the Office for MongoDB API call rate: 47.7 calls/sec
  • 25. Other Tips Data Types Use caution when changing DBObjectobj = cur.next(); long id = (Long) obj.get(“IWasAnIntOnce”) Attribute names Don’t change w/o migrating existing data! WTFDMDG????
  • 26. What’s next? GridFS Store audio files on disk Requires clustered file system for shared access Capped Collections (rolling out this week) UGC from MySQL => MongoDB Beg/Bribe 10gen for some Features