SlideShare ist ein Scribd-Unternehmen logo
1 von 19
Downloaden Sie, um offline zu lesen
Just-In-Time Scalability: Agile
Methods to Support Massive
           Growth
What is IMVU?

                 
Behind the scenes...

                       IMVU is LAMP, plus...
                        • Perlbal
                        • Memcached
                        • Solr
                        • MogileFS
                        • plus...
                                                      •   ADODB
                                                      •   b2evolution
                   •   Audiere
                                   •   BuildBot       •   Coppermine
                   •   Boost
                                   •   eAccelerator •     feed2js
                   •   Cal3D 
                                   •   Linux (Debian) •   FreeTag
                   •   CFL
                                   •   memcached •        Incutio XML-RPC
                   •   NSIS
                                   •   Nagios         •   jrcache
                   •   Pixomatic
                                   •   Perl           •   JSON-PHP
                   •   Python
                                   •   Roundup        •   Magpie
                   •   pywin32
                                   •   rrd            •   osCommerce
                   •   SCons
                                   •   Subversion     •   phpBB
                   •   wxPython
                                                      •   Phorum
                                                      •   SimpleTest
                                                      •   Selenium
Before and After Architecture

Before                            After

We started with a small site, a   We ended with a large site, a
mess of open source, and a        medium sized team, and an
small team that didn't know       architecture that has scaled. 
much about scaling. 




We never stopped. We used a roadmap and a compass, made
weekly changes in direction, regularly shipped code on
Wednesday to handle the next weekend's capacity constraints,
and shipped new features the whole time.  
Before and After Architecture (1/4)




                November
Before and After Architecture (2/4)




                December
Before and After Architecture (3/4)




                February
Before and After Architecture (4/4)




                May
Advanced planning vs. fast response
       “Rocket ship”                   “Driving”

• Figure out in advance what   • Continuously figure out
  is going to go wrong           what is going to go wrong
                                 soon
• Build a plan that prevents
  those things from            • Quickly fix it, without
  happening                      breaking something else
• Execute your plan            • Get feedback along the
                                 way
• Get feedback when done
Questions to ask
       “Rocket ship”                  “Driving”

• Are you sure you know      • How do you know you will
  what is going to happen?     be able to fix the problem
                               in time?
• Are you sure you can
                             • How can you be sure you
  execute?
                               won't cause collateral
• Can you afford it?           damage?
• Do you need feedback?      • How can you be sure you
                               won't code yourself into a
                               corner?
Continuous Ship
• Deploy new software quickly
   •   At IMVU time from check-in to production = 20 minutes

• Tell a good change from a bad change (quickly)

• Revert a bad change quickly

• Work in small batches
   •   At IMVU, a large batch = 3 days worth of work

• Break large projects down into small batches

• Don't have the same problem twice – fix the root cause of each
  class of problems

 IMVU pushes code to production 20-30 times every day
Cluster Immune System
What it looks like to ship one piece of code to production:
 • Run tests locally (SimpleTest, Selenium)
         Everyone has a complete sandbox
     o


 • Continuous Integration Server (BuildBot)
    o All tests must pass or “shut down the line”
         Automatic feedback if the team is going too fast
     o


 • Incremental deploy
         Monitor cluster and business metrics in real-time
     o
         Reject changes that move metrics out-of-bounds
     o


 • Alerting & Predictive monitoring (Nagios)
         Monitor all metrics that stakeholders care about
     o
         If any metric goes out-of-bounds, wake somebody up
     o
         Use historical trends to predict acceptable bounds
     o


 When customers see a failure:
         Fix the problem for customers
     o
         Improve your defenses at each level
     o
Case Study: Sharding

Problem: Spread write queries across multiple databases

Solution:
•Intercept and redirect queries based on SQL comments
• Move one table or sub-system at a time
   • Our experience was one engineer horizontally partitions one table or
     small sub-system in one week

•New engineers figure this out in about 5 minutes
db_query(“INSERT INTO inventory (customers_id, products_id)
          VALUES ($customer_id, $product_id)quot;);

db_query(quot;/*shard customer://$customer_id */
          INSERT INTO inventory (customers_id, products_id)
          VALUES ($customer_id, $product_id)quot;);

•Learning: cross shard joins & transactions aren’t required
Case Study: Caching
Problem: Cache frequently read data to memcached

Solution:
•Intercept and cache queries based on SQL comments
db_query_cache(BUDDY_CACHE_TIME,
              quot;/*shard customer://$customer_id */
               /*cache-class customer://$customer_id/buddies */
               SELECT friend_id, buddy_order FROM customers_friends
               WHERE customers_id=$customer_idquot;);

-----------------

db_query(“/*shard customer://$customer_id */
          DELETE FROM customers_friends
          WHERE customers_id = $customer_id
          AND friend_id = $friend_id”);
db_flush_cacheclass(quot;customer://$customer_id/buddies”);


•Learning: Flushing cache critical to users and performance
   –When a customer spends $24.95, they want the benefits immediately

•Learning: Test the cache behavior for critical systems
Case Study: Steering Data Design

Problem: Improve database schemas and data design to meet
scalability requirements without downtime

Solution:
•Measure to find the real problems (harder than it sounds)
•Migrate to new design that takes advantage of sharding and/or
caching
Case Study: Steering Data Design
Case Study: Steering Data Design
Case Study: Steering Data Design
Problem: You can’t bulk move large frequently accessed data
Solution:
•Copy on read
   –Use when you are read bound
   –Reads check cache, new location, and copy to new location if missing
   –Writes go to new location if data has been migrated, otherwise old

•Copy on write
   –Use when you are write bound
   –Reads check cache, new location, then old location
   –Writes go to new location, copying to new location if missing

•Copy all
   –Use when file system fills up
   –Reads & writes go to new location, falling back to old location if missing
   –Cron copies data a few records at a time
“Thank You for Listening!”

Weitere ähnliche Inhalte

Ähnlich wie Just In Time Scalability Agile Methods To Support Massive Growth Presentation

Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Eric Ries
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
adunne
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
Ross Lawley
 
Introduction to JRuby
Introduction to JRubyIntroduction to JRuby
Introduction to JRuby
Amit Solanki
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupal
Ronan Berder
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
oscon2007
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Cluster
guestd34230
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile Development
Gabriele Lana
 
Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009
Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009
Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009
Scott Porad
 

Ähnlich wie Just In Time Scalability Agile Methods To Support Massive Growth Presentation (20)

Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory CourseRuby on Rails 101 - Presentation Slides for a Five Day Introductory Course
Ruby on Rails 101 - Presentation Slides for a Five Day Introductory Course
 
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code QualityHigh-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
High-Octane Dev Teams: Three Things You Can Do To Improve Code Quality
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Web 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web AppsWeb 2.0 Performance and Reliability: How to Run Large Web Apps
Web 2.0 Performance and Reliability: How to Run Large Web Apps
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 
Introduction to JRuby
Introduction to JRubyIntroduction to JRuby
Introduction to JRuby
 
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
Peer Code Review: In a Nutshell and The Tantric Team: Getting Your Automated ...
 
Why Architecture in Web Development matters
Why Architecture in Web Development mattersWhy Architecture in Web Development matters
Why Architecture in Web Development matters
 
Agile Development Methodologies
Agile Development MethodologiesAgile Development Methodologies
Agile Development Methodologies
 
Performance and scalability with drupal
Performance and scalability with drupalPerformance and scalability with drupal
Performance and scalability with drupal
 
Os Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman SwpOs Fitzpatrick Sussman Swp
Os Fitzpatrick Sussman Swp
 
Modern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On RailsModern Webdevelopment With Ruby On Rails
Modern Webdevelopment With Ruby On Rails
 
Continuous integration at CartoDB March '16
Continuous integration at CartoDB March '16Continuous integration at CartoDB March '16
Continuous integration at CartoDB March '16
 
Multi Core Playground
Multi Core PlaygroundMulti Core Playground
Multi Core Playground
 
From One to a Cluster
From One to a ClusterFrom One to a Cluster
From One to a Cluster
 
Elite Bug Squashing
Elite Bug SquashingElite Bug Squashing
Elite Bug Squashing
 
Sustainable Agile Development
Sustainable Agile DevelopmentSustainable Agile Development
Sustainable Agile Development
 
The Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With RubyThe Current State of Asynchronous Processing With Ruby
The Current State of Asynchronous Processing With Ruby
 
Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009
Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009
Lazy, Messy, Backwards - Scott Porad, Start Pad, June 2009
 

Kürzlich hochgeladen

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Kürzlich hochgeladen (20)

Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Just In Time Scalability Agile Methods To Support Massive Growth Presentation

  • 1. Just-In-Time Scalability: Agile Methods to Support Massive Growth
  • 3. Behind the scenes... IMVU is LAMP, plus... • Perlbal • Memcached • Solr • MogileFS • plus... • ADODB • b2evolution • Audiere • BuildBot • Coppermine • Boost • eAccelerator • feed2js • Cal3D  • Linux (Debian) • FreeTag • CFL • memcached • Incutio XML-RPC • NSIS • Nagios • jrcache • Pixomatic • Perl • JSON-PHP • Python • Roundup • Magpie • pywin32 • rrd • osCommerce • SCons • Subversion • phpBB • wxPython • Phorum • SimpleTest • Selenium
  • 4. Before and After Architecture Before After We started with a small site, a We ended with a large site, a mess of open source, and a medium sized team, and an small team that didn't know architecture that has scaled.  much about scaling.  We never stopped. We used a roadmap and a compass, made weekly changes in direction, regularly shipped code on Wednesday to handle the next weekend's capacity constraints, and shipped new features the whole time.  
  • 5. Before and After Architecture (1/4) November
  • 6. Before and After Architecture (2/4) December
  • 7. Before and After Architecture (3/4) February
  • 8. Before and After Architecture (4/4) May
  • 9. Advanced planning vs. fast response “Rocket ship” “Driving” • Figure out in advance what • Continuously figure out is going to go wrong what is going to go wrong soon • Build a plan that prevents those things from • Quickly fix it, without happening breaking something else • Execute your plan • Get feedback along the way • Get feedback when done
  • 10. Questions to ask “Rocket ship” “Driving” • Are you sure you know • How do you know you will what is going to happen? be able to fix the problem in time? • Are you sure you can • How can you be sure you execute? won't cause collateral • Can you afford it? damage? • Do you need feedback? • How can you be sure you won't code yourself into a corner?
  • 11. Continuous Ship • Deploy new software quickly • At IMVU time from check-in to production = 20 minutes • Tell a good change from a bad change (quickly) • Revert a bad change quickly • Work in small batches • At IMVU, a large batch = 3 days worth of work • Break large projects down into small batches • Don't have the same problem twice – fix the root cause of each class of problems IMVU pushes code to production 20-30 times every day
  • 12. Cluster Immune System What it looks like to ship one piece of code to production: • Run tests locally (SimpleTest, Selenium) Everyone has a complete sandbox o • Continuous Integration Server (BuildBot) o All tests must pass or “shut down the line” Automatic feedback if the team is going too fast o • Incremental deploy Monitor cluster and business metrics in real-time o Reject changes that move metrics out-of-bounds o • Alerting & Predictive monitoring (Nagios) Monitor all metrics that stakeholders care about o If any metric goes out-of-bounds, wake somebody up o Use historical trends to predict acceptable bounds o When customers see a failure: Fix the problem for customers o Improve your defenses at each level o
  • 13. Case Study: Sharding Problem: Spread write queries across multiple databases Solution: •Intercept and redirect queries based on SQL comments • Move one table or sub-system at a time • Our experience was one engineer horizontally partitions one table or small sub-system in one week •New engineers figure this out in about 5 minutes db_query(“INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)quot;); db_query(quot;/*shard customer://$customer_id */ INSERT INTO inventory (customers_id, products_id) VALUES ($customer_id, $product_id)quot;); •Learning: cross shard joins & transactions aren’t required
  • 14. Case Study: Caching Problem: Cache frequently read data to memcached Solution: •Intercept and cache queries based on SQL comments db_query_cache(BUDDY_CACHE_TIME, quot;/*shard customer://$customer_id */ /*cache-class customer://$customer_id/buddies */ SELECT friend_id, buddy_order FROM customers_friends WHERE customers_id=$customer_idquot;); ----------------- db_query(“/*shard customer://$customer_id */ DELETE FROM customers_friends WHERE customers_id = $customer_id AND friend_id = $friend_id”); db_flush_cacheclass(quot;customer://$customer_id/buddies”); •Learning: Flushing cache critical to users and performance –When a customer spends $24.95, they want the benefits immediately •Learning: Test the cache behavior for critical systems
  • 15. Case Study: Steering Data Design Problem: Improve database schemas and data design to meet scalability requirements without downtime Solution: •Measure to find the real problems (harder than it sounds) •Migrate to new design that takes advantage of sharding and/or caching
  • 16. Case Study: Steering Data Design
  • 17. Case Study: Steering Data Design
  • 18. Case Study: Steering Data Design Problem: You can’t bulk move large frequently accessed data Solution: •Copy on read –Use when you are read bound –Reads check cache, new location, and copy to new location if missing –Writes go to new location if data has been migrated, otherwise old •Copy on write –Use when you are write bound –Reads check cache, new location, then old location –Writes go to new location, copying to new location if missing •Copy all –Use when file system fills up –Reads & writes go to new location, falling back to old location if missing –Cron copies data a few records at a time
  • 19. “Thank You for Listening!”