SlideShare ist ein Scribd-Unternehmen logo
1 von 80
Downloaden Sie, um offline zu lesen
Harmony in Tune
How we Refactored Cube
   to Terabyte Scale

   Philip (flip) Kromer
    Huston Hoburg
     infochimps.com
       Feb 15 2013
Big Data for All
Big Data for All
why dashboards?
Lightweight Dashboards

•   Understand what’s happening

•   Understand data in context

•   NOT exploratory analytics

•   real-time insight...
    but not just about real-time

mainline: j.mp/sqcube
hi-scale branch: j.mp/icscube
The “Church of Graphs”
Predictive Kvetching
Lightweight Dashboards
Approach to Tuning


• Measure: “Why can’t it be faster?”
• Harmonize: “Use it right”
• Tune: “Align it to production resources”
cube is awesome
What’s so great?
• Streaming, real-time
• Ad-hoc data: write whatever you want
• Ad-hoc queries: make up new queries whenever
• Efficient (“pyramidal”) calculations
Event Stream
•{   time: "2013-02-15T01:02:03Z",
     type: "webreq", data: {
       path: "/order", method: "POST",
       duration: 50.7, status: 400,
       ua:"...MSIE 6.0..." } }

•{   time: "2013-02-15T01:02:03Z",
     type: "tweet", id: 8675309, data: {
       text: "MongoDB talk yay",
       retweet_count: 121,
       user: { screen_name: "infochimps",
         followers_count: 7851,
         lang: "en", ...} } }
Events vs Metrics
Event:
 •{    time: "2013-02-15T01:02:03Z",
       type: "tweet", id: 8675309, data: {
         text: "MongoDB talk yay",
         retweet_count: 121,
         user: { screen_name: "infochimps",
           followers_count: 7851,
           lang: "en", ...} } }
Metrics:
• “# of tweets in 10s bucket at 1:02:10 on 2013-02-15”
• “# of non-english-language tweets in 1hr bucket at ...”
Events vs Metrics
Event:
 •{   time: "2013-02-15T01:02:03Z",
      type: "webreq", data: {
        path: "/order", method: "POST",
        duration: 50.7, status: 400,
        ua:"...MSIE 6.0..." } }


Metrics:
• “# of requests in 10s bucket at 3:05:10 on 2013-02-15”
• “Average duration of requests with 4xx status in the 5
  minute bucket at 3:05:00 on 2013-02-15”
Events vs Metrics
•   Events:              { time: "2013-02-15T01:02:03Z",
                           type: "webreq",

    • baskets of facts
                           data: {
                             path: "/order",
                             method: "POST",

    • narcissistic           duration: 50.7,
                             status: 400,
                             ua:"...MSIE 6.0..." } }
    • LOTS AND LOTS
Events vs Metrics
• Events:             { time: "2013-02-15T01:02:03Z",
                        type: "webreq",

 • baskets of facts
                        data: {
                          path: "/order",
                          method: "POST",

 • narcissistic           duration: 50.7,
                          status: 400,
                          ua:"...MSIE 6.0..." } }

 • LOTS AND LOTS
• Metrics:
 • a timestamped number
 • look like the graph{ time: "2013-02-15T01:02:03Z",
                        value: 90 }

 • one per time bucket
billions and billions
3000 events/second
tuning methodology
Monkey See Monkey Do

            Google for
            the #s the
            cool kids use
Spinal Tap

        Turn
        everything
        to 11!!!!
Hillbilly Mechanic

            Rewrite for
            memcached
            HBase on
            Cassandra!!!
Moneybags

       SSD plz
       Moar CPU
       Moar RAM
       Moar Replica
Tuning How to do it


• Measure: “Why can’t it be faster?”
• Harmonize: “Use it right”
• Tune: “Align it to production resources”
see through
 the magic
• Why can’t it be faster than it is now?
• dstat (http://j.mp/dstatftw): dstat   -drnycmf -t 5

• htop
• mongostat
Grok: client-side
• Made a sprayer to inject data
 • invalidate a time range at max speed
 • writes variously-shaped data: noise, ramp, sine, etc
• Or just reach into the DB and poke
 • delete range of metrics, leave events
 • delete range of events, leave metrics
Fault injection


• raise when packet comes in with certain flag
 •{   time: "2013...", data: {...},
      _raise:"db_write" }

• (only in development mode, obvs.)
app-side tracing
  metalog.event('connect',
    { method: 'ws',
      ip: connection.remoteAddress,
      path: request.url }, 'minor');




• “Metalog” announces lifecycle progress:
 • writes to log...
 • ... or as cube metrics!
app-side tracing
fits on machine
3000 events/second
• Rate:
 • 3000 ev/sec ≈ 250 M ev/day ≈ 2 BILLION/wk

• Expensive. Difficult.
 • 250 GB accumulated per day (@1000 bytes/ev)
 • 95 TB accumulated per year (@1000 bytes/ev)
Metrics
• Rate:
 • 3M tensec/year (π· 10 sec/year)
                         7


 • < 100 bytes/metric ...
• Manageable!
 • a 30 metric dashboard is ~ 10 GB/year @10sec
 • a 30 metric dashboard is ~ 170 MB/year @ 5min
20% gains are boring
At scale, your first barriers are either:


• Easy
• Impossible
Metrics: 10 GB/year
Events: 10 TB/month
Scalability sí
Performance no
Still CPU and Memory Use

 • Problem
  • Mongo seems to be working
  • but high resident memory and fault rate
  • Memory-mapped Files
    • 1Tb data served by 4Gb ram is no good
Capped Collections
• Fixed size circular queue
• records are in order of insertion
         A        B   C   D A   E          F




• oldest records are discarded when full
  ...G        H       C   D A   E          F   G ...
Capped Collections
• Extremely efficient on write
   A          B    C     D A     E         F




• Extremely efficient for insertion-order reads
• Very efficient if queries are ‘local’
 • events in same timebucket
       typically arrived at nearby times
       and so are nearby on disk
don’t like the answer?

change the question.
mainline

uncapped events

capped metrics:

metrics are a view on data
hi-scale branch

capped events

uncapped metrics:

events are ephemeral
Harmony



• Make your pattern of access
  match your system’s strengths and rhythm
Validate Mental Model
Easy fixes

• Duplicate requests = duplicate calculations
 • Cube patch for request queues exists
 • Easy fix!
• Non-pyramidal are inefficient
 • Remove until things are under control
 • ( solve paralyzing problems first )
cube 101
Cube Systems
Collector
• Receives events
• writes to MongoDB
• marks metrics for re-calculation (“invalidates”)
Evaluator
• receives, parses requests for metrics
• calculates metrics “pyramidally”
• then stores them, cached
Pyramidal Aggregation

                                  90
                                                                          5min
    10          20          15          25          10          10
                                                                          1min
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1 6 0 0 1 0 3
                                                                          10s
  ev ev ev ev ev ev ...
Pyramidal Aggregation

                                      5min
                                      1min
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2
                                      10s
   ev ev ev ev ev ev ...
Uses Cached Results

                                                              5min
    10          20          15          25          10
                                                              1min
1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1
                                                              10s
                                    ev ev ev ev ev ev ...
Pyramidal Aggregation
• calculates metrics...
 • from metrics and constants ... from metrics ...
    • from events
• (then stores them, cached)
                                           5 min

                                           1 min
                                           10 sec

ev ev ev ev ev....
fast writes
how fast can we write?
how fast can we write?



       FAST
streaming writes: way efficient
locked out
Writes and Invalidations
Inserts Stop Every 5s
 •   working

 •   working

 •   ANGRY

 •   ANGRY

 •   working

 •   working
Thanks, mongostat!


•   working
•   working
•   ANGRY
                     ...
•   ANGRY
•   working
•   working
                           (simulated)
Inserts Stop Every 5s
 Events Collection
   ...G               H                       C                   D A              E                   F                           G ...




          hi-speed writes                                                                  localized reads


 Metrics Collection
  .       .   .   .           .   .           .       .       .       .   .   .                .           .       .       .        .          .

  .
              .
                          .                                                            x x xx x x
                                                                                       .   .       .           .       .       .
                                  .   .   .       .       .       .
                                                                                                                                        x  .



      randomish                                                                   hi-speed
      reads                                                                       deletes
                                                                                  updates
Inserts Stop Every 5s
 Events Collection
   ...G               H                       C                   D A              E                   F                           G ...




          hi-speed writes                                                                  localized reads


 Metrics Collection
  .       .   .   .           .   .           .       .       .       .   .   .                .           .       .       .        .          .

  .
              .
                          .                                                            x x xx x x
                                                                                       .   .       .           .       .       .
                                  .   .   .       .       .       .
                                                                                                                                        x  .



      randomish                                                                   hi-speed
      reads                                                                       deletes
                                                                                  updates
Inserts Stop Every 5s
• What’s really going on?
 • Database write locks
 • Events and metrics have conflicting locks
 • Solution: split the databases
      Events Collection
        ...G               H                       C                   D A             E                   F                           G ...




               hi-speed writes                                                                 localized reads


      Metrics Collection
       .       .   .   .           .   .           .       .       .       .   .   .               .           .       .       .        .          .

       .
                   .
                               .                                                           x x xx x x
                                                                                           .   .       .           .       .       .
                                       .   .   .       .       .       .
                                                                                                                                            x  .



           randomish                                                                   hi-speed
           reads                                                                       deletes
fast reads
Pre-cache Metrics
• Keep metrics fresh (Warmer)
• Only calculate recent updates (Horizons)
fancy metrics
Non-pyramidal Aggregates

 • Can’t calculate from warmed metrics
 • Store values with counts in metrics
  • Counts can be vivified for aggregations
  • Smaller footprint than full events
  • Works best for dense, finite values
finally, scaling
Multicore


• MongoDB
 • Writes limited to single core
 • Requires sharding for multicore
Multicore

• Cube (node.js)
 • Concurrent, but not multi-threaded
• Easy solution
 • Multiple collectors on different ports
 • Produces redundant invalidations
 • Requires external load balancing
Multicore
Hardware
• High Memory
 • Capped events size scale with memory
• CPU
 • Mongo / cube not optimized for multicore
 • Faster cores
• EC2 Best value: m2.2xlarge
 • < $700/mo, 34.2GB RAM, 13 bogo-hertz
Cloud helps


• Tune machines to application

• Dedicating databases for each application makes life
  a lot easier
Cloud helps


• Tune machines to application

•
jobs@infochimps.com




github.com/
   infochimps-labs
good ideas that
  didn’t help
Queues


• Different queueing methods
• Should optimize metric calculations
 • No significant improvement
Locks: update VS remove

   • Uncapped metrics allow ‘remove’ as
     invalidation option
   • Remove doesn’t help with database locks
   • It was a stupid idea anyway: that’s OK
    • “Hey, poke it and see what happens!”
Mongo Aggregations

• Mongo has aggregations!
• Node ends up working better
 • Mongo aggregations aren’t faster
 • Less flexible
 • Would require query language rewrite
Why not Graphite?

• Data model
 • Metrics-centric vs Events-centric
    (metrics code not intertwingled with app code)
• Environment familiarity
 • Cube: d3, node.js, mongo
 • Graphite: Django, Whisper, C

Weitere ähnliche Inhalte

Ähnlich wie Playing in Tune: How We Refactored Cube to Terabyte Scale

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
London devops logging
London devops loggingLondon devops logging
London devops loggingTomas Doran
 
Web Developing In Search
Web Developing In SearchWeb Developing In Search
Web Developing In SearchFrank Xu
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 
Lots of facets, fast
Lots of facets, fastLots of facets, fast
Lots of facets, fastBeyondTrees
 
Memcached Code Camp 2009
Memcached Code Camp 2009Memcached Code Camp 2009
Memcached Code Camp 2009NorthScale
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++Mike Acton
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDCMike Dirolf
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...huguk
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comDamien Krotkine
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionDataStax Academy
 
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012Nick Galbreath
 
It's all about the timing
It's all about the timingIt's all about the timing
It's all about the timingSensePost
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionDataStax Academy
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionDataStax Academy
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profitRodrigo Campos
 

Ähnlich wie Playing in Tune: How We Refactored Cube to Terabyte Scale (20)

Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
Web Developing In Search
Web Developing In SearchWeb Developing In Search
Web Developing In Search
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 
Lots of facets, fast
Lots of facets, fastLots of facets, fast
Lots of facets, fast
 
Memcached Code Camp 2009
Memcached Code Camp 2009Memcached Code Camp 2009
Memcached Code Camp 2009
 
Data oriented design and c++
Data oriented design and c++Data oriented design and c++
Data oriented design and c++
 
MongoDB at ZPUGDC
MongoDB at ZPUGDCMongoDB at ZPUGDC
MongoDB at ZPUGDC
 
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
Sean Kandel - Data profiling: Assessing the overall content and quality of a ...
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
Rate Limiting at Scale, from SANS AppSec Las Vegas 2012
 
It's all about the timing
It's all about the timingIt's all about the timing
It's all about the timing
 
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in ProductionCassandra Day Atlanta 2015: Diagnosing Problems in Production
Cassandra Day Atlanta 2015: Diagnosing Problems in Production
 
Cassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in ProductionCassandra Day Chicago 2015: Diagnosing Problems in Production
Cassandra Day Chicago 2015: Diagnosing Problems in Production
 
Cassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in ProductionCassandra Day London 2015: Diagnosing Problems in Production
Cassandra Day London 2015: Diagnosing Problems in Production
 
How to Make Norikra Perfect
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra Perfect
 
Capacity Planning for fun & profit
Capacity Planning for fun & profitCapacity Planning for fun & profit
Capacity Planning for fun & profit
 

Mehr von MongoDB

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump StartMongoDB
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB
 

Mehr von MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Playing in Tune: How We Refactored Cube to Terabyte Scale

  • 1. Harmony in Tune How we Refactored Cube to Terabyte Scale Philip (flip) Kromer Huston Hoburg infochimps.com Feb 15 2013
  • 5. Lightweight Dashboards • Understand what’s happening • Understand data in context • NOT exploratory analytics • real-time insight... but not just about real-time mainline: j.mp/sqcube hi-scale branch: j.mp/icscube
  • 6. The “Church of Graphs”
  • 9. Approach to Tuning • Measure: “Why can’t it be faster?” • Harmonize: “Use it right” • Tune: “Align it to production resources”
  • 11. What’s so great? • Streaming, real-time • Ad-hoc data: write whatever you want • Ad-hoc queries: make up new queries whenever • Efficient (“pyramidal”) calculations
  • 12. Event Stream •{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } } •{ time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } }
  • 13. Events vs Metrics Event: •{ time: "2013-02-15T01:02:03Z", type: "tweet", id: 8675309, data: { text: "MongoDB talk yay", retweet_count: 121, user: { screen_name: "infochimps", followers_count: 7851, lang: "en", ...} } } Metrics: • “# of tweets in 10s bucket at 1:02:10 on 2013-02-15” • “# of non-english-language tweets in 1hr bucket at ...”
  • 14. Events vs Metrics Event: •{ time: "2013-02-15T01:02:03Z", type: "webreq", data: { path: "/order", method: "POST", duration: 50.7, status: 400, ua:"...MSIE 6.0..." } } Metrics: • “# of requests in 10s bucket at 3:05:10 on 2013-02-15” • “Average duration of requests with 4xx status in the 5 minute bucket at 3:05:00 on 2013-02-15”
  • 15. Events vs Metrics • Events: { time: "2013-02-15T01:02:03Z", type: "webreq", • baskets of facts data: { path: "/order", method: "POST", • narcissistic duration: 50.7, status: 400, ua:"...MSIE 6.0..." } } • LOTS AND LOTS
  • 16. Events vs Metrics • Events: { time: "2013-02-15T01:02:03Z", type: "webreq", • baskets of facts data: { path: "/order", method: "POST", • narcissistic duration: 50.7, status: 400, ua:"...MSIE 6.0..." } } • LOTS AND LOTS • Metrics: • a timestamped number • look like the graph{ time: "2013-02-15T01:02:03Z", value: 90 } • one per time bucket
  • 20. Monkey See Monkey Do Google for the #s the cool kids use
  • 21. Spinal Tap Turn everything to 11!!!!
  • 22. Hillbilly Mechanic Rewrite for memcached HBase on Cassandra!!!
  • 23. Moneybags SSD plz Moar CPU Moar RAM Moar Replica
  • 24. Tuning How to do it • Measure: “Why can’t it be faster?” • Harmonize: “Use it right” • Tune: “Align it to production resources”
  • 26. • Why can’t it be faster than it is now?
  • 27. • dstat (http://j.mp/dstatftw): dstat -drnycmf -t 5 • htop • mongostat
  • 28. Grok: client-side • Made a sprayer to inject data • invalidate a time range at max speed • writes variously-shaped data: noise, ramp, sine, etc • Or just reach into the DB and poke • delete range of metrics, leave events • delete range of events, leave metrics
  • 29. Fault injection • raise when packet comes in with certain flag •{ time: "2013...", data: {...}, _raise:"db_write" } • (only in development mode, obvs.)
  • 30. app-side tracing metalog.event('connect', { method: 'ws', ip: connection.remoteAddress, path: request.url }, 'minor'); • “Metalog” announces lifecycle progress: • writes to log... • ... or as cube metrics!
  • 33. 3000 events/second • Rate: • 3000 ev/sec ≈ 250 M ev/day ≈ 2 BILLION/wk • Expensive. Difficult. • 250 GB accumulated per day (@1000 bytes/ev) • 95 TB accumulated per year (@1000 bytes/ev)
  • 34. Metrics • Rate: • 3M tensec/year (π· 10 sec/year) 7 • < 100 bytes/metric ... • Manageable! • a 30 metric dashboard is ~ 10 GB/year @10sec • a 30 metric dashboard is ~ 170 MB/year @ 5min
  • 35. 20% gains are boring At scale, your first barriers are either: • Easy • Impossible Metrics: 10 GB/year Events: 10 TB/month
  • 37. Still CPU and Memory Use • Problem • Mongo seems to be working • but high resident memory and fault rate • Memory-mapped Files • 1Tb data served by 4Gb ram is no good
  • 38. Capped Collections • Fixed size circular queue • records are in order of insertion A B C D A E F • oldest records are discarded when full ...G H C D A E F G ...
  • 39. Capped Collections • Extremely efficient on write A B C D A E F • Extremely efficient for insertion-order reads • Very efficient if queries are ‘local’ • events in same timebucket typically arrived at nearby times and so are nearby on disk
  • 40. don’t like the answer? change the question.
  • 42. hi-scale branch capped events uncapped metrics: events are ephemeral
  • 43. Harmony • Make your pattern of access match your system’s strengths and rhythm
  • 45. Easy fixes • Duplicate requests = duplicate calculations • Cube patch for request queues exists • Easy fix! • Non-pyramidal are inefficient • Remove until things are under control • ( solve paralyzing problems first )
  • 48. Collector • Receives events • writes to MongoDB • marks metrics for re-calculation (“invalidates”)
  • 49. Evaluator • receives, parses requests for metrics • calculates metrics “pyramidally” • then stores them, cached
  • 50. Pyramidal Aggregation 90 5min 10 20 15 25 10 10 1min 1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1 6 0 0 1 0 3 10s ev ev ev ev ev ev ...
  • 51. Pyramidal Aggregation 5min 1min 1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 10s ev ev ev ev ev ev ...
  • 52. Uses Cached Results 5min 10 20 15 25 10 1min 1 5 2 0 2 0 6 4 7 1 0 2 2 3 2 4 2 2 5 5 4 6 4 1 2 7 0 0 0 1 10s ev ev ev ev ev ev ...
  • 53. Pyramidal Aggregation • calculates metrics... • from metrics and constants ... from metrics ... • from events • (then stores them, cached) 5 min 1 min 10 sec ev ev ev ev ev....
  • 55. how fast can we write?
  • 56. how fast can we write? FAST streaming writes: way efficient
  • 59. Inserts Stop Every 5s • working • working • ANGRY • ANGRY • working • working
  • 60. Thanks, mongostat! • working • working • ANGRY ... • ANGRY • working • working (simulated)
  • 61. Inserts Stop Every 5s Events Collection ...G H C D A E F G ... hi-speed writes localized reads Metrics Collection . . . . . . . . . . . . . . . . . . . . . x x xx x x . . . . . . . . . . . . x . randomish hi-speed reads deletes updates
  • 62. Inserts Stop Every 5s Events Collection ...G H C D A E F G ... hi-speed writes localized reads Metrics Collection . . . . . . . . . . . . . . . . . . . . . x x xx x x . . . . . . . . . . . . x . randomish hi-speed reads deletes updates
  • 63. Inserts Stop Every 5s • What’s really going on? • Database write locks • Events and metrics have conflicting locks • Solution: split the databases Events Collection ...G H C D A E F G ... hi-speed writes localized reads Metrics Collection . . . . . . . . . . . . . . . . . . . . . x x xx x x . . . . . . . . . . . . x . randomish hi-speed reads deletes
  • 65. Pre-cache Metrics • Keep metrics fresh (Warmer) • Only calculate recent updates (Horizons)
  • 67. Non-pyramidal Aggregates • Can’t calculate from warmed metrics • Store values with counts in metrics • Counts can be vivified for aggregations • Smaller footprint than full events • Works best for dense, finite values
  • 69. Multicore • MongoDB • Writes limited to single core • Requires sharding for multicore
  • 70. Multicore • Cube (node.js) • Concurrent, but not multi-threaded • Easy solution • Multiple collectors on different ports • Produces redundant invalidations • Requires external load balancing
  • 72. Hardware • High Memory • Capped events size scale with memory • CPU • Mongo / cube not optimized for multicore • Faster cores • EC2 Best value: m2.2xlarge • < $700/mo, 34.2GB RAM, 13 bogo-hertz
  • 73. Cloud helps • Tune machines to application • Dedicating databases for each application makes life a lot easier
  • 74. Cloud helps • Tune machines to application •
  • 76. good ideas that didn’t help
  • 77. Queues • Different queueing methods • Should optimize metric calculations • No significant improvement
  • 78. Locks: update VS remove • Uncapped metrics allow ‘remove’ as invalidation option • Remove doesn’t help with database locks • It was a stupid idea anyway: that’s OK • “Hey, poke it and see what happens!”
  • 79. Mongo Aggregations • Mongo has aggregations! • Node ends up working better • Mongo aggregations aren’t faster • Less flexible • Would require query language rewrite
  • 80. Why not Graphite? • Data model • Metrics-centric vs Events-centric (metrics code not intertwingled with app code) • Environment familiarity • Cube: d3, node.js, mongo • Graphite: Django, Whisper, C