SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
Massively Sharded MySQL

            Evan Elias
       Velocity Europe 2011
Tumblrʼs Size and Growth
                                     1 Year Ago           Today

                Impressions        3 Billion/month   15 Billion/month

                Total Posts          1.5 Billion       12.5 Billion

                Total Blogs           9 Million         33 Million

                Developers               3                 17

                Sys Admins               1                  5

                Total Staff (FT)         13                55




Massively Sharded MySQL
Our databases and dataset
        •   Machines dedicated to MySQL: over 175
             •   Thatʼs roughly how many production machines we had total a year ago

        •   Relational data on master databases: over 11 terabytes
        •   Unique rows: over 25 billion




Massively Sharded MySQL
MySQL Replication 101
        •   Asynchronous
        •   Single-threaded SQL execution on slave
        •   Masters can have multiple slaves
        •   A slave can only have one master
        •   Can be hierarchical, but complicates failure-handling
        •   Keep two standby slaves per pool: one to promote when a master fails, and
            the other to bring up additional slaves quickly
        •   Scales reads, not writes

Massively Sharded MySQL
Why Partition?
       Reason 1: Write scalability
        •   No other way to scale writes beyond the limits of one machine
        •   During peak insert times, youʼll likely start hitting lag on slaves before your
            master shows a concurrency problem




Massively Sharded MySQL
Why Partition?
       Reason 2: Data size
        •   Working set wonʼt fit in RAM
        •   SSD performance drops as disk fills up
        •   Risk of completely full disk
        •   Operational difficulties: slow backups, longer to spin up new slaves
        •   Fault isolation: all of your data in one place = single point of failure affecting
            all users


Massively Sharded MySQL
Types of Partitioning
       •   Divide a table
                •   Horizontal Partitioning
                •   Vertical Partitioning


       •   Divide a dataset / schema
                •   Functional Partitioning




Massively Sharded MySQL
Horizontal Partitioning
       Divide a table by relocating sets of rows
        •   Some support internally by MySQL, allowing you to divide a table into several
            files transparently, but with limitations
        •   Sharding is the implementation of horizontal partitioning outside of MySQL
            (at the application level or service level). Each partition is a separate table.
            They may be located in different database schemas and/or different instances
            of MySQL.




Massively Sharded MySQL
Vertical Partitioning
       Divide a table by relocating sets of columns
        •   Not supported internally by MySQL, though you can do it manually by
            creating separate tables.
        •   Not recommended in most cases – if your data is already normalized, then
            vertical partitioning introduces unnecessary joins
        •   If your partitions are on different MySQL instances, then youʼre doing these
            “joins” in application code instead of in SQL




Massively Sharded MySQL
Functional Partitioning
       Divide a dataset by moving one or more tables
        •   First eliminate all JOINs across tables in different partitions
        •   Move tables to new partitions (separate MySQL instances) using selective
            dumping, followed by replication filters
        •   Often just a temporary solution. If the table eventually grows too large to fit on
            a single machine, youʼll need to shard it anyway.




Massively Sharded MySQL
When to Shard
        •   Sharding is very complex, so itʼs best not to shard until itʼs obvious that you
            will actually need to!
        •   Predict when you will hit write scalability issues — determine this on spare
            hardware
        •   Predict when you will hit data size issues — calculate based on your growth
            rate
        •   Functional partitioning can buy time




Massively Sharded MySQL
Sharding Decisions
        •   Sharding key — a core column present (or derivable) in most tables.
        •   Sharding scheme — how you will group and home data (ranges vs hash vs
            lookup table)
        •   How many shards to start with, or equivalently, how much data per shard
        •   Shard colocation — do shards coexist within a DB schema, a MySQL
            instance, or a physical machine?




Massively Sharded MySQL
Sharding Schemes
       Determining which shard a row lives on
        •   Ranges: Easy to implement and trivial to add new shards, but requires
            frequent and uneven rebalancing due to user behavior differences.
        •   Hash or modulus: Apply function on the sharding key to determine which
            shard. Simple to implement, and distributes data evenly.  Incredibly difficult to
            add new shards or rebalance existing ones.
        •   Lookup table: Highest flexibility, but impacts performance and adds a single
            point of failure. Lookup table may eventually become too large.



Massively Sharded MySQL
Application Requirements
        •   Sharding key must be available for all frequent look-up operations. For
            example, canʼt efficiently look up posts by their own ID anymore, also need
            blog ID to know which shard to hit.
        •   Support for read-only and offline shards. App code needs to gracefully handle
            planned maintenance and unexpected failures.
        •   Support for reading and writing to different MySQL instances for the same
            shard range — not for scaling reads, but for the rebalancing process




Massively Sharded MySQL
Service Requirements
        •   ID generation for PK of sharded tables
        •   Nice-to-have: Centralized service for handling common needs
               •      Querying multiple shards simultaneously
               •      Persistent connections
               •      Centralized failure handling
               •      Parsing SQL to determine which shard(s) to send a query to




Massively Sharded MySQL
Operational Requirements
        •   Automation for adding and rebalancing shards, and sufficient monitoring to
            know when each is necessary
        •   Nice-to-have: Support for multiple MySQL instances per machine — makes
            cloning and replication setup simpler, and overhead isnʼt too bad




Massively Sharded MySQL
How to initially shard a table
       Option 1: Transitional migration with legacy DB
        •   Choose a cutoff ID of the tableʼs PK (not the sharding key) which is slightly
            higher than its current max ID. Once that cutoff has been reached, all new
            rows get written to shards instead of legacy.
        •   Whenever a legacy row is updated by app, move it to a shard
        •   Migration script slowly saves old rows (at the app level) in the background,
            moving them to shards, and gradually lowers cutoff ID
        •   Reads may need to check shards and legacy, but based on ID you can make
            an informed choice of which to check first

Massively Sharded MySQL
How to initially shard a table
       Option 2: All at once
        1. Dark mode: app redundantly sends all writes (inserts, updates, deletes) to
           legacy database as well as the appropriate shard. All reads still go to legacy
           database.

        2. Migration: script reads data from legacy DB (sweeping by the sharding
           key) and writes it to the appropriate shard.

        3. Finalize: move reads to shards, and then stop writing data to legacy.




Massively Sharded MySQL
Shard Automation
       Tumblrʼs custom automation software can:
        •   Crawl replication topology for all shards
        •   Manipulate server settings or concurrently execute arbitrary UNIX
            commands / administrative MySQL queries, on some or all shards
        •   Copy large files to multiple remote destinations efficiently
        •   Spin up multiple new slaves simultaneously from a single source
        •   Import or export arbitrary portions of the dataset
        •   Split a shard into N new shards
Massively Sharded MySQL
Splitting shards: goals
       •   Rebalance an overly-large shard by dividing it into N new shards, of even or
           uneven size
       •   Speed
           • No locks

           • No application logic

           • Divide a 800gb shard (hundreds of millions of rows) in two in only 5 hours

       •   Full read and write availability: shard-splitting process has no impact on live
           application performance, functionality, or data consistency




Massively Sharded MySQL
Splitting shards: assumptions
       •   All tables using InnoDB

       •   All tables have an index that begins with your sharding key, and sharding scheme is range-based.
           This plays nice with range queries in MySQL.

       •   No schema changes in process of split

       •   Disk is < 2/3 full, or thereʼs a second disk with sufficient space

       •   Keeping two standby slaves per shard pool (or more if multiple data centers)

       •   Uniform MySQL config between masters and slaves: log-slave-updates, unique server-id, generic log-
           bin and relay-log names, replication user/grants everywhere

       •   No real slave lag, or already solved in app code

       •   Redundant rows temporarily on the wrong shard donʼt matter to app

Massively Sharded MySQL
Splitting shards: process
       Large “parent” shard divided into N “child” shards
        1. Create N new slaves in parent shard pool — these will soon become
           masters of their own shard pools

        2. Reduce the data set on those slaves so that each contains a different subset
           of the data

        3. Move app reads from the parent to the appropriate children

        4. Move app writes from the parent to the appropriate children

        5. Stop replicating writes from the parent; take the parent pool offline

        6. Remove rows that replicated to the wrong child shard
Massively Sharded MySQL
Splitting shards: process
       Large “parent” shard divided into N “child” shards
        1. Create N new slaves in parent shard pool — these will soon become
           masters of their own shard pools

        2. Reduce the data set on those slaves so that each contains a different subset
           of the data

        3. Move app reads from the parent to the appropriate children

        4. Move app writes from the parent to the appropriate children

        5. Stop replicating writes from the parent; take the parent pool offline

        6. Remove rows that replicated to the wrong child shard
Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                          R/W
                                                                         all app reads/writes
                          Replication




                          Standby Slave   Standby Slave




Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                          R/W
                                                                              all app reads/writes
                          Replication


                                                          Clone               Clone




                          Standby Slave   Standby Slave           Standby Slave   Standby Slave




Massively Sharded MySQL
Aside: copying files efficiently
       To a single destination
        •   Our shard-split process relies on creating new slaves quickly, which involves
            copying around very large data sets
        •   For compression we use pigz (parallel gzip), but there are other alternatives
            like qpress
        •   Destination box: nc -l [port] | pigz -d | tar xvf -

        •   Source box: tar vc . | pigz | nc [destination hostname] [port]



Massively Sharded MySQL
Aside: copying files efficiently
       To multiple destinations
        •   Add tee and a FIFO to the mix, and you can create a chained copy to multiple
            destinations simultaneously
        •   Each box makes efficient use of CPU, memory, disk, uplink, downlink
        •   Performance penalty is only around 3% to 10% — much better than copying
            serially or from copying in parallel from a single source
        •   http://engineering.tumblr.com/post/7658008285/efficiently-copying-files-to-
            multiple-destinations


Massively Sharded MySQL
Splitting shards: process
       Large “parent” shard divided into N “child” shards
        1. Create N new slaves in parent shard pool — these will soon become
           masters of their own shard pools

        2. Reduce the data set on those slaves so that each contains a different subset
           of the data

        3. Move app reads from the parent to the appropriate children

        4. Move app writes from the parent to the appropriate children

        5. Stop replicating writes from the parent; take the parent pool offline

        6. Remove rows that replicated to the wrong child shard
Massively Sharded MySQL
Importing/exporting in chunks
        •   Divide ID range into many chunks, and then execute export queries in
            parallel. Likewise for import.
        •   Export via SELECT * FROM [table] WHERE [range clause] INTO
            OUTFILE [file]
        •   Use mysqldump to export DROP TABLE / CREATE TABLE statements, and
            then run those to get clean slate
        •   Import via LOAD DATA INFILE




Massively Sharded MySQL
Importing/exporting gotchas
        •   Disable binary logging before import, to speed it up
        •   Disable any query-killer scripts, or filter out SELECT ... INTO OUTFILE
            and LOAD DATA INFILE
        •   Benchmark to figure out how many concurrent import/export queries can be
            run on your hardware
        •   Be careful with disk I/O scheduler choices if using multiple disks with different
            speeds




Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                 R/W
                                                                all app reads/writes




                 Standby Slave, Standby Slave,     Standby Slave,   Standby Slave,
                  blogs 1-1000   blogs 1-1000       blogs 1-1000     blogs 1-1000




Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                 R/W
                                                                  all app reads/writes




                                                                                      Two slaves export different
                                                                                      halves of the dataset, drop
                                                       Child Shard     Child Shard
                 Standby Slave, Standby Slave,
                                                         Master,         Master,      and recreate their tables,
                  blogs 1-1000   blogs 1-1000
                                                       blogs 1-500   blogs 501-1000   and then import the data




Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                 R/W
                                                                  all app reads/writes




                                                       Child Shard     Child Shard
                 Standby Slave, Standby Slave,
                                                         Master,         Master,       Every pool needs two
                  blogs 1-1000   blogs 1-1000
                                                       blogs 1-500   blogs 501-1000    standby slaves, so spin
                                                                                       them up for the child
                                                                                       shard pools

                                                   Standby Slaves     Standby Slaves
                                                     blogs 1-500      blogs 501-1000

Massively Sharded MySQL
Splitting shards: process
       Large “parent” shard divided into N “child” shards
        1. Create N new slaves in parent shard pool — these will soon become
           masters of their own shard pools

        2. Reduce the data set on those slaves so that each contains a different subset
           of the data

        3. Move app reads from the parent to the appropriate children

        4. Move app writes from the parent to the appropriate children

        5. Stop replicating writes from the parent; take the parent pool offline

        6. Remove rows that replicated to the wrong child shard
Massively Sharded MySQL
Moving reads and writes separately
        •   If configuration updates do not simultaneously reach all of your web/app
            servers, this would create consistency issues if reads/writes moved at same
            time
             •   Web A gets new config, writes post for blog 200 to 1st child shard
             •   Web B is still on old config, renders blog 200, reads from parent shard, doesnʼt find the new
                 post

        •   Instead only move reads first; let writes keep replicating from parent shard
        •   After first config update for reads, do a second one moving writes
        •   Then wait for parent master binlog to stop moving before proceeding


Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                 R/W
                                                                  all app reads/writes




                                                       Child Shard     Child Shard
                 Standby Slave, Standby Slave,
                                                         Master,         Master,
                  blogs 1-1000   blogs 1-1000
                                                       blogs 1-500   blogs 501-1000




                                                   Standby Slaves     Standby Slaves
                                                     blogs 1-500      blogs 501-1000

Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                 W
                                                             all app writes (which then
                                                                   replicate to children)


                                                                                        Reads now go to children
                                                           R                 R          for appropriate queries to
                 Standby Slave, Standby Slave,
                                                      Child Shard       Child Shard           these ranges
                                                        Master,           Master,
                  blogs 1-1000   blogs 1-1000
                                                      blogs 1-500     blogs 501-1000




                                                     Standby Slaves    Standby Slaves
                                                       blogs 1-500     blogs 501-1000

Massively Sharded MySQL
Splitting shards: process
       Large “parent” shard divided into N “child” shards
        1. Create N new slaves in parent shard pool — these will soon become
           masters of their own shard pools

        2. Reduce the data set on those slaves so that each contains a different subset
           of the data

        3. Move app reads from the parent to the appropriate children

        4. Move app writes from the parent to the appropriate children

        5. Stop replicating writes from the parent; take the parent pool offline

        6. Remove rows that replicated to the wrong child shard
Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                 W
                                                             all app writes (which then
                                                                   replicate to children)


                                                                                        Reads now go to children
                                                           R                 R          for appropriate queries to
                 Standby Slave, Standby Slave,
                                                      Child Shard       Child Shard           these ranges
                                                        Master,           Master,
                  blogs 1-1000   blogs 1-1000
                                                      blogs 1-500     blogs 501-1000




                                                     Standby Slaves    Standby Slaves
                                                       blogs 1-500     blogs 501-1000

Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                           no longer in use by app



                                                                                    Reads and writes now go to
                                                     R/W              R/W             children for appropriate
                 Standby Slave, Standby Slave,
                                                  Child Shard       Child Shard      queries to these ranges
                                                    Master,           Master,
                  blogs 1-1000   blogs 1-1000
                                                  blogs 1-500     blogs 501-1000




                                                 Standby Slaves    Standby Slaves
                                                   blogs 1-500     blogs 501-1000

Massively Sharded MySQL
Splitting shards: process
       Large “parent” shard divided into N “child” shards
        1. Create N new slaves in parent shard pool — these will soon become
           masters of their own shard pools

        2. Reduce the data set on those slaves so that each contains a different subset
           of the data

        3. Move app reads from the parent to the appropriate children

        4. Move app writes from the parent to the appropriate children

        5. Stop replicating writes from the parent; take the parent pool offline

        6. Remove rows that replicated to the wrong child shard
Massively Sharded MySQL
Parent Master, blogs 1-1000,
                                                           no longer in use by app



                                                                                    Reads and writes now go to
                                                     R/W              R/W             children for appropriate
                 Standby Slave, Standby Slave,
                                                  Child Shard       Child Shard      queries to these ranges
                                                    Master,           Master,
                  blogs 1-1000   blogs 1-1000
                                                  blogs 1-500     blogs 501-1000




                                                 Standby Slaves    Standby Slaves
                                                   blogs 1-500     blogs 501-1000

Massively Sharded MySQL
Parent Master, blogs 1-1000
                                                       Global read-only set
                                                       App user dropped


                                                                                      These are now complete
                                                     R/W              R/W                shards of their own
                                                     Shard            Shard         (but with some surplus data
                 Standby Slave, Standby Slave,
                                                    Master,           Master,
                  blogs 1-1000   blogs 1-1000
                                                  blogs 1-500     blogs 501-1000
                                                                                     that needs to be removed)




                                                 Standby Slaves    Standby Slaves
                                                   blogs 1-500     blogs 501-1000

Massively Sharded MySQL
X       Deprecated Parent Master, blogs 1-1000
                                                         Recycle soon



                                                                                       These are now complete
                          X         X                  R/W
                                                        Shard
                                                                       R/W
                                                                       Shard
                                                                                          shards of their own
                                                                                     (but with some surplus data
                 Standby Slave, Standby Slave,
                                                       Master,         Master,
                  blogs 1-1000   blogs 1-1000
                                                     blogs 1-500   blogs 501-1000
                                                                                      that needs to be removed)
                  Recycle soon   Recycle soon




                                                 Standby Slaves     Standby Slaves
                                                   blogs 1-500      blogs 501-1000

Massively Sharded MySQL
Splitting shards: process
       Large “parent” shard divided into N “child” shards
        1. Create N new slaves in parent shard pool — these will soon become
           masters of their own shard pools

        2. Reduce the data set on those slaves so that each contains a different subset
           of the data

        3. Move app reads from the parent to the appropriate children

        4. Move app writes from the parent to the appropriate children

        5. Stop replicating writes from the parent; take the parent pool offline

        6. Remove rows that replicated to the wrong child shard
Massively Sharded MySQL
Splitting shards: cleanup
        •   Until writes are moved to the new shard masters, writes to the parent shard
            replicate to all its child shards
        •   Purge this data after shard split process is finished
        •   Avoid huge single DELETE statements — they cause slave lag, among other
            issues (huge long transactions are generally bad)
        •   Data is sparse, so itʼs efficient to repeatedly do a SELECT (find next sharding
            key value to clean) followed by a DELETE.




Massively Sharded MySQL
General sharding gotchas
        •   Sizing shards properly: as small as you can operationally handle
        •   Choosing ranges: try to keep them even, and better if they end in 000ʼs than
            999ʼs or, worse yet, random ugly numbers. This matters once shard naming is
            based solely on the ranges.
        •   Cutover: regularly create new shards to handle future data. Take the max
            value of your sharding key and add a cushion to it.
            •   Before: highest blog ID 31.8 million, last shard handles range [30m, ∞)
            •   After: that shard now handles [30m, 32m) and new last shard handles [32m, ∞)




Massively Sharded MySQL
Questions?




                          Email: evan -at- tumblr.com


Massively Sharded MySQL

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Globant development week / Micro frontend
Globant development week / Micro frontendGlobant development week / Micro frontend
Globant development week / Micro frontend
 
Microservice vs. Monolithic Architecture
Microservice vs. Monolithic ArchitectureMicroservice vs. Monolithic Architecture
Microservice vs. Monolithic Architecture
 
FOSS4G Firenze 2022 참가기
FOSS4G Firenze 2022 참가기FOSS4G Firenze 2022 참가기
FOSS4G Firenze 2022 참가기
 
Microservices in Practice
Microservices in PracticeMicroservices in Practice
Microservices in Practice
 
Cloud native integration
Cloud native integrationCloud native integration
Cloud native integration
 
[DevDay2019] Micro Frontends Architecture - By Thang Pham, Senior Software En...
[DevDay2019] Micro Frontends Architecture - By Thang Pham, Senior Software En...[DevDay2019] Micro Frontends Architecture - By Thang Pham, Senior Software En...
[DevDay2019] Micro Frontends Architecture - By Thang Pham, Senior Software En...
 
Microservices architecture
Microservices architectureMicroservices architecture
Microservices architecture
 
How to build Micro Frontends with @angular/elements
How to build Micro Frontends with @angular/elementsHow to build Micro Frontends with @angular/elements
How to build Micro Frontends with @angular/elements
 
Open Source GIS 기초교육 4일차 - GeoServer 기초 2014년 7월판
Open Source GIS 기초교육 4일차 - GeoServer 기초 2014년 7월판Open Source GIS 기초교육 4일차 - GeoServer 기초 2014년 7월판
Open Source GIS 기초교육 4일차 - GeoServer 기초 2014년 7월판
 
Microservices
MicroservicesMicroservices
Microservices
 
공간정보 관점에서 바라본 디지털트윈과 메타버스
공간정보 관점에서 바라본 디지털트윈과 메타버스공간정보 관점에서 바라본 디지털트윈과 메타버스
공간정보 관점에서 바라본 디지털트윈과 메타버스
 
Design patterns for microservice architecture
Design patterns for microservice architectureDesign patterns for microservice architecture
Design patterns for microservice architecture
 
Microservices
Microservices Microservices
Microservices
 
Introduction To Microservices
Introduction To MicroservicesIntroduction To Microservices
Introduction To Microservices
 
Micro Front Ends : Divided We Rule by Parth Ghiya - AhmedabadJS
Micro Front Ends : Divided We Rule by Parth Ghiya - AhmedabadJSMicro Front Ends : Divided We Rule by Parth Ghiya - AhmedabadJS
Micro Front Ends : Divided We Rule by Parth Ghiya - AhmedabadJS
 
Microservice Architecture
Microservice ArchitectureMicroservice Architecture
Microservice Architecture
 
Programmation réactive avec Spring 5 et Reactor
Programmation réactive avec Spring 5 et ReactorProgrammation réactive avec Spring 5 et Reactor
Programmation réactive avec Spring 5 et Reactor
 
Successfully Implement Your API Strategy with NGINX
Successfully Implement Your API Strategy with NGINXSuccessfully Implement Your API Strategy with NGINX
Successfully Implement Your API Strategy with NGINX
 
Azure functions
Azure functionsAzure functions
Azure functions
 
微博架构与平台安全
微博架构与平台安全微博架构与平台安全
微博架构与平台安全
 

Ähnlich wie Evan Ellis "Tumblr. Massively Sharded MySQL"

Massively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentationMassively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentation
kriptonium
 
Scaling MySQL Using Fabric
Scaling MySQL Using FabricScaling MySQL Using Fabric
Scaling MySQL Using Fabric
Remote MySQL DBA
 
MySQL HA Sharding-Fabric
MySQL HA Sharding-FabricMySQL HA Sharding-Fabric
MySQL HA Sharding-Fabric
Abdul Manaf
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 

Ähnlich wie Evan Ellis "Tumblr. Massively Sharded MySQL" (20)

Massively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentationMassively sharded my sql at tumblr presentation
Massively sharded my sql at tumblr presentation
 
Scaling MySQL Using Fabric
Scaling MySQL Using FabricScaling MySQL Using Fabric
Scaling MySQL Using Fabric
 
Scaling MySQL using Fabric
Scaling MySQL using FabricScaling MySQL using Fabric
Scaling MySQL using Fabric
 
MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0MySQL NDB Cluster 8.0
MySQL NDB Cluster 8.0
 
BigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearchBigData, NoSQL & ElasticSearch
BigData, NoSQL & ElasticSearch
 
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More FlexibilityNOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
NOSQL Meets Relational - The MySQL Ecosystem Gains More Flexibility
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
 
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
ScaleBase Webinar: Scaling MySQL - Sharding Made Easy!
 
No sql databases
No sql databasesNo sql databases
No sql databases
 
NoSql
NoSqlNoSql
NoSql
 
The No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra ModelThe No SQL Principles and Basic Application Of Casandra Model
The No SQL Principles and Basic Application Of Casandra Model
 
MySQL HA Sharding-Fabric
MySQL HA Sharding-FabricMySQL HA Sharding-Fabric
MySQL HA Sharding-Fabric
 
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
Big Data Storage Concepts from the "Big Data concepts Technology and Architec...
 
Using MySQL Fabric for High Availability and Scaling Out
Using MySQL Fabric for High Availability and Scaling OutUsing MySQL Fabric for High Availability and Scaling Out
Using MySQL Fabric for High Availability and Scaling Out
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
Chapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choicesChapter1: NoSQL: It’s about making intelligent choices
Chapter1: NoSQL: It’s about making intelligent choices
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
 
Revision
RevisionRevision
Revision
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 

Kürzlich hochgeladen

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Kürzlich hochgeladen (20)

Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 

Evan Ellis "Tumblr. Massively Sharded MySQL"

  • 1. Massively Sharded MySQL Evan Elias Velocity Europe 2011
  • 2. Tumblrʼs Size and Growth 1 Year Ago Today Impressions 3 Billion/month 15 Billion/month Total Posts 1.5 Billion 12.5 Billion Total Blogs 9 Million 33 Million Developers 3 17 Sys Admins 1 5 Total Staff (FT) 13 55 Massively Sharded MySQL
  • 3. Our databases and dataset • Machines dedicated to MySQL: over 175 • Thatʼs roughly how many production machines we had total a year ago • Relational data on master databases: over 11 terabytes • Unique rows: over 25 billion Massively Sharded MySQL
  • 4. MySQL Replication 101 • Asynchronous • Single-threaded SQL execution on slave • Masters can have multiple slaves • A slave can only have one master • Can be hierarchical, but complicates failure-handling • Keep two standby slaves per pool: one to promote when a master fails, and the other to bring up additional slaves quickly • Scales reads, not writes Massively Sharded MySQL
  • 5. Why Partition? Reason 1: Write scalability • No other way to scale writes beyond the limits of one machine • During peak insert times, youʼll likely start hitting lag on slaves before your master shows a concurrency problem Massively Sharded MySQL
  • 6. Why Partition? Reason 2: Data size • Working set wonʼt fit in RAM • SSD performance drops as disk fills up • Risk of completely full disk • Operational difficulties: slow backups, longer to spin up new slaves • Fault isolation: all of your data in one place = single point of failure affecting all users Massively Sharded MySQL
  • 7. Types of Partitioning • Divide a table • Horizontal Partitioning • Vertical Partitioning • Divide a dataset / schema • Functional Partitioning Massively Sharded MySQL
  • 8. Horizontal Partitioning Divide a table by relocating sets of rows • Some support internally by MySQL, allowing you to divide a table into several files transparently, but with limitations • Sharding is the implementation of horizontal partitioning outside of MySQL (at the application level or service level). Each partition is a separate table. They may be located in different database schemas and/or different instances of MySQL. Massively Sharded MySQL
  • 9. Vertical Partitioning Divide a table by relocating sets of columns • Not supported internally by MySQL, though you can do it manually by creating separate tables. • Not recommended in most cases – if your data is already normalized, then vertical partitioning introduces unnecessary joins • If your partitions are on different MySQL instances, then youʼre doing these “joins” in application code instead of in SQL Massively Sharded MySQL
  • 10. Functional Partitioning Divide a dataset by moving one or more tables • First eliminate all JOINs across tables in different partitions • Move tables to new partitions (separate MySQL instances) using selective dumping, followed by replication filters • Often just a temporary solution. If the table eventually grows too large to fit on a single machine, youʼll need to shard it anyway. Massively Sharded MySQL
  • 11. When to Shard • Sharding is very complex, so itʼs best not to shard until itʼs obvious that you will actually need to! • Predict when you will hit write scalability issues — determine this on spare hardware • Predict when you will hit data size issues — calculate based on your growth rate • Functional partitioning can buy time Massively Sharded MySQL
  • 12. Sharding Decisions • Sharding key — a core column present (or derivable) in most tables. • Sharding scheme — how you will group and home data (ranges vs hash vs lookup table) • How many shards to start with, or equivalently, how much data per shard • Shard colocation — do shards coexist within a DB schema, a MySQL instance, or a physical machine? Massively Sharded MySQL
  • 13. Sharding Schemes Determining which shard a row lives on • Ranges: Easy to implement and trivial to add new shards, but requires frequent and uneven rebalancing due to user behavior differences. • Hash or modulus: Apply function on the sharding key to determine which shard. Simple to implement, and distributes data evenly.  Incredibly difficult to add new shards or rebalance existing ones. • Lookup table: Highest flexibility, but impacts performance and adds a single point of failure. Lookup table may eventually become too large. Massively Sharded MySQL
  • 14. Application Requirements • Sharding key must be available for all frequent look-up operations. For example, canʼt efficiently look up posts by their own ID anymore, also need blog ID to know which shard to hit. • Support for read-only and offline shards. App code needs to gracefully handle planned maintenance and unexpected failures. • Support for reading and writing to different MySQL instances for the same shard range — not for scaling reads, but for the rebalancing process Massively Sharded MySQL
  • 15. Service Requirements • ID generation for PK of sharded tables • Nice-to-have: Centralized service for handling common needs • Querying multiple shards simultaneously • Persistent connections • Centralized failure handling • Parsing SQL to determine which shard(s) to send a query to Massively Sharded MySQL
  • 16. Operational Requirements • Automation for adding and rebalancing shards, and sufficient monitoring to know when each is necessary • Nice-to-have: Support for multiple MySQL instances per machine — makes cloning and replication setup simpler, and overhead isnʼt too bad Massively Sharded MySQL
  • 17. How to initially shard a table Option 1: Transitional migration with legacy DB • Choose a cutoff ID of the tableʼs PK (not the sharding key) which is slightly higher than its current max ID. Once that cutoff has been reached, all new rows get written to shards instead of legacy. • Whenever a legacy row is updated by app, move it to a shard • Migration script slowly saves old rows (at the app level) in the background, moving them to shards, and gradually lowers cutoff ID • Reads may need to check shards and legacy, but based on ID you can make an informed choice of which to check first Massively Sharded MySQL
  • 18. How to initially shard a table Option 2: All at once 1. Dark mode: app redundantly sends all writes (inserts, updates, deletes) to legacy database as well as the appropriate shard. All reads still go to legacy database. 2. Migration: script reads data from legacy DB (sweeping by the sharding key) and writes it to the appropriate shard. 3. Finalize: move reads to shards, and then stop writing data to legacy. Massively Sharded MySQL
  • 19. Shard Automation Tumblrʼs custom automation software can: • Crawl replication topology for all shards • Manipulate server settings or concurrently execute arbitrary UNIX commands / administrative MySQL queries, on some or all shards • Copy large files to multiple remote destinations efficiently • Spin up multiple new slaves simultaneously from a single source • Import or export arbitrary portions of the dataset • Split a shard into N new shards Massively Sharded MySQL
  • 20. Splitting shards: goals • Rebalance an overly-large shard by dividing it into N new shards, of even or uneven size • Speed • No locks • No application logic • Divide a 800gb shard (hundreds of millions of rows) in two in only 5 hours • Full read and write availability: shard-splitting process has no impact on live application performance, functionality, or data consistency Massively Sharded MySQL
  • 21. Splitting shards: assumptions • All tables using InnoDB • All tables have an index that begins with your sharding key, and sharding scheme is range-based. This plays nice with range queries in MySQL. • No schema changes in process of split • Disk is < 2/3 full, or thereʼs a second disk with sufficient space • Keeping two standby slaves per shard pool (or more if multiple data centers) • Uniform MySQL config between masters and slaves: log-slave-updates, unique server-id, generic log- bin and relay-log names, replication user/grants everywhere • No real slave lag, or already solved in app code • Redundant rows temporarily on the wrong shard donʼt matter to app Massively Sharded MySQL
  • 22. Splitting shards: process Large “parent” shard divided into N “child” shards 1. Create N new slaves in parent shard pool — these will soon become masters of their own shard pools 2. Reduce the data set on those slaves so that each contains a different subset of the data 3. Move app reads from the parent to the appropriate children 4. Move app writes from the parent to the appropriate children 5. Stop replicating writes from the parent; take the parent pool offline 6. Remove rows that replicated to the wrong child shard Massively Sharded MySQL
  • 23. Splitting shards: process Large “parent” shard divided into N “child” shards 1. Create N new slaves in parent shard pool — these will soon become masters of their own shard pools 2. Reduce the data set on those slaves so that each contains a different subset of the data 3. Move app reads from the parent to the appropriate children 4. Move app writes from the parent to the appropriate children 5. Stop replicating writes from the parent; take the parent pool offline 6. Remove rows that replicated to the wrong child shard Massively Sharded MySQL
  • 24. Parent Master, blogs 1-1000, R/W all app reads/writes Replication Standby Slave Standby Slave Massively Sharded MySQL
  • 25. Parent Master, blogs 1-1000, R/W all app reads/writes Replication Clone Clone Standby Slave Standby Slave Standby Slave Standby Slave Massively Sharded MySQL
  • 26. Aside: copying files efficiently To a single destination • Our shard-split process relies on creating new slaves quickly, which involves copying around very large data sets • For compression we use pigz (parallel gzip), but there are other alternatives like qpress • Destination box: nc -l [port] | pigz -d | tar xvf - • Source box: tar vc . | pigz | nc [destination hostname] [port] Massively Sharded MySQL
  • 27. Aside: copying files efficiently To multiple destinations • Add tee and a FIFO to the mix, and you can create a chained copy to multiple destinations simultaneously • Each box makes efficient use of CPU, memory, disk, uplink, downlink • Performance penalty is only around 3% to 10% — much better than copying serially or from copying in parallel from a single source • http://engineering.tumblr.com/post/7658008285/efficiently-copying-files-to- multiple-destinations Massively Sharded MySQL
  • 28. Splitting shards: process Large “parent” shard divided into N “child” shards 1. Create N new slaves in parent shard pool — these will soon become masters of their own shard pools 2. Reduce the data set on those slaves so that each contains a different subset of the data 3. Move app reads from the parent to the appropriate children 4. Move app writes from the parent to the appropriate children 5. Stop replicating writes from the parent; take the parent pool offline 6. Remove rows that replicated to the wrong child shard Massively Sharded MySQL
  • 29. Importing/exporting in chunks • Divide ID range into many chunks, and then execute export queries in parallel. Likewise for import. • Export via SELECT * FROM [table] WHERE [range clause] INTO OUTFILE [file] • Use mysqldump to export DROP TABLE / CREATE TABLE statements, and then run those to get clean slate • Import via LOAD DATA INFILE Massively Sharded MySQL
  • 30. Importing/exporting gotchas • Disable binary logging before import, to speed it up • Disable any query-killer scripts, or filter out SELECT ... INTO OUTFILE and LOAD DATA INFILE • Benchmark to figure out how many concurrent import/export queries can be run on your hardware • Be careful with disk I/O scheduler choices if using multiple disks with different speeds Massively Sharded MySQL
  • 31. Parent Master, blogs 1-1000, R/W all app reads/writes Standby Slave, Standby Slave, Standby Slave, Standby Slave, blogs 1-1000 blogs 1-1000 blogs 1-1000 blogs 1-1000 Massively Sharded MySQL
  • 32. Parent Master, blogs 1-1000, R/W all app reads/writes Two slaves export different halves of the dataset, drop Child Shard Child Shard Standby Slave, Standby Slave, Master, Master, and recreate their tables, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 and then import the data Massively Sharded MySQL
  • 33. Parent Master, blogs 1-1000, R/W all app reads/writes Child Shard Child Shard Standby Slave, Standby Slave, Master, Master, Every pool needs two blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 standby slaves, so spin them up for the child shard pools Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 34. Splitting shards: process Large “parent” shard divided into N “child” shards 1. Create N new slaves in parent shard pool — these will soon become masters of their own shard pools 2. Reduce the data set on those slaves so that each contains a different subset of the data 3. Move app reads from the parent to the appropriate children 4. Move app writes from the parent to the appropriate children 5. Stop replicating writes from the parent; take the parent pool offline 6. Remove rows that replicated to the wrong child shard Massively Sharded MySQL
  • 35. Moving reads and writes separately • If configuration updates do not simultaneously reach all of your web/app servers, this would create consistency issues if reads/writes moved at same time • Web A gets new config, writes post for blog 200 to 1st child shard • Web B is still on old config, renders blog 200, reads from parent shard, doesnʼt find the new post • Instead only move reads first; let writes keep replicating from parent shard • After first config update for reads, do a second one moving writes • Then wait for parent master binlog to stop moving before proceeding Massively Sharded MySQL
  • 36. Parent Master, blogs 1-1000, R/W all app reads/writes Child Shard Child Shard Standby Slave, Standby Slave, Master, Master, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 37. Parent Master, blogs 1-1000, W all app writes (which then replicate to children) Reads now go to children R R for appropriate queries to Standby Slave, Standby Slave, Child Shard Child Shard these ranges Master, Master, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 38. Splitting shards: process Large “parent” shard divided into N “child” shards 1. Create N new slaves in parent shard pool — these will soon become masters of their own shard pools 2. Reduce the data set on those slaves so that each contains a different subset of the data 3. Move app reads from the parent to the appropriate children 4. Move app writes from the parent to the appropriate children 5. Stop replicating writes from the parent; take the parent pool offline 6. Remove rows that replicated to the wrong child shard Massively Sharded MySQL
  • 39. Parent Master, blogs 1-1000, W all app writes (which then replicate to children) Reads now go to children R R for appropriate queries to Standby Slave, Standby Slave, Child Shard Child Shard these ranges Master, Master, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 40. Parent Master, blogs 1-1000, no longer in use by app Reads and writes now go to R/W R/W children for appropriate Standby Slave, Standby Slave, Child Shard Child Shard queries to these ranges Master, Master, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 41. Splitting shards: process Large “parent” shard divided into N “child” shards 1. Create N new slaves in parent shard pool — these will soon become masters of their own shard pools 2. Reduce the data set on those slaves so that each contains a different subset of the data 3. Move app reads from the parent to the appropriate children 4. Move app writes from the parent to the appropriate children 5. Stop replicating writes from the parent; take the parent pool offline 6. Remove rows that replicated to the wrong child shard Massively Sharded MySQL
  • 42. Parent Master, blogs 1-1000, no longer in use by app Reads and writes now go to R/W R/W children for appropriate Standby Slave, Standby Slave, Child Shard Child Shard queries to these ranges Master, Master, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 43. Parent Master, blogs 1-1000 Global read-only set App user dropped These are now complete R/W R/W shards of their own Shard Shard (but with some surplus data Standby Slave, Standby Slave, Master, Master, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 that needs to be removed) Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 44. X Deprecated Parent Master, blogs 1-1000 Recycle soon These are now complete X X R/W Shard R/W Shard shards of their own (but with some surplus data Standby Slave, Standby Slave, Master, Master, blogs 1-1000 blogs 1-1000 blogs 1-500 blogs 501-1000 that needs to be removed) Recycle soon Recycle soon Standby Slaves Standby Slaves blogs 1-500 blogs 501-1000 Massively Sharded MySQL
  • 45. Splitting shards: process Large “parent” shard divided into N “child” shards 1. Create N new slaves in parent shard pool — these will soon become masters of their own shard pools 2. Reduce the data set on those slaves so that each contains a different subset of the data 3. Move app reads from the parent to the appropriate children 4. Move app writes from the parent to the appropriate children 5. Stop replicating writes from the parent; take the parent pool offline 6. Remove rows that replicated to the wrong child shard Massively Sharded MySQL
  • 46. Splitting shards: cleanup • Until writes are moved to the new shard masters, writes to the parent shard replicate to all its child shards • Purge this data after shard split process is finished • Avoid huge single DELETE statements — they cause slave lag, among other issues (huge long transactions are generally bad) • Data is sparse, so itʼs efficient to repeatedly do a SELECT (find next sharding key value to clean) followed by a DELETE. Massively Sharded MySQL
  • 47. General sharding gotchas • Sizing shards properly: as small as you can operationally handle • Choosing ranges: try to keep them even, and better if they end in 000ʼs than 999ʼs or, worse yet, random ugly numbers. This matters once shard naming is based solely on the ranges. • Cutover: regularly create new shards to handle future data. Take the max value of your sharding key and add a cushion to it. • Before: highest blog ID 31.8 million, last shard handles range [30m, ∞) • After: that shard now handles [30m, 32m) and new last shard handles [32m, ∞) Massively Sharded MySQL
  • 48. Questions? Email: evan -at- tumblr.com Massively Sharded MySQL