This document discusses strategies for scaling MySQL databases to handle large datasets and high volumes of reads and writes. It recommends partitioning or sharding data, using appropriate storage engines and filesystems, adding memory, compressing data, optimizing queries, monitoring performance, and using tools like Innotop and Maatkit to help scale MySQL for high traffic applications and large amounts of data. The document also provides resources for learning more about scaling MySQL databases.
Relational data bases are good at storing lots of small bits of information related to lots of other small bits of information. MySQL is particularly good at fast reads , can be made to do fast writes and is a lovely store for large datasets
API SHOULD KNOW THE DIFF proxy as an option if API not possible. scale writes. write own interfaces? check on it. btree index larger than memory -- writes get _slow_ Memcached – good for short lived data and caching
Consistently inno now. Myisam still has some good uses. Append only non volatile many reads. Volume manager
Large objects I personally say shouldn’t be in a database. They can be handled more gracefully in a key value store like a filesystem… to keep them in a database, keep them as a pointer to the filesystem. Lots of other options in the nosql space
Volume managers. Volume managers Volume managers
Developers and operations need to be able to try things. hard to justify test databases with all the storage Rapidly stand up a slave and try
operationally this is a great big cudgel, but often cheaper than lots of staff of consulting time
innodb evolving. myisam is not being developed. there are specific cases where myisam is faster… single row retrieval and full table scans in a single table… mark callahan had a good post about this back in march. recent innodb plugin has data packing
cpu tradeoff on compressing data, but worth it for io myisam has compression. innodb has it in the latest plugin retrieve only the number of rows you need.
innodb clusters the data around primary keys so retrieval is less expensive. decrease fragmentation by alter table tablename ENGINE=innodb will reorder data by primary key. alter tables can be expensive by locking.
If you’re worried about swapping, use huge page support for mysql (linux doesn’t swap out huge pages)
operationally this is a great big cudgel, but often cheaper than lots of staff of consulting time
journaled filesystmes and write back caching. make sure you’re optimizing for write speeds as well as recoverabilty. benchmark
The size of the table slows down the insertion of indexes by log N , assuming B-tree indexes. trx_commit -> still flush every second but there is risk of data loss cfq completely fair queueing deadline scheduler InnoDB read-ahead