The document provides information on migrating to and managing databases on Amazon RDS/Aurora. Some key points include:
- RDS/Aurora handles complexity and makes the database highly available, but it also limits customization options compared to managing your own databases.
- Aurora is a MySQL-compatible database cluster that shares storage across nodes for high availability without replication lag. A cluster has writer and reader endpoints.
- CloudFormation is recommended for creating and managing Aurora clusters due to its native AWS support and ability to integrate with other services.
- Loading large amounts of data into Aurora may require using parallel dump/load tools like Mydumper/Myloader instead of mysqldump due to improved
2. About me
Balázs Pőcze
● I came from the operations world (ops/devops/sre)
● Works as a DBA for 4 years
● Currently I work for Gizmodo
● @banyek
● https://github.com/banyek
● http://blog.balazspocze.me
3. “Move to the cloud they said, it will be fun, they said...”
4. Why to migrate to RDS
- It is AWS native
- A lot of complexity is handled by Amazon
- It is Someone Else’s Problem (SEP ™)
- You have someone to blame
- It just works!
5. Why not migrate to RDS
- It’s not always the same approach as you’d choose
- You can’t access certain parts of your system
- When you have your own enlightened toolset which you
want to use
- You are not happy to use a black box system, when you
just hope it will work somehow
7. RDS/Aurora
- MySQL compatible HA database cluster
- No replication between the nodes, they share a common
storage, and the redo logs got shared
- This means there’s no ‘real’ replication lag between the
write node and the readers, in our case it is between
around 12-16ms
- The cluster has a writer and a reader endpoint, the writer
is not getting read
- A cluster can have up to 16 nodes
- The nodes doesn’t have to be the same size
9. Using the web interface
- DON’T!
- Just don’t.
- It is not reproducible
- There is no clean code you can just read and hand to
someone else
- You can export it to a cloudformation template
10. Awscli / boto3
- Amazon native
- Flexible
- Relatively easy
- Not the most reusable
11. Puppet/ Chef / Ansible
We tried to use puppet, but it wasn't a success - for us
- A lot of recreations happened instead of modifying
resources
- The newest API was not supported
- It worked, but with some limitations
12. Terraform
- No vendor lock
- It needs an Atlas server to work properly
- We had bad experience
- Constantly changes
- Not always the latest API in use
- It always recreated and destroyed the cluster instead of modifying
13. CloudFormation
- Native Amazon solution
- Could be written in YAML or in JSON
- Works with changesets
- The Amazon infrastructure fully supports it
- With CodePipeline and CodeBuild you can create a real
neat pipeline!
14. Sizing the nodes
- Always check Amazon’s node parameters, because with
MySQL not often the CPU is the bottleneck, but the
storage speed (Aurora using EBS) or the network could
be!
16. mysqldump / mysqlpump
- --single-transaction
- --master-data=2
- --hex-dump
- Don’t forget the interactive_timeout parameter!
- When uploading to s3 you have to take care of splicing up
- mysqlpump can dump parallel, it exists since 5.7
- The data loading is still single threaded
17. Mydumper / Myloader
- Parallel dumper, way faster than mysqldump (in my case
the data loading was around 5hrs instead of 26!
- No --single-transaction, so you have to take care
of it
- My recommendation: do it on a replica, because you’ll have the
binlog file name and position as well (STOP SLAVE; mydumper
… ; START SLAVE)
18. S3 - csv
Since version 1.8 you are able to load data directly from s3
with ‘LOAD DATA FROM S3’ command. (These command is
similiar to LOAD DATA INFILE regarding the usage and data
formats. To create a compatible dump, use SELECT INTO
OUTFILE …)
More info:
https://goo.gl/BA2fGc
19. S3 - xtrabackup
Regarding the documentation it is possible to load data from
MySQL 5.5 and 5.6 which was created with xtrabackup
● However it never worked for me
More info:
https://goo.gl/KUBZUY
20. AWS Database Migration Service
It is designed to migrate from “any” database to RDS
● Supports schema migration
● After the schemas moved it populates the tables through
replication
https://aws.amazon.com/dms/
23. MariaDB?
- No, it is not.
- It seems like MariaDB in the perspective of GTIDs, but it
is not
- For example, we had to break GTID replication, because MariaDB
don’t uses the same GTID implementation as native MySQL could,
but it can fail back there - theoretically. Aurora can’t.
24. Time Zones
- Aurora databases are in UTC time zone
- Actually as everything else in AWS
- You can change it
- But if you do it, and replicates to an external db be aware if the
mysql timezones are installed - this can break replication
https://goo.gl/LoVkUt
26. Not designed to replicate
- You can replicate, in or out, but I don’t recommend it
- My recommendation is to use replication only for
migration
- It’s not safe
- I had broken and unfixable replication because of a node restart
(just because a config change happened!)
27. RDS managed tables in MySQL
schema
- They are not appear in a mysqldump
- When you create a new host, create these tables
manually, (SHOW CREATE TABLE …) or your replication
will break
28. Aurora (RDS) as master
- Check the hostname of the cluster before setting up
replication, it might be too long for data dictionary (in
mysql 5.6 there are a VARCHAR(60) field hostnames
kinja-staging-rdsstack-aaaaaaaaaaaa-auroracl
uster-bbbbbbbbbbbb.cluster-cbrmrdukwwf3.us-e
ast-1.rds.amazonaws.com
- Just create a CNAME record, or use ip address instead
- I never tested what happens during/after a failover
30. Internet facing cluster
- Try to not expose it on internet
- I don’t have to talk about there are vulnerabilities always, right?
- If so, please always use SSL at least
- You can force SSL connection with mysql
-
31. Management node
- It is good to have a management node where you can
access your database
- Preconfigured work environment, plenty of disk space
for dumps, preconfigured connection parameters etc.
- Shouldn’t be an expensive node (unless you want to
utilize a lot of CPU or disk IO
- Should be spin up quickly and be destroyed when not
needed (hey, this is cloud, right?)
32. IAM Users
- You can authenticate IAM users, and grant privileges
inside of RDS but WHY WOULD YOU DO THIS?!
- I think it is way better to separate db users from IAM users
- Yes, this applies to AD as well
33. Create your own users
- Don’t use the cluster’s own user just for administrative
purposes
- Restrict your app users*
- Separate Read/Write and administrative users
* This is not Aurora/RDS only!
35. Native autoscaling?
- Sadly there is no autoscaling built in, however it would be
nice to have one
- You can create your own one
- You can add node with API
- You can delete node with API
- You can have events generated by CPU usage or number of queries
etc.
36. Cluster addresses
- One dns address for read/write
- One dns address for read
- Each hosts has it’s own dns name
- If you remove a node, it’s address will be in dns for 60
seconds (default TTL for Aurora cluster addresses)
37. Bring your own Autoscaler
(BYOA)
- Endless possibilities!
- Haproxy/ProxySQL/DNS frontend
- My solution is a simple dns based autoscaler, it manages the cluster
address in Route53, when I add a node it creates a node, and puts
the nodes address to dns when it is available, when I remove a node,
it cleans it out from dns first, and after ttl starts to delete the node
41. Read-only variables
- All variables which needs SUPER to change acts like
read-only variables
- You can change them in the parameter group
- And restart the node to the change take place
43. DB parameter group/Cluster
parameter group
- Nodes can have different DB parameter groups, not all of
them has to be the same (Like read_only)
- Cluster parameter groups have options which has to be
the same along the entire cluster (Like timezone,
character encoding)
44. Read only parameters and
cloudformation
- When you change a read only parameter, the cluster
node gets in a ‘pending change’ state and you can reboot
it manually to apply the changes
- When the cluster/node has associated parameter group
in cloudformation, and you change a read only value, your
node will be rebooted automatically
- To avoid this, don’t associate parameter groups with
instances/clusters via cloudformation
- You can have blue/green param group pairs
46. Automated backups
- They’re created via snapshotting the instances
- It is in S3 as backend, but you can’t access it as object, just
from RDS
- You can’t keep them “forever”
- You can’t set retention policy and storage class
- Quick -> Slow -> Long term (Glacier)
47. Long term backups
- Create logical dumps
(mysqldump/mysqlpump/mydumper/SELECT INTO
OUTFILE, SELECT INTO S3)
- You can provision a host for this, do the backup, and copy it to s3,
and remove the backup host (awscli/boto3)
- You can set up storage class for the long term backups
- WARNING: Glacier is freaking expensive if you want to
access your backup. No joke.
48. ALWAYS TEST YOUR BACKUPS
IF YOUR BACKUP IS NOT TESTED WITH RESTORE, THAT
BACKUP IS A NON-EXISTENT BACKUP. PERIOD.
IF YOUR BACKUP IS NOT TESTED WITH RESTORE, THAT
BACKUP IS A NON-EXISTENT BACKUP. PERIOD.
IF YOUR BACKUP IS NOT TESTED WITH RESTORE, THAT
BACKUP IS A NON-EXISTENT BACKUP. PERIOD.
50. Cloudwatch
- You can access host metrics from cloudwatch
- like CPU, Memory, Disk IO
- Not all the MySQL metrics are accessible
- You can create nice dashboards
- As well as nice alerting based on events
51. Third party monitoring tools
- Vividcortex
- PMM
- Newrelic
These tools can connect to mysql port, and collect metrics via
performance_schema. Check their documentation about the
details, be security-aware. If you can monitor via a host or a
container I believe the second one is better. (KISS)
52. Monitoring
- Cloudwatch is good, but not all the metrics can be
reached by it.
- Vividcortex should connect via mysql protocol using
performance_schema
53. Logging
- Hell.
- You can access them from the web interface
- You can access them with awscli
- (maybe there’s a logger backend somewhere?)