Gehören Sie zu den Ersten, denen das gefällt!
The Flying Circus is an Operations-as-a-Service platform that supports project development teams to run their custom-develop software for clients. Earlier in 2014 we experienced a major data loss and had to perform massive disaster recovery. Unfortunately our Bacula setup was not up to the task and it took us longer and more effort to restore the data than we and our customers expected.
In this case study I’d like to present our public and very honest root cause analysis on how we managed to lose a lot of VMs’ data, how the restore happened, what we learned and how we’re trying to get better. After investigating our options for the future we decided to move away from Bacula’s file and VTL-oriented model and are currently implementing a solution based on CoW-filesystems (ZFS/btrfs), block-layer snapshots and diffing, and a small utility to glue things together.