SlideShare a Scribd company logo
1 of 57
Download to read offline
backy
VM backup beyond Bacula/Bareos
Christian Theune

@theuni
ct@flyingcircus.io
Mea Culpa
I should have given this talk last year. But I boarded the wrong train and noticed when I inserted the conference logo why they wrote the wrong city on it.
Turns out I was heading to Nuremberg because Netways is based their and did not realise the conference is in Cologne. Looks like I made it.
However, I’m happy to have the chance to present what we have now, because a lot happened in the last year and thus I don’t have to give this talk
twice. :)
And I almost
missed it — again
And I almost would have to say the same again next year. My son-to-be decided to scare my wife and me on Sunday evening, and I almost did not make it
here again. Thanks to the organisers for not having me tarred and feathered just yet.
Backup!!11!!
We’ve been doing backups since ages. Of our own servers, workstations, applications, databases … and it keeps being painful.
We started in 1999 with using a Tandberg drive with Amanda and we moved on and got a small tape library at some point and started using Bacula quite
early. Things were reasonable, but I still hate tape libraries. And I don’t like answering pain with “you should have spend more money” if the basic thing
just fails by itself.
• flyingcircus.io
• DevOps as a Service
• custom, mission-critical web
applications
Today I work for the Flying Circus which operates custom web applications on a public cluster (with the option to run private clusters, too). We current
manage a couple of dozen physical servers and multiple hundred virtual machines to do that. Sizes vary a lot, workload varies a lot. We’ve been using
bacula for a long time with a largish RAID6-based drive array. We use Ceph as the primary storage (we used to use iSCSI) and we have been taking
backups directly from the storage servers running the FDs with the option to restore into a FD on the VM. We also have pre/post scripting to take
snapshots and set databases into backup mode.
This talk is mostly an excursion into the “why the hell did we build our own backup system” and showing what we decided to build. All of this is a
perpetual work in progress, but we feel that the “build your own” for our case ends up with higher quality and less maintenance effort than we had with
bacula. Whether that’s true in the long run - we’ll see. Let’s start with how we came to ask the question “How can we get away from Bacula?”
Part I - Oh the Pain
We always have nice times and rough times with our tools, but Bacula has been on our naughty list for a while and I’d like to start with the incident that
caused us to take action.
The story unfolds …
On a relatively slow day, we suddenly got Nagios warnings about filesystem errors. One, two, and then suddenly a lot of them. It’s 14:50.
We quickly found that a complex bug caused a rogue server to delete all our VM images. I think we lost about 50% of running VMs at that point. Luckily
the bug tried to delete the images so quickly that Ceph got confused and not all deletion requests went through.
Time to restore!
At T+1.5 hours later we had fixed the bug (by disabling any code that deletes stuff) and restore has started. We prioritised central services and SLA
customers and were well on our way.
At T+11 hours we were pretty much done. Except that our most valuable customer’s database VM would not want to restore. We were already on a shift
cycle so that everyone could get some sleep and com back fresh, but Bacula was hitting a limit to either continue restoring or figuring out how to get
around the inconsistency.
Unfortunately, the inconsistency was in the middle of some 100+GB volume that took a while to start the restore and forward to the 70GiB mark where it
would suddenly stop because of the inconsistency. Also, as we were trying different director options to let it ignore the inconsistency we had to halt other
backups …
Finally, around 20 hours later, we had all customer services back online.
Our most valuable customer had the most downtime. Great.
Root Cause Analysis
After we caught some breath and with a long list of things that got in our way while trying to restore. Here’s the list of things that failed us.
http://flyingcircus.io/
postmortems/13266.pdf
If you’d like to go into the details yourself. Here’s a URL. It’s an 8 page document. It starts with the basics and works through the event. Let me know if you
read it in the future and have any questions.
Restore script bottleneck:
global lock
The restore script has evolved in a way that did not allow large scale restores as we required them. We had to tune it in multiple places to avoid locks, talk
to the director in a different way. This took us about 3 hours.
Undetected inconsistency in
important customer database
After we ramped up parallelism a large customer database VM showed signs of inconsistency and aborted the restores. While others were busy this took
one person many hours to diagnose and fix while other backups were restored. This also stopped other backups every now and then.
Bacula: complexity and the
VTL
In the aftermath we saw that removing the global lock from the restore script should have been easy. However, the overall complexity of instrumenting
Bacula and the mismatch of the VTL when doing disk-based backups caused this to become a long and cumbersome task.
I don’t think we can ever come to the point where all our scripts for rare events will be perfect and thus I’d rather prefer to have our scripts in generally
good shape on a basic level and allow quick modifications as an event unfolds.
Not “everything” backed up.
Many of our VMs carry generated installations: the VMs are provisioned by our platform layer using puppet and customer scripting and are very similar.
For services our customers and us use separate service deployment tools (batou) to put customisations on top.
To save disk space and reduce backup load we did not back up all VMs if a customer had, i.e. dozens of identical application servers.
This however, caused us to have to re-deploy customer-specific installations during this large-scale event which was more fiddly then we would have liked
(provision the VM from the beginning, re-deploy the service, find glitches under this special scenario).
24 hours are not a sufficient
RPO in quite a few cases
Daily backups are OK for many cases. But in quite a few others, they are not. We’d love to give customers the option of making hourly backups (or hell,
even on a per-minute basis).
With our Bacula setup there was no feasible way to do this as it would have to at least scan all the metadata of a filesystem to determine what has
changed. Also, we have some pathological cases where append-only databases would cause insane write-amplification in that case.
Paper cuts
• Hard link farms
• Boot loaders
• The director as a “most valuable bottleneck”
Sigh. Yes, we’re running Cyrus. But still. Not restoring hard link farms really makes me hate archive formats that have to replicate all the filesystem logic.
Restoring VMs in our case means preparing an empty disk image, putting the standard base image on it, then running a restore into it and reconfiguring
the boot loader. This means the restore script has to know at least something about our boot loading process. This makes backups that come from a
different period of boot loader configuration broken. Sigh.
The director. A single process. For hundreds of jobs. And you can’t reload on the fly. And it takes us 2 MIB of auto-generated configuration. And I never
can predict how schedules will really work out.
Recap
• Restore fiddly to script
• Undetected inconsistency that was hard to deal with
• Blind spots
• Daily Interval
• Overall complexity, performance and the VTL
• Paper cuts
Daily backups are OK for many cases. But in quite a few others, they are not. We’d love to give customers the option of making hourly backups (or hell,
even on a per-minute basis).
With our Bacula setup there was no feasible way to do this as it would have to at least scan all the metadata of a filesystem to determine what has
changed. Also, we have some pathological cases where append-only databases would cause insane write-amplification in that case.
We had a couple of other incidents in the mean time that needed restore and I *never* am happy to have them. I’m always immediately mad to having to
restore because I know something is going to blow up.
Part II - Make a wish
Simplicity
• Restore with basic Unix tools
• No VTL
• Not mixing data of different VMs
Reliability
• Verification / Scrubbing / (Repair)
• High frequency
• Integration with storage snapshots
• Not inventing new formats
Operability
• Avoid bottlenecks / head-of-line blocking
• Efficient deltas for large files (ZODB)
• Parallelisation (multiple jobs and multiple servers)
• Simple scripting and environment-specific integration
• Coordination: pre/post actions on storage, hypervisor,
VM …
Operability II
• Simple Nagios integration to ensure we notice RPO/
SLA failures
• RTO-compliance during mass-restore
• Self-service for customers to restore files or VMs
Part III - Let’s do this!
–Probably someone, maybe me
“One size fits all … not”
It’s all about size
- Bacula / Bareos are general solutions to the big problem of backing up and restoring data. They have advanced capabilities, they have a large installed
base and the support many features.
- However, the pure complexity of things, is what gives us the big pain.
- We want to drive down complexity as much as we can, build on existing tools, and then add some of our own stuff.
- So. We’re not trying to solve backup for each and everyone. However, having compute and storage prevalently
It’s all about size: backy
~3050 LOC in Python
about 50% of the code are tests!
about 94% branch coverage
It’s all about size: Bacula
~150k LOC in C. This is 50x the size. Considering that intellectual complexity of this rises at least geometrically, then this would be 2.500 times more
complicated. Well. Ok. I’m being ironic.
Nevertheless, this is a lot of code. The mount of test code that I found was about 3000 lines (simply grepping for “test” in the filename). That means that
the code/test ratio is about 2%. On a code-base that is 50 times more complicated I see only a 45th of the test coverage ratio. This doesn’t feel good.
Please. If anyone is outraged right now because I did not find the tests. Please tell me. I’d love to hear how Bacula or Bareos are doing quality assurance
of their code nowadays.
It’s all about size: Bareos
Just as Bacula is big, Bareos seems to have some traction and seems to be healthy from a contributions perspective. At least they managed to crank out
another 100k more lines of code, totalling in 250k.
My insanely dumb test script tells me there’s 5.5k lines of test-related code. That’s actually an increase of 0.2% of test coverage. But again, let me know if
you’re doing unit or functional tests in a way that was not obvious for me.
The giants we stand on
Ok. We decided to do as little as possible ourselves which means we’re standing on multiple giant’s shoulders.
We only backup Qemu VM images. And luckily Qemu and Linux have an API for ensuring consistent state (fsfreeze) of a volume while it’s being mounted
read/write.
All our disk images are stored in Ceph which has cheap snapshots, is networked and and can hand out deltas pretty cheap, too.
We choose to use btrfs as the storage target of the images. It can store sparse files and we leverage the CoW-semantics (cp —reflink) when integrating
deltas from Ceph.
Limits
• Not a general purpose backup system.
• No tapes. No weird hardware.
• Restore without tools.
• Dead-simple configuration.
So. As I said: not one size fits all. Let’s build something that fits for us perfectly.
We also want to use standard hardware. At the moment we target a RAID6+Hotspare with about 50TB space.
We want to be able to restore into Ceph (or in the worst case into an iSCSI host if need be, or, *whatever*). DD is our tool of choice.
Let’s take a tour
So. Lets take a tour what backy looks like. We’ll start with a component overview.
So. Lets take a tour what backy looks like. We’ll start with a component overview.
simple!?!
- Obviously this is much simpler. Right?
- Ok, well. This is still quite complicated and involved. The point is: all of the tools have a very specific infrastructure-oriented job and they *contribute* to
our specific solution. We use Qemu anyway. We instrument Qemu with our “fc agent” anyway. We use Ceph anyway. We use Consul for coordination
anyway.
- Those tools do not come into existence in our environment just for backy. Our environment is quite Unix-oriented in this way. We use composable tools
that have specific jobs. Those provide us features or services or functionality that will be useful for many higher-level tasks.
- So again, yes this is not simple. However, plugging those things together, and testing this plugging, proved helpful to us. We invest in tools that we can
reuse and we build our own on top.
- We stopped shopping for one-stop-solution tools. We don’t want a cooking-oven-microwave-tv-lawnmower. We want some sandpaper, and a hammers,
and nails, and electricity.
- Again, most of this code we don’t have to maintain. And we configure those components anyway. So that’s why we get along with little code that we
can test well.
Hello CLI
- Backy ships with a single command that provides sub-commands. Intentionally you should never interact with those on a daily basis.
- However, I’ll show you how to interact with the CLI to trigger an immediate “random” backup and how to start the scheduler. The check command
provides a general Nagios-compatible check that you can use to alert for SLA/Scheduling issues on the whole installation.
Running a single backup
- You can simply ask for back to run a backup. This uses the job-specific configuration of the current directory “litprod00” and does all the work of
getting a snapshot, exporting the diff, integrating it into the CoW copy and running a partial random verification against the original source.
- Note that a differential backup of a 10GB volume took 8 seconds real time. Most of the time is spend by Ceph constructing the delta, and the other in
doing a random check of the volume agains the original.
-
Inspecting a backup
- Backy has a small command to inspect the status of this VMs backup archives. Each backup we call a revision and you see that it has tags. We not how
much data we backed up, how long it took and when. The ID is a short UUID.
- The summary is an estimation based on the amount of data we backed up for each revision. Note that this only indicates the average backup size for
each revision.
Inspecting a backup
- An important decision was to avoid storing data of multiple machines in a very obvious fashion.
- We thus create a simple directory hierarchy: a backy start directory, a directory per VM, and 2 files for each backup.
- Backy also keeps a per-machine log in the directory of *all* activities that were done using any backy command for this volume. The .rev files are yaml
files that store metadata about each revision. The most current revision can be found using the timestamp or is always available through the “last”
symlink.
-
Hello, daemon!
- Initially we intended to run without any special daemon. However, properly doing load management and scheduling with a shifting number of
configurations intended to require a few clever tricks that we put into a little daemon.
- The scheduler is based on Python 3.4’s asyncio and allows us to have a relatively simple implementation of running parall jobs with low overhead. Every
VM gets an infinite-loop coroutine that will schedule this VMs next backup, wait for that deadline, submit a task to a work pool that provides worker
limits, which in turn simply call the back shell command to a single backup. It then cleans up any expired backups and starts from the beginning.
- Also, the scheduler is stateless and thus can be stopped and started at any time without loosing queues. It doesn’t have to store data for that either.
- As the scheduler is completely irrelevant to restoring, we can restore while backing up or we can simply stop the scheduler.
Daemon configuration
- The daemon has three types of configuration. Some global options, like limiting the number of parallel jobs and the base directory where to put
backups.
- Then it defines multiple schedules that are named (i’ll explain those in detail in a moment)
- And then it describes jobs by stating their name, their type of source (file is a simple job that we use for low-level testing, others are pure Ceph and
Flying-Circus-specific consul-managed jobs), and which schedule they belong to.
- That’s it. This file can be computed from our inventory in a very simple fashion, we restart the scheduler and are happy.
Scheduling
This was probably one of the harder things to implement. At the moment we’re happy to have a very simple pattern.
The general terms are “schedule”, “tag”, “interval”, and “keep”.
What we didn’t do:
* allow references to absolute times. those don’t make sense on a broad platform as we have to adjust backups equally throughout the day. And honestly,
if it matters whether you make backups at 3 am versus 3pm then you actually are asking me to do hourly backups instead of dailys.
* allow referencing any special thing like weekdays, holidays, … what … ever.
* The way the schedule works is that a predictable pattern like “every 24 hours” can be derived from “which backups do exist” and “are we due for
another one yet and if yes, which tags are on it?”
* We then run a backup at some point, stick tags to it and are done.
Scheduling
This was probably one of the harder things to implement. At the moment we’re happy to have a very simple pattern.
The general terms are “schedule”, “tag”, “interval”, and “keep”.
What we didn’t do:
* allow references to absolute times. those don’t make sense on a broad platform as we have to adjust backups equally throughout the day. And honestly,
if it matters whether you make backups at 3 am versus 3pm then you actually are asking me to do hourly backups instead of dailys.
* allow referencing any special thing like weekdays, holidays, … what … ever.
* The way the schedule works is that a predictable pattern like “every 24 hours” can be derived from “which backups do exist” and “are we due for
another one yet and if yes, which tags are on it?”
* We then run a backup at some point, stick tags to it and are done.
Also, note that we do not have to differentiate the schedule for delta/full/differential. This is what Ceph and btrfs give us for free. We just specify a rhythm
of RPOs and then we’re done.
Purging
Nothing to see.
Really.
Well. OK. The actual thing is: this is the backside of scheduling.
Purging
Every tag in the schedule has a “keep” value. It means two things:
1. do not remove this tag from revisions as long as we have less than N revisions with this tag
2. do not remove this tag from revisions as long as the last revision is younger than interval*N
When a revision with a given tag runs out of those criteria (we have enough revisions with this tag and they are old enough) then the tag gets removed.
Once a revision has no tags left any longer, remove the revision. btrfs takes care of any block-level references to the data that need to be deleted at that
point.
Scrubbing
• partial, random verification during backup against
source
• btrfs scrubbing
• Raid-6
We do partial verification of a freshly made backup against the original source in Ceph. In addition to that we rely on btrfs scrubbing to warn us of any
issues. On top of that we hope to reduce the chance of unrecoverable bitrot with RAID 6. I think we’re relatively safe at the moment for the amount of
data we store.
Deleting a VM
• rewrite config, reload master
• rm -rf
Monitoring
• old state is uninteresting
• do I have to act?
This is something I’m kinda proud of. We tried multiple things in the past to monitor bacula, but it either ended up being too complicated and brittle, or
didn’t trigger at the right times, or too often, or …
So, I wanted to come up with a single test that tells me whether I have to act or not. What I noticed is that old state is uninteresting as we can’t fix it
anyway. If I missed a backup a week ago, then that’s happened. I can’t fix that. I can’t travel in time. (For that reason I’ve built in a way to catch up with
recent backups so backy has some limited self-repair here.)
When I come to the office in the morning, I want to know: are we good, or not. And that’s what I built.
Monitoring
Backy has a simple telnet console that can give an overview of what’s going on. The SLA column is interesting. The SLA being OK means that the last
backup is not older than 150% of the time of the smallest interval in our schedule. Done.
Backy also has a convenience subcommand that aggregates this for all jobs. To support this backy writes the status output you see here into a status file
every 30 seconds and the Nagios check reads that (so it doesn’t have to wait for a crashed daemon). The check validates that the file is fresh and no jobs
have exceeded their SLA.
Ok. The tour has probably been fast and rough. Let’s wrap it up here and call it a day, shall we?
What did we leave out?
• Physical host backup
• Guesstimating achievable backup storage ratio
* I don’t want to care about backup of physical hosts any more. Most stuff is managed automatically anyway. OS installation is mostly automatic, too.
Important data can be backed up by rsyncing files into a VM that is backed up with backy in our case.
* I don’t really care much about backup storage ratio. Having to keep 100% of every data for every day or hour for 3 months isn’t feasible. Storing
between 2-4 times the original volume is fine. Heck. Even 10 would probably be fine. Space is cheap.
Future
• trim-ready - waiting for our whole stack (Guest,
Hypervisor, Ceph, …) to pass this through
• Hot reload of scheduler
• Ensuring we can move VM backup directories between
different backup hosts
* I don’t want to care about backup of physical hosts any more. Most stuff is managed automatically anyway. OS installation is mostly automatic, too.
Important data can be backed up by rsyncing files into a VM that is backed up with backy in our case.
* Restarting the scheduler is fine for now. In the future we’ll likely implement a hot-reload feature to avoid accidentally tripping up already running jobs.
Having your backup and
eating it!
I think the biggest thing I wanted to get off my chest: bacula has been good at backup to us for a long time. It’s always been a bit annoying when it came
to restore. And obviously we all know by now: nobody wants backup, everybody wants restore.
Whenever we fail to restore our customers data in time and consistently we fail badly. This is what our backup needs t o measure up to. We have grown
out of bacula from both amount of data (restore calculations take ages) and from an operational perspective.
We need to move faster. We need to integrate more. We want to solve policy-oriented issues on a completely different level. We’re used to writing code
to solve our issues. We’re developers. We know coding is hard. That’s why we like small reliable tools that we can compose. Bacula isn’t very composable.
The only advice that I can give is based on personal experience: I love knowing how the pieces work and contribute to the world building my own.
However, the number of pieces we have to deal with is growing. And that means I want those pieces to be small, multi-purpose, do their job very-very-
well and then integrate them. From my perspective: big frameworks are dead. That’s why I love nginx over Apache. Or Pyramid over Django.
Small is beautiful. But I might be wrong and might say the opposite tomorrow. Caveat emptor.
@theuni
ct@flyingcircus.io
Thanks for having me and thanks for hearing me out. Do we have time for questions?
Image Sources
• https://www.flickr.com/photos/mpa-berlin/
14337541104/
• https://www.flickr.com/photos/
seattlemunicipalarchives/4777122561
• https://www.flickr.com/photos/jkroll/15314415946/
• https://www.flickr.com/photos/dvids/6956044669/
• https://www.flickr.com/photos/flowtastic/7354146628/
Image Sources
• https://www.flickr.com/photos/galeria_stefbu/
4781641072/in/pool-fotoszene/
• https://www.flickr.com/photos/dlography/6982668385/
• https://www.flickr.com/photos/
127437845@N04/15142216255
• https://www.flickr.com/photos/
clement127/15440591160
Image Sources
• https://www.flickr.com/photos/
clement127/15999160179
• https://www.flickr.com/photos/
63433965@N04/5814096531/
• private pictures

More Related Content

What's hot

2017.06.19 Paul Woodward - ExploreVM VMware 101
2017.06.19   Paul Woodward - ExploreVM VMware 1012017.06.19   Paul Woodward - ExploreVM VMware 101
2017.06.19 Paul Woodward - ExploreVM VMware 101Paul Woodward Jr
 
2015 ZendCon - Do you queue
2015 ZendCon - Do you queue2015 ZendCon - Do you queue
2015 ZendCon - Do you queueMike Willbanks
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterEugene Kirpichov
 
Integration with EMC VNX and VNXe hybrid storage arrays
Integration with EMC VNX and VNXe hybrid storage arraysIntegration with EMC VNX and VNXe hybrid storage arrays
Integration with EMC VNX and VNXe hybrid storage arraysVeeam Software
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notesPerrin Harkins
 
Optimizing WordPress Performance on Shared Web Hosting
Optimizing WordPress Performance on Shared Web HostingOptimizing WordPress Performance on Shared Web Hosting
Optimizing WordPress Performance on Shared Web HostingJon Brown
 
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICESSpring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICESMichael Plöd
 
VMworld 2014: Virtualize Active Directory, the Right Way!
VMworld 2014: Virtualize Active Directory, the Right Way!VMworld 2014: Virtualize Active Directory, the Right Way!
VMworld 2014: Virtualize Active Directory, the Right Way!VMworld
 
D installation manual
D installation manualD installation manual
D installation manualFaheem Akbar
 
JavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jcloudsJavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jcloudszshoylev
 
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014zshoylev
 
Toplog candy elves - HOCM Talk
Toplog candy elves - HOCM TalkToplog candy elves - HOCM Talk
Toplog candy elves - HOCM TalkPatrick LaRoche
 
Choosing a Web Architecture for Perl
Choosing a Web Architecture for PerlChoosing a Web Architecture for Perl
Choosing a Web Architecture for PerlPerrin Harkins
 
D-DAY 2015 Electric sheep SERVEBOX
D-DAY 2015 Electric sheep SERVEBOXD-DAY 2015 Electric sheep SERVEBOX
D-DAY 2015 Electric sheep SERVEBOXDEVOPS D-DAY
 
Improving Your Domino Designer Experience
Improving Your Domino Designer ExperienceImproving Your Domino Designer Experience
Improving Your Domino Designer ExperienceJulian Robichaux
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 NotesRoss Lawley
 

What's hot (20)

2017.06.19 Paul Woodward - ExploreVM VMware 101
2017.06.19   Paul Woodward - ExploreVM VMware 1012017.06.19   Paul Woodward - ExploreVM VMware 101
2017.06.19 Paul Woodward - ExploreVM VMware 101
 
2015 ZendCon - Do you queue
2015 ZendCon - Do you queue2015 ZendCon - Do you queue
2015 ZendCon - Do you queue
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
Lessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core clusterLessons learnt on a 2000-core cluster
Lessons learnt on a 2000-core cluster
 
Integration with EMC VNX and VNXe hybrid storage arrays
Integration with EMC VNX and VNXe hybrid storage arraysIntegration with EMC VNX and VNXe hybrid storage arrays
Integration with EMC VNX and VNXe hybrid storage arrays
 
Care and feeding notes
Care and feeding notesCare and feeding notes
Care and feeding notes
 
Optimizing WordPress Performance on Shared Web Hosting
Optimizing WordPress Performance on Shared Web HostingOptimizing WordPress Performance on Shared Web Hosting
Optimizing WordPress Performance on Shared Web Hosting
 
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICESSpring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
Spring One 2 GX 2014 - CACHING WITH SPRING: ADVANCED TOPICS AND BEST PRACTICES
 
NetApp against ransomware
NetApp against ransomwareNetApp against ransomware
NetApp against ransomware
 
VMworld 2014: Virtualize Active Directory, the Right Way!
VMworld 2014: Virtualize Active Directory, the Right Way!VMworld 2014: Virtualize Active Directory, the Right Way!
VMworld 2014: Virtualize Active Directory, the Right Way!
 
D installation manual
D installation manualD installation manual
D installation manual
 
JavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jcloudsJavaOne 2014: Taming the Cloud Database with jclouds
JavaOne 2014: Taming the Cloud Database with jclouds
 
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
Taming the Cloud Database with Apache jclouds, ApacheCon Europe 2014
 
Toplog candy elves - HOCM Talk
Toplog candy elves - HOCM TalkToplog candy elves - HOCM Talk
Toplog candy elves - HOCM Talk
 
Choosing a Web Architecture for Perl
Choosing a Web Architecture for PerlChoosing a Web Architecture for Perl
Choosing a Web Architecture for Perl
 
D-DAY 2015 Electric sheep SERVEBOX
D-DAY 2015 Electric sheep SERVEBOXD-DAY 2015 Electric sheep SERVEBOX
D-DAY 2015 Electric sheep SERVEBOX
 
XS 2008 Boston Project Snowflock
XS 2008 Boston Project SnowflockXS 2008 Boston Project Snowflock
XS 2008 Boston Project Snowflock
 
Scalable talk notes
Scalable talk notesScalable talk notes
Scalable talk notes
 
Improving Your Domino Designer Experience
Improving Your Domino Designer ExperienceImproving Your Domino Designer Experience
Improving Your Domino Designer Experience
 
Rails Conf Europe 2007 Notes
Rails Conf  Europe 2007  NotesRails Conf  Europe 2007  Notes
Rails Conf Europe 2007 Notes
 

Viewers also liked

Senior-Project-Presentation-Template (1)
Senior-Project-Presentation-Template (1)Senior-Project-Presentation-Template (1)
Senior-Project-Presentation-Template (1)Aaron Boshers
 
Bacula Overview
Bacula OverviewBacula Overview
Bacula Overviewsambismo
 
Automating backup provisioning with Bacula and Puppet
Automating backup provisioning with Bacula and PuppetAutomating backup provisioning with Bacula and Puppet
Automating backup provisioning with Bacula and Puppetmiouhpi
 
Introduction to Bacula
Introduction to BaculaIntroduction to Bacula
Introduction to BaculaHemant Shah
 
Guia completo para definição de estatística de modelos e algoritmos de machin...
Guia completo para definição de estatística de modelos e algoritmos de machin...Guia completo para definição de estatística de modelos e algoritmos de machin...
Guia completo para definição de estatística de modelos e algoritmos de machin...Geanderson Lenz
 
170311【bacula】cent os7で統合バックアップbacula7.4を使ってみよう
170311【bacula】cent os7で統合バックアップbacula7.4を使ってみよう170311【bacula】cent os7で統合バックアップbacula7.4を使ってみよう
170311【bacula】cent os7で統合バックアップbacula7.4を使ってみようKen Sawada
 

Viewers also liked (6)

Senior-Project-Presentation-Template (1)
Senior-Project-Presentation-Template (1)Senior-Project-Presentation-Template (1)
Senior-Project-Presentation-Template (1)
 
Bacula Overview
Bacula OverviewBacula Overview
Bacula Overview
 
Automating backup provisioning with Bacula and Puppet
Automating backup provisioning with Bacula and PuppetAutomating backup provisioning with Bacula and Puppet
Automating backup provisioning with Bacula and Puppet
 
Introduction to Bacula
Introduction to BaculaIntroduction to Bacula
Introduction to Bacula
 
Guia completo para definição de estatística de modelos e algoritmos de machin...
Guia completo para definição de estatística de modelos e algoritmos de machin...Guia completo para definição de estatística de modelos e algoritmos de machin...
Guia completo para definição de estatística de modelos e algoritmos de machin...
 
170311【bacula】cent os7で統合バックアップbacula7.4を使ってみよう
170311【bacula】cent os7で統合バックアップbacula7.4を使ってみよう170311【bacula】cent os7で統合バックアップbacula7.4を使ってみよう
170311【bacula】cent os7で統合バックアップbacula7.4を使ってみよう
 

Similar to Backy - VM backup beyond bacula

On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterSrihari Sriraman
 
on the most suitable storage architecture for virtualization
on the most suitable storage architecture for virtualizationon the most suitable storage architecture for virtualization
on the most suitable storage architecture for virtualizationJordi Moles Blanco
 
Inoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migrationInoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migrationOpenNebula Project
 
OSBConf 2015 | Vm backup beyond bacula by christian theune
OSBConf 2015 | Vm backup beyond bacula by christian theuneOSBConf 2015 | Vm backup beyond bacula by christian theune
OSBConf 2015 | Vm backup beyond bacula by christian theuneNETWAYS
 
Real world experience with provisioning services
Real world experience with provisioning servicesReal world experience with provisioning services
Real world experience with provisioning servicesCitrix
 
Redo and Rollback
Redo and RollbackRedo and Rollback
Redo and RollbackTubaahin10
 
Scalable Web Arch
Scalable Web ArchScalable Web Arch
Scalable Web Archroyans
 
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesCal Henderson
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebula Project
 
Become a MySQL DBA - slides: Deciding on a relevant backup solution
Become a MySQL DBA - slides: Deciding on a relevant backup solutionBecome a MySQL DBA - slides: Deciding on a relevant backup solution
Become a MySQL DBA - slides: Deciding on a relevant backup solutionSeveralnines
 
Capistrano, Puppet, and Chef
Capistrano, Puppet, and ChefCapistrano, Puppet, and Chef
Capistrano, Puppet, and ChefDavid Benjamin
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualizationFranck Pachot
 
The virtues of backup disaster recovery
The virtues of backup disaster recoveryThe virtues of backup disaster recovery
The virtues of backup disaster recoveryZack Fabro
 
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...Concentrated Technology
 
JAX London 2015 - Architecting a Highly Scalable Enterprise
JAX London 2015 - Architecting a Highly Scalable EnterpriseJAX London 2015 - Architecting a Highly Scalable Enterprise
JAX London 2015 - Architecting a Highly Scalable EnterpriseC24 Technologies
 
Become a MySQL DBA: performing live database upgrades - webinar slides
Become a MySQL DBA: performing live database upgrades - webinar slidesBecome a MySQL DBA: performing live database upgrades - webinar slides
Become a MySQL DBA: performing live database upgrades - webinar slidesSeveralnines
 

Similar to Backy - VM backup beyond bacula (20)

On The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL ClusterOn The Building Of A PostgreSQL Cluster
On The Building Of A PostgreSQL Cluster
 
Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...
Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...
Maximizing Business Continuity and Minimizing Recovery Time Objectives in Win...
 
Implementing dr w. hyper v clustering
Implementing dr w. hyper v clusteringImplementing dr w. hyper v clustering
Implementing dr w. hyper v clustering
 
on the most suitable storage architecture for virtualization
on the most suitable storage architecture for virtualizationon the most suitable storage architecture for virtualization
on the most suitable storage architecture for virtualization
 
Hyper v r2 deep dive
Hyper v r2 deep diveHyper v r2 deep dive
Hyper v r2 deep dive
 
Inoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migrationInoreader OpenNebula + StorPool migration
Inoreader OpenNebula + StorPool migration
 
OSBConf 2015 | Vm backup beyond bacula by christian theune
OSBConf 2015 | Vm backup beyond bacula by christian theuneOSBConf 2015 | Vm backup beyond bacula by christian theune
OSBConf 2015 | Vm backup beyond bacula by christian theune
 
Good virtual machines
Good virtual machinesGood virtual machines
Good virtual machines
 
Real world experience with provisioning services
Real world experience with provisioning servicesReal world experience with provisioning services
Real world experience with provisioning services
 
Redo and Rollback
Redo and RollbackRedo and Rollback
Redo and Rollback
 
Scalable Web Arch
Scalable Web ArchScalable Web Arch
Scalable Web Arch
 
Scalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & ApproachesScalable Web Architectures - Common Patterns & Approaches
Scalable Web Architectures - Common Patterns & Approaches
 
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
OpenNebulaConf2018 - How Inoreader Migrated from Bare-Metal Containers to Ope...
 
Become a MySQL DBA - slides: Deciding on a relevant backup solution
Become a MySQL DBA - slides: Deciding on a relevant backup solutionBecome a MySQL DBA - slides: Deciding on a relevant backup solution
Become a MySQL DBA - slides: Deciding on a relevant backup solution
 
Capistrano, Puppet, and Chef
Capistrano, Puppet, and ChefCapistrano, Puppet, and Chef
Capistrano, Puppet, and Chef
 
Testing Delphix: easy data virtualization
Testing Delphix: easy data virtualizationTesting Delphix: easy data virtualization
Testing Delphix: easy data virtualization
 
The virtues of backup disaster recovery
The virtues of backup disaster recoveryThe virtues of backup disaster recovery
The virtues of backup disaster recovery
 
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
WinConnections Spring, 2011 - 30 Bite-Sized Tips for Best vSphere and Hyper-V...
 
JAX London 2015 - Architecting a Highly Scalable Enterprise
JAX London 2015 - Architecting a Highly Scalable EnterpriseJAX London 2015 - Architecting a Highly Scalable Enterprise
JAX London 2015 - Architecting a Highly Scalable Enterprise
 
Become a MySQL DBA: performing live database upgrades - webinar slides
Become a MySQL DBA: performing live database upgrades - webinar slidesBecome a MySQL DBA: performing live database upgrades - webinar slides
Become a MySQL DBA: performing live database upgrades - webinar slides
 

More from Christian Theune

batou lightning talk @ PyConDE 2017
batou   lightning talk @ PyConDE 2017batou   lightning talk @ PyConDE 2017
batou lightning talk @ PyConDE 2017Christian Theune
 
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Christian Theune
 
batou - multi(component|host|environment|.*) deployment
batou - multi(component|host|environment|.*) deploymentbatou - multi(component|host|environment|.*) deployment
batou - multi(component|host|environment|.*) deploymentChristian Theune
 
Batou - multi-(host|component|environment|version|platform) deployment
Batou - multi-(host|component|environment|version|platform) deploymentBatou - multi-(host|component|environment|version|platform) deployment
Batou - multi-(host|component|environment|version|platform) deploymentChristian Theune
 
Modern, scalable deployment for plone
Modern, scalable deployment for ploneModern, scalable deployment for plone
Modern, scalable deployment for ploneChristian Theune
 

More from Christian Theune (7)

batou lightning talk @ PyConDE 2017
batou   lightning talk @ PyConDE 2017batou   lightning talk @ PyConDE 2017
batou lightning talk @ PyConDE 2017
 
Nach RAID und Fail-Over
Nach RAID und Fail-OverNach RAID und Fail-Over
Nach RAID und Fail-Over
 
NixOS @ Hackspace Jena
NixOS @ Hackspace JenaNixOS @ Hackspace Jena
NixOS @ Hackspace Jena
 
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
Flying Circus Ceph Case Study (CEPH Usergroup Berlin)
 
batou - multi(component|host|environment|.*) deployment
batou - multi(component|host|environment|.*) deploymentbatou - multi(component|host|environment|.*) deployment
batou - multi(component|host|environment|.*) deployment
 
Batou - multi-(host|component|environment|version|platform) deployment
Batou - multi-(host|component|environment|version|platform) deploymentBatou - multi-(host|component|environment|version|platform) deployment
Batou - multi-(host|component|environment|version|platform) deployment
 
Modern, scalable deployment for plone
Modern, scalable deployment for ploneModern, scalable deployment for plone
Modern, scalable deployment for plone
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Backy - VM backup beyond bacula

  • 1. backy VM backup beyond Bacula/Bareos Christian Theune
 @theuni ct@flyingcircus.io
  • 2. Mea Culpa I should have given this talk last year. But I boarded the wrong train and noticed when I inserted the conference logo why they wrote the wrong city on it. Turns out I was heading to Nuremberg because Netways is based their and did not realise the conference is in Cologne. Looks like I made it. However, I’m happy to have the chance to present what we have now, because a lot happened in the last year and thus I don’t have to give this talk twice. :)
  • 3. And I almost missed it — again And I almost would have to say the same again next year. My son-to-be decided to scare my wife and me on Sunday evening, and I almost did not make it here again. Thanks to the organisers for not having me tarred and feathered just yet.
  • 4. Backup!!11!! We’ve been doing backups since ages. Of our own servers, workstations, applications, databases … and it keeps being painful. We started in 1999 with using a Tandberg drive with Amanda and we moved on and got a small tape library at some point and started using Bacula quite early. Things were reasonable, but I still hate tape libraries. And I don’t like answering pain with “you should have spend more money” if the basic thing just fails by itself.
  • 5. • flyingcircus.io • DevOps as a Service • custom, mission-critical web applications Today I work for the Flying Circus which operates custom web applications on a public cluster (with the option to run private clusters, too). We current manage a couple of dozen physical servers and multiple hundred virtual machines to do that. Sizes vary a lot, workload varies a lot. We’ve been using bacula for a long time with a largish RAID6-based drive array. We use Ceph as the primary storage (we used to use iSCSI) and we have been taking backups directly from the storage servers running the FDs with the option to restore into a FD on the VM. We also have pre/post scripting to take snapshots and set databases into backup mode. This talk is mostly an excursion into the “why the hell did we build our own backup system” and showing what we decided to build. All of this is a perpetual work in progress, but we feel that the “build your own” for our case ends up with higher quality and less maintenance effort than we had with bacula. Whether that’s true in the long run - we’ll see. Let’s start with how we came to ask the question “How can we get away from Bacula?”
  • 6. Part I - Oh the Pain We always have nice times and rough times with our tools, but Bacula has been on our naughty list for a while and I’d like to start with the incident that caused us to take action.
  • 7. The story unfolds … On a relatively slow day, we suddenly got Nagios warnings about filesystem errors. One, two, and then suddenly a lot of them. It’s 14:50.
  • 8. We quickly found that a complex bug caused a rogue server to delete all our VM images. I think we lost about 50% of running VMs at that point. Luckily the bug tried to delete the images so quickly that Ceph got confused and not all deletion requests went through. Time to restore!
  • 9. At T+1.5 hours later we had fixed the bug (by disabling any code that deletes stuff) and restore has started. We prioritised central services and SLA customers and were well on our way.
  • 10. At T+11 hours we were pretty much done. Except that our most valuable customer’s database VM would not want to restore. We were already on a shift cycle so that everyone could get some sleep and com back fresh, but Bacula was hitting a limit to either continue restoring or figuring out how to get around the inconsistency. Unfortunately, the inconsistency was in the middle of some 100+GB volume that took a while to start the restore and forward to the 70GiB mark where it would suddenly stop because of the inconsistency. Also, as we were trying different director options to let it ignore the inconsistency we had to halt other backups …
  • 11. Finally, around 20 hours later, we had all customer services back online. Our most valuable customer had the most downtime. Great.
  • 12. Root Cause Analysis After we caught some breath and with a long list of things that got in our way while trying to restore. Here’s the list of things that failed us.
  • 13. http://flyingcircus.io/ postmortems/13266.pdf If you’d like to go into the details yourself. Here’s a URL. It’s an 8 page document. It starts with the basics and works through the event. Let me know if you read it in the future and have any questions.
  • 14. Restore script bottleneck: global lock The restore script has evolved in a way that did not allow large scale restores as we required them. We had to tune it in multiple places to avoid locks, talk to the director in a different way. This took us about 3 hours.
  • 15. Undetected inconsistency in important customer database After we ramped up parallelism a large customer database VM showed signs of inconsistency and aborted the restores. While others were busy this took one person many hours to diagnose and fix while other backups were restored. This also stopped other backups every now and then.
  • 16. Bacula: complexity and the VTL In the aftermath we saw that removing the global lock from the restore script should have been easy. However, the overall complexity of instrumenting Bacula and the mismatch of the VTL when doing disk-based backups caused this to become a long and cumbersome task. I don’t think we can ever come to the point where all our scripts for rare events will be perfect and thus I’d rather prefer to have our scripts in generally good shape on a basic level and allow quick modifications as an event unfolds.
  • 17. Not “everything” backed up. Many of our VMs carry generated installations: the VMs are provisioned by our platform layer using puppet and customer scripting and are very similar. For services our customers and us use separate service deployment tools (batou) to put customisations on top. To save disk space and reduce backup load we did not back up all VMs if a customer had, i.e. dozens of identical application servers. This however, caused us to have to re-deploy customer-specific installations during this large-scale event which was more fiddly then we would have liked (provision the VM from the beginning, re-deploy the service, find glitches under this special scenario).
  • 18. 24 hours are not a sufficient RPO in quite a few cases Daily backups are OK for many cases. But in quite a few others, they are not. We’d love to give customers the option of making hourly backups (or hell, even on a per-minute basis). With our Bacula setup there was no feasible way to do this as it would have to at least scan all the metadata of a filesystem to determine what has changed. Also, we have some pathological cases where append-only databases would cause insane write-amplification in that case.
  • 19. Paper cuts • Hard link farms • Boot loaders • The director as a “most valuable bottleneck” Sigh. Yes, we’re running Cyrus. But still. Not restoring hard link farms really makes me hate archive formats that have to replicate all the filesystem logic. Restoring VMs in our case means preparing an empty disk image, putting the standard base image on it, then running a restore into it and reconfiguring the boot loader. This means the restore script has to know at least something about our boot loading process. This makes backups that come from a different period of boot loader configuration broken. Sigh. The director. A single process. For hundreds of jobs. And you can’t reload on the fly. And it takes us 2 MIB of auto-generated configuration. And I never can predict how schedules will really work out.
  • 20. Recap • Restore fiddly to script • Undetected inconsistency that was hard to deal with • Blind spots • Daily Interval • Overall complexity, performance and the VTL • Paper cuts Daily backups are OK for many cases. But in quite a few others, they are not. We’d love to give customers the option of making hourly backups (or hell, even on a per-minute basis). With our Bacula setup there was no feasible way to do this as it would have to at least scan all the metadata of a filesystem to determine what has changed. Also, we have some pathological cases where append-only databases would cause insane write-amplification in that case. We had a couple of other incidents in the mean time that needed restore and I *never* am happy to have them. I’m always immediately mad to having to restore because I know something is going to blow up.
  • 21. Part II - Make a wish
  • 22. Simplicity • Restore with basic Unix tools • No VTL • Not mixing data of different VMs
  • 23. Reliability • Verification / Scrubbing / (Repair) • High frequency • Integration with storage snapshots • Not inventing new formats
  • 24. Operability • Avoid bottlenecks / head-of-line blocking • Efficient deltas for large files (ZODB) • Parallelisation (multiple jobs and multiple servers) • Simple scripting and environment-specific integration • Coordination: pre/post actions on storage, hypervisor, VM …
  • 25. Operability II • Simple Nagios integration to ensure we notice RPO/ SLA failures • RTO-compliance during mass-restore • Self-service for customers to restore files or VMs
  • 26. Part III - Let’s do this!
  • 27. –Probably someone, maybe me “One size fits all … not” It’s all about size - Bacula / Bareos are general solutions to the big problem of backing up and restoring data. They have advanced capabilities, they have a large installed base and the support many features. - However, the pure complexity of things, is what gives us the big pain. - We want to drive down complexity as much as we can, build on existing tools, and then add some of our own stuff. - So. We’re not trying to solve backup for each and everyone. However, having compute and storage prevalently
  • 28. It’s all about size: backy ~3050 LOC in Python about 50% of the code are tests! about 94% branch coverage
  • 29. It’s all about size: Bacula ~150k LOC in C. This is 50x the size. Considering that intellectual complexity of this rises at least geometrically, then this would be 2.500 times more complicated. Well. Ok. I’m being ironic. Nevertheless, this is a lot of code. The mount of test code that I found was about 3000 lines (simply grepping for “test” in the filename). That means that the code/test ratio is about 2%. On a code-base that is 50 times more complicated I see only a 45th of the test coverage ratio. This doesn’t feel good. Please. If anyone is outraged right now because I did not find the tests. Please tell me. I’d love to hear how Bacula or Bareos are doing quality assurance of their code nowadays.
  • 30. It’s all about size: Bareos Just as Bacula is big, Bareos seems to have some traction and seems to be healthy from a contributions perspective. At least they managed to crank out another 100k more lines of code, totalling in 250k. My insanely dumb test script tells me there’s 5.5k lines of test-related code. That’s actually an increase of 0.2% of test coverage. But again, let me know if you’re doing unit or functional tests in a way that was not obvious for me.
  • 31. The giants we stand on Ok. We decided to do as little as possible ourselves which means we’re standing on multiple giant’s shoulders. We only backup Qemu VM images. And luckily Qemu and Linux have an API for ensuring consistent state (fsfreeze) of a volume while it’s being mounted read/write. All our disk images are stored in Ceph which has cheap snapshots, is networked and and can hand out deltas pretty cheap, too. We choose to use btrfs as the storage target of the images. It can store sparse files and we leverage the CoW-semantics (cp —reflink) when integrating deltas from Ceph.
  • 32. Limits • Not a general purpose backup system. • No tapes. No weird hardware. • Restore without tools. • Dead-simple configuration. So. As I said: not one size fits all. Let’s build something that fits for us perfectly. We also want to use standard hardware. At the moment we target a RAID6+Hotspare with about 50TB space. We want to be able to restore into Ceph (or in the worst case into an iSCSI host if need be, or, *whatever*). DD is our tool of choice.
  • 33. Let’s take a tour So. Lets take a tour what backy looks like. We’ll start with a component overview.
  • 34. So. Lets take a tour what backy looks like. We’ll start with a component overview.
  • 35. simple!?! - Obviously this is much simpler. Right? - Ok, well. This is still quite complicated and involved. The point is: all of the tools have a very specific infrastructure-oriented job and they *contribute* to our specific solution. We use Qemu anyway. We instrument Qemu with our “fc agent” anyway. We use Ceph anyway. We use Consul for coordination anyway. - Those tools do not come into existence in our environment just for backy. Our environment is quite Unix-oriented in this way. We use composable tools that have specific jobs. Those provide us features or services or functionality that will be useful for many higher-level tasks. - So again, yes this is not simple. However, plugging those things together, and testing this plugging, proved helpful to us. We invest in tools that we can reuse and we build our own on top. - We stopped shopping for one-stop-solution tools. We don’t want a cooking-oven-microwave-tv-lawnmower. We want some sandpaper, and a hammers, and nails, and electricity. - Again, most of this code we don’t have to maintain. And we configure those components anyway. So that’s why we get along with little code that we can test well.
  • 36. Hello CLI - Backy ships with a single command that provides sub-commands. Intentionally you should never interact with those on a daily basis. - However, I’ll show you how to interact with the CLI to trigger an immediate “random” backup and how to start the scheduler. The check command provides a general Nagios-compatible check that you can use to alert for SLA/Scheduling issues on the whole installation.
  • 37. Running a single backup - You can simply ask for back to run a backup. This uses the job-specific configuration of the current directory “litprod00” and does all the work of getting a snapshot, exporting the diff, integrating it into the CoW copy and running a partial random verification against the original source. - Note that a differential backup of a 10GB volume took 8 seconds real time. Most of the time is spend by Ceph constructing the delta, and the other in doing a random check of the volume agains the original. -
  • 38. Inspecting a backup - Backy has a small command to inspect the status of this VMs backup archives. Each backup we call a revision and you see that it has tags. We not how much data we backed up, how long it took and when. The ID is a short UUID. - The summary is an estimation based on the amount of data we backed up for each revision. Note that this only indicates the average backup size for each revision.
  • 39. Inspecting a backup - An important decision was to avoid storing data of multiple machines in a very obvious fashion. - We thus create a simple directory hierarchy: a backy start directory, a directory per VM, and 2 files for each backup. - Backy also keeps a per-machine log in the directory of *all* activities that were done using any backy command for this volume. The .rev files are yaml files that store metadata about each revision. The most current revision can be found using the timestamp or is always available through the “last” symlink. -
  • 40. Hello, daemon! - Initially we intended to run without any special daemon. However, properly doing load management and scheduling with a shifting number of configurations intended to require a few clever tricks that we put into a little daemon. - The scheduler is based on Python 3.4’s asyncio and allows us to have a relatively simple implementation of running parall jobs with low overhead. Every VM gets an infinite-loop coroutine that will schedule this VMs next backup, wait for that deadline, submit a task to a work pool that provides worker limits, which in turn simply call the back shell command to a single backup. It then cleans up any expired backups and starts from the beginning. - Also, the scheduler is stateless and thus can be stopped and started at any time without loosing queues. It doesn’t have to store data for that either. - As the scheduler is completely irrelevant to restoring, we can restore while backing up or we can simply stop the scheduler.
  • 41. Daemon configuration - The daemon has three types of configuration. Some global options, like limiting the number of parallel jobs and the base directory where to put backups. - Then it defines multiple schedules that are named (i’ll explain those in detail in a moment) - And then it describes jobs by stating their name, their type of source (file is a simple job that we use for low-level testing, others are pure Ceph and Flying-Circus-specific consul-managed jobs), and which schedule they belong to. - That’s it. This file can be computed from our inventory in a very simple fashion, we restart the scheduler and are happy.
  • 42. Scheduling This was probably one of the harder things to implement. At the moment we’re happy to have a very simple pattern. The general terms are “schedule”, “tag”, “interval”, and “keep”. What we didn’t do: * allow references to absolute times. those don’t make sense on a broad platform as we have to adjust backups equally throughout the day. And honestly, if it matters whether you make backups at 3 am versus 3pm then you actually are asking me to do hourly backups instead of dailys. * allow referencing any special thing like weekdays, holidays, … what … ever. * The way the schedule works is that a predictable pattern like “every 24 hours” can be derived from “which backups do exist” and “are we due for another one yet and if yes, which tags are on it?” * We then run a backup at some point, stick tags to it and are done.
  • 43. Scheduling This was probably one of the harder things to implement. At the moment we’re happy to have a very simple pattern. The general terms are “schedule”, “tag”, “interval”, and “keep”. What we didn’t do: * allow references to absolute times. those don’t make sense on a broad platform as we have to adjust backups equally throughout the day. And honestly, if it matters whether you make backups at 3 am versus 3pm then you actually are asking me to do hourly backups instead of dailys. * allow referencing any special thing like weekdays, holidays, … what … ever. * The way the schedule works is that a predictable pattern like “every 24 hours” can be derived from “which backups do exist” and “are we due for another one yet and if yes, which tags are on it?” * We then run a backup at some point, stick tags to it and are done. Also, note that we do not have to differentiate the schedule for delta/full/differential. This is what Ceph and btrfs give us for free. We just specify a rhythm of RPOs and then we’re done.
  • 44. Purging Nothing to see. Really. Well. OK. The actual thing is: this is the backside of scheduling.
  • 45. Purging Every tag in the schedule has a “keep” value. It means two things: 1. do not remove this tag from revisions as long as we have less than N revisions with this tag 2. do not remove this tag from revisions as long as the last revision is younger than interval*N When a revision with a given tag runs out of those criteria (we have enough revisions with this tag and they are old enough) then the tag gets removed. Once a revision has no tags left any longer, remove the revision. btrfs takes care of any block-level references to the data that need to be deleted at that point.
  • 46. Scrubbing • partial, random verification during backup against source • btrfs scrubbing • Raid-6 We do partial verification of a freshly made backup against the original source in Ceph. In addition to that we rely on btrfs scrubbing to warn us of any issues. On top of that we hope to reduce the chance of unrecoverable bitrot with RAID 6. I think we’re relatively safe at the moment for the amount of data we store.
  • 47. Deleting a VM • rewrite config, reload master • rm -rf
  • 48. Monitoring • old state is uninteresting • do I have to act? This is something I’m kinda proud of. We tried multiple things in the past to monitor bacula, but it either ended up being too complicated and brittle, or didn’t trigger at the right times, or too often, or … So, I wanted to come up with a single test that tells me whether I have to act or not. What I noticed is that old state is uninteresting as we can’t fix it anyway. If I missed a backup a week ago, then that’s happened. I can’t fix that. I can’t travel in time. (For that reason I’ve built in a way to catch up with recent backups so backy has some limited self-repair here.) When I come to the office in the morning, I want to know: are we good, or not. And that’s what I built.
  • 49. Monitoring Backy has a simple telnet console that can give an overview of what’s going on. The SLA column is interesting. The SLA being OK means that the last backup is not older than 150% of the time of the smallest interval in our schedule. Done. Backy also has a convenience subcommand that aggregates this for all jobs. To support this backy writes the status output you see here into a status file every 30 seconds and the Nagios check reads that (so it doesn’t have to wait for a crashed daemon). The check validates that the file is fresh and no jobs have exceeded their SLA.
  • 50. Ok. The tour has probably been fast and rough. Let’s wrap it up here and call it a day, shall we?
  • 51. What did we leave out? • Physical host backup • Guesstimating achievable backup storage ratio * I don’t want to care about backup of physical hosts any more. Most stuff is managed automatically anyway. OS installation is mostly automatic, too. Important data can be backed up by rsyncing files into a VM that is backed up with backy in our case. * I don’t really care much about backup storage ratio. Having to keep 100% of every data for every day or hour for 3 months isn’t feasible. Storing between 2-4 times the original volume is fine. Heck. Even 10 would probably be fine. Space is cheap.
  • 52. Future • trim-ready - waiting for our whole stack (Guest, Hypervisor, Ceph, …) to pass this through • Hot reload of scheduler • Ensuring we can move VM backup directories between different backup hosts * I don’t want to care about backup of physical hosts any more. Most stuff is managed automatically anyway. OS installation is mostly automatic, too. Important data can be backed up by rsyncing files into a VM that is backed up with backy in our case. * Restarting the scheduler is fine for now. In the future we’ll likely implement a hot-reload feature to avoid accidentally tripping up already running jobs.
  • 53. Having your backup and eating it! I think the biggest thing I wanted to get off my chest: bacula has been good at backup to us for a long time. It’s always been a bit annoying when it came to restore. And obviously we all know by now: nobody wants backup, everybody wants restore. Whenever we fail to restore our customers data in time and consistently we fail badly. This is what our backup needs t o measure up to. We have grown out of bacula from both amount of data (restore calculations take ages) and from an operational perspective. We need to move faster. We need to integrate more. We want to solve policy-oriented issues on a completely different level. We’re used to writing code to solve our issues. We’re developers. We know coding is hard. That’s why we like small reliable tools that we can compose. Bacula isn’t very composable. The only advice that I can give is based on personal experience: I love knowing how the pieces work and contribute to the world building my own. However, the number of pieces we have to deal with is growing. And that means I want those pieces to be small, multi-purpose, do their job very-very- well and then integrate them. From my perspective: big frameworks are dead. That’s why I love nginx over Apache. Or Pyramid over Django. Small is beautiful. But I might be wrong and might say the opposite tomorrow. Caveat emptor.
  • 54. @theuni ct@flyingcircus.io Thanks for having me and thanks for hearing me out. Do we have time for questions?
  • 55. Image Sources • https://www.flickr.com/photos/mpa-berlin/ 14337541104/ • https://www.flickr.com/photos/ seattlemunicipalarchives/4777122561 • https://www.flickr.com/photos/jkroll/15314415946/ • https://www.flickr.com/photos/dvids/6956044669/ • https://www.flickr.com/photos/flowtastic/7354146628/
  • 56. Image Sources • https://www.flickr.com/photos/galeria_stefbu/ 4781641072/in/pool-fotoszene/ • https://www.flickr.com/photos/dlography/6982668385/ • https://www.flickr.com/photos/ 127437845@N04/15142216255 • https://www.flickr.com/photos/ clement127/15440591160
  • 57. Image Sources • https://www.flickr.com/photos/ clement127/15999160179 • https://www.flickr.com/photos/ 63433965@N04/5814096531/ • private pictures