And now to the fun subject of virtual machine images!
My computer blue screened the other day. So I got fairly paranoid. This is the second time in a month this has happened. I needed to make sure my backup strategy was solid. The most important thing I want to backup is my Ubuntu VirtualBox image. Here at work we use CrashPlan which seems to do a decent enough job automatically backing up the directories I tell it to. However, the dumb me was just pointing CrashPlan at the directory with my virtual machine images and hoping for the best. This is, of course, problematic as CrashPlan is most likely running its backup while Im actively working in my virtual machine. How can I have any guarantee that that image is in any kind of consistent state?
Backing up with a versioning system
I tried all kinds of hair-brained schemes to backup my virtualbox image in something resembling automatic. I really wanted some kind of diff-based approach to backing up the large file. One thought I had was that before launching the virtual machine in a batch file or script Id commit the vdi it to a version control system. Then Id point CrashPlan at the associated repository which would be backed up in the background. I dont reccomend this. Git turns out to have a sensible maximum file limit, which was a pretty big “hey idiot… what are you doing” warning. I found and tried Boar, a versioning system which attempts to work well for binary files. Unfortunately even its diffing ability in this situation was lacking. After 2 commits of a 20 GB .vdi file, the repository had grown to 40 GB+. This solution wasnt very space efficient. Moreover committing such a large file is tediously slow. Everytime I launched my virtual machine Id have to wait for this boring 5-10 minute process.
Backing up using VirtualBox snapshots
The canonical solution turns out to involve a VirtualBox feature known as snapshots which I had before now known nothing about.
From a users perspective, a snapshot is a restore point. If I create a snapshot, I can go back to that point in time. The best part is I can take a snapshot of a system even while its running. In the simplest use-case you have a linear progression in time of various snapshots. You can restore your virtual machine to any snapshot in the history. You can also do crazy things like go back to a snapshot and create a branch from that snapshot – taking your virtual machine in multiple experimental directions from a single restore point.
How snapshots actually work makes it extremely powerful for backups. A snapshot turns out to be the diffing system I was looking for. When you take a snapshot of a virtual machine, in the default “normal” mode, the associated parent (either the virtual machine image or another snapshot) is frozen and no longer written to. Instead, all writes go into the file associated with new snapshot. This file is in essence a kind-of commit log against the underlying virtual file system of everything that has happened after the snapshot in time. Its a diff of stuff thats changed since the snapshot took place. Restoring to the snapshot point is as simple as throwing away the snapshot-file – the commit log – and unfreezing the snapshots ancestor.
Deleting a snapshot is not removing all the changes in that commit log. Instead its instead folding that snapshot into its ancestor (back to the vdi or another diff file). Its actually committing the diff.
Which leads me to understanding why this can work as a backup strategy.
You can backup a running VirtualBox virtual machine by maintaining a cascade of snapshots. One snapshot, knows as “current” is the most recent snapshot. Restoring to it restores to the last backup. The diff file associated with it (holding all the stuff that has happened AFTER the snapshot) reflects all the non-backed up changes and is the where VirtualBox is actively keeping the guest OSs writes. This diff is a kind of “commit log” of all the changes that are going to the virtual disk. The “previous” snapshot is the restore point before current. In this script, previouss commit log is the old current. The commit log/diff associated with “previous” reflects the changes between the previous/current snapshots. Finally this script also has a “deleteMe” – the old previous. On every run, deleteMe, is folded back into the main vdi file by telling VBoxManage to delete the snapshot (deleting a snapshot doesnt remove the associated data, it just folds in the data and forgets the restore point).
This strategy lets us keep 2 restore points (in case an accidental backup backs up an unstable image). Its a great strategy, but….
Snapshots – Tread Carefully
Sadly, for me personally, live VirtualBox snapshots havent been a terribly robust backup strategy. Ive unfortunately seen several snapshots fail. Or, worse, had VirtualBox crash while a snapshot was taking place. Luckily I havent lost a lot of data, as Ive been diligent about pushing my code to github. When a snapshot has failed, Ive had to edit my virtual machines vbox.xml file. A file that clearly states “DO NOT EDIT” at the top. Its easy to fall into the lull of thinking that this seems like something that should either succeed or fail atomically like comiting to a versioning system. It hasnt been my experience that this is the case.
Heres a gallery of horrors of some of the errors Ive seen. First theres the “A differencing image of snapshot could not be found” where somehow a snapshot image file gets lost
Ive also encountered this error – “Hard Disk XXX cannot be directly attached to the virtual machine because it has 1 differencing child hard disks”. Ive had errors taking snapshots, including having the snapshot process hang with a live VM. Sadly I cant say I trust live snapshots right now.
Ive reverted to a simpler, non-live backup strategy that only takes snapshots immediately before starting up my virtual machine and wont take a snapshot while VirtualBox.exe is running (asking you to close VirtualBox before continuing). This is a combination of the script above reworked into Python on Windows and this Python ActiveState recipe. Ive replaced the VirtualBox icon pinned to the taskbar with a batch file that runs my script, and use the default VirtualBox icon on the task bar.
I also only ever have one snapshot currently running – “current”. Before launching VirtualBox, current gets compacted into the main vdi image via a snapshot delete. I then take a new “current” snapshot which VirtualBox uses. For me, this seems to be the best solution thus far. I have one differencing image active at one time. The main vdi gets updated right before the VM launches. Therefore when crashplan backs up that folder, it should be backing up a stable vdi thats not constantly changing and getting unstable. This seems to be the best solution for me for maintaining a trustworthy backup strategy without weird errors that crash my live VM.
Anyway, Id definitely be curious to here about your experiences backing up VMs!