Blog

# Backing up Virtual Machines in VirtualBox

And now to the fun subject of virtual machine images!

My computer blue screened the other day. So I got fairly paranoid. This is the second time in a month this has happened. I needed to make sure my backup strategy was solid. The most important thing I want to backup is my Ubuntu VirtualBox image. Here at work we use CrashPlan which seems to do a decent enough job automatically backing up the directories I tell it to. However, the dumb me was just pointing CrashPlan at the directory with my virtual machine images and hoping for the best. This is, of course, problematic as CrashPlan is most likely running its backup while Im actively working in my virtual machine. How can I have any guarantee that that image is in any kind of consistent state?

## Backing up with a versioning system

I tried all kinds of hair-brained schemes to backup my virtualbox image in something resembling automatic. I really wanted some kind of diff-based approach to backing up the large file. One thought I had was that before launching the virtual machine in a batch file or script Id commit the vdi it to a version control system.  Then Id point CrashPlan at the associated repository which would be backed up in the background. I dont reccomend this. Git turns out to have a sensible maximum file limit, which was a pretty big “hey idiot… what are you doing” warning. I found and tried Boar, a versioning system which attempts to work well for binary files. Unfortunately even its diffing ability in this situation was lacking. After 2 commits of a 20 GB .vdi file, the repository had grown to 40 GB+. This solution wasnt very space efficient. Moreover committing such a large file is tediously slow. Everytime I launched my virtual machine Id have to wait for this boring 5-10 minute process.

## Backing up using VirtualBox snapshots

The canonical solution turns out to involve a VirtualBox feature known as snapshots which I had before now known nothing about.

From a users perspective, a snapshot is a restore point. If I create a snapshot, I can go back to that point in time. The best part is I can take a snapshot of a system even while its running. In the simplest use-case you have a linear progression in time of various snapshots. You can restore your virtual machine to any snapshot in the history. You can also do crazy things like go back to a snapshot and create a branch from that snapshot – taking your virtual machine in multiple experimental directions from a single restore point.

How snapshots actually work makes it extremely powerful for backups. A snapshot turns out to be the diffing system I was looking for. When you take a snapshot of a virtual machine, in the default “normal” mode, the associated parent (either the virtual machine image or another snapshot) is frozen and no longer written to. Instead, all writes go into the file associated with new snapshot. This file is in essence a kind-of commit log against the underlying virtual file system of everything that has happened after the snapshot in time. Its a diff of stuff thats changed since the snapshot took place. Restoring to the snapshot point is as simple as throwing away the snapshot-file – the commit log – and unfreezing the snapshots ancestor.

Deleting a snapshot is not removing all the changes in that commit log. Instead its instead folding that snapshot into its ancestor (back to the vdi or another diff file). Its actually committing the diff.

Which leads me to understanding why this can work as a backup strategy.

  1 2 3 4 5 6 7 8 9101112131415161718 #!/bin/bashVBOXMANAGE="/usr/bin/VBoxManage -q"if [ $# != 1 ]then echo "Usage:$0 VBoxName" exitfi echo "Renaming old snapshot..."$VBOXMANAGE snapshot "$1" edit previous --name deletemeecho "Renaming current snapshot..."$VBOXMANAGE snapshot "$1" edit current --name previousecho "Taking new snapshot..."$VBOXMANAGE snapshot "$1" take currentecho "Deleting old snapshot..."$VBOXMANAGE snapshot "$1" delete deleteme

You can backup a running VirtualBox virtual machine by maintaining a cascade of snapshots. One snapshot, knows as “current”  is the most recent snapshot. Restoring to it restores to the last backup. The diff file associated with it (holding all the stuff that has happened AFTER the snapshot) reflects all the non-backed up changes and is the where VirtualBox is actively keeping the guest OSs writes. This diff is a kind of “commit log” of all the changes that are going to the virtual disk. The “previous” snapshot is the restore point before current. In this script, previouss commit log is the old current. The commit log/diff associated with “previous” reflects the changes between the previous/current snapshots. Finally this script also has a “deleteMe” – the old previous. On every run, deleteMe, is folded back into the main vdi file by telling VBoxManage to delete the snapshot (deleting a snapshot doesnt remove the associated data, it just folds in the data and forgets the restore point).

This strategy lets us keep 2 restore points (in case an accidental backup backs up an unstable image). Its a great strategy, but….