Blog

# Solr Backup – a simple walk through of Backup and Restore of a Solr Collection

I find myself frequently doing backup and restores of Solr collections as part of my experimentation workflow for improving search relevancy, so I wanted to share the steps I follow.

I learned an, shall we say, interesting difference between a single shard collection and a multiple shard collection! Typically when you are backing up and restoring Solr collections you do this via a shared filesystem mounted across all your Solr nodes, and that is how the Solr Reference Guide is written.

However, if your index is made of a single shard, then you don’t need that shared file system mounted into each Solr node. In my case, I only had a single shard, so I didn’t need that shared file system.

If you want to follow along, these steps work with the repository https://github.com/epugh/playing-with-solr-streaming-expressions. Follow the first section in the README for setting up the sample Solr cluster. The Solr Reference Guide has the page Collections API that explains these commands in detail.

### Solr Backup

The command to back up is pretty simple:

curl 'http://localhost:8983/solr/admin/collections?action=BACKUP&name=myBackup2020-06-12&collection=books&location=/tmp/fake_shared_fs' -H 'Content-type:application/json'

Then look at the ./tmp/fake_shard_fs and you’ll see the shards and the Zookeeper setup exported.

### Restoring

Restoring is similarly simple. Here we are restoring our backup into a new restored_books collection:

curl 'http://localhost:8983/solr/admin/collections?action=RESTORE&name=myBackup2020-06-12&location=/tmp/fake_shared_fs&collection=restored_books' -H 'Content-type:application/json'

### What if I don’t have a Shared Filesystem?

As an experiment, I wondered if I can merge a multi-shard collection into a single collection, and then back it up? I thought first about using the MIGRATECOLLECTION command, but then remembered that in 8.x the REINDEXCOLLECTION command was introduced:

curl 'http://localhost:8983/solr/admin/collections?action=REINDEXCOLLECTION&name=books&target=single_shard_books&numShards=1' -H 'Content-type:application/json'

It worked!

Okay, now let’s try and see if we can back it up to a different directory than our fake shared filesystem:

curl 'http://localhost:8983/solr/admin/collections?action=BACKUP&name=single_shard_books_backup&collection=single_shard_books&location=/tmp' -H 'Content-type:application/json'

No joy. It seems like it should work, and at least has worked if you had just a single Solr node in the cluster. I’ll have to investigate this more. I also noticed that the BACKUP and RESTORE commands aren’t exposed to the end user in Solr Admin collections UI. I’ll update this blog post if I get a chance to add them to the Solr Admin UI.