How to delete millions of Amazon S3 objects in one swell foop

June 25, 2013 Eric Pugh
Category: Uncategorized

Did you accidentally upload millions of documents into S3 under the wrong key in your bucket? Maybe ran 40 servers over night reprocessing data only to discover that its in the wrong place? Dont want to run 40 servers over night deleting said data? Or watching Cyberduck churn for a week deleting each object one by one?

Then use S3s lifecycle option to delete those documents. On a recent project we stored all our objects under a year tag in our bucket:

mybukket/1999/mybukket/2000/mybukket/2001/

And we did this for data ranging from 1985 to 2011. We decided that to support future datasets we wanted to aggregate them under dataset name. So we then copied the data to:

mybukket/datasetA/1999/mybukket/datasetA/2000/mybukket/datasetA/2001/

And then contemplate the delete process. Fortunately it turned out to be very easy by setting up a lifecycle rule to delete all the keys that started with 19, and then 20, which covered everything from 1995 to 2011, and left everything under the /datasetA/ subdirectory:

It does take 24 hours to happen, but is much faster, and feels much less error prone then running your own job to do this!

At OSC we empower the world’s best search teams – if you have a search engine project using Apache Solr or Elasticsearch and you’d like our help tuning relevance or performance, get in touch!