Did you accidentally upload millions of documents into S3 under the wrong key in your bucket? Maybe ran 40 servers over night reprocessing data only to discover that it’s in the wrong place? Don’t want to run 40 servers over night deleting said data? Or watching Cyberduck churn for a week deleting each object one by one?
Then use S3’s lifecycle option to delete those documents. On a recent project we stored all our objects under a year tag in our bucket:
mybukket/1999/ mybukket/2000/ mybukket/2001/
And we did this for data ranging from 1985 to 2011. We decided that to support future datasets we wanted to aggregate them under dataset name. So we then copied the data to:
mybukket/datasetA/1999/ mybukket/datasetA/2000/ mybukket/datasetA/2001/
And then contemplate the delete process. Fortunately it turned out to be very easy by setting up a lifecycle rule to delete all the keys that started with 19, and then 20, which covered everything from 1995 to 2011, and left everything under the /datasetA/ subdirectory:
It does take 24 hours to happen, but is much faster, and feels much less error prone then running your own job to do this!