With EC2 and S3 Amazon has made available some seriously powerful and flexible tools for server and file hosting. Ec2 allows you to roll whatever flavor OS you want and get it up and running–think of it as virtual dedicated hosting–which while being incredibly cool has one major downside: if your server instance fails you loose your data as it will revert to the most current instance when you get it back online. In other words, a hiccup on EC2 could turn into a blistering nightmare.
This is where S3 comes in handy for storing anything that might change after you create and instance and launch it such as databases, files uploaded or created, and even logs. One of the first projects I hopped into was dumping the MySQL databases, compressing them and tossing them up on S3, a tedious process that can be automated with cron and s3sync (a ruby based tool that approximates rsync, kinda sorta).
Daily cron job to create new bucket
cd /root/s3sync/
DAYNOW=$(date +%j)
ruby s3cmd.rb createbucket WTF_db_$DAYNOW
Daily cron job to delete old bucket
cd /root/s3sync/
DAYTHEN=$(date +%j --date='2 days ago')
ruby s3cmd.rb deletebucket WTF_db_$DAYTHEN
Hourly job to back up the database
# get into the directory
cd /root/s3sync/
# set the environment
export AWS_ACCESS_KEY_ID=XXXX
export AWS_SECRET_ACCESS_KEY=XXXX/XXXX
export SSL_CERT_DIR=/root/s3sync/certs
# set date variables
DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# dump database
mysqldump WTF > /tmp/WTF-$DAYNOW-$TIMENOW.sql
# tar SQL dump
tar -czvf /tmp/backups/WTF-$STAMPD-$STAMPT.tar.gz /tmp/WTF-$DAYNOW-$TIMENOW.sql
# copy tar to S3
ruby s3sync.rb -r --ssl /tmp/backups/ MahBukit:WTF_db_$DAYNOW
# clean up after yourself
rm /tmp/*.sql
rm /tmp/backups/*.gz
Now one of the frustrations I have with this set up is while I am dropping the buckets from 48 hours prior it isn’t actually deleting the files off S3, a bit of a pain in the ass that I need to find some sort of resolution for in the near future. If anyone has an answer to this problem I would love to hear it! For now, I’m cheating by using S3Fox to purge those pesky backups.




So, what you’re saying is, even when you’re issuing the deletebucket command, it’s not actually deleting the bucket? Sounds like deletebucket isn’t actually working properly.
This is a fantastic little howto, though. I’m in the process of tuning/repackaging a webserver under EC2, so this will help me greatly
Thanks!
Well, I think it is because how S3 treats buckets as labels rather than containers. When I set this up I was in the whole RSync mindset of folders and whatnot and after ruminating on it and playing with the tools a little more came to the conclusion that, at least from the standpoint of the s3sync tools, you cannot manipulate files beyond put and get and tagging and untagging as a bucket name.
Now how S3Fox handles file deletion and why it is able to deep-six a bucket and all associated files…No idea!
Hmm… As a FireFox extension, it’s likely all JavaScript. I’ll have to look into this.
Since I used this entry as a basis for the setup I’m working on, I’ll be sure to share with you the set of scripts I’m coming up with. I’m sure someone will find it useful. Particularly if I can get the bucket dumping process together.
I’d be very interested in seeing what you come up with, especially that bucket dump. Seeing as the tools are ruby scripts I really ought to punt it over to one of our developers to peek at to see if they can find out what gives with it and what the differences are between the Javascript and it.