Posts Tagged ‘S3’

Caching Static Assets Made Simple with Nginx, Varnish, S3

Thursday, September 18th, 2008

We serve some of our assets directly out of s3 and while it is convenient it is not the speediest way to deliver content.  The crew over at Viximo worked out how to bolt Varnish on the side of Apache so that they can cache their S3 content and I was so smitten with the idea that I wanted to adapt what they worked out for our configuration so I asked Chris Chiodo reveal the secret sauce.  Below are the configuration files I munged from what he generously shared.

Nginx

This is pretty straightforward, what I’ve done is made varnish an upstream server and am intercepting any content in photos, avatars, kit, or caboodle and passing it the request to it.

upstream varnish {
server varnish01:7000 max_fails=3  fail_timeout=30s;
}

location ~ ^/(photos|avatars|kit|caboodle)/ {
proxy_pass http://varnish;
}

Varnish

This was my stumbling block until I talked to Viximo, the problem was how I defined the backend and that for whatever reason it did not like or AWS did not like the request to the bucket-name.amazonaws.com.

backend media {
.host = "s3.amazonaws.com";
.port = "80";
}

sub vcl_recv {
set req.url = regsub(req.url, "^", "/bucket-name");
set req.backend = media;
set req.http.host = "localhost";
remove req.http.X-Forwarded-For;
remove req.http.X-Forwarded-for;
remove req.http.X-Forwarded-Host;
remove req.http.X-Forwarded-Server;
set    req.http.X-Forwarded-for     = "127.0.0.1";
set req.grace = 30s;
lookup;
}

sub vcl_fetch {
set obj.http.X-Varnish-Url = req.url;
// set a 1 day ttl for avatars
set obj.ttl = 1d;
set obj.grace = 30s;

if (!obj.cacheable) {
pass;
}

set obj.prefetch =  -30s;
deliver;
}

That’s it.  Simple and it works.

Sold on Jungle Disk

Saturday, March 22nd, 2008

Admittedly, it was an easy sell since I’ve been using Amazon’s S3 service on the job for the last 8 months for storing db backups so I’m familiar with the pros, cons, and costs and I had looked at Jungle Disk as a possible solution but disregarded it since it did not support Linux. Mike posted his thoughts about it and pointed out that they are giving some love to The Penguin.

After a quick spin on the free trial client I went ahead and signed up for the Plus service which allows you to browse your files online. Yes, you could use the S3 Firefox plugin but given the way that Jungle Disk writes folders as files it makes for some ugly viewing and the same goes for the S3Sync tools. Anyways, I look at the $12/year as a donation to keep the company afloat and developing.

To give people an idea of the cost I’m going to start with backing up my photos (33GB) and the last three years of eMusic (50GB) which the first transfer cost will be about $30 dollars and after that will cost about $12.50 a month.  This data grows at about 2GB a month which will tack on less than a dollar extra a month.  Not too bad of a proposition though I do see the potential for climbing up to around $35/month though the cost is worth the piece of mind it brings.

EC2, MySQL, Backup Recovery, and You! (redux)

Thursday, December 27th, 2007

Here we go again…

On the heels of the replication monitor, I’ve gone back and fine-tuned the fetch script to let you look back two days in the archives. Now, it is a bit janky because I am setting the days for the first array rather than parsing the actual buckets in S3 but my sed/awk skills are less than none. However, I suppose that the next version could be set up to ask how many days you want to look back easily enough.

#!/bin/bash
# set the environment
export AWS_ACCESS_KEY_ID=xxxyyyzzz
export AWS_SECRET_ACCESS_KEY=xxxyyyzzz
export SSL_CERT_DIR=/opt/s3sync/certs

DAYLST[0]=$(date +%j --date='2 days ago')
DAYLST[1]=$(date +%j --date='1 days ago')
DAYLST[2]=$(date +%j)

DAYNUM=${#DAYLST[@]}

echo

echo "Here are the available days for backup recovery."

echo

# echo each element in array
# for loop
for (( i=0;i<$DAYNUM;i++)); do
echo $i -  ${DAYLST[${i}]}
done

echo

echo -e "What day did you want to parse? \c"
read selectday
listday=${DAYLST[$selectday]}

echo "Ok, I'm going to get the backups from $listday."
echo

echo -e "How many did you want? \c"
read count

echo

# Get the list of backups on the server using s3cmd
dbsets=$(ruby s3cmd.rb list your_db_backups:$listday | tail -n $count)
ARRAY=($dbsets)
# get number of elements in the array
ELEMENTS=${#ARRAY[@]}

# echo each element in array
# for loop
for (( i=0;i<$ELEMENTS;i++)); do
echo $i -  ${ARRAY[${i}]:4}
done

# Prompt user for which backup they want to recover
echo

echo -e "Which backup set would you like to recover? \c"

read numbackup
backup=${ARRAY[$numbackup]:4}

echo "I am fetching your backup $backup now..."
echo

ruby s3cmd.rb get your_db_backups/$listday:$backup /tmp/$backup
cd /tmp
tar -xf $backup
sqlset=${backup:0:14}
mv $sqlset /root

echo "Your backup can be found here /root/$sqlset"

Still on the agenda is getting a slave to recover unassisted after a failure is detected but as my shell scripting abilities improve the possibility of it being realized grows.

Recovering Encrypted MySQL Backups from S3

Thursday, November 1st, 2007

So like I promised here’s the script I banged together to allow easy recovery of your MySQL backup sets on S3. At the moment, it only does the current day so if it is just after midnight, well, you won’t see any backups! I plan on updateing it to allow the user to choose today or yesterday and then build the list from that selection.

#/bin/bash
# This script will list the most recent backups based on a number prompted by the user
# decrypt and expand them into a temp directory.
# set date variables

cd /opt/s3sync

DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

echo -e "How many backups would you like to list? \c"
read count
echo
# Get the list of backups on the server using s3cmd
dbsets=$(ruby s3cmd.rb list YOURDB_db_backups:$DAYNOW | tail -n $count)
ARRAY=($dbsets)
# get number of elements in the array
ELEMENTS=${#ARRAY[@]}

# echo each element in array
# for loop
for (( i=0;i<$ELEMENTS;i++)); do
echo $i - ${ARRAY[${i}]:4}
done

# Prompt user for which backup they want to recover
echo
echo -e "Which backup set would you like to recover? \c"
read numbackup
backup=${ARRAY[$numbackup]:4}
tarset=${backup:0:31}
sqlset=${tarset:0:19}

echo "I am fetching your backup $backup now..."

ruby s3cmd.rb get YOURDB_db_backups/$DAYNOW:$backup /mnt/tmp/recovery/$backup

echo
echo "I'm going to decrypt your backup..."

cd /mnt/tmp/recovery

gpg -d $tarset > $sqlset

echo
echo "Cleaning up after myself..."
rm *.gz*
echo
echo "Your backup can be found here /mnt/tmp/recovery/$sqlset"

Next up is a script that easily allows you to chuck files or directories up onto S3 from your EC2 instance or from your local machine.

EC2, S3, Encrypted MySQL Backups, and You!

Tuesday, October 30th, 2007

With great trepidation I write this as my last attempt earlier in the day saw the utter meltdown of this blog…

The topic of what we are doing to secure user data is one that comes up often and it is completely understandable, so this past week I’ve decided to add an extra layer of security into our database backups by encrypting them. It is a fairly simple process that while still being a work in progress works pretty well.

To get things started I generated a key-pair both on the server and imported my personal key so that I can encrypt the backups so I can open them either on the server or on my laptop. Further down the road I’ll be collecting the keys of the development team and importing them so that they can decrypt locally as well.

Now, I’m a bit wet behind the ears when it comes to shell scripting and while I already had a backup script written I wasn’t really happy with how it performed. I’ve made some tweaks to this one that allowed me to drop the nightly “Create Bucket” procedure as well as gathered the backups into a more logical folder/sub-folder layout.

Here’s the backup script…

#! /bin/bash

# Hourly cron job to upload to current bucket
# This is built off what we are currently running

# set date variables
DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

# dump database
mysqldump YOURDB > /mnt/tmp/backup/YOURDB-$DAYNOW-$TIMENOW.sql

# tar SQL dump
cd /mnt/tmp/backup

tar -chf – YOURDB-$DAYNOW-$TIMENOW.sql | gzip – | \
gpg -r [remote-key-holder] -r [local-key-holder] –encrypt \
> YOURDB-$DAYNOW-$TIMENOW.sql.tar.gz.gpg

rm /mnt/tmp/backup/*.sql

# copy tar to S3
cd /opt/s3sync
ruby s3sync.rb -vr –ssl /mnt/tmp/backup/ YOURDB_db_backups:$DAYNOW

#clean up
rm /mnt/tmp/backup/*.gz*

And the fetch script which will download the backup, decrypt it, and untar it. Now, this script I am working on listing the last X number of backups as determined by the user, dumping them into an array, and then prompting the user to choose which one they want. At the moment, the user need to know the number day of the year and the military time sans colon of the backup. But for the moment running the script is as simple as ./get_db_backup.sh 301 1530.

#! /bin/bash

# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

echo “Fetching your backup now…”

ruby s3cmd.rb get YOURDB_db_backups/$1:YOURDB-$1-$2.sql.tar.gz.gpg \
/mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz.gpg

echo “I’m going to decrypt your backup but will need a passcode…”

gpg -d /mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz.gpg \
> /mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz

echo “Extracting your backup into /mnt/tmp/recovery…”

cd /mnt/tmp/recovery
tar -xf YOURDB-$1-$2.sql.tar.gz

echo “Cleaning up after myself…”
rm *.tar.gz*

echo “Your file is here: /mnt/tmp/recovery/YOURDB-$1-$2.sql”

Lastly, the “Delete Bucket” script which now thankfully works as advertised.

#! /bin/bash

# Daily cron job to delete old bucket
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

DAYTHEN=$(date +%j –date=’2 days ago’)
cd /opt/s3sync
ruby s3cmd.rb -v deleteall YOURDB_db_backups:$DAYTHEN

Since all this is a work in progress I’d love to hear how other people are leveraging S3 for their database backups and if there is an easier way to accomplish what I’m attempting. :-D

Amazon S3, S3Sync, Backups, and You!

Friday, June 22nd, 2007

With EC2 and S3 Amazon has made available some seriously powerful and flexible tools for server and file hosting. Ec2 allows you to roll whatever flavor OS you want and get it up and running–think of it as virtual dedicated hosting–which while being incredibly cool has one major downside: if your server instance fails you loose your data as it will revert to the most current instance when you get it back online. In other words, a hiccup on EC2 could turn into a blistering nightmare.

This is where S3 comes in handy for storing anything that might change after you create and instance and launch it such as databases, files uploaded or created, and even logs. One of the first projects I hopped into was dumping the MySQL databases, compressing them and tossing them up on S3, a tedious process that can be automated with cron and s3sync (a ruby based tool that approximates rsync, kinda sorta).

Daily cron job to create new bucket

cd /root/s3sync/
DAYNOW=$(date +%j)
ruby s3cmd.rb createbucket WTF_db_$DAYNOW

Daily cron job to delete old bucket

cd /root/s3sync/
DAYTHEN=$(date +%j --date='2 days ago')
ruby s3cmd.rb deletebucket WTF_db_$DAYTHEN

Hourly job to back up the database
# get into the directory
cd /root/s3sync/


# set the environment
export AWS_ACCESS_KEY_ID=XXXX
export AWS_SECRET_ACCESS_KEY=XXXX/XXXX
export SSL_CERT_DIR=/root/s3sync/certs


# set date variables
DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)

# dump database
mysqldump WTF > /tmp/WTF-$DAYNOW-$TIMENOW.sql

# tar SQL dump
tar -czvf /tmp/backups/WTF-$STAMPD-$STAMPT.tar.gz /tmp/WTF-$DAYNOW-$TIMENOW.sql

# copy tar to S3
ruby s3sync.rb -r --ssl /tmp/backups/ MahBukit:WTF_db_$DAYNOW

# clean up after yourself
rm /tmp/*.sql
rm /tmp/backups/*.gz

Now one of the frustrations I have with this set up is while I am dropping the buckets from 48 hours prior it isn’t actually deleting the files off S3, a bit of a pain in the ass that I need to find some sort of resolution for in the near future. If anyone has an answer to this problem I would love to hear it! For now, I’m cheating by using S3Fox to purge those pesky backups.