Tag Archive for 'AWS'

nginx + HAProxy + Thin + FastCGI + PHP5 = Load Balanced Rails with PHP Support

This was probably one of the more radical switches in architecture that we’ve made in the recent past.  For the past 7 months we have been successfully running Apache + mod_proxy + mongrel with some limited PHP applications bolted on but the whole set up felt a tad bloated and a little more than unstable as we tested various scaling scenarios.  With the rails community chatting about the hotness that is thin, nginx, and HAProxy we decided to see what it would take to migrate.

The catch with our infrastructure though is that we have broken apart our static assets from rails so the usual localhost simplicity isn’t there which, unfortunately, is how most of the tutorials are aimed at.  In our case, the application sits in a pool of servers and one of the things that we wanted to do was leverage HAProxy to balance each nginx instance over a group of primary and secondary application servers with the primary and secondary status staggered between each nginx instance. Igvita’s post was the inspiration for this and our goal is to create a more fault tolerant environment built on shared services rather than our current setup of largely discrete stacks.

The first thing I tackled was setting up nginx by breaking apart the rails application and any PHP applications into separate virtual hosts. First up is the rails config…

upstream thin {
server 127.0.0.1:8700;
}

server {
listen       80;
server_name  first.server.name;
rewrite ^/(.*) https://what.ever.you.want/$1 permanent;
}

server {
listen 443;
ssl on;
ssl_session_timeout  5m;
ssl_protocols  SSLv2 SSLv3 TLSv1;
ssl_ciphers  ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
ssl_prefer_server_ciphers   on;

# path to your certificate
# if you have an intermediate cert then you need to add the contents to the end of the cert file
ssl_certificate /where/your/cert/is.pem;

# path to your ssl key
ssl_certificate_key /where/your/key/is.key;

# standard rails configuration goes here.
root /location/of/your/site/root;

#        rewrite_log on;

if (-f $document_root/system/maintenance.html) {
rewrite  ^(.*)$  /system/maintenance.html last;
break;
}

location ~ ^/$ {
if (-f /index.html){
rewrite (.*) /index.html last;
}
proxy_pass  http://thin;
}

location / {
if (!-f $request_filename.html) {
proxy_pass  http://thin;
}
rewrite (.*) $1.html last;
}

location ~ .html {
root /location/of/your/site/root;
}

location ~* ^.+\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|pdf|txt|js|mov)$ {
root  /location/of/your/site/root;
}

location / {
proxy_pass  http://thin;
proxy_redirect     off;
proxy_set_header   Host             $host;
proxy_set_header   X-Real-IP        $remote_addr;
proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
proxy_set_header X-FORWARDED_PROTO https;
}
}

And our PHP config…

server {
### PHP Support ###
listen       80;
server_name  second.server.name;
access_log  /location/of/your/site/root/logs/blog-access.log;
error_log  /location/of/your/site/root/logs/blog-error.log;

if (!-e $request_filename) {
rewrite ^([_0-9a-zA-Z-]+)?(/wp-.*) $2 last;
rewrite ^([_0-9a-zA-Z-]+)?(/.*\.php)$ $2 last;
rewrite ^ /index.php last;
}

location / {
root / /location/of/your/site/root;
index index.html index.php index.htm;
}

location ~* ^.+\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|pdf|txt|js|mov)$ {
root /location/of/your/site/root;
}

# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000

location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_param SCRIPT_FILENAME  /location/of/your/site/root/$fastcgi_script_name;
fastcgi_param REQUEST_URI $request_uri;
fastcgi_param DOCUMENT_URI $document_uri;
fastcgi_param DOCUMENT_ROOT $document_root;
fastcgi_param SERVER_PROTOCOL $server_protocol;
fastcgi_param GATEWAY_INTERFACE CGI/1.1;
fastcgi_param SERVER_SOFTWARE nginx;
fastcgi_param REMOTE_ADDR $remote_addr;
fastcgi_param REMOTE_PORT $remote_port;
fastcgi_param SERVER_ADDR $server_addr;
fastcgi_param SERVER_PORT $server_port;
fastcgi_param SERVER_NAME $server_name;
}
}

Next up is the HAProxy configuration…

global
	log 127.0.0.1	local0
	log 127.0.0.1	local1 notice
	nbproc		1
	pidfile		/var/run/haproxy.pid
	#debug
	#quiet
	user haproxy
	group haproxy

defaults
	log		global
	mode		http
	option		httplog
	option		dontlognull
	retries		15
	redispatch
	contimeout	60000
	clitimeout	150000
	srvtimeout	60000
	option          httpclose     # disable keepalive (HAProxy does not yet support the HTTP keep-alive mode)
	option          abortonclose  # enable early dropping of aborted requests from pending queue
	option          httpchk       # enable HTTP protocol to check on servers health

listen	thin *:8700
	option httpchk
        mode http
        option forwardfor except 127.0.0.1/8
	balance roundrobin
        server web01 hostname-of-server:8100 weight 1 minconn 1 maxconn 6 check inter 40000
        etc....

There are a couple of things to note here: to get HAProxy to fetch content from servers other than localhost you’ll need to chuck in a wildcard: listen thin *:8700, and to get logging running you’ll need to edit /etc/syslog.conf adding the following lines:

# Save HA-Proxy logs
	local0.*                                                /var/log/haproxy_0.log
	local1.*                                                /var/log/haproxy_1.log

As well as edit /etc/default/syslogd:

# For remote UDP logging use SYSLOGD="-r"
SYSLOGD="-r"

One last thing that drove me almost to the brink of madness is that HAProxy, at least in the build on Ubuntu 8.04, is finicky about how the configuration file is laid out. Each section default, global, and listen has to have the parameters defined with a tab preceding each and while HAProxy would start and accept request from nginx with anything else it would not fetch from the thin server pool.

So that is our front-end, what about the application pool? Turns out that Thin is just as easy to set up as a mongrel cluster and only took a minimum of effort on our part to get it dialed in with God and serving upstream. We edited the stock init script to reflect where we store the yamls and massaged God for the changes in clustering.

Here’s our init script:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          thin
# Required-Start:    $local_fs $remote_fs
# Required-Stop:     $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      S 0 1 6
# Short-Description: thin initscript
# Description:       thin
### END INIT INFO

# Original author: Forrest Robertson

# Do NOT "set -e"

DAEMON=/usr/bin/thin
SCRIPT_NAME=/etc/init.d/thin
CONFIG_PATH=/location/of/your/yamls

# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0

case "$1" in
  start)
	$DAEMON start --all $CONFIG_PATH
	;;
  stop)
	$DAEMON stop --all $CONFIG_PATH
	;;
  restart)
	$DAEMON restart --all $CONFIG_PATH
	;;
  *)
	echo "Usage: $SCRIPT_NAME {start|stop|restart}" >&2
	exit 3
	;;
esac

:

And here’s a sample yaml:

---
user: user-which-runs
group: group-which-runs
chdir: /location/of/your/app
log: log/thin.log
port: 8100
environment: staging
pid: /location/of/your/pids.pid
servers: 3

God is very similar to what we had been running with a mongrel cluster:

RAILS_ROOT = "/location/of/your/app"

%w{8100 8101 8102}.each do |port|
 God.watch do |w|
    w.group = 'pack_01'
    w.name = "thin-#{port}"
    w.interval = 30.seconds # default
    w.start = "thin start -C /location/of/your.yaml -o #{port}"
    w.stop = "thin stop -C /location/of/your.yaml -o #{port}"
    w.restart = "thin stop -C/location/of/your.yaml -o #{port} && thin start -C /location/of/your.yaml -o #{port}"
    w.start_grace = 15.seconds
    w.restart_grace = 15.seconds
    w.pid_file = "/location/of/your/pids.#{port}.pid"

    w.behavior(:clean_pid_file)

    w.start_if do |start|
      start.condition(:process_running) do |c|
        c.interval = 5.seconds
        c.running = false
      end
    end

    w.restart_if do |restart|
      restart.condition(:memory_usage) do |c|
        c.above = 150.megabytes
        c.times = [3, 5] # 3 out of 5 intervals
      end

      restart.condition(:cpu_usage) do |c|
        c.above = 50.percent
        c.times = 5
      end
    end

    # lifecycle
    w.lifecycle do |on|
      on.condition(:flapping) do |c|
        c.to_state = [:start, :restart]
        c.times = 5
        c.within = 5.minute
        c.transition = :unmonitored
        c.retry_in = 10.minutes
        c.retry_times = 5
        c.retry_within = 2.hours
      end
    end
  end
end

There you have it, a completely rebuilt stack leveraging lean, fast, and stable services.

Gratefully cribbed from HowtoForgeJohn Yerhot, and  Igvita.

Evolving Services on EC2

One of the great things about EC2 is that it is essentially a giant sandbox where you can take risks experimenting with architecture and services in a rapid and cost effective manner, something that you cannot do really well at co-lo or even on other VPS services.  In the past year we have experimented with plenty of different configurations: some found their way into production, others filed for future reference, and still some to be avoid all costs.

May 2007

When I came on board as a contractor we had only 2 servers inside EC2 and the database hosted at Go Daddy. The company had just migrated the Apache/Application server along with a Harvest server into EC2 but had opted to leave the database hosted at Go Daddy due to fears of data loss.  The only trouble with this scheme was the latency between the application and the db which made things so glacially slow that the site nearly unusable.

August 2007

After starting full-time we brought the database into the cloud and started looking into how we might implement a MySQL cluster in EC2.  The challenge was to get a backup routine that was unobtrusive yet fast and easy to transfer into S3. I never got LVM snapshots working to my comfort level so we relied instead on MySQLdump, which, all and all worked fine while the db was small. Data loss was still a big concern for us so we began experimenting in earnest with MySQL clusters.

November 2007

When the MySQL cluster idea didn’t pan out, the theory is that the small instances just didn’t have what it takes to cluster. So we went with plain old replication which has proven to be stable and reliable. The slaves serve both as fail-over units but also perform periodic backups freeing the master from that task.  Feeling more comfortable with database integrity we turned our attention to getting our application to scale, a challenge with resource hungry Rails.

January 2008

Breaking apart Apache and rails was a snap with mod_proxy and it allowed us to dedicate hardware to each.  With things running even better we started thinking about how we can flip this into a more through horizontal scale.

May 2008

So one year later and we have brought some horizontal scale to the site adding stability and failover to the application. As the site grows, though, we are back to the how we can best scale the database but at least we have a sandbox to play in so we can figure it out.

EC2, MySQL Replication, Monitoring, and You!

So in a full turn of events I’ve gone back to replication as the the most cost effective solution for creating a high availability environment for MySQL in EC2. The problem of the development team issuing schema changes frequently and without notification hasn’t changed but I have gotten a little more sophisticated about how to deal with them kicking the slave in the teeth when they issue schema changes with impunity.

What I’ve done is build off the backup scripts I have written about prior–especially since they work so well and created the beginnings of a metascript to over see the slaves–it is aptly named slaver. This metascript checks the state of the slave and acts based on it is state: slave up, run backups, or slave down, issue notifications.

#!/bin/bash
### Slaver v0.0.1
### this script is intended to check on the status of the slave
### if the slave is down (IO or SQL) it will send an email out

### set the variables that we are checking for ###
slaver1=$(mysql -Bse ’show slave status \G;’ | grep Slave_IO_Running)
slaver2=$(mysql -Bse ’show slave status \G;’ | grep Slave_SQL_Running)
IO=${slaver1:29}
SQL=${slaver2:29}
COMBO=$IO-$SQL
count=$(mysql -Bse ’show slave status \G;’| grep -c Yes)

### this is sanity check testing stuff ###

#count=1
#echo $IO
#echo $SQL
#echo $COMBO
#echo $count
### this is sanity check testing stuff ###

### run the exception check ###
if [[ "$count" == "2" ]] ; then
/opt/s3sync/db_backup.sh
exit
else
### create status file and mail it ###
date >> status.txt
mysql -Bse ’show slave status \G;’ >> status.txt
mutt you@yourhome.com -s “Slave Status :: DOWN” < status.txt
rm status.txt
fi

The next pieces to build out will be freezing the automated deletion of the old backup sets and attempting recovery of the slave if it is down. To get started on the latter I made some changes to the backup routine on the master:

#! /bin/bash

# Hourly cron job to upload to current bucket
# This is built off what we are currently running

# set date variables
DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# set the environment
export AWS_ACCESS_KEY_ID=xxxyyyzzz
export AWS_SECRET_ACCESS_KEY=xxxyyyzzz
export SSL_CERT_DIR=/opt/s3sync/certs

sleep 1m

# grab info about the binlog and position of the database

status1=$(mysql -e ’show master status \G’ | grep mysql)
status2=$(mysql -e ’show master status \G’ | grep Position)
sql=${status1:18}
posit=${status2:18}

mkdir /mnt/tmp/backup/you-$DAYNOW-$TIMENOW

echo A.$sql >> \
/mnt/tmp/backup/you-$DAYNOW-$TIMENOW/master-$DAYNOW-$TIMENOW.txt
echo B.$posit >> \
/mnt/tmp/backup/you-$DAYNOW-$TIMENOW/master-$DAYNOW-$TIMENOW.txt

# dump database
mysqldump geezeo > \
/mnt/tmp/backup/you-$DAYNOW-$TIMENOW/you-$DAYNOW-$TIMENOW.sql

# tar SQL dump
cd /mnt/tmp/backup

tar -chf - you-$DAYNOW-$TIMENOW | gzip - > you-$DAYNOW-$TIMENOW.tar.gz

rm -r /mnt/tmp/backup/you-$DAYNOW-$TIMENOW/

# copy tar to S3
cd /opt/s3sync
ruby s3sync.rb -vr –ssl /mnt/tmp/backup/ you_db_backups:$DAYNOW

#clean up
rm /mnt/tmp/backup/*.gz*

The key piece here is the capturing of binlog number and position with those two pieces captured it becomes much easier to automate a recovery of the slave from the master’s backup.

More to follow…

Recovering Encrypted MySQL Backups from S3

So like I promised here’s the script I banged together to allow easy recovery of your MySQL backup sets on S3. At the moment, it only does the current day so if it is just after midnight, well, you won’t see any backups! I plan on updateing it to allow the user to choose today or yesterday and then build the list from that selection.

#/bin/bash
# This script will list the most recent backups based on a number prompted by the user
# decrypt and expand them into a temp directory.
# set date variables

cd /opt/s3sync

DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

echo -e "How many backups would you like to list? \c"
read count
echo
# Get the list of backups on the server using s3cmd
dbsets=$(ruby s3cmd.rb list YOURDB_db_backups:$DAYNOW | tail -n $count)
ARRAY=($dbsets)
# get number of elements in the array
ELEMENTS=${#ARRAY[@]}

# echo each element in array
# for loop
for (( i=0;i<$ELEMENTS;i++)); do
echo $i - ${ARRAY[${i}]:4}
done

# Prompt user for which backup they want to recover
echo
echo -e "Which backup set would you like to recover? \c"
read numbackup
backup=${ARRAY[$numbackup]:4}
tarset=${backup:0:31}
sqlset=${tarset:0:19}

echo "I am fetching your backup $backup now..."

ruby s3cmd.rb get YOURDB_db_backups/$DAYNOW:$backup /mnt/tmp/recovery/$backup

echo
echo "I'm going to decrypt your backup..."

cd /mnt/tmp/recovery

gpg -d $tarset > $sqlset

echo
echo "Cleaning up after myself..."
rm *.gz*
echo
echo "Your backup can be found here /mnt/tmp/recovery/$sqlset"

Next up is a script that easily allows you to chuck files or directories up onto S3 from your EC2 instance or from your local machine.

EC2, S3, Encrypted MySQL Backups, and You!

With great trepidation I write this as my last attempt earlier in the day saw the utter meltdown of this blog…

The topic of what we are doing to secure user data is one that comes up often and it is completely understandable, so this past week I’ve decided to add an extra layer of security into our database backups by encrypting them. It is a fairly simple process that while still being a work in progress works pretty well.

To get things started I generated a key-pair both on the server and imported my personal key so that I can encrypt the backups so I can open them either on the server or on my laptop. Further down the road I’ll be collecting the keys of the development team and importing them so that they can decrypt locally as well.

Now, I’m a bit wet behind the ears when it comes to shell scripting and while I already had a backup script written I wasn’t really happy with how it performed. I’ve made some tweaks to this one that allowed me to drop the nightly “Create Bucket” procedure as well as gathered the backups into a more logical folder/sub-folder layout.

Here’s the backup script…

#! /bin/bash

# Hourly cron job to upload to current bucket
# This is built off what we are currently running

# set date variables
DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

# dump database
mysqldump YOURDB > /mnt/tmp/backup/YOURDB-$DAYNOW-$TIMENOW.sql

# tar SQL dump
cd /mnt/tmp/backup

tar -chf - YOURDB-$DAYNOW-$TIMENOW.sql | gzip - | \
gpg -r [remote-key-holder] -r [local-key-holder] –encrypt \
> YOURDB-$DAYNOW-$TIMENOW.sql.tar.gz.gpg

rm /mnt/tmp/backup/*.sql

# copy tar to S3
cd /opt/s3sync
ruby s3sync.rb -vr –ssl /mnt/tmp/backup/ YOURDB_db_backups:$DAYNOW

#clean up
rm /mnt/tmp/backup/*.gz*

And the fetch script which will download the backup, decrypt it, and untar it. Now, this script I am working on listing the last X number of backups as determined by the user, dumping them into an array, and then prompting the user to choose which one they want. At the moment, the user need to know the number day of the year and the military time sans colon of the backup. But for the moment running the script is as simple as ./get_db_backup.sh 301 1530.

#! /bin/bash

# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

echo “Fetching your backup now…”

ruby s3cmd.rb get YOURDB_db_backups/$1:YOURDB-$1-$2.sql.tar.gz.gpg \
/mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz.gpg

echo “I’m going to decrypt your backup but will need a passcode…”

gpg -d /mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz.gpg \
> /mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz

echo “Extracting your backup into /mnt/tmp/recovery…”

cd /mnt/tmp/recovery
tar -xf YOURDB-$1-$2.sql.tar.gz

echo “Cleaning up after myself…”
rm *.tar.gz*

echo “Your file is here: /mnt/tmp/recovery/YOURDB-$1-$2.sql”

Lastly, the “Delete Bucket” script which now thankfully works as advertised.

#! /bin/bash

# Daily cron job to delete old bucket
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

DAYTHEN=$(date +%j –date=’2 days ago’)
cd /opt/s3sync
ruby s3cmd.rb -v deleteall YOURDB_db_backups:$DAYTHEN

Since all this is a work in progress I’d love to hear how other people are leveraging S3 for their database backups and if there is an easier way to accomplish what I’m attempting. :-D

EC2: Pound + Apache, Mongrel Cluster, and MySQL Cluster

Alternately, I should be titling this my 36 hour nightmare. Last week, high off the presentation, I built out and deployed the following configuration.

EC2 Cluster

Everything was nice and tight and after loading QA data it ran like a champ but the problem was that QA data was pretty thin being only a fraction of the size of the production data. When we loaded production data into it, which by the way took nearly an hour to import,performance in the Cluster ground to a halt and we were faced with MySQL timing out the mongrels. Needless to say that after another 36 hours of work we abandoned this model and are looking at plain old replication for our data backed.

What could have given us all that grief? A couple of things spring to mind. The instances have 1.7GB of RAM and a single core process which for now works like a champ for a single MySQL server but for whatever reason it is not enough for a cluster under load. Also, running both SQL and Data Node services on the same box was likely less than inspired as the SQL service would spin up chewing into the remaining RAM and would often dominate the CPU. However, when we launch the cluster we were running some grossly inefficient queries with little or no indexing in the tables. A huge issue.

So we pulled back. At the moment we are still running the three legged system (one instance running Pound, Apache, Monit, and Mongrels, one Harvester, and one MySQL instance) but we made significant changes to the DB so that all the bloated joins that Ruby likes to make are hitting indexed tables as well as tweaking my.cnf to boost key buffer to 30% of RAM. Things seem better and we bought ourselves a little breathing room but we are still hitting the limit of the number of mongrels we can run on a single instance, 10 seems to be the upper threshold for stability, so we need to work out a method for building out a replicated set that will auto-recover after the countless data migrations that the dev team performs.  That will be fun!





Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States
Creative Commons Attribution-NonCommercial-ShareAlike 3.0 United States