Archive for the ‘Programming’ Category

nginx + HAProxy + Thin + FastCGI + PHP5 = Load Balanced Rails with PHP Support

Tuesday, July 15th, 2008

This was probably one of the more radical switches in architecture that we’ve made in the recent past.  For the past 7 months we have been successfully running Apache + mod_proxy + mongrel with some limited PHP applications bolted on but the whole set up felt a tad bloated and a little more than unstable as we tested various scaling scenarios.  With the rails community chatting about the hotness that is thin, nginx, and HAProxy we decided to see what it would take to migrate.

The catch with our infrastructure though is that we have broken apart our static assets from rails so the usual localhost simplicity isn’t there which, unfortunately, is how most of the tutorials are aimed at.  In our case, the application sits in a pool of servers and one of the things that we wanted to do was leverage HAProxy to balance each nginx instance over a group of primary and secondary application servers with the primary and secondary status staggered between each nginx instance. Igvita’s post was the inspiration for this and our goal is to create a more fault tolerant environment built on shared services rather than our current setup of largely discrete stacks.

The first thing I tackled was setting up nginx by breaking apart the rails application and any PHP applications into separate virtual hosts. First up is the rails config…

upstream thin {
server 127.0.0.1:8700;
}

server {
listen       80;
server_name  first.server.name;
rewrite ^/(.*) https://what.ever.you.want/$1 permanent;
}

server {
listen 443;
ssl on;
ssl_session_timeout  5m;
ssl_protocols  SSLv2 SSLv3 TLSv1;
ssl_ciphers  ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP;
ssl_prefer_server_ciphers   on;

# path to your certificate
# if you have an intermediate cert then you need to add the contents to the end of the cert file
ssl_certificate /where/your/cert/is.pem;

# path to your ssl key
ssl_certificate_key /where/your/key/is.key;

# standard rails configuration goes here.
root /location/of/your/site/root;

#        rewrite_log on;

if (-f $document_root/system/maintenance.html) {
rewrite  ^(.*)$  /system/maintenance.html last;
break;
}

location ~ ^/$ {
if (-f /index.html){
rewrite (.*) /index.html last;
}
proxy_pass  http://thin;
}

location / {
if (!-f $request_filename.html) {
proxy_pass  http://thin;
}
rewrite (.*) $1.html last;
}

location ~ .html {
root /location/of/your/site/root;
}

location ~* ^.+\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|pdf|txt|js|mov)$ {
root  /location/of/your/site/root;
}

location / {
proxy_pass  http://thin;
proxy_redirect     off;
proxy_set_header   Host             $host;
proxy_set_header   X-Real-IP        $remote_addr;
proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;
proxy_set_header X-FORWARDED_PROTO https;
}
}

And our PHP config…

server {
### PHP Support ###
listen       80;
server_name  second.server.name;
access_log  /location/of/your/site/root/logs/blog-access.log;
error_log  /location/of/your/site/root/logs/blog-error.log;

if (!-e $request_filename) {
rewrite ^([_0-9a-zA-Z-]+)?(/wp-.*) $2 last;
rewrite ^([_0-9a-zA-Z-]+)?(/.*\.php)$ $2 last;
rewrite ^ /index.php last;
}

location / {
root / /location/of/your/site/root;
index index.html index.php index.htm;
}

location ~* ^.+\.(jpg|jpeg|gif|png|ico|css|zip|tgz|gz|rar|bz2|doc|xls|pdf|txt|js|mov)$ {
root /location/of/your/site/root;
}

# pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000

location ~ \.php$ {
fastcgi_pass 127.0.0.1:9000;
fastcgi_index index.php;
fastcgi_param QUERY_STRING $query_string;
fastcgi_param REQUEST_METHOD $request_method;
fastcgi_param CONTENT_TYPE $content_type;
fastcgi_param CONTENT_LENGTH $content_length;
fastcgi_param SCRIPT_FILENAME  /location/of/your/site/root/$fastcgi_script_name;
fastcgi_param REQUEST_URI $request_uri;
fastcgi_param DOCUMENT_URI $document_uri;
fastcgi_param DOCUMENT_ROOT $document_root;
fastcgi_param SERVER_PROTOCOL $server_protocol;
fastcgi_param GATEWAY_INTERFACE CGI/1.1;
fastcgi_param SERVER_SOFTWARE nginx;
fastcgi_param REMOTE_ADDR $remote_addr;
fastcgi_param REMOTE_PORT $remote_port;
fastcgi_param SERVER_ADDR $server_addr;
fastcgi_param SERVER_PORT $server_port;
fastcgi_param SERVER_NAME $server_name;
}
}

Next up is the HAProxy configuration…

global
	log 127.0.0.1	local0
	log 127.0.0.1	local1 notice
	nbproc		1
	pidfile		/var/run/haproxy.pid
	#debug
	#quiet
	user haproxy
	group haproxy

defaults
	log		global
	mode		http
	option		httplog
	option		dontlognull
	retries		15
	redispatch
	contimeout	60000
	clitimeout	150000
	srvtimeout	60000
	option          httpclose     # disable keepalive (HAProxy does not yet support the HTTP keep-alive mode)
	option          abortonclose  # enable early dropping of aborted requests from pending queue
	option          httpchk       # enable HTTP protocol to check on servers health

listen	thin *:8700
	option httpchk
        mode http
        option forwardfor except 127.0.0.1/8
	balance roundrobin
        server web01 hostname-of-server:8100 weight 1 minconn 1 maxconn 6 check inter 40000
        etc....

There are a couple of things to note here: to get HAProxy to fetch content from servers other than localhost you’ll need to chuck in a wildcard: listen thin *:8700, and to get logging running you’ll need to edit /etc/syslog.conf adding the following lines:

# Save HA-Proxy logs
	local0.*                                                /var/log/haproxy_0.log
	local1.*                                                /var/log/haproxy_1.log

As well as edit /etc/default/syslogd:

# For remote UDP logging use SYSLOGD="-r"
SYSLOGD="-r"

One last thing that drove me almost to the brink of madness is that HAProxy, at least in the build on Ubuntu 8.04, is finicky about how the configuration file is laid out. Each section default, global, and listen has to have the parameters defined with a tab preceding each and while HAProxy would start and accept request from nginx with anything else it would not fetch from the thin server pool.

So that is our front-end, what about the application pool? Turns out that Thin is just as easy to set up as a mongrel cluster and only took a minimum of effort on our part to get it dialed in with God and serving upstream. We edited the stock init script to reflect where we store the yamls and massaged God for the changes in clustering.

Here’s our init script:

#!/bin/sh
### BEGIN INIT INFO
# Provides:          thin
# Required-Start:    $local_fs $remote_fs
# Required-Stop:     $local_fs $remote_fs
# Default-Start:     2 3 4 5
# Default-Stop:      S 0 1 6
# Short-Description: thin initscript
# Description:       thin
### END INIT INFO

# Original author: Forrest Robertson

# Do NOT "set -e"

DAEMON=/usr/bin/thin
SCRIPT_NAME=/etc/init.d/thin
CONFIG_PATH=/location/of/your/yamls

# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0

case "$1" in
  start)
	$DAEMON start --all $CONFIG_PATH
	;;
  stop)
	$DAEMON stop --all $CONFIG_PATH
	;;
  restart)
	$DAEMON restart --all $CONFIG_PATH
	;;
  *)
	echo "Usage: $SCRIPT_NAME {start|stop|restart}" >&2
	exit 3
	;;
esac

:

And here’s a sample yaml:

---
user: user-which-runs
group: group-which-runs
chdir: /location/of/your/app
log: log/thin.log
port: 8100
environment: staging
pid: /location/of/your/pids.pid
servers: 3

God is very similar to what we had been running with a mongrel cluster:

RAILS_ROOT = "/location/of/your/app"

%w{8100 8101 8102}.each do |port|
 God.watch do |w|
    w.group = 'pack_01'
    w.name = "thin-#{port}"
    w.interval = 30.seconds # default
    w.start = "thin start -C /location/of/your.yaml -o #{port}"
    w.stop = "thin stop -C /location/of/your.yaml -o #{port}"
    w.restart = "thin stop -C/location/of/your.yaml -o #{port} && thin start -C /location/of/your.yaml -o #{port}"
    w.start_grace = 15.seconds
    w.restart_grace = 15.seconds
    w.pid_file = "/location/of/your/pids.#{port}.pid"

    w.behavior(:clean_pid_file)

    w.start_if do |start|
      start.condition(:process_running) do |c|
        c.interval = 5.seconds
        c.running = false
      end
    end

    w.restart_if do |restart|
      restart.condition(:memory_usage) do |c|
        c.above = 150.megabytes
        c.times = [3, 5] # 3 out of 5 intervals
      end

      restart.condition(:cpu_usage) do |c|
        c.above = 50.percent
        c.times = 5
      end
    end

    # lifecycle
    w.lifecycle do |on|
      on.condition(:flapping) do |c|
        c.to_state = [:start, :restart]
        c.times = 5
        c.within = 5.minute
        c.transition = :unmonitored
        c.retry_in = 10.minutes
        c.retry_times = 5
        c.retry_within = 2.hours
      end
    end
  end
end

There you have it, a completely rebuilt stack leveraging lean, fast, and stable services.

Gratefully cribbed from HowtoForgeJohn Yerhot, and  Igvita.

Literate Two

Friday, February 15th, 2008

My Library

Feed Your Mind

Recovering Encrypted MySQL Backups from S3

Thursday, November 1st, 2007

So like I promised here’s the script I banged together to allow easy recovery of your MySQL backup sets on S3. At the moment, it only does the current day so if it is just after midnight, well, you won’t see any backups! I plan on updateing it to allow the user to choose today or yesterday and then build the list from that selection.

#/bin/bash
# This script will list the most recent backups based on a number prompted by the user
# decrypt and expand them into a temp directory.
# set date variables

cd /opt/s3sync

DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

echo -e "How many backups would you like to list? \c"
read count
echo
# Get the list of backups on the server using s3cmd
dbsets=$(ruby s3cmd.rb list YOURDB_db_backups:$DAYNOW | tail -n $count)
ARRAY=($dbsets)
# get number of elements in the array
ELEMENTS=${#ARRAY[@]}

# echo each element in array
# for loop
for (( i=0;i<$ELEMENTS;i++)); do
echo $i - ${ARRAY[${i}]:4}
done

# Prompt user for which backup they want to recover
echo
echo -e "Which backup set would you like to recover? \c"
read numbackup
backup=${ARRAY[$numbackup]:4}
tarset=${backup:0:31}
sqlset=${tarset:0:19}

echo "I am fetching your backup $backup now..."

ruby s3cmd.rb get YOURDB_db_backups/$DAYNOW:$backup /mnt/tmp/recovery/$backup

echo
echo "I'm going to decrypt your backup..."

cd /mnt/tmp/recovery

gpg -d $tarset > $sqlset

echo
echo "Cleaning up after myself..."
rm *.gz*
echo
echo "Your backup can be found here /mnt/tmp/recovery/$sqlset"

Next up is a script that easily allows you to chuck files or directories up onto S3 from your EC2 instance or from your local machine.

EC2, S3, Encrypted MySQL Backups, and You!

Tuesday, October 30th, 2007

With great trepidation I write this as my last attempt earlier in the day saw the utter meltdown of this blog…

The topic of what we are doing to secure user data is one that comes up often and it is completely understandable, so this past week I’ve decided to add an extra layer of security into our database backups by encrypting them. It is a fairly simple process that while still being a work in progress works pretty well.

To get things started I generated a key-pair both on the server and imported my personal key so that I can encrypt the backups so I can open them either on the server or on my laptop. Further down the road I’ll be collecting the keys of the development team and importing them so that they can decrypt locally as well.

Now, I’m a bit wet behind the ears when it comes to shell scripting and while I already had a backup script written I wasn’t really happy with how it performed. I’ve made some tweaks to this one that allowed me to drop the nightly “Create Bucket” procedure as well as gathered the backups into a more logical folder/sub-folder layout.

Here’s the backup script…

#! /bin/bash

# Hourly cron job to upload to current bucket
# This is built off what we are currently running

# set date variables
DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

# dump database
mysqldump YOURDB > /mnt/tmp/backup/YOURDB-$DAYNOW-$TIMENOW.sql

# tar SQL dump
cd /mnt/tmp/backup

tar -chf – YOURDB-$DAYNOW-$TIMENOW.sql | gzip – | \
gpg -r [remote-key-holder] -r [local-key-holder] –encrypt \
> YOURDB-$DAYNOW-$TIMENOW.sql.tar.gz.gpg

rm /mnt/tmp/backup/*.sql

# copy tar to S3
cd /opt/s3sync
ruby s3sync.rb -vr –ssl /mnt/tmp/backup/ YOURDB_db_backups:$DAYNOW

#clean up
rm /mnt/tmp/backup/*.gz*

And the fetch script which will download the backup, decrypt it, and untar it. Now, this script I am working on listing the last X number of backups as determined by the user, dumping them into an array, and then prompting the user to choose which one they want. At the moment, the user need to know the number day of the year and the military time sans colon of the backup. But for the moment running the script is as simple as ./get_db_backup.sh 301 1530.

#! /bin/bash

# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

echo “Fetching your backup now…”

ruby s3cmd.rb get YOURDB_db_backups/$1:YOURDB-$1-$2.sql.tar.gz.gpg \
/mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz.gpg

echo “I’m going to decrypt your backup but will need a passcode…”

gpg -d /mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz.gpg \
> /mnt/tmp/recovery/YOURDB-$1-$2.sql.tar.gz

echo “Extracting your backup into /mnt/tmp/recovery…”

cd /mnt/tmp/recovery
tar -xf YOURDB-$1-$2.sql.tar.gz

echo “Cleaning up after myself…”
rm *.tar.gz*

echo “Your file is here: /mnt/tmp/recovery/YOURDB-$1-$2.sql”

Lastly, the “Delete Bucket” script which now thankfully works as advertised.

#! /bin/bash

# Daily cron job to delete old bucket
# set the environment
export AWS_ACCESS_KEY_ID=XXXXXX
export AWS_SECRET_ACCESS_KEY=XXXXXX
export SSL_CERT_DIR=/opt/s3sync/certs

DAYTHEN=$(date +%j –date=’2 days ago’)
cd /opt/s3sync
ruby s3cmd.rb -v deleteall YOURDB_db_backups:$DAYTHEN

Since all this is a work in progress I’d love to hear how other people are leveraging S3 for their database backups and if there is an easier way to accomplish what I’m attempting. :-D

EC2, MySQL, Replication, and You!

Tuesday, August 7th, 2007

One week in and I’m just beginning to get my head around the EC2 way of doing things which is different than what I am used to doing in meatspace with a pile of hardware as well as being worlds away from my Microsoft background. So after tackling a couple of minor projects like Monit + Mongrel, and playing with Memcached, I decided tackle a simple replication setup to get my feet wet in preparation for building a cluster.

After hammering my head against the wall for a day or so things clicked after realizing that this exercise is so much easier if I use the private DNS for plumbing rather than laying ssh tunnels all over the place, which by the way sort-of-kind-of-not-really worked.

Prologue – Create Your World

Look, it is a well known fact that the 9 or so GB that Amazon gives you on the main partition isn’t going to cut it so you have to make some changes to your MySQL environment. The best place for bins and logs are on /mnt. So, since I am epically lazy, I created a mysql directory in /mnt, passed ownership to mysql:mysql, backed up /var/lib/mysql, created a symbolic link to mysql, and then cp -a the contents of the old /var/lib/mysql into /mnt/mysql. Why the backup? Well, if your MySQL instance ever goes down it’ll take the contents of /mnt with it and having a copy of it helps with a speedy recovery.

Step One – Be The Master

Add or edit the following line in /etc/mysql/my.cnf
log=/mnt/mysql/log/mysql.log
server-id=1
log-bin=/var/log/mysql/mysql-bin.log
binlog-do-db=babygotdb

log=/var/lib/mysql/mysql.log (this part is important for seeing whether or not data is replicating)

Restart MySQL then launch the client and issue the following command to add a user with replication privileges:

GRANT REPLICATION SLAVE ON *.* TO ‘replicant’@'%’
IDENTIFIED BY ‘replicantsarepeopletoo’;

FLUSH PRIVILEGES;

Now use the database so we can finish things off…

USE babygotdb;

We want to clear any read locks before we move on so,

FLUSH TABLES WITH READ LOCK;

Now we want to check if what we set up looks like right:

SHOW MASTER STATUS;

This will tell you about the database File, Position, Binlog_Do_DB, and Binlog_Ignore_DB. You will need to know both the file and the position. In this case mine has the following:

File – mysql-bin.000023
Position – 3617723
Binlog_Do_DB – babygotdb
Binlog_Ignore_DB -

You should see something similar to that though it will be in a different format.

quit;

Now is when we dump the database and get it ready to ship over to the slave server.

$ mysqldump -B babygotdb | gzip > babygotdb.sql.gz

Then just scp it to the slave.

Step Two – Get To Know Your Slave

Like on the master we need to make some changes to /etc/mysql/my.cnf

server-id=2
master-host=domU-12-34-56-78-90-1A.usma1.compute.amazonaws.com
master-user=replicant
master-password=replicantsarepeopletoo
master-connect-retry=60
replicate-do-db=babygotdb
log=/var/lib/mysql/mysql.log

Restart mySQL and then launch the client and we have only a little more work to do until we are done.

Gunzip your SQL dump then jump into the mysql client as we need to create our replicated database:

CREATE DATABASE babygotdb;

Then we need to suck in what we dumped on the master:

SOURCE babygotdb.sql;

Stop the slave from running:

SLAVE STOP;

And get our environment ready for replication (this is one long command with each component marked out with commas)

CHANGE MASTER TO
MASTER_HOST=’domU-12-34-56-78-90-1A.usma1.compute.amazonaws.com’,
MASTER_USER=’replicant’,
MASTER_PASSWORD=’replicantsarepeopletoo’,
MASTER_LOG_FILE=’mysql-bin.000023′, MASTER_LOG_POS=3617723;

Now start the slave back up…

SLAVE START;

Watch the replication happen in real time:

$ tail -f /var/lib/mysql/mysql.log

If all is working properly you should be seeing transactions flying by on both as changes are made in the master database.

Gratefully cribbed from Phillip Pearson over at Second p0st and the HowTo Forge write up by falko.

My long term goal is to create images of slaves, masters, data nodes, and management nodes so that deployment will require less hammering on the forge and more tying up loose ends.

Amazon S3, S3Sync, Backups, and You!

Friday, June 22nd, 2007

With EC2 and S3 Amazon has made available some seriously powerful and flexible tools for server and file hosting. Ec2 allows you to roll whatever flavor OS you want and get it up and running–think of it as virtual dedicated hosting–which while being incredibly cool has one major downside: if your server instance fails you loose your data as it will revert to the most current instance when you get it back online. In other words, a hiccup on EC2 could turn into a blistering nightmare.

This is where S3 comes in handy for storing anything that might change after you create and instance and launch it such as databases, files uploaded or created, and even logs. One of the first projects I hopped into was dumping the MySQL databases, compressing them and tossing them up on S3, a tedious process that can be automated with cron and s3sync (a ruby based tool that approximates rsync, kinda sorta).

Daily cron job to create new bucket

cd /root/s3sync/
DAYNOW=$(date +%j)
ruby s3cmd.rb createbucket WTF_db_$DAYNOW

Daily cron job to delete old bucket

cd /root/s3sync/
DAYTHEN=$(date +%j --date='2 days ago')
ruby s3cmd.rb deletebucket WTF_db_$DAYTHEN

Hourly job to back up the database
# get into the directory
cd /root/s3sync/


# set the environment
export AWS_ACCESS_KEY_ID=XXXX
export AWS_SECRET_ACCESS_KEY=XXXX/XXXX
export SSL_CERT_DIR=/root/s3sync/certs


# set date variables
DAYNOW=$(date +%j)
TIMENOW=$(date +%H%M)

# dump database
mysqldump WTF > /tmp/WTF-$DAYNOW-$TIMENOW.sql

# tar SQL dump
tar -czvf /tmp/backups/WTF-$STAMPD-$STAMPT.tar.gz /tmp/WTF-$DAYNOW-$TIMENOW.sql

# copy tar to S3
ruby s3sync.rb -r --ssl /tmp/backups/ MahBukit:WTF_db_$DAYNOW

# clean up after yourself
rm /tmp/*.sql
rm /tmp/backups/*.gz

Now one of the frustrations I have with this set up is while I am dropping the buckets from 48 hours prior it isn’t actually deleting the files off S3, a bit of a pain in the ass that I need to find some sort of resolution for in the near future. If anyone has an answer to this problem I would love to hear it! For now, I’m cheating by using S3Fox to purge those pesky backups.