Archive for the ‘Sysadmin’ Category

EC2 Elastic Load Balancing for Fun and Profit

Monday, November 23rd, 2009

This one is so easy I don’t know why I haven’t gotten around to it sooner. It took my dev to be all, “OMG! The servers! How will we do a rolling deploy without impacting uptime!1!!eleven!” Thankfully, Amazon is there with Elastic Load Balancing (ELB) so I didn’t really need to get off my ass and be excited. Amazon Web Services: Doing the hard work so you don’t have to.

Before we dive in, a little background on operations. We’ve been making use of the elastic IP feature since it came out along with round robin DNS to handle the distribution of traffic over the web servers. Sure, it is crude and primitive but it works and I am a big fan of implementing the simplest possible solution since more moving parts results in exponentially more headaches. And, yes, we tried the HAProxy and the Nginx route but the cost+plus heartburn factor was too high for what we needed. Anyway, things have worked just fine with a minimum of effort on our part but our needs are changing so we need a solution that is both flexible and forgiving. Brittle is only good with peanuts.

First things first, create an ELB…

elb-create-lb eeniemeenie --headers --listener "lb-port=443,instance-port=8443,protocol=TCP" \
--listener "lb-port=80,instance-port=80,protocol=http" --availability-zones us-east-1c

Now you might be saying to yourself, “WTF, James. What is with the 443 => 8443 when 80 => 80?” Simple: “Currently, Elastic Load Balancing does not have SSL termination capability.” That basically means your application servers need to handle the SSL part but honestly this is fine because the way ELB is engineered the traffic between the balancer and your instances outside the firewall so you’ll want that traffic passing over SSL anyway.

Next, configure health checks…

elb-configure-healthcheck eeniemeenie --headers --target "TCP:8443" --interval 30 --timeout 20 \
--unhealthy-threshold 2 --healthy-threshold 2

elb-configure-healthcheck eeniemeenie --headers --target "HTTP:80/up.html" --interval 10 --timeout 5 \
--unhealthy-threshold 2 --healthy-threshold 2

These two health checks are intended to do different things. The one for TCP:8443 is to see if our application is up and serving (we force all traffic over SSL) while the latter is intended for rolling deploys, when we begin a deploy we’ll rename that file with the intention that the balancer sees it go missing and pulls the instance out of the pool. When everything is done and the deploy was successful we plop up.html back in and the balancer throws the instance back in the pool.

On the Apache side of things I decided to do a little mod_rewrite mumbo jumbo to deal with the new up.html file. Since I want all application related traffic forced to SSL the monitoring of that file by ELB presents a little wrinkle to my normal sledgehammer approach to things. Easily addressed though:

RewriteEngine on
RewriteCond %{REQUEST_URI} !(up.html)
RewriteRule ^(.*)$ https://%{SERVER_NAME}$1 [L,R]

Now, because I am a grumpy bastard I really don’t want up.html to be available over SSL.

RewriteEngine on
RewriteCond %{REQUEST_URI} (up.html)
RewriteRule ^(.*)$ http://%{SERVER_NAME}$1 [L,R]

That is basically it. Mind you, I have not punched this down for production but our initial tests have been very promising, in particular the rolling deployment feature.

(Parts of this solution were gratefully cribbed from Serk and Lead Thinking as well as the ever popular RTFM)

Re-Sizing EBS Volumes for Fun and Profit on EC2

Thursday, September 17th, 2009

This is one you can file under easy and obvious but since I have a memory like a sieve I am going to write about it. There are times when I set up EBS volumes and think to myself, “I’ll never need anymore than nGB, ever!”, only to find out some months down the road my estimates were woefully short for the growth trend. Turns out resizing is pretty easy and can be accomplished in a matter of minutes. For this example, I’m going to resize a single EBS volume using XFS and where a MySQL database hangs out doing its thing. I am going to attach it to a separate block device than the original volume so that we can quickly revert back if we find the apocalypse happening ahead of schedule.

  1. Put up a maintenance page or just halt activity to the DB so that your life is slightly more elegant.
  2. Stop MySQL
  3. Make a snapshot of the original EBS volume
  4. Create a new volume from that snapshot specifying the size you want and make sure that it is in the same availability zone as the instance you want to attach it to
  5. Attach the new EBS volume to your instance on a different block device (if the original is on /dev/sdh then attach the new one to /dev/sdi)
  6. Edit /etc/fstab to reflect the new changes:

    #/dev/sdh /vol xfs noatime 0 0
    /dev/sdi /vol xfs noatime 0 0

  7. Mount the drive
  8. Resize it with “xfs_growfs -d /vol”
  9. Start MySQL and let it run its checks
  10. Take down the maintenance pages and sit back with a sense of smug self-satisfaction

Like I said, super easy. However, this is for a single EBS volume I haven’t really played around with resizing or generally manipulating RAID sets so that is a post for another day.

EC2, BIND9, DNS, Pancakes and You!

Wednesday, August 12th, 2009

I finally reached that place where maintaining versions of host files just didn’t seem to cut it andfound myself mucking about with BIND9 to solve the problem of keeping track of all those internal IP addresses when I bring instances online and take them offline. This seemed to be the quickest and easiest way to handle fairly basic networking needs though it does introduce yet another point of failure so I’ll need to tackle primary and secondary DNS servers in the near future and maybe develop a doomsday host file backup but if the latter is invoked I’m sure that I have bigger problems on my hands.

Two basic tutorials helped me hack together the start of a solution, HowTo update DNS hostnames automatically for your Amazon EC2 instances and the ever useful Ubuntu wiki’s BIND9 Server Howto. Working back and forth between those two write-ups and a little manpage searching answered 95% of my questions so getting started isn’t as much of a headache as it might seem. Deploying it on a large scale? That is a whole other matter.

A caveat before we dive in, more to remind me when I have to revisit this some 18 months later. A couple of iterations in I thought it would be cute to set up the DNS zone for the full domain (*.awesome.com) but that seemingly added to my headaches as I tried to examine resources that did not have a DNS entry just yet. Since this is for internal use only we can pretty much set the FQDN to be whatever we want it to be, in this case pancake.man is our working domain.

Set things up…

apt-get install apt-get install bind9 dnsutils
sudo mkdir /etc/bind/zone/
sudo mkdir /some/where/other/than/etc/bind
sudo cp /etc/bind/db.local /etc/bind/zone/db.pancake.man
sudo cp /etc/bind/db.127 /etc/bind/zone/db.10
sudo chown -Rv bind:bind /etc/bind/zones/

EDIT: A couple of things occurred to me while I mulled this over. Keeping the zone files in /etc/bind while sounding like a neat idea doesn’t take into account the ephemeral nature of instances so I moved ours into an EBS volume. Additionally, I neglected to add that if your distro uses app armor you’ll have to grant rw permissions to the folder where you are keeping the db files. That can be done by editing /etc/apparmor.d/usr.sbin.named and adding the following:

/some/where/other/than/etc/bind/** rw,

The you can just reload apparmor and be on your happy way.

Generate a key pair….
This is really a CYA implementation but it should ensure that only the servers you want updating the LAN DNS can do it.

dnssec-keygen -a HMAC-MD5 -b 512 -n USER pancake.man

…and copy the secret from the public key as you’ll need it in the next section.

Set up a Zone…
Edit /etc/bind/named.conf.local and add the following:

key pancake.man. {
algorithm HMAC-MD5;
secret "Gzr11.....==";
};

The above block sets up the key that you created earlier and it will be used in the following block that defines the zone.

zone "pancake.man"
{
type master;
file "/etc/bind/zone/db.pancake.man";
allow-update { key pancake.man.; };
allow-query { any; };
};

Next is to work on the zone file so edit /etc/bind/zone/db.pancake.man and change the localhost/127.0.0.1 entries to fit your environment.

$TTL 604800
@ IN SOA ns.pancake.man. me.pancake.man. (
2 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
604800 ) ; Negative Cache TTL
;
@ IN NS ns.pancake.man.
@ IN A 10.0.0.1 ; this is your local IP address

When you restart BIND9 this file will change a bit here’s what I see once it is running…

$ORIGIN .
$TTL 604800 ; 1 week
pancake.man IN SOA ns.pancake.man. me.pancake.man. (
3 ; serial
604800 ; refresh (1 week)
86400 ; retry (1 day)
2419200 ; expire (4 weeks)
604800 ; minimum (1 week)
)
NS ns.pancake.man.
A 10.208.41.206
$ORIGIN pancake.man.
$TTL 60 ; 1 minute

One thing that tripped me up for a little bit was the formation of the SOA line, pancake.man IN SOA ns.pancake.man. me.pancake.man.. I had been reading ns.pancake.man. me.pancake.man. as FQDN when in fact they are a FQDN and and email address where the ‘@’ has been replaced with a ‘.’; file this one under RIF: Reading Is Fundamental.

Next up is a reverse zone file which is pretty much the same process as what we just did, just edit /etc/bind/zone/db.10 and replace the localhost/127.0.0.1 values with your own.

;
; BIND reverse data file for local loopback interface
;
$TTL 604800
@ IN SOA ns.pancake.man. root.pancake.man. (
2 ; Serial
604800 ; Refresh
86400 ; Retry
2419200 ; Expire
604800 ) ; Negative Cache TTL
;
@ IN NS ns.
1.0.0 IN PTR ns.pancake.man.

Test your new DNS server
First things first, let’s test things out by inserting an entry using nsupdate (this piece is gratefully cribbed from Marius). Edit /etc/resolv.conf and put the IP address of your DNS server above the ec2 nameserver.

search compute-1.internal
nameserver 10.1.2.3
nameserver 172.1.2.3

Now lets punch an entry in using our shiny set of keys. I found making a script was easier for testing purposes, create a file called dnsupdate.sh and add the following:

#!/bin/bash
cat << EOF | /usr/bin/nsupdate -k Kpancake.man.+157+46088.private -v
server 127.0.0.1
zone pancake.man
update delete $1 A
update add $1 60 A $2
show
send
EOF

The private key is the one we made way back in the beginning of this exercise, your's might be about waffles. After making the file executable, just ./dnsupdate test.pancake.man 10.1.2.3. Ideally, you should get something like the following back:

Outgoing update query:
;; ->>HEADER<<- opcode: UPDATE, status: NOERROR, id: 0
;; flags: ; ZONE: 0, PREREQ: 0, UPDATE: 0, ADDITIONAL: 0
;; ZONE SECTION:
;pancake.man. IN SOA


;; UPDATE SECTION:
test.pancake.man. 0 ANY A
test.pancake.man. 60 IN A 10.1.2.3

Implement your shiny new DNS server
With all that done now we and testing showing that everything works as expected we want to share it with the whole LAN but there's a potential snag. When you work up the next morning still heady from your success you noticed that when DHCP decided to check on its lease it wrote over /etc/resolv.conf with the stock EC2 values, showing absolutely no love for your handiwork. Not what we wanted to happen but there are a couple of steps we can take to make sure that your shiny new DNS persists (this is mainly geared toward Ubuntu so your mileage may vary). The AWS forums has a great writeup about this though I'm only implementing some what was discussed.

Since there is the chance of a DNS server changing, instance goes down or a new one is swapped in, I'm going the route of appending the DNS server to /etc/hosts at boot time and then adding the FQDN to /etc/dhcp3/dhclient.conf like so...

prepend domain-name-servers ns.pancake.man;

What that will do is punch in the IP address specified in /etc/hosts into /etc/resolv.conf when DHCP does it's little dance.

Loose ends...
While everything should be working decently on a small scale there are still plenty of things still left to do like setting up a secondary DNS server for failover, implementing EC2 internal zones for forwarding as described in the AWS post, and working out an elegant solution for updating clients about the DNS server. As for the latter, I'm still playing with scripts to fetch the local IP address of the DNS server, from using an empty security group as a tag to just keeping an updated list in S3 that is maintained by the DNS servers themselves. That, among the other things, will be a post for another day when I finally stumble on something that isn't so ham-fisted.

Continuous Integration Testing, CruiseControl.rb, Github, Apache, and You!

Friday, March 20th, 2009

“Ain’t no test like an elwoodicious test ’cause an elwoodicious test don’t pass!”

So one one of my co-workers thought it would be an awesome idea if we were continually reminded of our shortcomings regarding tests so he passed along a link to the CruiseControl.rb project. Getting up an running is a fairly simple affair, particularly if you are just running it locally, but we wanted to drop this on our QA server and have the dashboard served up via Apache.  Not too difficult, just a couple more steps.

Grabbing the code is the easy part, git clone git://github.com/thoughtworks/cruisecontrol.rb.git.

Add your project, ./cruise add Project-Name -s git -b branch-name-here -r git@github.com:username/project-name.git

Set up the configuration for your projects by editing ~/.cruise/site_config.rb.  The file is largely self-explanatory but it would be helpful to set Configuration.dashboard_url and Configuration.default_polling_interval which are nested in the middle of the file.

Set up the defaults for each project, like getting emails and polling times, in ~/.cruise/projects/project-name/cruise_config.rb.

Now the fun part, we use Apache with passenger so the tact we took was to install mongrel and configure Apache’s balancer to look for it on a specified port as well as password protect the app with basic authentication:

Edit your site file and add a virtual host…

<VirtualHost *:80>
ServerName cruise.mydomain.com
ServerAlias cruise.mydomain.com

Include /etc/apache2/sites-available/app.cruise

</VirtualHost>

Then create the app.cruise file with the following content…

ServerSignature Off

<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>

DocumentRoot /where/you/put/cruisecontrol.rb/public

<Directory “/where/you/put/cruisecontrol.rb/public”>
Options FollowSymLinks
AllowOverride None
Order allow,deny
Allow from all
AuthName “Secure Area”
AuthType Basic
AuthUserFile
/where/you/put/.htpasswd
require valid-user
</Directory>

RewriteEngine On

# Redirect all non-static requests to cluster
RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
RewriteRule ^/(.*)$ balancer://mongrel_cluster%{REQUEST_URI} [P,QSA,L]

# Deflate
AddOutputFilterByType DEFLATE text/html text/plain text/xml application/xml application/xhtml+xml text/javascript text/css
BrowserMatch ^Mozilla/4 gzip-only-text/html
BrowserMatch ^Mozilla/4.0[678] no-gzip
BrowserMatch \bMSIE !no-gzip !gzip-only-text/html

Lastly create a proxy.conf file in /etc/apache2/conf.d/…

<Proxy balancer://mongrel_cluster>
BalancerMember http://127.0.0.1:8000
AuthType Basic
AuthName “Supa Sekrit”
AuthUserFile /where/you/put/.htpasswd
Require valid-user
</Proxy>

One all that is done you can cd into the CruiseControl.rb directory and issue, ruby ./cruise start -p 8000 –daemon, and you should be ready to watch your tests fail like me.

Save your database (and your bacon) with Elastic Block Store and mysqlcheck

Monday, March 16th, 2009

Here’s the situation: early this afternoon I get a panicked IM from a client that they dropped a table on the production db but that they have a CVS copy that they want to load.  Sounds easy, right?  Should have been but the CSV file had some oddities where lines were terminated by \n but those also existed in some of the fields.  Now, I am by no means a MySQL guru and while there might be a solution to issue a LOAD DATA INFILE statement that accounts for it the process of working around that would keep the site down longer than necessary.

When we setup the database on EC2 we made a conscious decision to move the database into an Elastic Block Store (EBS) so that we could take regular snapshots of the volume as well as enjoy the durability that they offer.  The approach that we took to recovering the table was to re-purpose one of the QA instances as a recovery point, create an EBS volume from the latest snapshot, point MySQL at it, dump the table with –complete-insert, source it in, call it a day.

Things were working swimmingly up until the point where I needed to dump the table, ERROR 1033 (HY000): Incorrect information in file, was the last thing I wanted to read.  The table is InnoDB and it is possible that it was corrupted when I started the server back up with some missing variables in the my.cnf file–lesson here is remember to breathe, work quickly but methodically, and double check your work.  So here I sat with a seemingly good copy of the database but a mangled table.

Just a handful of keystrokes saved my ass: mysqlcheck mydb mytable.

If we had been doing just whole database dumps with mysqldump this process would have been frustrated by trying to chop up a 12GB file into the section that we needed (yes, it makes more sense to dump each table seperately but remember, I’m no guru).  In the end, having a volume that we could mount and access withing minutes was the biggest reason we were able to get the production site back online as fast as we did and for future reference I’ll check my config files more closely before I turn on services.

***Note: this really only applies if your database runs in EC2 ;-)

Using mod_rewrite to force SSL on directories

Tuesday, December 30th, 2008

Not that this is exceptionally difficult but I really needed to post this just so I would have a reference in the future.

<VirtualHost *:80>
ServerName your.server.here
ServerAlias your.server.here

RewriteEngine on
RewriteCond  %{REQUEST_URI} (private|secret|goaway)
RewriteRule ^(.*)$ https://%{SERVER_NAME}$1 [L,R]

# The rest of the configuration follows…
</VirtualHost>

<VirtualHost *:443>
ServerName your.server.here
ServerAlias your.server.here

RewriteEngine on
RewriteCond  %{REQUEST_URI} !(private|secret|goaway)
RewriteRule ^(.*)$ http://%{SERVER_NAME}$1 [L,R]

# The rest of the configuration follows…
</VirtualHost>

Essentially any requests for private, secret, or goaway will get punted to https while all others will be handle as non-SSL.  In the 443 block, to make sure that you aren’t using SSL for assets that don’t require it we check again this time to see if it is not the listed directories and punt them back to non-SSL. This can also be used in htaccess.