One of the great things about EC2 is that it is essentially a giant sandbox where you can take risks experimenting with architecture and services in a rapid and cost effective manner, something that you cannot do really well at co-lo or even on other VPS services. In the past year we have experimented with plenty of different configurations: some found their way into production, others filed for future reference, and still some to be avoid all costs.
When I came on board as a contractor we had only 2 servers inside EC2 and the database hosted at Go Daddy. The company had just migrated the Apache/Application server along with a Harvest server into EC2 but had opted to leave the database hosted at Go Daddy due to fears of data loss. The only trouble with this scheme was the latency between the application and the db which made things so glacially slow that the site nearly unusable.
After starting full-time we brought the database into the cloud and started looking into how we might implement a MySQL cluster in EC2. The challenge was to get a backup routine that was unobtrusive yet fast and easy to transfer into S3. I never got LVM snapshots working to my comfort level so we relied instead on MySQLdump, which, all and all worked fine while the db was small. Data loss was still a big concern for us so we began experimenting in earnest with MySQL clusters.
When the MySQL cluster idea didn’t pan out, the theory is that the small instances just didn’t have what it takes to cluster. So we went with plain old replication which has proven to be stable and reliable. The slaves serve both as fail-over units but also perform periodic backups freeing the master from that task. Feeling more comfortable with database integrity we turned our attention to getting our application to scale, a challenge with resource hungry Rails.
Breaking apart Apache and rails was a snap with mod_proxy and it allowed us to dedicate hardware to each. With things running even better we started thinking about how we can flip this into a more through horizontal scale.
So one year later and we have brought some horizontal scale to the site adding stability and failover to the application. As the site grows, though, we are back to the how we can best scale the database but at least we have a sandbox to play in so we can figure it out.









Thanks for building such killer architecture James….you da man!
Dugg: http://digg.com/tech_news/Evolving_Services_on_Amazon_Web_Services_AWS_EC2
Pete
I love the diagrams! I know for sure that the site works a lot faster thanks to all your work behind-the-scenes. What would we do without you?
The ability to cheaply experiment with different designs is one of the things I like most about using EC2.
Was performance of the small VMs the only reason you decided not to go with MySQL clustering? I’m thinking about implementing that as the DB for part of our service and am concerned about difficulties arising from some of EC2’s idiosyncrasies, like never knowing what IP an instance will get and so forth.
Well, it was really the interseection of price and performance. Anyway you build it a redundant cluster is pricey much more than replication and not to mention back ups and recovery is greatly complicated.
For grabbing ips, that can easily be done with a shell script wrapper around the run command or one that runs on start. Up and checks it in either to a homegrown service or just writes to a file on another instance. One of the things I’ve been meaning to get to is making elastic slaves that will come up, grab the latest backup and master info, source in, register, and start replicating. Maybe I’ll move that up in my To-Do.