Posts Tagged ‘Monit’

Of Monit and Mongrels, Quick Thoughts

Wednesday, March 5th, 2008

Monit has been serving us pretty well for the last 7 months or so in terms of keeping an eye on our mongrels and kicking them back in line if they act up. At the moment, we are running them in clusters of 3 with only 6 clusters in our current production set up and the monit restart all, for the most part, works fine when rolling them after a deploy. It is a completely different situation with 10 clusters which we are experimenting in a situation where Apache+mod_proxy sits on a separate server from the mongrels–it truly is wonderful to see Apache perform under load when it has all the resources it needs.

The problem seems to be with how resource hungry and ponderous Rails can be when firing up a mongrel, sucking up 25%+ user CPU and making the system gobble up another 10%+. This is enough so that when monit is trying to bring up or down 30 mongrels some of the pack gets left out, failing to either shutdown or start up. Now there is a monkey patch out there that addresses this very issue but I am a little wary in patching mongrel_cluster in our production environment as it might cause me headaches later with upgrades. So what is my solution?

Pressed for time it is a little kludgey and demonstrates some truly sloppy bash scripting but…it works.

#!/bin/bash
monit stop all -g pack_01
echo "Stopping 8100-02"
sleep 12s
monit stop all -g pack_02
echo "Stopping 8103-05"
sleep 12s
monit stop all -g pack_03
echo "Stopping 8106-08"
sleep 12s
monit stop all -g pack_04
echo "Stopping 8109-11"
sleep 12s
monit stop all -g pack_05
echo "Stopping 8112-14"
sleep 12s
monit stop all -g pack_06
echo "Stopping 8115-17"
sleep 12s
monit stop all -g pack_07
echo "Stopping 8118-20"
sleep 12s
monit stop all -g pack_08
echo "Stopping 8121-23"
sleep 12s
monit stop all -g pack_09
echo "Stopping 8124-26"
sleep 12s
monit stop all -g pack_10
echo "Stopping 8127-29"

So what I’ve done in the monitrc file is defined each mongrel as belonging to a group that reflects its cluster. Then I issue a stop to each group with a 12 second delay between each so that Rails and monit can navigate around each other with out either flipping out and forgetting to do their respective jobs. Starting is the same as above.

Does it work? Well, I have tried it under load several times (siege with a concurrency of 30 and the mongrels bloated up to our obese resting state) and it seems to work like a charm. I added the echos when our dev team expressed concern cap would time out while the script took its time to run all the way through.

File under: Ugly, Functional.

EC2: Pound + Apache, Mongrel Cluster, and MySQL Cluster

Thursday, October 11th, 2007

Alternately, I should be titling this my 36 hour nightmare. Last week, high off the presentation, I built out and deployed the following configuration.

EC2 Cluster

Everything was nice and tight and after loading QA data it ran like a champ but the problem was that QA data was pretty thin being only a fraction of the size of the production data. When we loaded production data into it, which by the way took nearly an hour to import,performance in the Cluster ground to a halt and we were faced with MySQL timing out the mongrels. Needless to say that after another 36 hours of work we abandoned this model and are looking at plain old replication for our data backed.

What could have given us all that grief? A couple of things spring to mind. The instances have 1.7GB of RAM and a single core process which for now works like a champ for a single MySQL server but for whatever reason it is not enough for a cluster under load. Also, running both SQL and Data Node services on the same box was likely less than inspired as the SQL service would spin up chewing into the remaining RAM and would often dominate the CPU. However, when we launch the cluster we were running some grossly inefficient queries with little or no indexing in the tables. A huge issue.

So we pulled back. At the moment we are still running the three legged system (one instance running Pound, Apache, Monit, and Mongrels, one Harvester, and one MySQL instance) but we made significant changes to the DB so that all the bloated joins that Ruby likes to make are hitting indexed tables as well as tweaking my.cnf to boost key buffer to 30% of RAM. Things seem better and we bought ourselves a little breathing room but we are still hitting the limit of the number of mongrels we can run on a single instance, 10 seems to be the upper threshold for stability, so we need to work out a method for building out a replicated set that will auto-recover after the countless data migrations that the dev team performs.  That will be fun!

Closing out the first week: Monit, Mongrel, and MySQL

Friday, August 3rd, 2007

One of the things I’ve realized with this new position is that I am my own worst taskmaster, driving myself to work longer hours in tightly focused stretches of time rarely punctuated by breaks. I suppose on some level that I feel like I need to be even more productive because of the absence of “face-time”, that there is no boss leaning over me making sure that I at least have the appearance of being busy. In contrast, though, I really am enjoying the work and the challenge that it presents, so I often feel that itch in the back of my brain to tray and solve the puzzle before I go to bed.

What have I been working on? Well, half the week was spent training Monit to play nice with Mongrel and I am decently confident that it works as advertised in the test environment. This afternoon we did a test deploy with Capistrano nesting it between Monit stop and start statements and everything appeared to work without a hitch. The challenge we faced with Monit in our environment was that we are unable to actually issue Mongrel starts and stops inside the config file. The solution was to take those statements and drop them into bash scripts, so at the moment I have an kludgey but operational method of fourteen scripts for seven mongrels (one start and one stop). When I get a moment, I plan on cleaning them up and making a single one that executes with variables, ie $ monit-mongrel stop 8001 but at the moment I am epically lazy. If I have the time I would like to figure out what exactly it is about the environment that doesn’t like mongrels being started or stopped inside Monit.

Half of yesterday and all of today I have been pounding my head against a nail studded board trying to get secure replication rolling inside EC2 for our MySQL boxen. The masters and slaves (yes, the developers on the crew with a more PC sensibility have chided me saying that the correct terms are primary and secondary. Fine we can meet in the middle with boss and underling) fire up fine and do what they are supposed to except actually perform replication of any shape, form, or fashion. To pipe them together I went the Stunnel route–could not for the life of me get SSL in MySQL to actually do anything–and I know that something is happening because the moment I issue a SLAVE START; command this shows up in the stunnel logs on the master: 2007.08.03 15:46:10 LOG5[13077:3083316112]: localhost.3306 connected from xx.xx.xx.xx:36769. I’m thinking that possibly it is how I set up the replication account permissions on the master, GRANT REPLICATION SLAVE ON *.* TO ‘replicantsarepeopletoo’@'%.mydomain.com’ IDENTIFIED BY ‘s3KrEtpa5Sw0rd’;. Taking shot in the dark, since I am tunneling the traffic it likely should just be ‘replicantsarepeopletoo’@'localhost’ so when I’m feeling a little less punchy I’m going to take a look at that again but after twelve hours I pretty much hate MySQL and EC2 at the moment.

What I haven’t been doing is taking pictures, writing, reading something other than man pages and long-winded newsgroup threads, and really listening to some of the new albums I just picked up (Nicole Willis & The Soul Investigators – Keep Reachin’ Up and Red Bumb Ball: Rare and Unreleased Rocksteady (1966-1968) are fucking amazing albums though). Hopefully, I’ll find my stride soon and build a sort of groove where I’m not pushing myself so hard that I’m dreaming about how the company abandons it current market focus and I’m forced to look into re-architecting the mongrel cluster for their plans to launch a fried chicken franchise. Yeah, I do need more sleep.