The past week I’ve been pounding my head bloody going round and round with setting up a MySQL Cluster in EC2. First trying it with Ubuntu, then Fedora 6, and then finally I learned to trust the fine folks at Canonical and believe in that their distro was tight and damn is it ever tight. The beauty of using Ubuntu is that everything you need is installed by default and there is no mucking trying to get the right packages, dependencies, or source. Yes, this is probably not the optimal way of going about this but I need a workable solution and fast and while there are a whole pile of rpms ready to roll the nightmare of getting simple things like perl dependencies satisfied in Fedora were enough to send me screaming out of the cloud.
Anyways, I have a wicked basic cluster running using the following process:
On the Management Node I’m using this config.ini which is sort of cribbed together (/var/lib/mysql-cluster/config.ini)
# Options affecting ndbd processes on all data nodes: [NDBD DEFAULT] NoOfReplicas=2 # Number of replicas DataMemory=256M # How much memory to allocate for data storage IndexMemory=256M # How much memory to allocate for index storage # For DataMemory and IndexMemory, we have used the # default values. Since the "world" database takes up # only about 500KB, this should be more than enough for # this example Cluster setup. # TCP/IP options: [TCP DEFAULT] portnumber=2202 # This the default; however, you can use any # port that is free for all the hosts in cluster # Note: It is recommended beginning with MySQL 5.0 that # you do not specify the portnumber at all and simply allow # the default value to be used instead # Management process options: [NDB_MGMD] hostname=mgmn # Hostname or IP address of MGM node datadir=/var/lib/mysql-cluster # Directory for MGM node log files # Options for data node "A": [NDBD] # (one [NDBD] section per data node) hostname=ndbda # Hostname or IP address datadir=/mnt/mysql/data # Directory for this data node's data files # Options for data node "B": [NDBD] hostname=ndbdb # Hostname or IP address datadir=/mnt/mysql/data # Directory for this data node's data files # SQL node options: [MYSQLD] hostname=sqln # Hostname or IP address # (additional mysqld connections can be # specified for this node for various # purposes such as running ndb_restore)
Now, in the end I moved this into /mnt (cp -ar /var/lib/mysql-cluster) so that I didn’t have the threat of running out of disk space on the primary partition.
On the SQL Node in mysql.cnf (/etc/mysql/my.cnf) I have nothing more than this:
# Options for mysqld process: [MYSQLD] ndbcluster # run NDB storage engine ndb-connectstring=mgmn # location of management server log=/var/lib/mysql/mysql.log
I am experimenting with adding settings back in but I’m not too sure if they belong in the config.ini on the management node or in here. My gut tell me management node. Anyhow, with this I copied the contents of /var/lib/mysql into /mnt (cp -ar again) and renamed the old directory and created a symbolic link pointing to the new location. Kludgey, yes, but I am still learning my way around MySQL and its various settings. Likely, I will figure which config file gets the data directory settings and I’ll make the appropriate changes. And yes, you read that right I do have logging turned on because I am the kind of guy who needs to know.
On the Data Node in my.cnf (/etc/mysql/my.cnf) this plain vanilla setup:
# Options for ndbd process: [MYSQL_CLUSTER] ndb-connectstring=mgmn # location of management server
Now to tie all this boxes together I ended up using a host file, recommended by Paul Moen and my boss and with an endorsement like that I just had to run with it! On all of the nodes in /etc/hosts I dropped the internal IP addresses of each box in the cloud (nslookup domU-12-34-56-78-9A-B1.z-2.compute-1.internal):
# Mysql Cluster data node 10.1.2.3 ndbda 10.4.5.6 ndbdb # Mysql Cluster mgm node 10.7.8.9 mgmn # MySQL Cluster sql node 10.10.11.12 sqln
Starting everything up begins with the management cluster:
ndb_mgmd -f /mnt/mysql-cluster/config.ini
Then the data nodes:
Note, you only need to do the inital part if it is the first time the node is coming up if you are restarting a cluster you can skip it.
Lastly, the SQL node:
On the management node you can issue a SHOW to figure out if your bacon is frying:
root@mgmn:~# ndb_mgm -e show Connected to Management Server at: localhost:1186 Cluster Configuration --------------------- [ndbd(NDB)] 2 node(s) id=2 @10.1.2.3 (Version: 5.0.38, Nodegroup: 0, Master) id=3 @10.4.5.6 (Version: 5.0.38, Nodegroup: 0) [ndb_mgmd(MGM)] 1 node(s) id=1 @10.7.8.9 (Version: 5.0.38) [mysqld(API)] 1 node(s) id=4 @10.10.11.12 (Version: 5.0.38)
Now, what about backups? Well, I am in the process of experimenting with issuing ndb_mgm -e “START BACKUP” on the cluster manager and that will dump a backup to each of the data nodes. Ideally, I would like to issue periodic backups to each individual node in a staggered fashion and have those gziped and sent up to S3. What I need to figure out is if I can issue a backup command for individual nodes like START BACKUP Node_2 or something there abouts. If that is the case I could then grow the data nodes out to the maximum four and take snapshots every 15 minutes which could give us decent coverage if our whole section of the cloud decided to pop.
If you have any questions, criticisms, or gripes feel free to slap me with them as I feel like I am still missing a huge chunk of the picture with all of this.