One of the worst things to go through as a Managed Hosting Provider is to get calls and support tickets from clients sharing various services are down when visually everything appears to be up and running.
The client might be stating mySQL is down along with providing the error message and steps to recreate the problem. You have all the information you need to trouble shoot.
Yet, on the server you run the commands to see the status of mySQL, and everything is showing up.
You know you have to be missing something because you and the client can walk through their site and see the site thinks mySQL is down. Where is this elusive thing you are not seeing?
You start going through the client’s code, and as time goes by other clients are calling, emailing with pressure building from all sides.
You just want to scream!
Eventually, you check your network bindings to find out that one or more of the service and dedicated IP addresses are not showing as bound to the network card.
You found the elusive problem was somehow Parallels H-Sphere lost its network bindings; and so you run /hsphere/shared/scripts/setup-ips.pl to re-establish the network IP address bindings following by restarting mysql and other various key services.
While you breathe a sigh of relief, and update all of your clients… you might be thinking about how the problem could have been prevented… what if…
Parallels H-Sphere is fairly reliable; and the problem with losing one or more IP bindings is rare. So rare that some providers never get hit with it; and those that do, typically once every year or so many years.
The infrequency of the problem adds to the elusive hunt when it happens, but it is typically not the first thing on your mind to check.
Personally, I like to automate things to remove such burdens.
One of the first things that came to mind is R-fx Networks System Integrity Monitor or S.I.M. for short. Ryan MacDonald. S.I.M. was one of Ryan’s first (if not the first) projects in hosting automation, management, and security.
We’ve been using S.I.M. for years to monitor for internal outages of mysql, Apache, and more… why not for internal monitoring of missing IP addresses from the network card interfaces?
Ryan MacDonald provides a number of S.I.M. modules in /usr/local/sim/modules/init — and most of these are very straight forward to read and understand (if you have a system administration or programming background).
Those in /usr/local/sim/modules/init typically deal with an individual service being down such as mySQL, sendmail, ssh, and the like.
The issue with the elusive H-Sphere bug is that the network itself is up, and any directly bound IP addresses may also still be active and present. So just looking at a port or necessarily even a URL check wasn’t going to guarantee S.I.M. or a person knows for sure about an internal IP address blackout.
So simple port checks or presence checks didn’t provide enough to solve the problem.
Well, how would we do it manually in the most concise manner possible?
At the time of this writing (and for a while now), we have mysql.dynamicnet.net at a unique IP address of 173.193.203.200.
This IP needs to be bound to the public network interface and active for mysql.dynamicnet.net to work.
This IP address is also in /hsphere/local/network/ips which is called by /hsphere/shared/scripts/setup-ips.pl
/hsphere/shared/scripts/setup-ips.pl is typically only run when the server boots.
The following is a one line command that will show the IP address if the IP address is bound and active (the reason for using full paths is that S.I.M. is called from cron which typically passes no path in the environment; therefore never assume the system will know the location of any application or package):
(Editorial note: Using the “-w” as part of the grep to avoid matching .22 against .226 and the like is thanks to Jeffery Kilonsky for reviewing the code).
/sbin/ifconfig | /bin/grep inet | /usr/bin/cut -d : -f 2 | /usr/bin/cut -d \ -f 1 | /bin/grep -w 173.193.203.200
Similarly, the following command will show the IP address if the IP address is a part of the /hsphere/local/network/ips file:
/bin/grep -w 173.193.203.200 /hsphere/local/network/ips | /usr/bin/cut -f 1
Ok, so we have two pieces to the puzzle. /hsphere/local/network/ips has to contain the IP address if it becomes unbound; and /sbin/ifconfig can show bound IP’s.
Now what?
S.I.M. has two module areas — one in /usr/local/sim/modules/init and one in /usr/local/sim/modules
The former has the relatively easy (few lines of code, and more built in functions and checks) to setup and implement modules; and the latter has the sightly more complex, typically involving some real code to make it happen modules.
Ryan even has a network.mod which provided a framework for what a H-Sphere module to check the network might look like (at least in outline form).
Yet, for a while finding out how to put the pieces together was as elusive as the hunt for why a customer might see mysql down when I see it up.
Tonight the hunt is finished. The S.I.M. add module developed (finally), and tested.
Since Ryan MacDonald is kind enough to share S.I.M. with the world, I thought it would be nice to share dni_network.mod with other H-Sphere providers as well as R-fx Network fans.
H-Sphere providers, you can test this yourself by putting in a dummy IP address for IP_CHECK that is not in /hsphere/local/network/ips (nothing will restart, you will just get the S.I.M. warning email); and for a full test have a valid IP from /hsphere/local/network/ips for an IP address you don’t have setup through /etc/sysconfig/network-scripts and from the server console run “service network restart” (which will restart network services, BUT not bind a single H-Sphere IP). Please allow a few minutes for S.I.M. to kick in.
Please contact us if you have any questions.