Features Release

Using PandoraFMS to monitor LOM-enabled hardware

May 6, 2009

Using PandoraFMS to monitor LOM-enabled hardware

Apple and most other vendors (Intel, Sun, HP, IBM, …) currently support Lights-Out Management software which is basically out-of-band management software built into the network card on server machines. With IPMI you can not only create a serial port over LAN but also start, restart or shut down the server eg. if you’re working on an on-demand system and don’t want to have all computers running at the same time – marketing types call this a ‘cloud’ these days.

This is really a good development since it allows to pull all types of interesting status information from the server without needing an agent on the server nor any special plugins for each different system. What you need though is a conversion script from whatever your LOM hardware and tools give you as output to the Pandora XML scheme.

Make sure you have the latest monitoring software for your system before beginning. As I found out with the latest Nehalem-based Apple XServe, the previous version of Server Monitor doesn’t have support yet for the new XServe so ipmitool returned nothing. Upgrading to Server Monitor 1.7 gave me the ability to get the new information (it probably installed a newer version of ipmitool). I use ipmitool as an example since it is available on Mac and Linux (I don’t know about Windows) and it’s fairly simple and straightforward.

There is an example of the ipmi2xml script in the Pandora plugins folder when you install the client (at least from SVN) for the previous version of the XServe. In this weblog I’ll talk about how to adapt it to the new one (adapting it to any other type of return is trivial and left as an exercise to the reader – a minimum of PHP or other programming language knowledge required)

So first off, we’ll see what type of information ipmitool returns. There are 2 returns we are interested in: chassis status (what the external chassis shows) and sensor status (internal sensors) – on command line we type:

sh-3.2# ipmitool -H 172.30.188.99 -U pandora -P yourpassword chassis status
System Power         : on
Power Overload       : false
Power Interlock      : inactive
Main Power Fault     : false
Power Control Fault  : false
Power Restore Policy : previous
Last Power Event     :
Chassis Intrusion    : inactive
Front-Panel Lockout  : inactive
Drive Fault          : false
Cooling/Fan Fault    : true
Front Panel Light    : off

If you’re familiar with Pandora FMS XML scheme, you’ll know that there are 3 types of values: binary, numeric and string. Strings and numeric are straightforward, they give you a string that you can use for review and graphs. Binary expects 0 if something is wrong, 1 if something is good. You can see from the return that some things are false, inactive, off when everything is good and true, active, on if something is bad or vice versa so each of these instances will have to be converted while others might have to be ignored.

In ipmi2xml this is done with a switch (see PHP documentation for syntax) statement:

foreach ($status as $name => $data) {
     switch($name) {
        ## False is good
        case "Power Overload":
        case "Main Power Fault":
        case "Power Control Fault":
        case "Drive Fault":
        case "Cooling/Fan Fault":
                $data = ($data == "false" ? 1 : 0);
                print_xml_sensor ($name, $data);
                break;
        ## Inactive is good
        case "Power Interlock":
                $data = ($data == "inactive" ? 1 : 0);
                print_xml_sensor ($name, $data);
                break;
        ## On is good
        case "System Power":
                $data = ($data == "on" ? 1 : 0);
                print_xml_sensor ($name, $data);
                break;
        ## Off is good
        case "Front Panel Light":
                $data = ($data=="off" ? 1 : 0);
                print_xml_sensor ($name, $data);
                break;
        ## Ignore the following values
        case "Last Power Event":
        case "Power Restore Policy":
        default:
                break;

        }
}

As you can see, each $data variable is replaced using an inline if-else statement, then passed on in a function and ended by the switch break. This should be simply adaptable by you as it’s very similar to an Excel if-else statement. A single equal sign (=) means setting the previous variable to something so don’t use it (it will always return true), a double equal sign (==) is comparing two variables which you should use. For inverse you can do !=. Three equal signs (=== or !==) means compare type as well so you shouldn’t use it unless you know what you’re doing don’t use it. Remember to close off each line with a semicolon (;). If you don’t put break; the switch statement will fall through and execute the next comparison as well (maybe good if you know what you’re doing).

Now for the sensor status:
This is the output:

PCI Slot 1 Pwr   | 9.000      | Watts      | ok    | na        | na        | na        | na        | na        | na
PSU2 Fan Out     | na         | RPM        | na    | na        | na        | 1024.000  | 18048.000 | na        | na
PSU2 Fan In      | na         | RPM        | na    | na        | na        | 1024.000  | 18048.000 | na        | na
PSU1 Fan Out     | 6784.000   | RPM        | ok    | na        | na        | 1024.000  | 18048.000 | na        | na
PSU1 Fan In      | 7040.000   | RPM        | ok    | na        | na        | 1024.000  | 18048.000 | na        | na
PSU2 5V STBY     | na         | Volts      | na    | na        | na        | 4.725     | 5.292     | na        | na
PSU2 12V         | na         | Volts      | na    | na        | na        | 11.340    | 12.600    | na        | na
PSU2             | na         | Watts      | na    | na        | na        | 0.000     | na        | na        | na
PSU1 5V STBY     | 4.914      | Volts      | ok    | na        | na        | 4.725     | 5.292     | na        | na
PSU1 12V         | 12.348     | Volts      | ok    | na        | na        | 11.340    | 12.600    | na        | na

Since this might look different on several systems (it even differs between versions of ipmitool) I’ll go over the full modifications. At //Begin of Sensor in the PHP script you’ll see. I start out with exploding (making an array out of a string) the variable by each newline (\n) which on my system, each sensor has it’s own line.

Then the foreach will go over each line and split it by the | string which results in something like this:

[1] => Array
        (
            [0] => CPU A Core
            [1] =>  1.264
            [2] =>  Volts
            [3] =>  ok
            [4] =>  na
            [5] =>  na
            [6] =>  1.000
            [7] =>  1.368
            [8] =>  na
            [9] =>  na
        )

A quite uniform format for all sensors. Except for $tmp[0] which is the first line of the return and is invalid so I’ll unset it:

unset ($tmp[0]);

Then I’ll go through each line of the $tmp variable in the next foreach and print out the necessary variables in XML.

foreach ($tmp as $value_arr) {
        print_xml_sensor (trim($value_arr[0]).' ('.trim($value_arr[2]).')', trim ($value_arr[1]), "generic_data");
}

Which will return this:

<module><name>PSU2 5V STBY (Volts)</name><data>na</data><type>generic_data</type></module>
<module><name>PSU2 12V (Volts)</name><data>na</data><type>generic_data</type></module>
<module><name>PSU2 (Watts)</name><data>na</data><type>generic_data</type></module>
<module><name>PSU1 5V STBY (Volts)</name><data>4.914</data><type>generic_data</type></module>
<module><name>PSU1 12V (Volts)</name><data>12.348</data><type>generic_data</type></module>
<module><name>PSU1 (Watts)</name><data>200.000</data><type>generic_data</type></module>
<module><name>Session Audit (discrete)</name><data>0x0</data><type>generic_data</type></module>

Of course you might have to clean data out. Eg. if the data is na, maybe there is no sensor there so you might want to skip it (use continue;). Put the following before the print_xml_sensor

        if (trim($value_arr[1]) == "na") {
                continue;
        } elseif (trim($value_arr[2]) == "discrete") {
                continue;
        }

It’s that simple. You could experiment with it yourself or use my examples. It will be in SVN soon.


    Written by:



    2 comments
    1. Hi, Do you have a version of this that does not attempt to connect to ipmi via SOL (IE uses local ipmi)? I have a bunch of xenservers which I would like to run this on, however SOL does not seem to want to function so the connection obviously fails I have attempted disabling the connect options however it basically kills the script.

    2. Okay forget that previous message. HP's don't seem to have IPMI Lan capability, however I have managed to get it working with ipmi lan. Now to work out why the script doesn't actually work...

    Leave a comment

    Your email address will not be published. Required fields are marked *

    This site uses Akismet to reduce spam. Learn how your comment data is processed.