Dynamic monitoring: a new functionality for Pandora FMS
This post is also available in: Spanish
Pandora FMS, version 7.0 NG has been updated to include new functions designed for complex network environments. One of these new additions is dynamic monitoring, a buzzword in the monitoring sector for a while now. So, what is it?
Dynamic monitoring consists of predictive analysis of, and adaptation to, your system’s warning parameters. It is an automated feature and is based on pre-existing data, harvested from the system’s history. Warning and critical thresholds are automatically and dynamically redefined according to information collected during a previously established time period.
Automatically configured thresholds are a big help when it comes to usability and setup of your monitoring tool, saving you the necessity of carrying out a prior systems study in order to fix your thresholds. Pandora will now handle this task automatically.
This obviously relies on pre-existing information, as it’s impossible to know what the normal values of the systems are. When the AI-enabled version appears this will be one of the functions it includes, but we’re not quite there yet.
The dynamic monitoring system uses existing data to calculate trend deviations and, based on those, automatically reconfigures the different modules’ thresholds.
Intelligent work mode analyzes information from a set time period (e.g. one week), establishing average values, trends and deviations from the data. Using this information it establishes warning thresholds that could be either over or under the values (dependent thresholds). The values can be modified manually once established.
Dynamic monitoring is configured from the Pandora FMS console, but requires predictionserver to be enabled on the pandora_server.conf. file.
Establish a range on each module’s individual config file within which dynamic monitoring can take place, and indicate the time interval from which the samples are collected:
In the previous example all data from the last seven days has been collected in order to calculate the thresholds.
Use Dynamic Threshold Min. and Dynamic Threshold Max. for greater flexibility in automatically generated thresholds.
In the screenshot, the minimum value has been incremented by 5% and the maximum by 10%, creating higher thresholds.
These fields can be inverted, reducing the threshold intervals, as below:
There’s also another parameter, Dynamic Threshold Two Tailed, which creates thresholds that are not only above the average values (by default) but which are also below. This kind of operation is similar to using the inverse interval threshold function.
In the graphs below there are two examples that both correspond to dynamic thresholds for a module on which the interval has been established as 24 hours.
In the first example the Dynamic Threshold Two Tailed parameter is not selected:
In the second, Dynamic Threshold Two Tailed is now selected:
Both configurations can, of course, be performed massively with the use of policies.
Some real-life examples will help to better see the configuration and the effect they have on your dynamic monitoring.
Starting from a web latency module apply a basic configuration with a one-week interval:
Once applied, you’ll have the following thresholds:
So, if the module status registers a warning when latency is above 0.33 secs. and critical status when it’s above 0.37 secs.
Keeping in mind that this is a relaxed threshold, you can reduce it by 20% so the alerts are triggered more easily. In order to achieve this, modify the values in the Dynamic Threshold Min. field and use a negative value to lower the threshold minimums. As there isn’t a maximum value, since critical status is registered from a specific time going forward, you don’t have to modify the Dynamic Threshold Max. field:
Once the changes have been applied they show the following status:
And the graph should look something like this:
This example represents monitoring the temperature of a control room. The graph showing the values of the last week shown below:
In this case it is very important that the temperature remain stable, which we can monitor with the Dynamic Threshold Two Tailed parameter to define the upper and lower thresholds. The following configuration was used:
And the automatically generated thresholds:
The graph displays the following:
As can be seen, anything between 23’10 y 26 is considered normal, being the optimal temperature for the location. Any deviation from the established norm will trigger an alert.
If you really need to dial them in, the Dynamic Threshold Min. and Dynamic Threshold Max. parameters are extremely flexible, tweakable to the percentages you need.