Computer system monitoring: advantages, procedures and use
This post is also available in: Spanish
Computer system monitoring: advantages, procedures and use
Most company’s workforce is based on their computer systems, therefore these must be capable of responding in any situation, and sometimes at any given time of the day. Monitoring theses systems has become a fundamental task to manage all of a company’s IT infrastructure, with the following main goals in mind:
- Taking maximum advantage over a company’s HW resources.
- Instance prevention and problem detection.
- Notifying possible issues
In general these objectives can be summarized into one single, very quantifiable, objective: Cutting down costs, less instances, less time used and higher client satisfaction rate.
System monitoring tools must be centered on processes, memory, storage and net connections. In our article on network monitoring tools, you can see a comparative countdown of the best tools in 2016. All of them are also system monitoring tools.
Apart from relying on a good monitoring tool, we must establish a problem solving protocol, which will a key to solve them in the best way possible.
In this article we won’t only try to make you see the need for a monitoring, but also try to explain how to establish the action protocol and monitor that as well, using the same tools.
Advantages of monitoring. Why should I monitor my computer systems?
We’re going to list the main reasons why you should think about applying monitoring to your computer systems.
- You’ll be able to configure events and alarms related with them. Some examples of alarms can be applied to warn about full hard drives, RAM occupation at over 80%, excessive access to discs with writing permissions, too many threads open on the same system, etc.
- Being able to access executive information on the status of our installations and checking up on our most critical technological assets.
- Access to our computer system’s status in real time.
- Improve the efficiency and performance in maintenance tasks performed on the system.
- Detecting instance origins.
- Creating system inventories (maps, lists).
- Planning growth based on the real use of your systems. Through usage reports, you can detect tendencies and know when you’ll need more storage space, a new server, or a memory upgrade. Under the same premises we can detect systems that are underused.
- Cost reduction
Steps to create a good system monitoring scheme
Monitoring systems is not as complicated as it may seem. The priority is order and discipline. Up next, we’ll show the steps to follow to perform system monitoring which is both efficient and complete.
- Performing a complete system analysis over the devices we want to monitor. This is the most important and complex part of the process. On many occasions we don’t have this inventory at hand and for this reason it’s one of the most important characteristics that a monitoring tool should have. In the article on network monitoring, we talk about the main characteristics to keep in mind when doing this. The inventory should be classified as follows:
- Component type (server, router, switch, firewall, etc.)
- Elements within said component (Harddrives, RAM, applications, server, etc.)
- Component brand (Sparc, Intel, HP-UX, Windows, IIS, Apache, Oracle, etc.)
- Device IP
- Monitoring priority: how important is the monitoring on a scale of 1 to 10.
- Once we’ve inventoried our systems, we must gather the different parts responsible in the main areas of an installation.
Security, systems, networks, servers and applications.
- The main alarms have to be defined, for each type of component, element and brand defined in step one.
This’ll be very important to later deploy the configurations sorted by monitored element types. For example some possible alarms could be: full harddrives, oversaturated processors, limited bandwidth, etc.
- Tresholds must be defined for every alarm with the corresponding parameters and levels to launch the alarm.
For harddrives and/or partitions, there’ll be a percentage of free disc space defined, otherwise an alarm should go off.
For RAM you’ll be able to establish an 80% treshold, so if that server occupied RAM goes over that treshold, an alarm will go off.
We should continue to do this for all elements listed in previous steps.
- Establishing the communication and action protocol
It’s quite to define communication channels (via SMS, email, WhatsApp, push, etc.) and how the alarm attention process will be. First, second or third level support protocols can be established. In this section we’ll detail on action protocols when it comes to handling instances.
- Making a comparation between different monitoring tools and choosing which one adapts the best to your budget and the requirements established in previous steps. It’s of prime importance that the tool chosen is capable of monitoring the items with the most priority from our initial inventory.
- Write an installation plan for the new monitoring system. For this we should follow a set of rules:
- Maintain existing security measures
- Minimize the number of middle-man systems standing between the monitoring system and critically important systems
- Minimize impact on the system to study
- Install and configure the chosen software package
Action protocol when facing instances on our systems
One of the previously steps we commented on and which has a lot of importance is defining the action protocol when we’re performing a system monitoring.
- First, we should identify the types of support that need to be given, and which will be the groups that will attend to that support.
Normally, it’s recommendable to create groups according to levels. The first level will be those on the frontline and the alerts will pass on through levels if they cannot be solved.
- We should dwell on the duration of the support and if we want it to be available 24/7.
Once we define the hours and different support groups we must be able to establish a communications system that allows alert transferring among groups. For this purpose there are many open or paid licensing tools for instance management.
- The next step would be to install and configure our Instance Management tools. This must be integrated with our system monitoring scheme.
- We must establish the quality of the service we’re offering. Instance types, maximum timespan to solve said instances, etc.
- We must establish the KPIs we want to monitor to control our instance management system. It’ll be here where, if we’ve chosen our system monitoring tool correctly, we can use it to monitor said KPIs.
The main KPIs we want to take into account are the amount of instances generated, classified by element type and severeness. This’ll be quite useful to measure our devices and to register each instance to its corresponding owner. Solving times or the time that instances stay on each level will be very important data to detect improvements and optimizations in our support.
- One of the most important steps, finally, and that many times is forgotten or badly performed, is the capacity to offer the instance management the knowledge to solve said issues.
Instances generated on our systems can sometimes be solved, and at other times (because of costs or complexity) we’ll just have to accept them and move on. That’s why it’s very important to establish the way to feed the system so that when the same problems arise, our system already has a solution to implement, therefore saving us time and money.
As we have seen, monitoring our systems gives us many advantages for our company and the employees that are behind said system. We also would like to insist on how our system monitoring tool is very important, but not more important than the process to execute in order to monitor computer systems and the protocol for action when instances come along.
Do you know about more advantages to system monitoring? Would you add something to system monitoring or the attention and solving process?