WAN monitoring and the Internet-based model

When we think about WAN monitoring we usually start from the basics: the behaviour of remote communication links will directly affect the performance of our applications.

Therefore, we understand that if traffic over the communications link experiences high levels of latency this will negatively impact the response time that our users observe when accessing the applications.

In April 2018 Network Computing published an article exposing the seven most common causes for latency generation in a network.

We found it interesting, out of the seven reasons, four of them correspond to situations that have to do with the design, management and behaviour of communications links:

Number of jumps or distance between origin and destination.
Presence of bottlenecks.
QoS is not implemented, or its configuration is deficient.
Implementation of non-optimal routing schemes.

As we are monitoring personnel, we immediately said to ourselves: let’s see those reasons, let’s establish a monitoring procedure for each one.

However, we then thought that it would not be superfluous to reflect on the monitoring of WAN networks in light of the problems identified by Network Computing, but also in terms of how the design of such networks has evolved.

Let’s say that if we want to know if a communications link presents or not a bottleneck situation, we can use Pandora FMS to:

Monitor the router in question via SNMP in order to have information on the overall performance of this device.
Define the interface of the router on which you want to monitor the bottleneck condition.
Obtain information about the WAN technology used; that is, we would ask ourselves if it is a MPLS link or a MetroEthernet backbone, etc.
Make use of the knowledge that one has of the behaviour of the link and of the application. In other words, we would check the records of previous measurements, as long as they exist of course.
Define what a bottleneck means to us. We could say that we are about to have a bottleneck when we have a very high percentage of bandwidth used and the application response time is very low.
Monitor the percentage of bandwidth usage. Here we could use the SNMP protocol or the NetFlow protocol if the device supports it, or we could even validate what information our service provider provides.

This article details how we can monitor bandwidth with Pandora FMS.

Also to define the response time of the application or applications that generate traffic on the link in question.
Define an alarm that prevents us from a bottleneck situation, correlating the three variables: the percentage of bandwidth used, the response time of the application and the time during which both conditions coincide in abnormal stages.
Evaluate the behaviour of this monitoring scheme for a period of time that should cover at least a full business cycle, and whether it works well to replicate it for different ports with the same type of link and the same application.
Establish a structure of visibility that allows us to monitor all the links considering all the headquarters of our organization, both regional and central.

With this example the procedure we usually follow to monitor WAN networks is well reflected, taking advantage of the facilities that Pandora FMS offers us to:

Identify the physical elements involved, as well as the technologies implemented in the links in question, considering all the knowledge previously accumulated on the behaviour of the devices and links.
Generate the logic associated with the evaluation.
Define a set of variables that we want to measure and their behaviour thresholds. And to define and implement the procedure to obtain information on these variables.
Implement thresholds and alarms.
Test whether it works to transfer all links with similar conditions, defining a global visualization schema.

Now, this line of thought seems to be ideal for a WAN scheme based on private links.

For this scheme where we are the owners of the router, we have negotiated and contracted a communications service, we have accumulated knowledge about the performance of both the link and the application and we are also responsible for the administration of the entire platform.

But is this approach to WAN monitoring really viable when we have a cloud-based or software-designed scheme?

What should we change in our way of facing the challenge of WAN monitoring when the design of these networks contemplates other services beyond private links?

Internet-based WAN Scheme

The remote communications scheme based on private links is undoubtedly still in place and will probably take a long time or will never disappear completely.

The basic reasons for maintaining this type of scheme are that they are operating platforms in which the behaviour of the links is highly stable and their performance is highly predictable.

However, we cannot deny that there is a tendency to migrate to technologies associated with the cloud, SD-WAN, to resort to platforms such as SaaS, IaaS, and so on.

In short, we are witnessing the incorporation of the WAN scheme that has been called Internet-based.

Reasons for this trend? Well, from general things like costs based on “The more you use it, the more you pay”, the increase in mobile workers and shorter implementation times, to more specific things like a better cost/benefit ratio for small offices, for example.

Here we do not intend to present reasons in favour of one or the other model; we start by accepting the trend and we want to reflect on how we should adapt our way of taking on monitoring.

How can we adapt the monitoring?

Depending on whether we are facing the WAN monitoring project of an Internet-based platform or a hybrid platform, we must consider the following elements:

Eliminating our attachment to hardware

We already said it in the article in which we analysed the challenges of SD-WAN, in this same blog.

Network administrators usually have a very strong connection to the hardware; we normally configure each switch, router and firewall command by command.
This way of working gives us a deep knowledge about platform but has the problem that it is very laborious, error-prone and can slow down changes.
In an Internet-centric WAN scheme we may never know the device in which a communications service is implemented or we may not have access to them.
Therefore, the idea is to think less about devices, configurations, and commands and think more about rules, services, user experience, and vendor performance commitments.

Identifying all providers and services involved

On a WAN platform based on private links, the group of administrators is regularly responsible for all services that support users’ access to a particular application.

Services such as user account access, profile management, DNS, NAT, Gateway, IPS, etc. Having all the services under control makes it easier, in theory at least, to have a global vision of the dependencies that could affect the performance of a given application.

However, for all these services there is also a tendency to contract them in the cloud. Or it may happen that our service provider, such as SaaS, subcontracts other companies to offer a Gateway service, for example.

This situation leads us to the moment we need to monitor a WAN service, we must expand our scope beyond the behaviour of the link as such, and include the entire universe of dependencies that are being supplied in the cloud.

Data obtained from suppliers

On an Internet-centric WAN platform, services are regularly outsourced to a larger number of providers.

To the extent that many authors argue that, in the presence of Internet-based WANs or hybrid models, WAN administrators should concentrate their efforts on achieving what they call “cloud governance.

This governance is therefore challenged by a greater number of service commitments and more heterogeneity in the conditions of the agreements.

Monitoring in particular can be complicated, since we can assume that a provider can offer data on the performance of its service in formats completely different from those used by another provider.

In addition, we can also expect greater variability in the actual performance of each regional office, for example, as this insurance performance will be underpinned by different providers with unequal services.

A good strategy could be to:

Try to make the most of each negotiation, establishing as a condition for the closing of an agreement collaboration agreements in terms of monitoring and reporting problems.
Define monitoring procedures based on an approach focused on the experience of our users in the use of their applications.
Keep an eye out for the fact that what works for users in one regional office does not necessarily apply directly to users in another, even though they may eventually access the same application.

With all these changes, it is valid to ask whether WAN monitoring, as we do today, will be relevant in the medium term or whether it will blur into application monitoring and cloud governance.

Perhaps we should change our approach to the issue, as we put it in this article.

However, knowledge about communications protocols, associated services (DNS, IPS, etc.), our expertise in SNMP, Netflow and, above all, the effort dedicated to implementing monitoring tools, understand how they work, how they adapt to our needs, how they fit to other systems, how to do analysis and optimization, that will surely remain in force and will always be an important value.

We invite you to share your experiences in this interesting world of WAN monitoring and review all the facilities offered by a tool like Pandora FMS, visiting this link.