main

MonitoringNetwork

How SDN change our vision on networks?

June 28, 2018 — by Alexander La rosa0

SDN-featured.png

SDN: Challenges for Network Administrator’s and Monitoring

SDN Software Defined Networking

SDN: Challenges for Network Administrator’s and Monitoring

Last December, Acumen Research and Consulting, a global provider of market research, published a report titled “Software Defined Network (SDN) Market” where they estimated a compound annual growth rate (CAGR) of 47% for SDN in the period of 2016 – 2022.

In 2016, Cisco launched its DNA (Digital Network Architecture), which is more based on software than hardware.

In 2017, Cisco acquired Viptela to complete its SD-WAN (Software Defined WAN) offer. Also, in 2017, IDC (International Data Corporation) estimated for SD-WAN infrastructure and services revenues a CAGR of 69.6% reaching $8 billion in 2021.

All those statistics show us that the business around network is changing, but apart from new offers from our ISP or cloud services provider, does SDN really imply a change in the way of understanding, designing, managing and monitoring networks?

We have to start by clarifying that SDN is an architectural approach not a specific product. Actually SDN is the result of the application of virtualization paradigm to the world of networks.

In general, virtualization seeks to separate the logical part from the physical part in any process. In server virtualization for example, we can create a fully functional server without having any particular physical equipment for it.

Let’s translate this paradigm to a basic function of a switch:

When a packet arrives at a switch, the rules built into its firmware tell the switch where to put the packet, so all the packets that share the same conditions are treated in the same way.

In a more advanced switch, we can define rules in a configuration environment through a command line interface (CLI) but we have to configure each one of the switches in our platform.

When applying virtualization, we have all the rules for all the switches (logical part) separated from the switches themselves (physical part). SDN applies this principle to all networking equipment.

Therefore SDN proposes the separation of:

  • Control Level: in this level, an application called SDN Controller decides how packets have to flow through the network, and it also performs configuration and management activities.
  • From Data Level: this level actually enables the movement of the packets from one point to another. Here we can find network nodes (any physical and virtual networking equipment). In SDN we say traffic moves through the network nodes rather than towards or from them.

With those two levels defined the idea is that network administrators can change any network rules when necessary interacting with a centralized control console without touching individual network nodes one by one.

This interaction defines a third level in the architecture called

  • Application Level: In this level we find programs that build an abstract view of the network for decision-making purposes. These applications have to work with user’s needs, service requirements, and management.

In the following image we can see a basic model of SDN architecture:

basic model of SDN architecture

Finally there are two elements to mention:

  • Northbound API: these APIs are used to allow the communication between SDN Controller and applications running over the network. By using a northbound API, an application can program the network and request services from it. They enable basic network functions like loop routing, avoidance, security and modifying or customizing network control among others.

    Northbound APIs are also used to integrate SDN Controller with external automation stacks and cloud operating systems like OpenStack, VCloud Director and CloudStack.

  • Southbound API: These APIs enable the communication between SDN Controller and network nodes. SDN Controller uses this communication to identify network topology, determine traffic flows, define the behavior of network nodes and implement the request generated by a Northbound API.

SDN was originally just about this separation of functions; however the architecture has evolved to embrace the automation and virtualization of network services as well, in order to bestow network administrators with the power to deliver network services wherever they are needed without regard to what specific equipment is required.

This automation implies that SDN-based networks have to detect changes in pattern of traffic flow and select the better path based on parameters like application type, quality of services and security rules.

Up to here, our brief introduction to SDN. If the reader wants to go deeper, we recommend visiting the websites of Open Networking Foundation and SDX central.

So, let’s go back to the original question: does SDN really imply a change in the way of understanding, designing, managing and monitoring networks?

Traditionally, network administrators have a very strong connection to the hardware; we usually configure every switch, router and firewall using a command line interface.

This “usual way of doing things” gives us a deep knowledge about the platform, however we have always agreed that this way of working is laborious, prone to errors and slows down changes. With SDN, we may have to think less about commands and configurations and think more about rules and services.

On the other hand, virtualization has taken a long time to impact the world of networks and has taken longer to make an impact in companies that are not Internet service providers or mega corporations

Then, this change may be less hard for those IT teams that have experience with server virtualization, containers and have faced the challenges of DevOps methodology (a topic we discussed previously in this same blog).

In terms of monitoring, the fundamental challenge is how to monitor networks considering the complexity and transience that SDN implies. For example, how to do application performance monitoring if the network topology can change several times a day.

There are some monitoring tools designed to be at an Application Level as part of Network Management Systems. Those tools face the problem of complexity, doing controller monitoring and regular network monitoring in the devices on the Data Level.

The real challenge with an agile structure is to identify the entry of new devices and automatically adjust the monitoring scheme.

Furthermore, troubleshooting on SDN-based networks requires an important effort in interactivity and contextual analysis. In practice, it will not be enough to see the network as it is in a certain moment, but we will need to move forward and backward in the topology in order to identify the performance problems associated with routes to optimize the whole process.

Therefore, we can foresee a large amount of data extracted from the platform that must be stored and then filtered under a flexible visualization scheme.

Finally, we must say that many of the challenges mentioned here have already been assumed by some monitoring tools. Those tools with flexible architectures and extensive experience in virtual environment monitoring can be successful. We invite you to know the full scope of Pandora FMS in virtualized environments by visiting our website.

Redactor técnico con más de diez años de experiencia manejando proyectos de monitorización. Es un auténtico apasionado del Yoga y la meditación.

Pandora FMSUsability

False Positives in Monitoring

September 10, 2014 — by steve0

False positives (as well as false negatives) are a recurrent issue in our experience in monitoring, and after a while we are pretty sure that it’s worth talking about them.

The best way to approach a problem is using an example: Suppose we use Pandora FMS to monitor a network with 500 servers, in which we have defined to make a connectivity check (ping) to each IP. The most common result is that all checks appear in green, however, sometimes and in a random way, some check appears in red. Once we detect that, we perform the ping manually and we make sure that it works perfectly.

The initial conclusion is that our monitoring system, in this case Pandora FMS, is failing, but what is really happening is that our monitoring system is not configured as it should, and that is exactly where the problem is.

To test it, we just have to do a ping to one of these IP’s that sometimes fail y leave it for hours. We will see that occasionally, in 1 of 1.000 checks or even in 1 of 10.000 the ping fails, but we shouldn’t worry about that because is relatively common for networks to have that behavior sometimes.
The following screenshot shows us how our entire monitoring system is in green and however, a ping from the console fails. If in this precise moment Pandora FMS had been doing a check, it would have probably turned into red.

falsepositives

All monitoring systems have several parameters to control this behavior. Maybe we are interested in having the maximum detail level, as Pandora FMS does by default, or otherwise we want to attenuate the detail to avoid warning at the minimum failure. Below we list many control mechanisms available in Pandora FMS (also available in other monitoring systems) to avoid this kind of behavior:

  • Nº of checks: Sometimes the first ping fails, but the second one works, that’s why almost all the systems have a number of retries. There have been cases of systems where the first ping always failed, and it only worked when we pinged constantly with three retries. In this kind of cases (infrequent), the best option is to use other adapted checks (a custom plugin) instead of the standard check.
  • Timeout: In case we want to check remote systems maybe we need to increase the Timeout response. If we talk about a LAN, a second is more than enough, in the Internet we’d probably find a lot of false positives caused by a very low Timeout. On the other hand, setting a high Timeout of 10 seconds for example, would be a drag for the capacity of our server, because in the worst case it would have to wait 10 seconds per each check considering that the system is not responding.
  • Sensibility package loss: It could be hard to believe, but different ping tools behave differently, and even the same ping tool in different systems behave differently. Sometimes, the monitoring tool allows to set up this behavior to be tuned. We can`t compare the results of tools like ping, fping, hping or nmap as it will return different values. That’s why we need to know if our monitoring tool has settings that are generally respect to the tolerance of package loss or to the speed of transmission of information (related with Timeout and Nº of checks parameters). A bad configuration can make false positives appear. In an extreme case, because of this intolerance, we can find out with our monitoring tool a network with a package loss negligible for other tools. This is a real case with Pandora ICMP Enterprise server, using T3 parameter in the Nmap scan, in which we can appreciate that some systems don’t respond randomly because of a negligible package loss for the most part of the conventional monitoring systems.
  • Flipflop: The phenomenon in which an element that usually behaves in a stable way “bounces” more or less regular. To avoid that these bounces affect to how we perceive the value we will put a bound threshold. As this sometimes has “peaks” we’ll assume that there is a problem when that failure happens twice.
  • Flipflop threshold: To avoid having to wait till the monitorization process finish we will set the flipflop threshold to control the element faster and better. This way, if something fails we will know instantly. It’ssually combined with the previous parameter (Flipflop) so that if it fails we hope to have a confirmation in a shorter time, in Pandora FMS that is called Intensive Monitoring.

In the previous example we set the flipflop threshold in 1 and the flipflop interval in 30 seconds, so that, if anything fails we will be aware and we will repeat the test after 30 seconds. If the fails again, we’ll consider it as down and we will send an alert to the system, if not, we will consider it as a false positive and we will avoid alerting the system.

In conclusion, before claiming that our system has false positives, it’s important to review and properly set up all those elements in our monitoring software to avoid unnecessary alerts.

icon_contact_us download_it-08
Do you want to know more
about false positives in Pandora FMS
Do you want to get Pandora FMS?