Last updated: July 15, 2022

Executive Summary

The digital transformation of corporations worldwide has led to all-pervading connectivity and high-speed data Exchange with massive productivity growths. Network infrastructure plays such a fundamental role that ensuring its reliability and security is absolutely vital for every organisation today.

Yet networks are becoming increasingly abstract, understood in such metaphors as a ‘public cloud’ where we do not even think in terms of switches or routers anymore, let alone how to configure them by hand. Vendors of network Equipment are making networking more seamless, and consequently, the expertise of engineers is slowly shifting from knowing how to use a screwdriver to how to secure a network with tens of thousands of IoT devices.

Flow Data - by-Martin-Adams @ Unsplash

SDN, NFV and virtualization goes hand-in-hand with this trend – enabling corporations to Focus their resources on the core business instead of managing the network. This leads to a bandwidth explosion in corporate networks, which makes the traditional approaches of network monitoring appear to be a primitive tool. We at Flowmon are re-thinking the old concepts and endeavouring to tear down the walls that limit progress.

This technical reading is about our belief that merging flow and packet level visibility into one versatile solution is the Technology that will help you scale to future performance and capacity needs. The solution also retains detailed information about network traffic and presents the outputs in a straightforward and understandable way.

You are about to discover how flow data, often perceived as a billing and top statistics tool, can, when mastered, fully replace full packet capture and analysis and provide never-seen-before and future-ready scalability.

The history of network monitoring: Packet analysis

TB of data - by Massimo Botturi @ Unsplash

Packet analysis looks inside communications to analyse ist content. There is no aggregation, compression or trimming involved, and data is stored in its original size. Consequently, this method has extremely demanding performance and disc capacity requirements.

Just imagine the capturing of a network with 250 Mbps traffic on average. This equals a data load with over 31 MB per second, 1.8 GB per minute, 108 GB per hour and 2.6 TB a day. In the case of 10Gbps networks we reach hardly believable numbers – it would be more than 100 TB of stored data per day.

However, large volumes of data are not the only drawback. The principal limitation of packet analysis is encrypted traffic. Without the encryption key, we cannot understand the content of any transferred data, and often not even uncover the transfer protocol or application. Nevertheless, volumes of encrypted traffic are constantly growing.

Continual, full-scale traffic recording (full packet capture) demands the right technical equipment, especially high-speed storage arrays with adequate capacity.

Such an approach to network monitoring is very expensive, suitable only for critical infrastructure and networks with a specific purpose. It must be underlined that storing such data may not be the only problem. Once the data is stored, any troubleshooting involves extensive mining of information that requires considerable experience and skillset.

For the majority of network incidents outside of the scope of expensive continuous full packet capture, corporations rely on another approach.

On Demand Packet Capture

It is called on-demand packet capture. When employing this approach, we capture packets only when needed – typically when we deal with system compatibility issues – for instance, when discovering missing or damaged packets. On-demand packet capture is a simple method, affordable to every network administrator, but it does have its pros and cons.

The limitation of this approach is that the administrator has to decide in advance which traffic should be stored. Consequently, there is no option to reach the traffic Archive to get the right information for analysis when an incident occurs. It requires a network administrator to go to a physical location (e.g. server room) with their notebook, connect the notebook to a mirror port or TAP and to carry out the network traffic recording.

Problems may arise when the location is far away, and also in optical network interfaces and 10Gpbs infrastructures – limitations that could hardly be overcome with a notebook.

Although the need for capturing packets is not going away, the demand, however, is certainly shrinking. There is a clear scalability problem, especially because of the ever growing number of devices connected to a network or the number of applications and services delivered from cloud that require higher bandwidths.

Solutions for full-scale traffic recording and analysis are very resource consuming and, consequently, expensive. Also, there are technology limits in a high-speed networking environment and restricted possibilities of use when traffic is encrypted.

The future of network monitoring: Enriched flow data

When it comes to network traffic monitoring, troubleshooting or threat detection, network engineers would rarely think there were two options at their disposal. The first one is full packet capture and analysis providing complete network visibility. The other is flow data.

Flow data is an abstraction of the network traffic itself. Flow statistics are created as an aggregation of the network traffic; using the source IP address, destination IP address, source port, destination port and protocol number as attributes that identify the individual flow records.

The content of the communication is not stored, and the achievable aggregation rate is about 500:1. With the information listed above, we are able to analyse traffic structure, identify end-stations transferring large amounts of data or to troubleshoot network issues and wrong configurations.

In other words, we can handle 80% of network incidents, as Gartner has reported since 2013.

It is evident that flow data does not contain enough Information for some tasks. By contrast, as a result of packet analysis, the IT department is usually overloaded with barely manageable volumes of detailed data.

IPFIX as a solution

When we combine both perspectives and extend traditional flow data with information from the application layer, we can get the appropriate detail, providing an insight into data communication, flexible reporting and effective troubleshooting of operational issues and the automatic detection of security incidents.

This approach is called enriched flow data, leveraging flexibility of the IPFIX protocol. In our experience, we are now able to handle 95% of network incidents with the most scalable, cost-efficient and easy-to-use solution based on flow data.

The best known implementation of this technology is Cisco’s NBAR2 (Next Generation Network-Based Application Recognition). Flow data monitoring is combined with continual packet analysis that extends the traffic statistics with the name of an application or application protocol. Based on this information, modern flow collectors enable traffic reporting and analysis.

HTTP and HTTPS

One of the most widespread communication protocols is HTTP, or its encrypted version HTTPS. Today it is used to provide access to websites, but this is not its sole function. The protocol is also the basis of communication between the components of business systems, or applications working with sensitive data (e.g. electronic banking). By identifying this transfer protocol, we
can extend flow data statistics by fundamental HTTP request attributes – a hostname or URL information.

Thanks to SNI (Server Name Indication), we can get hostname Information even when the HTTPS protocol is used. Similarly, we can get other information from HTTP communication; for example, the operating system and its version, the identification of a browser and its version or a device type in the case of mobile phones. And this is only one example of many protocols for which we can use L7 information without the necessity of manual data mining.

Network Performance Monitoring (NPM)

Network Performance Monitoring

Nonetheless, flow data can be enriched by something perhaps even more powerful in the modern world – Network Performance Monitoring (NPM). NPM metrics can significantly help with network performance troubleshooting. Using the Server-Response-Time and Round-Trip-Time metrics, we can distinguish between delays in the network infrastructure (e.g. a malfunctioning access point), from delays in the server (e.g. insufficient HW resources).

This kind of information is crucial for fast network troubleshooting. The delay and jitter metrics should interest us especially when we use VoIP calls or Video conferences, as they can indicate bad audio and video quality. When we are talking about transferring large volumes of data, we are interested mainly in the number of TCP retransmissions, which can indicate problems on the physical layer (e.g. interference, faulty port) and a lower bit rate, or out-of-order packets, which can signpost failures in communication links.

On-Demand Full Packet Capture

And when this level of information is still insufficient, Flowmon allows triggering of on-demand full packet capture. This can be done manually or automatically on a detected event. When this happens, the capturing filter is determined autonomously by the system, narrowing the captured data volume to an absolute minimum, retaining only the relevant part of the traffic. This can obviously be done remotely in any part of the network and at speeds of 100Gbps; unachievable with Wireshark or WinPCAP.

The benefits of flow data monitoring and application layer analysis are clear. We get more detailed information about data communication, better capabilities for traffic analysis. At the same time we retain the excellent compression rate of Network traffic statistics vs. original traffic volume to scale to multiple hundred-Gigabit networks.

Also, the system aggregates the most important information, so it can be delivered at a glance, avoiding the necessity of manual data mining with packet analysis. This leads to a tremendous reduction of Mean-Time-To-Resolve while lowering the skillset required to use the solution. With Flowmon it is always possible to carry out full-scale traffic recording if necessary using the same platform.

Why choose flow over packet?
Business and technical benefits explained

In the earlier part of this technical reading we identified differences between continuous packet capture and flow data in the context of using these technologies to successfully monitor network traffic. Let us summarize the benefits of enriched flow data over packet capture:

Due to an average reduction ratio of raw traffic vs. required storage for respective flow records of 500:1 (as opposed to packet analysis 1:1), flow technology is much more scalable.

Budget requirements on packet analysis technologies rarely allow monitoring of the entire network traffic. So, it is deployed only to monitor top critical systems as opposed to flow, where covering the entire corporate network traffic, including data centres and cloud is a standard scheme.

Troubleshooting in general is not conducted in real time. In corporations, it often takes days until a reported incident is reviewed by a network administrator. With only a limited data retention period, retrospective analysis would be impossible. Flow monitoring can easily provide retention of weeks or months, so you can easily prioritise your work and focus on retrospective analysis whenever you have finished more important tasks.

Seamless deployment, integration with existing network equipment, compatibility with a wide range of flow sources, fast training of administrators are among many reasons it is so easy to introduce flow technology to your network and get an instant value.

The level of detail provided by packet analysis makes it suitable for deep-dive forensic analysis of persistent problems. Companies turn to flow to minimize the time needed for root cause analysis and gain more time for root cause remediation using human readable dashboards and drill down capabilities.

Flow Data and IPFIX Visbility with FlowMon

Too detailed granularity of packet analysis means higher costs, lower scalability and a much higher required skillset to use. However, only a small percentage of the captured data is relevant. Enriched flow data on the other hand keeps the most interesting and important information so that 95% of network incidents can be resolved with it. Additionally, Flowmon enables on-demand full packet capture for the rest of cases.

Packet analysis had been built with unlimited visibility in mind. It is well suited for time consuming forensics of persistent problems. To restore business-as-usual promptly, corporations are turning to Flowmon to be provided with analytical workflows and automation to help streamline resolution.

With more and more traffic being encrypted, packet analysis becomes ueless. While exporting Flow data from encrypted traffic, Flowmon focuses on non-encrypted IP headers that will help to resolve 80% of incidents. Additionally, it uses different techniques to extract information from the application layer otherwise hidden to an inexperienced eye.

Public cloud providers do not allow tapping into their network to enable full packet analysis. However, both Cloud providers and virtual hypervisors often export some form of flow data compatible with Flowmon, enabling seamless deployment of quality network monitoring.

The level of detail provided by packet analysis makes it suitable for deep-dive forensic analysis of persistent problems. Companies turn to Flowmon to minimize the time needed for root cause analysis and gain more time for root cause remediation using human readable dashboards, context aware presentations and drill-down capabilities.

Use cases

Troubleshooting using Packet Capture

I have my packet analyzer with continuous capture in place. So, let us hope that my rolling buffer still keeps the data I need. Thankfully, we can download the PCAP containing the traffic of IP address 192.168.222.87 and open the traffic in Wireshark.

In the figure above we can see the communication between user and services.invea.com. Domain services.invea.com is resolved correctly to IP address 89.185.224.79, but after the DNS response was received, a user tried to establish a TCP session with no response from the external IP address. We have to check our firewall settings if this communication is allowed.

The second issue is related to a non-existent domain queried by the user machine. We can see that update.invea.com does not exist, which probably implies a wrong configuration of the user and not a network related issue.

Troubleshooting using Enriched Flow Data

I have Flowmon Probes in place monitoring network and Flowmon Collector with weeks of history of un-sampled and non-aggregated traffic statistics. So, I can directly ask about the DNS question to domain services.invea.com.

I can see the IP address provided by the DNS server as a response and check for the traffic going to that IP address easily. It is possible because flows from the probe corresponding to DNS are enriched with the most important L7 Information from the DNS protocol.

I see that only SYN packets are being transmitted to the Network with no response from an external IP address, which implies the need to check firewall rules. Besides common L3/L4 information, I have visibility into TCP specific items such as default TCP segment (window) size, which can help me to troubleshoot the TCP session.

The non-existing domain is in my flow evidence as well. The domain update.invea.com does not exist, so the DNS Server replies „NXDomain“.

Testimonial: Root cause investigation workflow

Let us imagine a department of Level 3+ engineers in a 50 thousand people bank. These engineers focus on root cause analysis of network incidents that no other team before could resolve. For example, to figure out why a VPN connection between the customer and the bank, both on different continents, had outages. It is very common that these engineers spend many hours digging in Wireshark through petabytes of data generated by hundreds of different systems in a complicated and heterogeneous environment. And this is exactly a situation of one Flowmon customer who asked us to deliver an alternative and more effective solution for dealing with operational incidents.

It is important to say that our customer built a whole platform to help them with network operational tasks. It was based on a commercial tool for continuous packet capture, a customized open source software for flow monitoring, and an SNMP based tool showing data transfer heat maps in real time.

Soon it became clear that maintaining, supporting and upgrading the custom solution to fit everyday operations was too expensive and time-consuming. So, they sought NetFlow/IPFIX technology to replace the original solution. The bank’s IT department looked for a solution to completely replace the original one. Although budget was not an issue, the choice was not as easy as it first appeared.

Testing different vendors, sometimes the problem was data aggregation, sometimes no virtualization, but always it was the slowness in terms of the time needed to provide measured results.

Their dream solution should:

  • provide not only context-aware top-level dashboards, but one that allows manual drill down to any flow.
  • not aggregate stored data and to keep the raw flows for as long as the storage lasts.
  • not necessarily be dependent on its own sensors, since they cannot afford to be locked into a single technology due to their heterogeneous environment.
  • be virtualized, so management and migration is as flexible as possible.
  • combine flow monitoring with on-demand full packet capture.
  • provide, most importantly, outputs of the measured statistics of the flow data faster than the platform they built
    themselves ten years ago from an open source tool.

Then the team came across Flowmon, which fully fitted their needs. Since the proof-of-concept project, Flowmon has become fundamental in their tool set. It has become the root cause investigation workflow itself. The engineers now start with the dashboard, go into top-level statistics, deeper onto levels of NetFlow and then only focus on a small part of the traffic where they can run full packet capture.

Conclusion

The dynamics and diversity of today’s networks challenge the prevailing network monitoring approach. Facing an increase in network speeds, visibility gaps caused by migration to cloud, IoT and software defined networking, packet capture solutions struggle to bring expected results promptly and at a reasonable price.

Packet capture solutions were designed at the time when the dynamics of today’s network environments would have been hard to imagine. Nowadays, they work well in specific use cases, but they cannot cope with the flexibility, scalability and ease of use of flow data in most of the everyday use cases that network engineers face.

We have demonstrated a use case where flow data with extended visibility is equally powerful to full packet capture and packet analysis. On the other hand, it is fair to say that even with extended flow level visibility you still might face issues where analysis of PCAPs is unavoidable.

We in Flowmon Networks believe that merging flow and packet level visibility into one versatile solution is the technology that will help you scale to future performance and capacity needs. So, let‘s do continuous flow monitoring and packet capture when needed. At the end of the day, you will most probably need to analyze PCAPs less than you expect. Go Flowmon and see how it can help to your organisation.

We will be happy to consult you and look forward to hearing from you!