Introducing flow formats and their differences

Introducing flow formats and their differences

 

There are multiple flow formats. What are the differences? Which are supported by Flowmon? Check the post to see the answers.

Flow monitoring has become the prevalent method for monitoring traffic in high-speed networks. Several standards of flow format exist and it can be tricky to choose the right one for your needs. In this article we will go through the most common flow formats, providing a basic overview of their history and differences.

Flow monitoring history

The history of flow monitoring goes back to 1996 when the NetFlow protocol was patented by Cisco Systems. Flow data represents a single packet flow in the network with the same 5-tuple identification composed of source IP address, destination IP address, source port, destination port and protocol. Based on this, packets are aggregated into flow records that accumulate the amount of transferred data, the number of packets and other information from the network and transport layer. A typical flow monitoring setup consists of three main components:

Flow exporter – create flow records by aggregating packet information and exports the records to one or more flow collectors (eg. Flowmon Probe).

Flow collector – collects and stores the flow data (eg. Flowmon Collector).

Analysis application – allows the visualization and analysis of the received flow data (eg. Flowmon Monitoring Center – native application of the Flowmon Collector).

Cisco originally developed the protocol for its products. Other manufacturers have followed such an approach and have developed more or less similar proprietary flow data formats.

Cisco standards

NetFlow v5

The first widely adopted version was NetFlow v5, which became available in 2006. NetFlow v5 is still the most common version, and it is supported by a wide range of routers and switches. However, it no longer meets the needs for accurate flow monitoring as it does not support IPv6 traffic, MAC addresses, VLANs or other extension fields.

NetFlow v9

NetFlow v9 brought several added improvements. The most important is support for templates, which allow a flexible flow export definition and ensures that NetFlow can be adapted to provide support for new protocols. Other improvements are the support for IPv6, Virtual Local Area Networks (VLANs) and Multiprotocol Label Switching (MPLS) and other features. NetFlow v9 is supported on most of the recent Cisco routers and switches.

Flexible NetFlow

Cisco still continues to improve NetFlow technology. The next generation is called Flexible NetFlow, and it further extends NetFlow v9. What Flexible NetFlow can export is highly customizable, which allows customers to export almost anything that is passing through the router.

Other vendors

jFlow, NetStream, cflowd

All standards mentioned above are similar to the original Cisco NetFlow standard. jFlow was developed by Juniper networks, NetStream by Huawei and cflowd by Alcatel-Lucent.

Independent standard

IPFIX

The proposal for IPFIX (Internet Protocol Flow Information eXport) protocol was published by the IETF in 2008. IPFIX is derived from NetFlow v9 and should serve as a universal protocol for exporting flow information from network devices to a collector or Network Management System. The IPFIX is more flexible than NetFlow and allows to extend flow data with additional information about network traffic. As an example, our Flowmon IPFIX extensions enrich the IPFIX flow data with application layer protocol metadata, network performance statistics and other information.

In Cisco world IPFIX is usually referred to as NetFlow v10 and provides various extensions similar to Flowmon.

Related standards

NSEL

NSEL (NetFlow Security Event Logging) allows exporting Flow Data from Cisco’s ASA family of security devices. It has a similar format as NetFlow, but requires a different interpretation and has different use-cases – the purpose of NSEL is to track firewall events and logs via NetFlow. Unfortunately, sometimes people got confused by the terminology and consider NSEL compatible with NetFlow. In fact, with NSEL there is not enough information to provide traffic charts or support detailed drill downs and troubleshooting.

sFlow

Unlike NetFlow, sFlow is based on sampling. An sFlow agent obtains traffic statistics using sFlow sampling, encapsulates them into sFlow packets, which are then sent to the collector. sFlow provides two sampling modes – flow and counter sampling:

Flow sampling where the sFlow agent samples packets in one direction or both directions on an interface based on the sampling ratio, and parses the packets to obtain information about packet data content

Counter sampling where the sFlow agent periodically obtains traffic statistics on an interface

Flow sampling focuses on traffic details to monitor and parse traffic behaviors on the network while counter sampling focuses on general traffic statistics.

Due to packet sampling it is however not possible to have an accurate representation of the traffic and some traffic will be missed. Therefore, sampling can limit usage of flow data in cases like network anomaly detection. On the other hand, it can be used for top statistics or DDoS attack detection. Cisco has introduced very similar technology to sFlow which is called NetFlow Lite.

What formats does Flowmon support?

Our standalone Probe allows exporting flow data in NetFlow v5/v9 and IPFIX format. Additionally, the Probe can use the Flowmon IPFIX extension that allows enriching the flow data with additional information, such as network performance statistics (for example, Round-Trip Time, Server Response Time and Jitter) and information from the application protocols (HTTP, DNS, DHCP, SMB, E-mail, MSSQL and others).

The Flowmon Collector can process network traffic statistics from various sources and flow standards, including:

  • NetFlow v5/v9
  • IPFIX
  • NetStream
  • jFlow
  • cflowd
  • sFlow, NetFlow Lite

Conclusion: which flow format to use?

We have introduced the most common flow formats. Although the format you can use depends on your network infrastructure, from our experience in implementing high-performance network monitoring appliances, we highly recommend using NetFlow v9/IPFIX export formats, as they provide the most accurate and comprehensive information.

Firewall Performance Testing with Xena VulcanBay

In this concrete test case we used a Xena VulcanBay with 2x 40 Gbps QSFP+ interfaces to test some next-generation firewalls regarding their performance. Specifically, we were interested in the following test scenarios:

  • Pure throughput
  • High number of connections (session load)
  • Use of NAT
  • Realistic traffic
  • Longer testing periods during which we “pushed” new firewall rules to detect potential throughput breaches

In this article we want to show how we used the Xena VulcanBay including its management, the VulcanManager, and a Cisco Nexus Switch to connect the firewall clusters. We list our test scenarios and give some hints about potential stumbling blocks.

For our tests we had a Xena VulcanBay Vul-28PE-40G with firmware version 3.6.0, licenses for both 40 G interfaces and the full 28 Packet Engines available. The VulcanManager ran on version 2.1.23.0. Since we only used one single VulcanBay (and not several at distributed locations), the only admin user was able to distribute the full 28 Packet Engines equally on these two ports.

For tests with up to 80 G throughput two QSFP+ modules (left) as well as the distribution of the packet engines on these ports (right) were sufficient.

Wiring

We used a single Cisco Nexus switch with sufficient QSFP+ modules and corresponding throughput to connect the VulcanBay to the respective firewall clusters. Since we had connected all firewall clusters as well as VulcanBay to this switch at the same time, and had always used the same IPv4/IPv6 address ranges for the tests, we were able to decide which firewall manufacturer we wanted to test purely with the “shutdown / no shutdown” of individual interfaces. Thus the complete laboratory was controllable from a distance. Very practical for the typical case of a home office employee. Furthermore, it was so easy to connect VulcanBay “to itself” in order to get meaningful reference values for all tests. For this purpose, both 40 G interfaces to VulcanBay were temporarily configured in the same VLAN.

With two lines each for client and server, all firewall clusters were connected to a central switch. Also the VulcanBay from Neox Networks.

There are switches with QSFP+ modules, which are however designed as 4x 10 G and *not* as 1x 40 G. For the connection of the VulcanBay with its 40 G interfaces, the latter is unavoidable.

Thanks to modern QSFP+ slots with 40 G interfaces, a duplex throughput of 80 Gbit/s can be achieved with just two connections.

IP Subnets

In our case we wanted to test different firewalls in Layer 3 mode. In order to integrate this “Device Under Test” (DUT) routing we created appropriate subnets – for the outdated IPv4 protocol as well as for IPv6. The IP subnets simulated by VulcanBay are then directly attached to the firewall. In the case of a /16 IPv4 network, exactly this /16 network must also be configured at the firewall. Especially important is the default gateway, for example 10.0.0..1 for the client IPv4 network. If you additionally use the option “Use ARP” (right side), you do not have to worry about the displayed MAC addresses. The VulcanBay resolves these itself.

The address range must be adjusted so that the tests performed are not equivalent to MAC flooding.

The same applies to IPv6. Here the network is not entered in the usual slash notation, but simply the gateway and the address range are determined. Via “Use NDP” the VulcanBay automatically resolves the Layer 2 MAC address to the Layer 3 IPv6 address.

The “Use Gateway” tells VulcanBay that an intermediate router/firewall should be used for the tests.

MAC Flooding! Depending on the test scenarios used, VulcanBay may simulate millions of IPv4/IPv6 addresses in the client or server segment. This is a pure flood of MAC addresses for every intermediate switch or firewall. Common high-end weights can hold a maximum of 128 k MAC addresses in their MAC address table. If you leave the default range of 16 million (!) IPv4 addresses, or 1.8 x 10^19 IPv6 addresses set by Xena Networks by default, any test results are meaningless. Therefore we strictly recommend to reduce the address ranges from the beginning to realistic values, as you can see in the screenshot above (yellow marked: 65 k addresses).

For reference values, the VulcanBay was also “connected to itself” for all tests. While IPv4 allowed using the same “subnets” networks with different address ranges, IPv6 required subnets within the *same* /64 prefix.

Testcases

1) Pure throughput: In the first test scenario, we were purely concerned with the throughput of the firewalls. For this we chose the “Pattern” scenario, once for IPv4 and once for IPv6, which automatically sets the ratio to 50-50. In the settings we have additionally selected “Bidirectional” to push through data in both directions, i.e. duplex, in both cases. So we could reach the maximum throughput of 80 G with the 2x 40 G interfaces. In order to distribute the bandwidth over several sessions (which in real life is the more realistic test case), we selected 1000 users, who should establish connections from 100 source ports to 10 servers each. Makes 1 million sessions each for IPv4 and IPv6. With a ramp-up time of 5 seconds, i.e. a smooth increase of the connections instead of the immediate full load, the pure test ran through 120 seconds afterwards, before it also had a ramp-down time of 5 seconds.

Test scenario “Pattern” with a 50-50 distribution of IPv4 and IPv6. The “Load Profile” (right) shows the users to be simulated using the time axis.

During the test, the VulcanManager already displays some useful data, such as TCP Connections or Layer 1 throughput. By means of the graphics in the upper area, one gets a good impression at a glance. In the following screenshot you can see that the number of active connections is less than half of the planned one (bad), while Layer 5-7 Goodput has an unattractive kink at the beginning of the test. Both problems turned out to be errors in the IPv6 implementation of the tested device.

While theoretically 2 million sessions at 80 G throughput should have passed the firewall, less than half of them got through cleanly.

The graphic “Active Sessions” does not show the actual active sessions, but the number of simulated users in the Live View during the test as well as in the later PDF report. While the graph is correct for the 2000 users, there were actually 2 million sessions during the test.

2) High number of connections (session load): Also for IPv4 and IPv6, 20 million parallel TCP sessions were established and maintained during this test. Not only the sum of the sessions was relevant, but also the short ramp-up time of only 30 seconds, which corresponded to a setup rate of 667,000 connections per second! The sessions were left standing for 60 seconds, but without transferring any data. Over a further 30 seconds they were terminated again, typical for TCP via FIN-ACK. The aim was that the firewalls to be tested would firstly allow the connections to pass through cleanly and secondly they could also dismantle them cleanly (and thus free up their memory).

Before each test we deleted the MAC address table on the switch as well as the session, ARP and NDP caches on the firewalls. So every test was done from zero to zero.

3) NAT scenarios: The same test as under 1) was used, with the only difference that the IPv4 connections from the client network to the server network were provided with a source NAT on the firewalls. The goal was to find out if this would cause a performance degradation of the firewalls.

4) Realistic Traffic: With a predefined “Datacenter Mix” we were able to simulate the flow of two HTTPS, SMB2, LDAP and AFS (via UDP and TCP) connections for several thousand users with just a few clicks. This was not about a full load test of the firewalls, but about the set-up and dismantling speeds as well as the application detections. Depending on whether the app IDs of the firewalls were activated or deactivated, there were major differences here.

5) 10 minutes of continuous fire with commits: This somewhat more specific test consisted of scenarios 1 and 4, i.e. full load (1) with constant session setup and shutdown (4) at the same time. This ran constantly for 10 minutes, while we installed another 500 rules on each firewall. Here we wanted to find out if this process creates a measurable kink in throughput on the firewalls, which was partly the case.

Test results

At the end of each test, VulcanManager displays the Statistics and Reporting page with all possible details. By “Create Report” you can create a PDF, which contains besides all details also information about the selected test scenario as well as information about the tested device. The challenge is to distinguish the relevant numbers from the less relevant ones and place them in the right context to get meaningful results. During our comparisons of different Next-Generation Firewalls we restricted ourselves to the “Layer 1 steady throughput (bps)” for the throughput test, or the “Successful TCP Connections” for the connection test. Compared to the reference values at which the VulcanBay was connected to itself, this already yielded meaningful comparable results that could be easily displayed both in table form and graphically.

The Statistics and Reporting page provides a rough overview (middle) and the possibility to read test values from all OSI layers and the selected test scenarios (links, fold-out tabs).

Detail of a PDF report with all details.

The various existing “Application Mix” scenarios of Xena Networks do not serve the direct comparison of firewall performance values, but the targeted generation of network traffic. This way, application detections can be checked or other scenarios executed in parallel can be “stressed” a little more.

Further Features

Note that VulcanManager has some other interesting features that we did not use in this case study, such as TLS Traffic (for testing TLS interception) and Packet Replay (for testing custom and more specific scenarios extracted from uploaded PCAPs). Also we have not used many application or protocol oriented test scenarios like Dropbox, eBay, LinkedIn or HTTPS, IMAP, NFS. This is due to our testing purposes, which were strongly focused on pure throughput and number of sessions.

Conclusion

The VulcanBay from XENA Networks is the ideal test device for comparing various next-generation firewalls. Within a very short time we had configured and tested various test scenarios. Only the abundance of test results was initially overwhelming. The trick was to concentrate on the relevant information.

Up to 14x Wireshark Performance Increase – Napatech Link™ Capture Software for Napatech SmartNIC

Solution Description

 

Wireshark is a widely-used network protocol analyzer allowing users to see what is happening on their networks at a microscopic level. It is the de facto standard across many commercial and non-profit enterprises, government agencies, and educational institutions for troubleshooting and protocol analysis.

Wireshark has a rich feature set including deep inspection of hundreds of protocols, live capture and offline analysis. However, as capable as Wireshark is at inspecting and analyzing network protocols, it will only be as effective as its implementation.

The ability to capture and analyze traffic at lossless rates is of the utmost importance for Wireshark to be successful. To decode all traffic, it is a fundamental requirement that Wireshark “sees everything”. If any traffic is missed, full protocol analysis is not possible. And if the capture server is overburdened and too slow to handle the incoming packet rate, packets are discarded, and information lost forever.

But examining the contents of every network packet is extremely CPU-intensive, especially for a multi-gigabit traffic load. And this is the limiting factor in Wireshark performance: the packet processing on the CPU.

In addressing this challenge, Napatech has created a hardware acceleration solution, based on the Napatech Link™ Capture Software, that alleviates the load on the CPU and thereby greatly increases Wireshark capture perfor- mance.

Key Solution Features

 

  • Lossless capture and protocol decode for up to 13 Gbps on a single thread for traffic analysis, inspection and detection
  • Onboard packet buffering during micro-burst or PCI Express bus congestion scenarios
  • Advanced host memory buffer management enabling ultra-high CPU cache performance
  • Packet classification, match/action filtering and zero-copy forwarding
  • Intelligent and flexible load distribution to as many as 64 queues improving CPU cache performance by always delivering the same flows to the same cores

The Napatech difference

 

The Napatech Link™ Capture Software dramatically increases capture and protocol analysis, allowing network engineers to utilize the full power of Wireshark to understand network traffic, find anomalies, and diagnose network issues at incredible speeds. The solution offloads processing and analysis of networking traffic from the application software, while ensuring optimal use of the standard server’s resources leading to effective Wireshark acceleration.

Outstanding lossless performance

 

Optimized to capture all network traffic at full line rate, with almost no CPU load on the host server, the solution demonstrates enormous lossless performance advantages for Wireshark: up to 14x lossless capture and decode performance compared to a standard network interface card (NIC).

Turning acceleration into value

These performance advantages ultimately allow you to:

  • Maximize your server performance by improving CPU utilization
  • Minimize your TCO by reducing number of servers, thus optimizing rack space, power, cooling and operational expenses
  • Diminish your time-to-resolution, thereby enabling greatly increased efficiency

Test configuration

 

The outstanding improvements achieved with this solution were demonstrated by comparing Wireshark performance running on a Dell PowerEdge R740 with a standard 40G NIC card and the Napatech NT200 SmartNIC with Link™ Capture Software. Test configuration: dual-socket Dell R740 with Intel® Xeon® Gold 6138 2.0 GHz, 128GB RAM running Ubuntu 14.04 LTS.

Lossless throughput tests

 

For the lossless throughput test, traffic was sent at fixed rates and packet sizes and throughput was measured as the rate at which Wireshark is able to receive and analyze the packets.

Additional testing for “back-to-back frames” was applied as described in the RFC 2544 benchmarking methodology to send a burst of frames with minimum inter-frame gaps to the Device Under Test (DUT) and count the number of frames received/forwarded by the DUT. The back-to-back value is defined as the number of frames in the longest burst that the DUT can handle without the loss of any frames. With same- size capture buffer configurations, the Napatech SmartNIC delivers 60 times higher back-to-back frame performance. When required for highly bursty traffic patterns, the Napatech solution can allocate significantly larger host buffers, providing hundreds of times higher back-to-back capture performance.

Napatech Link™ Capture Software

 

The stunning benchmarks for Wireshark were achieved by deploying Napatech’s Reconfigurable Computing Platform, based on FPGA-based Link™ Capture Software and Napatech SmartNIC hardware.

Napatech’s Reconfigurable Computing Platform flexibly offloads, accelerates and secures open, standard, high-volume and low-cost server platforms allowing them to meet the performance requirements for networking, communications and cybersecurity applications.

Wireshark

 

Wireshark, one of the industry’s foremost network protocol analyzers, is an ideal example of the type of critical enterprise applications that can achieve better performance through hardware acceleration with the Napatech LinkTM Capture Software.

Wireshark can be compiled with native support for hardware acceleration based on the Intel hardware and Napatech software. Instructions specific to building Wireshark with support for Napatech are listed in the Installation Quick Guide available at the Napatech Documentation Portal.

Ethernet Pakete lügen nicht – nun ja, zumindest in den meisten Fällen.

Sie sagen die Wahrheit, sofern sie nicht falsch erfasst wurden. In diesen Fällen können Pakete in der Tat fett gedruckte Lügen erzählen.

Beim Durchsuchen von Trace-Dateien können wir in den Paketen auf Symptome stoßen, bei denen so mancher überrascht die Stirn runzeln würde. Dies sind Ereignisse, die auf der Oberfläche seltsam erscheinen und sogar unsere Fehlerbehebung für eine Zeit ablenken können. Einige dieser Probleme haben tatsächlich Netzwerk Analysten stundenlang irregeleitet, wenn nicht sogar Tage, wodurch sie Problemen und Ereignissen nachgejagt haben, die einfach nicht im Netzwerk existieren.

Die meisten dieser Beispiele können einfach vermieden werden, indem die Pakete von einem Netzwerk Test Access Point (TAP) aus erfasst werden, anstatt auf dem Rechner, auf dem der Verkehr erzeugt wird. Mit einem Netzwerk TAP können Sie die Netzwerkdaten transparent und unverfälscht erfassen und sehen was wirklich über die Leitung übertragen wird.

Sehr große Pakete

In den meisten Fällen sollten Pakete nicht größer als das Ethernet-Maximum von 1518 Bytes sein, oder das was für die Link-MTU festgelegt wurde. Dies gilt jedoch nur dann, wenn wir nicht mit 802. 1Q-Tags verwenden oder uns in einer Jumbo Frame Umgebung befinden.

Wie ist es möglich Pakete zu haben, die größer sind als das Ethernet-Maximum? Einfach ausgedrückt, wir erfassen sie, bevor sie von der NIC segmentiert werden. Viele TCP/IP-Stacks verwenden heutzutage TCP Segmentation Offload, welche die Last der Segmentierung von Paketen an die NIC delegiert. Der WinPcap- oder Libpcap-Treiber erfasst die Pakete, bevor dieser Vorgang stattfindet, sodass einige der Pakete viel zu groß erscheinen können, um legitim zu sein. Wenn die gleiche Aktivität auf dem Netzwerk erfasst wurde, würden diese großen Frames für den Transport in mehrere kleinere segmentiert.

Zero Delta Zeiten

Zero Delta Zeiten bedeutet, dass es keine gemessene Zeit zwischen den Paketen gibt. Wenn diese Pakete in das Erfassungsgerät gelangen, erhalten sie einen Zeitstempel und eine messbare Delta Zeit. Der Zeitstempel für den Eintritt auf dem Erfassungsgerät konnte nicht mit dem Paketaufkommen mithalten. Würden diese Pakete hingegen mit einem externen Tap-Server erfasst, könnten wir wahrscheinlich eine fehlerfreie Zeitstempelung erhalten.

Vorherige Pakete nicht erfasst

Diese Warnung wird angezeigt, weil Wireshark eine Lücke im TCP-Datenstrom bemerkt hat. Es kann anhand der sequenzierten Zahlen feststellen, dass ein Paket fehlt. Manchmal ist dies aufgrund des Upstream-Paketverlusts berechtigt. Allerdings kann es auch ein Symptom sein, dass der Analysator oder SPAN das Paket fallen gelassen hat, weil es nicht mit der Belastung mithalten konnte.

Nach dieser Warnung sollten Sie nach einer Reihe von doppelten ACK-Paketen suchen, anstatt eines defekten Pakets. Dies deutet darauf hin, dass ein Paket tatsächlich

verloren gegangen ist und erneut übertragen werden muss. Wenn Sie keine erneute Übertragung oder defekte Pakete sehen, konnte der Analyzer oder SPAN wahrscheinlich nicht mit dem Datenstrom mithalten. Das Paket war tatsächlich im Netzwerk, aber wir haben es nicht gesehen.

TCP ACKed unbemerkte Segmente

In diesem Fall wird eine Bestätigung für ein Datenpaket angezeigt, das nicht erfasst wurde. Das Datenpaket hat eventuell einen anderen Weg genommen, oder das Erfassungsgerät hat es einfach nicht wahrgenommen.

Vor Kurzem habe ich diese Ereignisse bei Trace-Dateien gesehen, die von Switches, Routern und Firewalls erfasst wurden. Da die Erfassung von Datenverkehr eine niedrigere Priorität als eine Weiterleitung hat (Gott sei Dank!), kann das Gerät schlichtweg einige der Frames im Datenstrom übersehen. Da wir die Bestätigung gesehen haben, wissen wir, dass das Paket es zu seinem Bestimmungsort geschafft hat.

Zum größten Teil sagen Pakete die Wahrheit. Sie können uns zur Hauptursache unserer Netzwerk- und Anwendungsprobleme führen. Da sie solche klaren und detaillierten Daten präsentieren, ist es sehr wichtig, dass wir sie so nah am Netzwerk wie möglich aufzeichnen. Das bedeutet, dass wir sie während der Übertragung erfassen müssen, und nicht auf dem Server selbst. Dies hilft uns dabei, keine Zeit mit falschen Negativwerten zu verschwenden.

Wenn Sie mehr über die Berücksichtigungen von Netzwerkvisualisierungen für Fachleute erfahren möchten, dann laden Sie unsere kostenlose Informationsschrift herunter, TAP vs SPAN.

https://www.garlandtechnology.com/tap_vs_span_lp