Network Blog

Ensuring performance resilience with deduplication

Posted on 5. December 201918. August 2022 by Tobias Schaller

05
Dec

Performance resilience is the ability to ensure the performance of your commercial or home-made appliance in any data center environment. In other words, to ensure that your performance monitoring, cybersecurity or forensics appliance is resilient to common data center issues, such as badly configured networks, inability to specify desired connection type, time sync, power, space, etc.

In this blog, we will look at deduplication and how support of deduplication in your SmartNIC ensures performance resilience when data center environments are not configured properly – router and switch SPAN ports specifically.

Assume the worst

When designing an appliance to analyze network data for monitoring performance, cybersecurity or forensics, it is natural to assume that the environments where your appliance will be deployed are configured correctly and adhere to best practices. It is also fair to assume that you can get the access and connectivity you need. Why would someone go to the trouble of paying for a commercial appliance or even fund the development of an appliance in-house, if they wouldn’t also ensure that the environment meets minimum requirements?

Unfortunately, it is not always like that, as many veterans of appliance installments will tell you. This is because the team responsible for deploying the appliance is not always the team responsible for running the data center. Appliances are not their first priority. So, what happens in practice, is that the team deploying the appliance is told to install the appliance in a specific location with specific connectivity, and that is that. You might prefer to use a tap, but that might not be available, so you need to use a Switched Port Analyzer (SPAN) port from a switch or router for access to network data.

While this might seem acceptable, it can lead to some unexpected and unwanted behavior that is responsible for those grey hairs on the heads of veterans! An example of this unwanted behavior is duplicate network packets.

How do duplicate packets occur?

Ideally, when performing network monitoring and analysis, you would like to use a tap to get direct access to the real data in real time. However, as we stated above, you can’t always dictate that and sometimes have to settle for connectivity to a SPAN port.

The difference between a tap and a SPAN port is that a tap is a physical device that is installed in the middle of the communication link so that all traffic passes through the tap and is copied to the appliance. Conversely, a SPAN port on a switch or router receives copies of all data passing through the switch, which can then be made available to the appliance through the SPAN port.

When configured properly, a SPAN port works just fine. Modern routers and switches have become better at ensuring that the data provided by SPAN ports is reliable. However, SPAN ports can be configured in a manner that leads to duplicate packets. In some cases, where SPAN ports are misconfigured, up to 50% of the packets provided by the SPAN port can be duplicates.

So, how does this occur? What you need to understand with respect to SPAN ports is that when a packet enters the switch on an ingress port, a copy is created – and when it leaves a switch on an egress port, another copy is created. In this case, duplicates are unavoidable. But it is possible to configure the SPAN to only create copies on ingress or egress from the switch, thus avoiding duplicates.

Nevertheless, it is not uncommon to arrive in a data center environment where SPAN ports are misconfigured and nobody has permission to change the configuration on the switch or router. In other words, there will be duplicates and you just have to live with it!

What is the impact of duplicates?

Duplicates can cause a lot of issues. The obvious issue is that double the amount of data requires double the amount of processing power, memory, power, etc. However, the main issue is false positives: errors that are not really errors or threats that are not really threats. One common way that duplicates affect analysis is by an increase in TCP out-of-order or retransmission warnings. Debugging these issues takes a lot of time, usually time that an overworked, understaffed network operations or security team does not have. In addition, any analysis performed on the basis of this information is probably not reliable, so this only exacerbates the issue.

How to achieve resilience

With deduplication built-in via a SmartNIC in the appliance, it is possible to detect up to 99.99% of duplicate packets produced by SPAN ports. Similar functionality is available on packet brokers, but for a sizeable extra license fee. On Napatech SmartNICs, this is just one of several powerful features delivered at no extra charge.

The solution is ideal for situations where the appliance is connected directly to a SPAN port, dramatically reducing the amount of damage that duplicates can cause. But, it also means that the appliance is resilient to any SPAN misconfigurations or other network architectural issues that can give rise to duplicates – without relying on other costly solutions, such as packet brokers, to provide the necessary functionality to complete the solution.

Network Blog

Napatech Smart FPGA NICs: 50% Data Reduction with built-in Deduplication

Posted on 21. October 201911. November 2022 by Patrick Nixdorf

21
Oct

The challenge
More than 50% copies

Duplicate packets are a major burden for today’s network monitoring and security applications. In worst cases, more than 50% of the received traffic is sheer replication. This not only adds excessive pressure in terms of bandwidth, processing power, storage capacity and overall efficiency. It also places severe strain on operations and security teams as they end up wasting valuable time chasing false negatives. Napatech’s intelligent deduplication capabilities solve this by identifying and discarding any duplicate packets, thus enabling up to a 50% reduction in application data load.

Misconfigured SPAN ports

For passive monitoring and security applications, duplicate packets can make up more than 50% of the total traffic volume. This is partly due to TAP and aggregation solutions collecting packets from multiple points in the network – and partly due to misconfigured SPAN ports; a much too common issue in today’s datacenters.

Solution: intelligent deduplication

With deduplication built in via a SmartNIC in the applicance, it is possible to detect all duplicate packets. By analyzing and comparing incoming packets with previously received/stored data, deduplication algorithms discard any replicas, thus easing the burden on the system and greatly optimizing Performance.

Hardware vs Software Deduplication Comparison

Significant cost benefits

By adding deduplication in hardware via a Napatech SmartNIC, significant cost benefits can be achieved at various levels:

At a PERFORMANCE level
For the vast majority of capture deployments, deduplication will dramatically save system resources. By efficiently discarding redundant copies, deduplication can reduce the processing load, PCIe transfer, system memory and disk space requirements by as much as 50%.
At an OPERATIONAL level
At an operational level, the main issue with duplicate packets is that they distort the overview. But with deduplication, operations and security teams avoid wasting valuable time investigating false positives.
At an APPLICATION level
Similar functionality is available on network packet brokers, but for a sizeable extra license fee. On Napatech SmartNICs, deduplication is just one of several powerful features delivered at no extra charge.

Key features

Deduplication in hardware up to 2x100G
Deduplication key calculated as a hash over configurable sections of the frame
Dynamic header information (e.g. TTL) can be masked out from the key calculation
Deduplication can be enabled/disabled per network port or network port group
Configurable action per port group: Discard or pass duplicates / Duplicate counters per port group
Configurable deduplication window: 10 microseconds – 2 seconds

Want to reduce data duplication by as much as 50%? Contact us today!

Network Blog

Application Risk Management

Posted on 27. August 20198. November 2022 by Timur Özcan

27
Aug

Network Blog

Firewall Performance Testing with Xena VulcanBay

Posted on 14. May 201918. July 2022 by Patrick Nixdorf

14
May

In this concrete test case we used a Xena VulcanBay with 2x 40 Gbps QSFP+ interfaces to test some next-generation firewalls regarding their performance. Specifically, we were interested in the following test scenarios:

Pure throughput
High number of connections (session load)
Use of NAT
Realistic traffic
Longer testing periods during which we “pushed” new firewall rules to detect potential throughput breaches

In this article we want to show how we used the Xena VulcanBay including its management, the VulcanManager, and a Cisco Nexus Switch to connect the firewall clusters. We list our test scenarios and give some hints about potential stumbling blocks.

For our tests we had a Xena VulcanBay Vul-28PE-40G with firmware version 3.6.0, licenses for both 40 G interfaces and the full 28 Packet Engines available. The VulcanManager ran on version 2.1.23.0. Since we only used one single VulcanBay (and not several at distributed locations), the only admin user was able to distribute the full 28 Packet Engines equally on these two ports.

For tests with up to 80 G throughput two QSFP+ modules (left) as well as the distribution of the packet engines on these ports (right) were sufficient.

Wiring

We used a single Cisco Nexus switch with sufficient QSFP+ modules and corresponding throughput to connect the VulcanBay to the respective firewall clusters. Since we had connected all firewall clusters as well as VulcanBay to this switch at the same time, and had always used the same IPv4/IPv6 address ranges for the tests, we were able to decide which firewall manufacturer we wanted to test purely with the “shutdown / no shutdown” of individual interfaces. Thus the complete laboratory was controllable from a distance. Very practical for the typical case of a home office employee. Furthermore, it was so easy to connect VulcanBay “to itself” in order to get meaningful reference values for all tests. For this purpose, both 40 G interfaces to VulcanBay were temporarily configured in the same VLAN.

With two lines each for client and server, all firewall clusters were connected to a central switch. Also the VulcanBay from Neox Networks.

There are switches with QSFP+ modules, which are however designed as 4x 10 G and *not* as 1x 40 G. For the connection of the VulcanBay with its 40 G interfaces, the latter is unavoidable.

Thanks to modern QSFP+ slots with 40 G interfaces, a duplex throughput of 80 Gbit/s can be achieved with just two connections.

IP Subnets

In our case we wanted to test different firewalls in Layer 3 mode. In order to integrate this “Device Under Test” (DUT) routing we created appropriate subnets – for the outdated IPv4 protocol as well as for IPv6. The IP subnets simulated by VulcanBay are then directly attached to the firewall. In the case of a /16 IPv4 network, exactly this /16 network must also be configured at the firewall. Especially important is the default gateway, for example 10.0.0..1 for the client IPv4 network. If you additionally use the option “Use ARP” (right side), you do not have to worry about the displayed MAC addresses. The VulcanBay resolves these itself.

The address range must be adjusted so that the tests performed are not equivalent to MAC flooding.

The same applies to IPv6. Here the network is not entered in the usual slash notation, but simply the gateway and the address range are determined. Via “Use NDP” the VulcanBay automatically resolves the Layer 2 MAC address to the Layer 3 IPv6 address.

The “Use Gateway” tells VulcanBay that an intermediate router/firewall should be used for the tests.

MAC Flooding! Depending on the test scenarios used, VulcanBay may simulate millions of IPv4/IPv6 addresses in the client or server segment. This is a pure flood of MAC addresses for every intermediate switch or firewall. Common high-end weights can hold a maximum of 128 k MAC addresses in their MAC address table. If you leave the default range of 16 million (!) IPv4 addresses, or 1.8 x 10^19 IPv6 addresses set by Xena Networks by default, any test results are meaningless. Therefore we strictly recommend to reduce the address ranges from the beginning to realistic values, as you can see in the screenshot above (yellow marked: 65 k addresses).

For reference values, the VulcanBay was also “connected to itself” for all tests. While IPv4 allowed using the same “subnets” networks with different address ranges, IPv6 required subnets within the *same* /64 prefix.

Testcases

1) Pure throughput: In the first test scenario, we were purely concerned with the throughput of the firewalls. For this we chose the “Pattern” scenario, once for IPv4 and once for IPv6, which automatically sets the ratio to 50-50. In the settings we have additionally selected “Bidirectional” to push through data in both directions, i.e. duplex, in both cases. So we could reach the maximum throughput of 80 G with the 2x 40 G interfaces. In order to distribute the bandwidth over several sessions (which in real life is the more realistic test case), we selected 1000 users, who should establish connections from 100 source ports to 10 servers each. Makes 1 million sessions each for IPv4 and IPv6. With a ramp-up time of 5 seconds, i.e. a smooth increase of the connections instead of the immediate full load, the pure test ran through 120 seconds afterwards, before it also had a ramp-down time of 5 seconds.

Test scenario “Pattern” with a 50-50 distribution of IPv4 and IPv6. The “Load Profile” (right) shows the users to be simulated using the time axis.

During the test, the VulcanManager already displays some useful data, such as TCP Connections or Layer 1 throughput. By means of the graphics in the upper area, one gets a good impression at a glance. In the following screenshot you can see that the number of active connections is less than half of the planned one (bad), while Layer 5-7 Goodput has an unattractive kink at the beginning of the test. Both problems turned out to be errors in the IPv6 implementation of the tested device.

While theoretically 2 million sessions at 80 G throughput should have passed the firewall, less than half of them got through cleanly.

The graphic “Active Sessions” does not show the actual active sessions, but the number of simulated users in the Live View during the test as well as in the later PDF report. While the graph is correct for the 2000 users, there were actually 2 million sessions during the test.

2) High number of connections (session load): Also for IPv4 and IPv6, 20 million parallel TCP sessions were established and maintained during this test. Not only the sum of the sessions was relevant, but also the short ramp-up time of only 30 seconds, which corresponded to a setup rate of 667,000 connections per second! The sessions were left standing for 60 seconds, but without transferring any data. Over a further 30 seconds they were terminated again, typical for TCP via FIN-ACK. The aim was that the firewalls to be tested would firstly allow the connections to pass through cleanly and secondly they could also dismantle them cleanly (and thus free up their memory).

Before each test we deleted the MAC address table on the switch as well as the session, ARP and NDP caches on the firewalls. So every test was done from zero to zero.

3) NAT scenarios: The same test as under 1) was used, with the only difference that the IPv4 connections from the client network to the server network were provided with a source NAT on the firewalls. The goal was to find out if this would cause a performance degradation of the firewalls.

4) Realistic Traffic: With a predefined “Datacenter Mix” we were able to simulate the flow of two HTTPS, SMB2, LDAP and AFS (via UDP and TCP) connections for several thousand users with just a few clicks. This was not about a full load test of the firewalls, but about the set-up and dismantling speeds as well as the application detections. Depending on whether the app IDs of the firewalls were activated or deactivated, there were major differences here.

5) 10 minutes of continuous fire with commits: This somewhat more specific test consisted of scenarios 1 and 4, i.e. full load (1) with constant session setup and shutdown (4) at the same time. This ran constantly for 10 minutes, while we installed another 500 rules on each firewall. Here we wanted to find out if this process creates a measurable kink in throughput on the firewalls, which was partly the case.

Test results

At the end of each test, VulcanManager displays the Statistics and Reporting page with all possible details. By “Create Report” you can create a PDF, which contains besides all details also information about the selected test scenario as well as information about the tested device. The challenge is to distinguish the relevant numbers from the less relevant ones and place them in the right context to get meaningful results. During our comparisons of different Next-Generation Firewalls we restricted ourselves to the “Layer 1 steady throughput (bps)” for the throughput test, or the “Successful TCP Connections” for the connection test. Compared to the reference values at which the VulcanBay was connected to itself, this already yielded meaningful comparable results that could be easily displayed both in table form and graphically.

The Statistics and Reporting page provides a rough overview (middle) and the possibility to read test values from all OSI layers and the selected test scenarios (links, fold-out tabs).

Detail of a PDF report with all details.

The various existing “Application Mix” scenarios of Xena Networks do not serve the direct comparison of firewall performance values, but the targeted generation of network traffic. This way, application detections can be checked or other scenarios executed in parallel can be “stressed” a little more.

Further Features

Note that VulcanManager has some other interesting features that we did not use in this case study, such as TLS Traffic (for testing TLS interception) and Packet Replay (for testing custom and more specific scenarios extracted from uploaded PCAPs). Also we have not used many application or protocol oriented test scenarios like Dropbox, eBay, LinkedIn or HTTPS, IMAP, NFS. This is due to our testing purposes, which were strongly focused on pure throughput and number of sessions.

Conclusion

The VulcanBay from XENA Networks is the ideal test device for comparing various next-generation firewalls. Within a very short time we had configured and tested various test scenarios. Only the abundance of test results was initially overwhelming. The trick was to concentrate on the relevant information.

Network Blog

Up to 14x Wireshark Performance Increase – Napatech Link™ Capture Software for Napatech SmartNIC

Posted on 13. May 201915. July 2022 by Patrick Nixdorf

13
May

Solution Description

Wireshark is a widely-used network protocol analyzer allowing users to see what is happening on their networks at a microscopic level. It is the de facto standard across many commercial and non-profit enterprises, government agencies, and educational institutions for troubleshooting and protocol analysis.

Wireshark has a rich feature set including deep inspection of hundreds of protocols, live capture and offline analysis. However, as capable as Wireshark is at inspecting and analyzing network protocols, it will only be as effective as its implementation.

The ability to capture and analyze traffic at lossless rates is of the utmost importance for Wireshark to be successful. To decode all traffic, it is a fundamental requirement that Wireshark “sees everything”. If any traffic is missed, full protocol analysis is not possible. And if the capture server is overburdened and too slow to handle the incoming packet rate, packets are discarded, and information lost forever.

But examining the contents of every network packet is extremely CPU-intensive, especially for a multi-gigabit traffic load. And this is the limiting factor in Wireshark performance: the packet processing on the CPU.

In addressing this challenge, Napatech has created a hardware acceleration solution, based on the Napatech Link™ Capture Software, that alleviates the load on the CPU and thereby greatly increases Wireshark capture perfor- mance.

Key Solution Features

Lossless capture and protocol decode for up to 13 Gbps on a single thread for traffic analysis, inspection and detection
Onboard packet buffering during micro-burst or PCI Express bus congestion scenarios
Advanced host memory buffer management enabling ultra-high CPU cache performance
Packet classification, match/action filtering and zero-copy forwarding
Intelligent and flexible load distribution to as many as 64 queues improving CPU cache performance by always delivering the same flows to the same cores

The Napatech difference

The Napatech Link™ Capture Software dramatically increases capture and protocol analysis, allowing network engineers to utilize the full power of Wireshark to understand network traffic, find anomalies, and diagnose network issues at incredible speeds. The solution offloads processing and analysis of networking traffic from the application software, while ensuring optimal use of the standard server’s resources leading to effective Wireshark acceleration.

Outstanding lossless performance

Optimized to capture all network traffic at full line rate, with almost no CPU load on the host server, the solution demonstrates enormous lossless performance advantages for Wireshark: up to 14x lossless capture and decode performance compared to a standard network interface card (NIC).

Turning acceleration into value

These performance advantages ultimately allow you to:

Maximize your server performance by improving CPU utilization
Minimize your TCO by reducing number of servers, thus optimizing rack space, power, cooling and operational expenses
Diminish your time-to-resolution, thereby enabling greatly increased efficiency

Test configuration

The outstanding improvements achieved with this solution were demonstrated by comparing Wireshark performance running on a Dell PowerEdge R740 with a standard 40G NIC card and the Napatech NT200 SmartNIC with Link™ Capture Software. Test configuration: dual-socket Dell R740 with Intel® Xeon® Gold 6138 2.0 GHz, 128GB RAM running Ubuntu 14.04 LTS.

Lossless throughput tests

For the lossless throughput test, traffic was sent at fixed rates and packet sizes and throughput was measured as the rate at which Wireshark is able to receive and analyze the packets.

Additional testing for “back-to-back frames” was applied as described in the RFC 2544 benchmarking methodology to send a burst of frames with minimum inter-frame gaps to the Device Under Test (DUT) and count the number of frames received/forwarded by the DUT. The back-to-back value is defined as the number of frames in the longest burst that the DUT can handle without the loss of any frames. With same- size capture buffer configurations, the Napatech SmartNIC delivers 60 times higher back-to-back frame performance. When required for highly bursty traffic patterns, the Napatech solution can allocate significantly larger host buffers, providing hundreds of times higher back-to-back capture performance.

Napatech Link™ Capture Software

The stunning benchmarks for Wireshark were achieved by deploying Napatech’s Reconfigurable Computing Platform, based on FPGA-based Link™ Capture Software and Napatech SmartNIC hardware.

Napatech’s Reconfigurable Computing Platform flexibly offloads, accelerates and secures open, standard, high-volume and low-cost server platforms allowing them to meet the performance requirements for networking, communications and cybersecurity applications.

Wireshark

Wireshark, one of the industry’s foremost network protocol analyzers, is an ideal example of the type of critical enterprise applications that can achieve better performance through hardware acceleration with the Napatech LinkTM Capture Software.

Wireshark can be compiled with native support for hardware acceleration based on the Intel hardware and Napatech software. Instructions specific to building Wireshark with support for Napatech are listed in the Installation Quick Guide available at the Napatech Documentation Portal.

Assume the worst

How do duplicate packets occur?

What is the impact of duplicates?

How to achieve resilience

The challengeMore than 50% copies

Misconfigured SPAN ports

Solution: intelligent deduplication

Significant cost benefits

Key features

The challenge
More than 50% copies