Using Multipath TCP to enhance bandwidth and withstand outages - Course Monster Blog

Written by Marbenz Antonio | 16/06/2022 5:29:47 AM

MultiPath TCP (MPTCP) allows you to bundle numerous channels between systems to boost capacity and resilience in the event of a breakdown. Using two virtual guests, we demonstrate how to construct a simple configuration to test MPTCP. Then we’ll look at the specifics of a real-world scenario, measuring bandwidth and latency when single links fail.

What is MultiPath TCP (MPTCP)

Red Hat Enterprise Linux 9 (RHEL 9) and later support the MultiPath TCP daemon (mptcpd) for multipath TCP configuration. But what is MPTCP and how can it assist us?

Multipath TCP allows a transport connection to function across many pathways at the same time. It became an experimental standard in 2013 and was eventually superseded by the Multipath v1 definition through RFC 8684 in 2020.

Consider a basic network connection. A client launches a browser and connects to a website hosted on a server. To do this, the client connects to the server through a TCP connection and communicates over that channel.

There are many potential causes of network failures:

A firewall located between the client and the server can terminate the connection.
The connection may slow down anywhere along the packet’s journey.
Both the client and the server communicate over a network interface, and if that interface fails, the entire TCP connection fails.

MPTCP can assist us in mitigating the consequences of some of these situations. Instead of a TCP connection, we will use MPTCP to establish an MPTCP socket between client and server. This enables the development of new MPTCP “subflows.”

These subflows can be viewed as different channels via which our system can connect with the destination. These can employ various resources, routes to the destination system, and even media such as ethernet, 5G, and so on.

The application can communicate across the network as long as one of the subflows remains operational. MPTCP can help us consolidate the throughput of several subflows into a single MPTCP connection, in addition to boosting resilience.

“Isn’t that what bonding/teaming were designed for?” you may question. Bonding/teaming concentrates on the interface/link layer and most modes are limited to a single media. Also, even when using bonding in aggregation mode, most modes can only boost bandwidth if you run many TCP sessions – with a single TCP session, you are limited to the throughput of a single interface.

What is the practical use of MPTCP?

MPTCP is beneficial when you wish to combine connections to boost resilience against network difficulties or to combine network throughput.

In this part, we’ll utilize MPTCP in a real-world scenario with two RHEL 9 laptops. One laptop is wired — 1 Gbit ethernet with low latency. The second connection is wireless, which has a greater latency but makes the laptop portable, allowing it to be used anywhere in the house.

MPTCP enables us to create subflows over both of these connections, allowing applications to become independent of the underlying media. Of course, the subflows’ latency and throughput differ, but both provide network access.

A simple demo of MPTCP in action

To get a feel for MPTCP, we’ll start with a modest configuration of a hypervisor (running Fedora 35) and two KVM guests with RHEL 9 on top. Each visitor has three network interfaces:

1st KVM guest: 192.168.4.13/24, 10.0.1.13/24, 10.0.2.13/24
2nd KVM guest: 192.168.4.14/24, 10.0.1.14/24, 10.0.2.14/24
Hypervisor: 192.168.4.1/24

Following that, we will activate MPTCP using sysctl on both guests and install the following packages:

# echo "net.mptcp.enabled=1" > /etc/sysctl.d/90-enable-MPTCP.conf
# sysctl -p /etc/sysctl.d/90-enable-MPTCP.conf
# yum -y install pcp-zeroconf mptcpd iperf3 nc

We can now begin monitoring the traffic for the visitors’ network interfaces. Open a terminal session for each of the two visitors, then run the following commands in both terminals and keep them running:

# pmrep network.interface.in.bytes -t2

We are now ready to execute iperf3 in server mode on the first guest, which will begin listening on a port and waiting for incoming connections. On the second guest, we run iperf3 in client mode, which connects to the server and measures bandwidth.

While iperf3 opens IPPROTO IP sockets by default (as seen by strace), we want it to use the MPTCP protocol and open IPPROTO MPTCP sockets. We can either alter the source code and recompile the app, or we can use the mptcpize tool to change the channel type:

1st guest:> mptcpize run iperf3 -s
2nd guest:> mptcpize run iperf3 -c 10.0.1.13 -t3

The mptcpize program allows unmodified current TCP applications to use the MPTCP protocol by dynamically modifying the type of generated sockets using libcall hijacking.

We can now use the ss tool on the first guest to confirm that the iperf3 server is listening on an mptcp connection. The ‘tcp-ulp-mptcp’ tag indicates that the paired socket is utilizing the MPTCP ULP (Upper Layer Protocol), indicating that the socket is an MPTCP subflow, in this instance an MPTCP listener.

1st guest:> ss -santi|grep -A1 5201
LISTEN 0   4096   *:5201 *:*
  cubic cwnd:10 tcp-ulp-mptcp flags:m token:0000(id:0)/0000(id:0) [..]

Our terminals with pmrep commands running reveal that each client only has one interface.

MPTCP may now be configured. Each MPTCP connection, by default, employs a single subflow. Because each subflow is analogous to a simple TCP connection, the MPTCP default behavior is analogous to plain TCP. To take advantage of features such as throughput bundling or network outage resilience, you must enable MPTCP to use multiple subflows on the relevant systems.

The following command informs the MPTCP protocol that each MPTCP connection can have up to two subflows in addition to the first, for a total of three:

2nd guest:> ip mptcp limits set subflow 2

When and why should MPTCP introduce new subflows? There are several mechanisms. In this example, we will instruct the server to request that the client produce further subflows to other server addresses supplied by the server.

The following command instructs the client’s operating system to accept up to two of these subflow formation requests per connection:

2nd guest:> ip mptcp limits set add_addr_accepted 2

Now we will configure the first guest — that will run the server — to also deal with up to two additional subflows, with the limits set subflow 2. In the following two commands, we configure additional IPs get announced to our peer to create the two additional subflows:

1st guest:> ip mptcp limits set subflow 2
1st guest:> ip mptcp endpoint add 10.0.2.13 dev enp3s0 signal
1st guest:> ip mptcp endpoint add 192.168.4.13 dev enp1s0 signal

Now, if we run mptcpize run iperf3 [..] on the second guest again, we should see this:

n.i.i.bytes n.i.i.bytes n.i.i.bytes n.i.i.bytes
lo          enp1s0       enp2s0       enp3s0
byte/s      byte/s       byte/s       byte/s
N/A         N/A          N/A          N/A
0.000       91.858     1194.155       25.960
[..]
0.000       92.010       26.003       26.003
0.000    91364.627    97349.484    93799.789
0.000  1521881.761  1400594.700  1319123.660
0.000  1440797.789  1305233.465  1310615.121
0.000  1220597.939  1201782.379  1149378.747
0.000  1221377.909  1252225.282  1229209.781
0.000  1232520.776  1244593.380  1280007.121
0.000      671.831      727.317      337.415
0.000       59.001      377.005       26.000

Congrats! Traffic from one application layer channel is split among numerous subflows in this case. Because our two VMs are running on the same hypervisor, our throughput is very certainly restricted by the CPU. We are unlikely to witness enhanced performance with several subflows, but we can see many channels being used.

If you want to go one step farther, you can run:

1st guest:> mptcpize run ncat -k -l 0.0.0.0 4321 >/dev/null
2nd guest:> mptcpize run ncat 10.0.1.13 4321 </dev/zero

From the application’s perspective, this will create a single channel and transmit data in just one way, indicating that a single channel is also distributed. We may then disable network interfaces on the two guests and see how traffic changes.

Using MPTCP in the real world

Now that we’ve got a basic system up and going, let’s look at a more realistic configuration. Customers frequently want Red Hat to enable programs to swap media transparently, perhaps combining the bandwidth of such media. MPTCP can be used as a workable solution in such cases. The following configuration makes use of two Thinkpads running RHEL 9, as well as an MPTCP configuration that makes use of subflows via ethernet and WLAN.

Each media has distinct capabilities:

Ethernet: high bandwidth, low latency, although the system is cable-bound
Wireless: lesser bandwidth, and greater latency, but the system is more mobile

So, would MPTCP assist us in surviving ethernet disconnects and maintaining network communication? It does. The screenshot below shows a Grafana depiction of data captured with Performance Co-Pilot (PCP), with WLAN traffic in blue and Ethernet traffic in yellow.

At first, a program, in this case, the download of a Fedora installation image, consumes all of the available bandwidth on both media. When we detach the ethernet cable and relocate the Thinkpad to the garden, the network connectivity remains active. We eventually return home, reconnect to ethernet, and regain full bandwidth.

These Ethernet ports would typically operate at 1 Gbit/second. To make this more equivalent to wireless bandwidth, I have limited the interfaces here to 100 Mbit/second.

What did the application experience? The console output shows that iperf3 had degraded bandwidth for some time, but had always network connectivity:

# mptcpize run iperf3 -c 192.168.0.5 -t3000 -i 5
Connecting to host 192.168.0.5, port 5201
[  5] local 192.168.0.4 port 50796 connected to 192.168.0.5 port 5201
[ ID] Interval        Transfer Bitrate     Retr  Cwnd
[  5]   0.00-5.00   sec  78.3 MBytes   131 Mbits/sec  271   21.2 KBytes
[..]
[  5] 255.00-260.00 sec  80.0 MBytes   134 Mbits/sec  219   29.7 KBytes
[  5] 260.00-265.00 sec  80.2 MBytes   134 Mbits/sec  258   29.7 KBytes
[  5] 265.00-270.00 sec  79.5 MBytes   133 Mbits/sec  258   32.5 KBytes
[  5] 270.00-275.00 sec  56.8 MBytes  95.2 Mbits/sec  231   32.5 KBytes
[  5] 275.00-280.00 sec  22.7 MBytes  38.1 Mbits/sec  244   28.3 KBytes
[..]
[  5] 360.00-365.00 sec  20.6 MBytes  34.6 Mbits/sec  212   19.8 KBytes
[  5] 365.00-370.00 sec  32.9 MBytes  55.2 Mbits/sec  219   22.6 KBytes
[  5] 370.00-375.00 sec  79.4 MBytes   133 Mbits/sec  245   24.0 KBytes
[  5] 375.00-380.00 sec  78.9 MBytes   132 Mbits/sec  254   24.0 KBytes
[  5] 380.00-385.00 sec  79.1 MBytes   133 Mbits/sec  244   24.0 KBytes
[  5] 385.00-390.00 sec  79.2 MBytes   133 Mbits/sec  219   22.6 KBytes
[..]

While we can see that the remaining links work well, we need to keep in mind that both parties must speak MPTCP. So:

Either all of the programs are MPTCP’ized, either natively or through mptcpize.

Alternatively, one system acts as an access gateway, acting as a transparent TCP-to-MPTCP proxy. A second internet-connected server then serves as a counterpart, running a comparable MPTCP-to-TCP reverse proxy. Others are also configuring proxy software to tunnel ordinary TCP over MPTCP.

Summary

Several attempts have been made in the past to overcome the problem of transparent bundling of network connections. The initial attempts employed basic, standard TCP channels as the foundation, but they had drawbacks, hence MPTCP was created.

If MPTCP becomes the conventional method for bundling bandwidth and surviving uplink failures, more apps will likely implement MPTCP sockets natively. Other Linux-based devices, such as Android phones, are also expected to follow suit.

MPTCP support was introduced to TechnologyPreview in RHEL 8, although without the userland component mptcpd, which was later included in RHEL 9. Everything in this blog post was done without utilizing mptcpd – with mptcpd running, an API for adding/removing subflows, etc. is available, which we performed by hand in this article.

Here at CourseMonster, we know how hard it may be to find the right time and funds for training. We provide effective training programs that enable you to select the training option that best meets the demands of your company.

For more information, please get in touch with one of our course advisers today or contact us at training@coursemonster.com

View full post