According to the National Security Agency, cloud misconfigurations represent the most significant...
Using Multipath TCP to enhance bandwidth and withstand outages
MultiPath TCP (MPTCP) allows you to bundle numerous channels between systems to boost capacity and resilience in the event of a breakdown. Using two virtual guests, we demonstrate how to construct a simple configuration to test MPTCP. Then we’ll look at the specifics of a real-world scenario, measuring bandwidth and latency when single links fail.
What is MultiPath TCP (MPTCP)
Red Hat Enterprise Linux 9 (RHEL 9) and later support the MultiPath TCP daemon (mptcpd) for multipath TCP configuration. But what is MPTCP and how can it assist us?
Multipath TCP allows a transport connection to function across many pathways at the same time. It became an experimental standard in 2013 and was eventually superseded by the Multipath v1 definition through RFC 8684 in 2020.
Consider a basic network connection. A client launches a browser and connects to a website hosted on a server. To do this, the client connects to the server through a TCP connection and communicates over that channel.
There are many potential causes of network failures:
- A firewall located between the client and the server can terminate the connection.
- The connection may slow down anywhere along the packet’s journey.
- Both the client and the server communicate over a network interface, and if that interface fails, the entire TCP connection fails.
MPTCP can assist us in mitigating the consequences of some of these situations. Instead of a TCP connection, we will use MPTCP to establish an MPTCP socket between client and server. This enables the development of new MPTCP “subflows.”
These subflows can be viewed as different channels via which our system can connect with the destination. These can employ various resources, routes to the destination system, and even media such as ethernet, 5G, and so on.
The application can communicate across the network as long as one of the subflows remains operational. MPTCP can help us consolidate the throughput of several subflows into a single MPTCP connection, in addition to boosting resilience.
“Isn’t that what bonding/teaming were designed for?” you may question. Bonding/teaming concentrates on the interface/link layer and most modes are limited to a single media. Also, even when using bonding in aggregation mode, most modes can only boost bandwidth if you run many TCP sessions – with a single TCP session, you are limited to the throughput of a single interface.
What is the practical use of MPTCP?
MPTCP is beneficial when you wish to combine connections to boost resilience against network difficulties or to combine network throughput.
In this part, we’ll utilize MPTCP in a real-world scenario with two RHEL 9 laptops. One laptop is wired — 1 Gbit ethernet with low latency. The second connection is wireless, which has a greater latency but makes the laptop portable, allowing it to be used anywhere in the house.
MPTCP enables us to create subflows over both of these connections, allowing applications to become independent of the underlying media. Of course, the subflows’ latency and throughput differ, but both provide network access.
A simple demo of MPTCP in action
To get a feel for MPTCP, we’ll start with a modest configuration of a hypervisor (running Fedora 35) and two KVM guests with RHEL 9 on top. Each visitor has three network interfaces:
-
1st KVM guest: 192.168.4.13/24, 10.0.1.13/24, 10.0.2.13/24
-
2nd KVM guest: 192.168.4.14/24, 10.0.1.14/24, 10.0.2.14/24
-
Hypervisor: 192.168.4.1/24
Following that, we will activate MPTCP using sysctl on both guests and install the following packages:
# echo "net.mptcp.enabled=1" > /etc/sysctl.d/90-enable-MPTCP.conf # sysctl -p /etc/sysctl.d/90-enable-MPTCP.conf # yum -y install pcp-zeroconf mptcpd iperf3 nc
We can now begin monitoring the traffic for the visitors’ network interfaces. Open a terminal session for each of the two visitors, then run the following commands in both terminals and keep them running:
# pmrep network.interface.in.bytes -t2
We are now ready to execute iperf3 in server mode on the first guest, which will begin listening on a port and waiting for incoming connections. On the second guest, we run iperf3 in client mode, which connects to the server and measures bandwidth.
While iperf3 opens IPPROTO IP sockets by default (as seen by strace), we want it to use the MPTCP protocol and open IPPROTO MPTCP sockets. We can either alter the source code and recompile the app, or we can use the mptcpize tool to change the channel type:
1st guest:> mptcpize run iperf3 -s 2nd guest:> mptcpize run iperf3 -c 10.0.1.13 -t3
The mptcpize program allows unmodified current TCP applications to use the MPTCP protocol by dynamically modifying the type of generated sockets using libcall hijacking.
We can now use the ss tool on the first guest to confirm that the iperf3 server is listening on an mptcp connection. The ‘tcp-ulp-mptcp’ tag indicates that the paired socket is utilizing the MPTCP ULP (Upper Layer Protocol), indicating that the socket is an MPTCP subflow, in this instance an MPTCP listener.
1st guest:> ss -santi|grep -A1 5201 LISTEN 0 4096 *:5201 *:* cubic cwnd:10 tcp-ulp-mptcp flags:m token:0000(id:0)/0000(id:0) [..]
Our terminals with pmrep
commands running reveal that each client only has one interface.
MPTCP may now be configured. Each MPTCP connection, by default, employs a single subflow. Because each subflow is analogous to a simple TCP connection, the MPTCP default behavior is analogous to plain TCP. To take advantage of features such as throughput bundling or network outage resilience, you must enable MPTCP to use multiple subflows on the relevant systems.
The following command informs the MPTCP protocol that each MPTCP connection can have up to two subflows in addition to the first, for a total of three:
2nd guest:> ip mptcp limits set subflow 2
When and why should MPTCP introduce new subflows? There are several mechanisms. In this example, we will instruct the server to request that the client produce further subflows to other server addresses supplied by the server.
The following command instructs the client’s operating system to accept up to two of these subflow formation requests per connection:
2nd guest:> ip mptcp limits set add_addr_accepted 2
Now we will configure the first guest — that will run the server — to also deal with up to two additional subflows, with the limits set subflow 2
. In the following two commands, we configure additional IPs get announced to our peer to create the two additional subflows:
1st guest:> ip mptcp limits set subflow 2 1st guest:> ip mptcp endpoint add 10.0.2.13 dev enp3s0 signal 1st guest:> ip mptcp endpoint add 192.168.4.13 dev enp1s0 signal
Now, if we run mptcpize run iperf3 [..]
on the second guest again, we should see this:
n.i.i.bytes n.i.i.bytes n.i.i.bytes n.i.i.bytes lo enp1s0 enp2s0 enp3s0 byte/s byte/s byte/s byte/s N/A N/A N/A N/A 0.000 91.858 1194.155 25.960 [..] 0.000 92.010 26.003 26.003 0.000 91364.627 97349.484 93799.789 0.000 1521881.761 1400594.700 1319123.660 0.000 1440797.789 1305233.465 1310615.121 0.000 1220597.939 1201782.379 1149378.747 0.000 1221377.909 1252225.282 1229209.781 0.000 1232520.776 1244593.380 1280007.121 0.000 671.831 727.317 337.415 0.000 59.001 377.005 26.000
Congrats! Traffic from one application layer channel is split among numerous subflows in this case. Because our two VMs are running on the same hypervisor, our throughput is very certainly restricted by the CPU. We are unlikely to witness enhanced performance with several subflows, but we can see many channels being used.
If you want to go one step farther, you can run:
1st guest:> mptcpize run ncat -k -l 0.0.0.0 4321 >/dev/null 2nd guest:> mptcpize run ncat 10.0.1.13 4321 </dev/zero
From the application’s perspective, this will create a single channel and transmit data in just one way, indicating that a single channel is also distributed. We may then disable network interfaces on the two guests and see how traffic changes.
Using MPTCP in the real world
Now that we’ve got a basic system up and going, let’s look at a more realistic configuration. Customers frequently want Red Hat to enable programs to swap media transparently, perhaps combining the bandwidth of such media. MPTCP can be used as a workable solution in such cases. The following configuration makes use of two Thinkpads running RHEL 9, as well as an MPTCP configuration that makes use of subflows via ethernet and WLAN.
Each media has distinct capabilities:
- Ethernet: high bandwidth, low latency, although the system is cable-bound
- Wireless: lesser bandwidth, and greater latency, but the system is more mobile
So, would MPTCP assist us in surviving ethernet disconnects and maintaining network communication? It does. The screenshot below shows a Grafana depiction of data captured with Performance Co-Pilot (PCP), with WLAN traffic in blue and Ethernet traffic in yellow.
At first, a program, in this case, the download of a Fedora installation image, consumes all of the available bandwidth on both media. When we detach the ethernet cable and relocate the Thinkpad to the garden, the network connectivity remains active. We eventually return home, reconnect to ethernet, and regain full bandwidth.
These Ethernet ports would typically operate at 1 Gbit/second. To make this more equivalent to wireless bandwidth, I have limited the interfaces here to 100 Mbit/second.
What did the application experience? The console output shows that iperf3
had degraded bandwidth for some time, but had always network connectivity:
# mptcpize run iperf3 -c 192.168.0.5 -t3000 -i 5 Connecting to host 192.168.0.5, port 5201 [ 5] local 192.168.0.4 port 50796 connected to 192.168.0.5 port 5201 [ ID] Interval Transfer Bitrate Retr Cwnd [ 5] 0.00-5.00 sec 78.3 MBytes 131 Mbits/sec 271 21.2 KBytes [..] [ 5] 255.00-260.00 sec 80.0 MBytes 134 Mbits/sec 219 29.7 KBytes [ 5] 260.00-265.00 sec 80.2 MBytes 134 Mbits/sec 258 29.7 KBytes [ 5] 265.00-270.00 sec 79.5 MBytes 133 Mbits/sec 258 32.5 KBytes [ 5] 270.00-275.00 sec 56.8 MBytes 95.2 Mbits/sec 231 32.5 KBytes [ 5] 275.00-280.00 sec 22.7 MBytes 38.1 Mbits/sec 244 28.3 KBytes [..] [ 5] 360.00-365.00 sec 20.6 MBytes 34.6 Mbits/sec 212 19.8 KBytes [ 5] 365.00-370.00 sec 32.9 MBytes 55.2 Mbits/sec 219 22.6 KBytes [ 5] 370.00-375.00 sec 79.4 MBytes 133 Mbits/sec 245 24.0 KBytes [ 5] 375.00-380.00 sec 78.9 MBytes 132 Mbits/sec 254 24.0 KBytes [ 5] 380.00-385.00 sec 79.1 MBytes 133 Mbits/sec 244 24.0 KBytes [ 5] 385.00-390.00 sec 79.2 MBytes 133 Mbits/sec 219 22.6 KBytes [..]
While we can see that the remaining links work well, we need to keep in mind that both parties must speak MPTCP. So:
Either all of the programs are MPTCP’ized, either natively or through mptcpize.
Alternatively, one system acts as an access gateway, acting as a transparent TCP-to-MPTCP proxy. A second internet-connected server then serves as a counterpart, running a comparable MPTCP-to-TCP reverse proxy. Others are also configuring proxy software to tunnel ordinary TCP over MPTCP.
Summary
Several attempts have been made in the past to overcome the problem of transparent bundling of network connections. The initial attempts employed basic, standard TCP channels as the foundation, but they had drawbacks, hence MPTCP was created.
If MPTCP becomes the conventional method for bundling bandwidth and surviving uplink failures, more apps will likely implement MPTCP sockets natively. Other Linux-based devices, such as Android phones, are also expected to follow suit.
MPTCP support was introduced to TechnologyPreview in RHEL 8, although without the userland component mptcpd, which was later included in RHEL 9. Everything in this blog post was done without utilizing mptcpd – with mptcpd running, an API for adding/removing subflows, etc. is available, which we performed by hand in this article.
Here at CourseMonster, we know how hard it may be to find the right time and funds for training. We provide effective training programs that enable you to select the training option that best meets the demands of your company.
For more information, please get in touch with one of our course advisers today or contact us at training@coursemonster.com