Network Buffer Bloat

Jim Getty, who was an inventor of the X Window System in the 80s, has a bone to pick about performance of his networks. He suspects there is a problem with TCP buffers related to network congestion and round trip time (RTT).

…you don’t see the full glory of TCP RTT confusion caused by buffering if you have a bad connection as it reset TCP’s timers and RTT estimation; packet loss is always considered possible congestion. This is a situation where the “cleaner” the network is, the more trouble you’ll get from bufferbloat. The cleaner the network, the worse it will behave. And I’d done so much work to make my cable as clean as possible&

At this point, I realized what I had stumbled into was serious and possibly widespread; but how widespread?

Very widespread. I hate to spoil the story, but here’s the conclusion:

By inserting such egregiously large buffers into the network, we have destroyed TCP’s congestion avoidance algorithms. TCP is used as a “touchstone” of congestion avoiding protocols: in general, there is very strong pushback against any protocol which is less conservative than TCP. This is really serious, as future blog entries will amplify.

What this means is that increasing the size of your network connection is not going to give you a performance boost. The Internet used to feel faster because, well, it was faster. The shift in traffic towards massive file sizes and streams of data appears to be incompatible with the network’s ability to regulate flow. Did you know that NetFlix alone is said to be “20 percent of all Internet traffic during the typical American evening”?

What this also means is that traffic shaping could be improved by using techniques already available, but product vendors and service providers first have to admit there is a problem with their progress model. Let’s hope that the providers do not try to use this as an excuse to take even more control of the network (e.g. anti-neutrality).

The home router situation is probably much grimmer, from what I’ve experienced. We have a very large amount of deployed home network kit (hundreds of millions of boxes) much of which is no longer maintained, even for security updates (which is why the home router problem is so painful, and dangerous in my opinion). It seems that within 6 months to a year, the engineers working on that firmware have moved on to new products (and/or new companies), and that kit with serious problems (like that which has inhibited deployment of ECN) never, ever gets fixed.

You can easily audit/measure your buffers and join the debate using tools like the ICSI Netalyzr from Berkeley.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.