Bufferbloat and the Ski Shop

Bufferbloat is undesirable latency caused by a router buffering too much data. It makes your kids say, “The Internet is slow today, Daddy”. It’s caused by routers and other network equipment buffering (accepting for delivery) more data than can be delivered in a timely way. Bufferbloat causes much of the poor performance and human pain experienced using today’s Internet.

Update: I found a way better analogy for explaining Bufferbloat. I write about it in my post Best Bufferbloat Analogy – Ever

First a story…

Imagine a ski shop with one employee. That employee handles everything: small purchases, renting skis, installing new bindings, making repairs, etc. He also handles customers in first-come, first-served order, and accepts all the jobs, even if there’s already a big backlog. Imagine, too, that he never stops working with a customer until their purchase is complete. He never goes out of order, never pauses a job in the middle, not even to sell a Chapstick.

That’s dumb, you say. No store would do that. Their customers – if they had any left – would get really terrible service, and would never know when they’re going to be served. And you would be right.

The Ski-Shop Router

Unfortunately, a lot of network routers (both home and commercial) work just like that fictitious ski shop. And they give terrible service. These routers have a single first-in, first-out queue for packets. All traffic is queued, regardless of whether it’s a big or small packet, whether there has been a lot of traffic from a particular source, or whether things are getting backed up.

Since the router has no global knowledge of what’s happening, it cannot inform a big sender to slow down, or throttle a particular stream of traffic by discarding some data (in the ski shop analogy, sending away a customer who has another long repair job). The dumb router simply accepts the data, buffers it up, and expects to send it sooner or later. To make matters worse, in networking (but not in ski shops), if delays get long enough, computers can resend the data, thinking that the original packet must have been lost. These retransmissions further increase bloat and delay, because there are now two copies of the same data buffered up, waiting to be sent…

This is the genesis of the name “bufferbloat” – the router’s memory gets bloated with buffers of packets. When the router doggedly determines to send that data, it blocks newer sessions from even starting, and the entire network gets slow.

What’s the solution?

The members of the CeroWrt team have been working for the last three years to solve the problem of bufferbloat. We’ve largely succeeded: the CeroWrt firmware works really well. CeroWrt users no longer see problems with “the internet being slow” even when uploading and downloading files, watching videos, etc. We have pushed those changes into the Linux Kernel, and also into OpenWrt mainline, and now its Barrier Breaker release contains the Smart Queue Management (SQM) fixes.

SQM introduces a new queueing discipline called fq_codel (see http://tools.ietf.org/html/draft-nichols-tsvwg-codel-02 and http://tools.ietf.org/html/draft-hoeiland-joergensen-aqm-fq-codel-00 for details) that can detect which flows (streams of data between two endpoints) are using more than their share of the bottleneck link (usually, the connection to the ISP).

SQM divides the traffic into multiple queues, one per flow, and sends packets from each queue in round-robin order. (The algorithm is somewhat more involved, so read the full description for details.) fq_codel also measures the time that each packet has been queued (its sojourn time). If packets for a flow have been in the queue for “too long”, then fq_codel either marks them for ECN (Explicit Congestion Notification), or discards a certain percentage of them, preventing the flow from using more than its fair share of bandwidth on the bottleneck.

Wait a minute – discarding packets? Doesn’t that make things worse?

It does slow the rate for the affected flow, but that is exactly what should happen. If a sender has sent so many packets that they’re building up in the router’s memory, then the router must offer back pressure for that flow by dropping/marking some of its packets.

In the meantime, all the other flows (in their own queues) have their packets sent promptly, since they’re not building up and their sojourn time stays low. This automatically keeps everything responsive: short packets, and those from low-volume flows automatically get sent first. The big senders, whose packets are dropped/marked, will re-send the data, but at a slower rate, bringing the entire system back into balance.

What about Quality of Service (QoS)? Doesn’t that help?

Yes, it helps a bit. If you configure your router for QoS, the router will use that information to prioritize certain packets. Good QoS settings can help certain traffic types by sending it first, ahead of the bulk traffic that’s buffered up. But there are several problems with QoS:

  • It doesn’t solve the problem of overbuffering. The QoS rules allow certain packets to go to the head of the queue. But the buffers from large flows are still there, and will have to be sent at some point. And those buffers will stand in the way of any newly arrived traffic that hasn’t been prioritized.
  • As a corollary, there’s no throttling of the big senders: they don’t get early feedback that they are using more than their fair share of the capacity. If the queue/delay gets large enough, the sender could even retransmit the data, making the overall situation worse.
  • It’s annoying to configure QoS. You have to understand how to configure the router and manually make the changes. This is something that only a network geek could come to enjoy. It’s also a maintenance hassle: if you hear about a new application, it may not work well until you adjust the QoS rules to take it into account.
  • Finally, QoS doesn’t help for the download direction. It can improve traffic being sent from your local (home) network toward the Internet. But if the equipment at the far end at your DSL or cable provider is bloated (and very often it is), then QoS in your router won’t make things any better.

The fq_codel and other algorithms in CeroWrt handle all this automatically. The only configuration parameters are what kind of link you have (DSL, Cable, etc.) and the speeds of those links. You don’t have to adjust QoS settings or make other adjustments.

Is my network affected by Bufferbloat?

Quite possibly – here’s one symptom: If the network works well when no one else is using it (early morning, or late at night after everyone else is asleep), but gets slow when others are on the net, then you are suffering from Bufferbloat. Another symptom is if your voice, video chat, or gaming degrade when others are using the network.

Here’s a more scientific test. The DSLReports Speed Test http://dslreports.com/speedtest runs a latency measurement during the download and upload speed tests. (Most speed test sites only measure a few pings when the line is idle.) If the latency gets high on this test, then your router is probably bloated. You can also check out the Quick Test for Bufferbloat on the CeroWrt site for additional information.

What can I do about this?

Three years of network research have paid off: the networks work great at our houses. Our algorithms have been adopted and implemented in the Linux kernel, other operating systems, and an increasing set of commercial network equipment. Our changes have been pushed in to the OpenWRT project. We are making the code available at no charge, and are encouraging all vendors to embrace it.

Regrettably, nearly every piece of equipment with a network connection – home router, ISP headend and DSLAM, phone, personal computer, laptop, tablet, even big commercial routers and switches – needs to have some form of SQM installed. We now have the technology, and it’s simple, but it needs to be deployed.

TL;DR – What you can do about Bufferbloat at your home

  • Consider installing the stable OpenWrt Barrier Breaker firmware on your router. The luci-app-sqm and sqm-scripts packages include the enhancements that we’ve tested and then pushed into the OpenWrt mainline source code. Use the Supported Devices page to find your router, then read the SQM/fq_codel HOWTO. If you don’t want to do all that…
  • Call your router vendor’s support line. With the information from the DSLReports Speed Test in hand, you can mention that the ping times get really high when up/downloading files, and that it really hurts your network performance. Ask if they’re working on the problem, and when they’re going to release a firmware update that solves it. Leave a comment here with their response – I’d love to hear.

An earlier draft of this note appeared in the Bloat and Codel mailing lists. See https://lists.bufferbloat.net/pipermail/codel/2014-February/000802.html Latest update 3May2015

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.