Toward a Consumer Responsiveness Metric

At a recent videoconference, I advocated strongly for a consumer-facing measurement of latency/responsiveness. I had not planned to speak, so I gave off-the-cuff comments. This is an organized explanation of my position. I offer these thoughts for consideration at the IAB Workshop “Measuring Network Quality for End-Users, 2021” – Rich Brown

I hunger for a day when vendors (router manufacturers and service providers) compete on the basis of “responsiveness” in the same way that they compete on speed – “Up to X megabits per second, and Y responsiveness!”

I have been working on the “Bufferbloat Project” [1] since 2011, trying to find layman’s terms for what was happening, and what to do about it. [2] [3] The delay goes by the name “lag”, “latency under load”, or “bufferbloat”. At first, the effects seemed mysterious and non-intuitive. Even to knowledgeable individuals, the magnitude of the delay caused by queueing was astonishing. No matter what name you use, it makes people say, “the internet is slow today”.

My router at home has solved this problem. I enjoy the fruits of the intense research from the mid 2010’s that led to well-understood solutions such as fq_codel, cake, PIE, and airtime fairness. Even using 7 mbps DSL, my network was quite usable, and very responsive.

My frustration in 2021 is that this remains a problem for nearly everyone else. The market has not provided solutions. Every day, people purchase brand name equipment that happily queues hundreds of msec of traffic.

I postulate that vendors have not considered responsiveness to be an important characteristic of their offerings. Consequently, they have not prioritized the engineering resources to incorporate the well-tested solutions listed above.

My hope, from this note, and from our on-going efforts, is that we can come up with a test tool that consumers can use to raise awareness of the problem of bad responsiveness.

Characteristics of a Responsiveness Tool

I seek a “responsiveness tool” with these characteristics:

  1. Easy to use. People need an easy way to measure responsiveness so they can give feedback to their vendors.
  2. A single number, so it’s easy to report and compare.
  3. Bigger must be better. High latency means bad responsiveness. People have no intuitive feel for a millisecond: “Is 100 msec bad? Isn’t that really short…?”
  4. An approximate measure is OK. Consumers won’t mind separate runs varying 20% or 30%, especially since poor responsiveness could be an order of magnitude different from good.
  5. Resistant to cheating. Vendors sometimes optimize pings to make latency look lower. But real people’s traffic doesn’t use pings. The responsiveness test must use protocols that match actual traffic patterns.
  6. Vendor and technology independent. People should use and get similar results from their phone, their desktop, on the web, or using an app.
  7. “Good enough”. A widely implemented and promoted metric that substantially matches people’s real experience is vastly superior to a host of competing metrics that muddy the waters in consumer’s minds.

A Proposed Metric – RPM

Apple has produced an Internet Draft “Responsiveness under Working Conditions” [4] and implementation. It defines a procedure for continually making short HTTPS transactions on a path to a server that has been fully loaded in both directions. The number of transactions in a fixed time is expressed as the number of “round-trips per minute”, which is given the name “RPM”, a wink to the “revolutions per minute” that we use for cars.

The RPM measurement satisfies all my concerns.

Non-requirements

It is not a requirement for the responsiveness test to provide:

  • Strict reproducibility. The wider internet has widely varying conditions, with bottlenecks moving around by time of day or adjacent traffic. It is not reasonable/feasible to expect that any measure used by consumers will be exactly reproducible.

  • Detailed statistics or distributions of measurements. This is not a diagnostic tool. A nuanced data set with medians and percentiles may excite techies, but for others, it’s hard to understand the implications.

  • Performance of any particular protocol. The responsiveness tool must measure a broad variety of typical traffic.

  • Data to be used as input for vendors to design solutions. The responsiveness measure needs to be used the same way we say to our mechanic, “The car makes a funny noise when I …”. I expect the specialist to work to reproduce the symptom, using the provided equipment, and come up with an appropriate solution.

Summary

The research of the last decade has developed a wide variety of solutions. There are plenty of corner-cases where these solutions aren’t perfect. I encourage vendors and researchers to study the field and advance our knowledge further. I would be delighted if they found practices even better than the current state of the art.

But “the rest of the internet” (including my neighbors and family members, for whom I’m the support person) would all benefit from a world where off-the-shelf equipment already incorporated well-known, best practice solutions.

References

[1] Bufferbloat Project https://bufferbloat.net

[2] Bufferbloat and the Ski Shop https://randomneuronsfiring.com/bufferbloat-and-the-ski-shop/

[3] Best Bufferbloat Analogy – Ever https://randomneuronsfiring.com/best-bufferbloat-analogy-ever/

[4] Responsiveness under Working Conditions – Internet-Draft at: https://datatracker.ietf.org/doc/draft-cpaasch-ippm-responsiveness/ Full disclosure: I am one of the editors of the “Responsiveness Under Working Conditions I-D”

Best Bufferbloat Analogy – Ever

My friends frequently ask, “Why is my network so slow?” And often, the answer is “latency” or the screwy term, “Bufferbloat” – the “undesirable latency caused when a router buffers too much data.” But what the heck does that mean?

A while back, I attempted a layman’s explanation of Bufferbloat. I compared it to a ski shop. It was pretty unsuccessful: it just didn’t have any intuitive appeal.

That’s why I was delighted that Waveform.com published what I believe is the Best Bufferbloat Analogy – Ever. (I am pleased to have contributed to the final version of their description.) That page also has a well-designed web-based Bufferbloat Tester (on a par with the DSLReports Speed Test).

They asked, Can you explain bufferbloat like I’m five? and noted that flows of liquids were sort of like flows of packets. The analogy was when a friend dumps a bucket of water into a sink with a narrow drain, it slows other flows (like a teaspoon of oil) from emptying out. Read the whole description…

This made me think about having a SmartSink™ to give a visual image for understanding how a well-designed router can decrease latency.

What’s a SmartSink™?

Instead of accepting a full bucket of water all at once, a SmartSink controls the bucket of water with a valve. It allows just enough water into the sink to keep the drain full. If the water gets too low, the SmartSink opens the valve: if it gets “too full”, it closes it a bit.

A SmartSink also works when lots of friends have their own buckets, pouring in colored water – pink, blue, etc. The valves on the SmartSink control each color. If the SmartSink notices too much pink water, it closes that valve a bit to bring back balance, so that each color gets its “fair share” of the drain’s capacity. And because there’s never too much water (of any color) in the sink, a small new flow always drains quickly.

Reality check: This is just an analogy. I realize that a SmartSink is a ridiculous idea. But it helps me visualize how small flows can drain quickly while big flows share the drain capacity fairly.

What does this have to do with routers?

The Smart Queue Management (SQM) algorithm in a router works like the SmartSink. When a device starts sending a lot of data (maybe a phone starts uploading photos to the cloud), SQM controls the amount of data queued for each flow (each separate upload, videoconference, voice call, gaming session, Youtube, Bittorrent, etc) to prevent any one flow from using more than its share. Instead of operating valves to control the flow of water, SQM controls the size of each flow’s queue by:

  1. Placing packets from each flow into a separate queue.
  2. Removing a small batch of packets from each queue, round-robin style, and sending that batch “out the drain” through the (slow) bottleneck link to the ISP. When each batch has been fully sent, it retrieves another batch from the next queue, and so on.
  3. Offering back pressure to flows that are sending “more than their share” of data.

This process provides these desirable effects:

  • Most importantly, SQM provides low latency. Small flows (with just one or a few small packets) get sent right away in their next “round robin” batch.
  • Equal sharing of the bottleneck: If there are multiple senders, each can send an equal amount of data with each round-robin opportunity.
  • No waste of the bottleneck: If there’s only one sender (one queue with data), that one gets the full capacity of the link.
  • Offering backpressure to bulk senders minimizes lost packets and re-transmissions, making the network globally more efficient.

Does SQM work?

YES! Can I get a router with SQM today? YES!

Got questions? Send them to me and I’ll include them in Part 2 (coming soon) of this blog. Thanks.

 

WireGuard Vanity Keys

(This is another post that only a techie could love…) A WireGuard VPN provides a fast, secure tunnel between endpoints. It uses public/private key pairs to encrypt the data.

If you have several clients, you have to enter their public keys into your server. Keeping track of those keys gets to be a hassle, since ordinarily, the keys are essentially random numbers.

I found a great project to help this problem: WireGuard Vanity Address. It continually generates WireGuard private/public key pairs, printing keys that contain a desired string in the first 10 characters. For example, I generated this public key for my MacBook Pro (MBP): MBP/DzPRZ05vNZ0XS3P9tlokZPrLy/1lb1Zsm3du4QA= Note the MBP/ at the start – it makes it easy to know that this is my Mac’s key.

To do it, I ran the wireguard-vanity-address program. Here is sample output:


$ ./wireguard-vanity-address MBP/ searching for 'mbp/' in pubkey[0..10], one of every 299593 keys should match one trial takes 28.7 us, CPU cores available: 2 est yield: 4.3 seconds per key, 232.30e-3 keys/s hit Ctrl-C to stop private qMKPNrCMId59XTn5vgDICUh/QzIfhqZdrZ+XQBIJj2w= public zmbP/YEpC8Zl6MacYhcY1lq126tL2UudFjmrwbl2/18= private HHtPY8IwGBxQ5OTtJY6GcuFpImXtDp9d187zvI0axFo= public qhIiSMbp/extT5irPy4EJfLRPR9jTzQZHlM15Fo/P2E= private BEnEu1lVdcRI997nj2uPNGsyCZNPhBTCNfgJuYPPJHA= public hZzmBP/8EthWPOFp5wroEGPeJTHGxZ5KENnMiZvniGY= private 8HRj+YZfSBnYZn38MPE09W2g03JvRJoGbjlDkHQ0Wnk= public mBP/q2dOd+m457PyKTIvI7MDTuXLCneG6MM0ir9rwRc= ... private dFE8xsDDWNNNY1OjOIlxQiNVbp7Z6tZhXsaOo/5gPH0= public MBP/DzPRZ05vNZ0XS3P9tlokZPrLy/1lb1Zsm3du4QA= ^C # This last line contains a public key starting with "MBP/"

For more details, read the github page, and also the issue where the author addresses security concerns about decreasing the size of the key space.

WireGuard GUI on macOS

A WireGuard VPN provides a fast, secure tunnel between endpoints. A macOS GUI client is available from the App Store

It works great. But its documentation is minimal. Even though the required keywords (which you must type manually) are the same as other clients, the GUI doesn’t give a hint about whether it’s right until you type it exactly correctly. Consequently, it can be a pain to configure it properly.

This screen shot shows a correctly configured (although fictitious) VPN tunnel. To get to this configuration window, use the Wireguard Manage Tunnels menu, click  and choose Add Empty Tunnel… then fill in the resulting window as shown below:

Screen shot of macOS WireGuard GUI

Although there are plenty of guides to explain WireGuard, this summarizes my best understanding of the meaning of these fields. There may be additional ways to configure the VPN, but following this advice will result in a working secure configuration.

[Interface] Section

  • PrivateKey: Private key for this computer. WireGuard uses this key to encrypt data sent to its peer, and decrypt received data. WireGuard displays the corresponding PublicKey (which you’ll enter into the peer) at the top of the window.
  • Address: Address for the VPN tunnel interface on this computer. Use a /32 address chosen from an address range that not is in either this network or the peer’s network. (This example uses 10.0.10.2/32 for this end. The peer (not shown) is 10.0.10.1/32. They were chosen because the 10.0.10.0/24 subnet is not in use on either side of the tunnel.)
  • DNS: (Optional) Address(es) of DNS servers to be used by this computer. It’s OK to leave this out – by default, WireGuard will use the underlying OS DNS servers.
  • ListenPort: (Optional) WireGuard listens on this port for traffic from its peer. It’s OK to leave this out – by default, WireGuard will select an unused port.

[Peer] Section

  • PublicKey: The public key of the remote peer. WireGuard uses this key to decrypt the packets sent from the peer, and encrypt packets sent to the peer.
  • PresharedKey: (Optional) This key will be used to encrypt the session. If specified, it is used in lieu of the public/private key pair for the peers.
  • AllowedIPs: A comma-separated list of IP (v4 or v6) addresses with CIDR masks which are allowed as destination addresses when sending via this peer and as source addresses when receiving via this peer.
  • Endpoint: (Optional) The address (or DNS name) and port of the remote peer. If specified, this peer will attempt to connect to the endpoint periodically.
  • PersistentKeepalive: (Optional) The number of seconds this peer waits before sending another keep-alive message. These messages “keep the session alive” through NAT.

I would appreciate comments on these descriptions so I can make them more helpful/useful.

Additional Thoughts

The following thoughts are refinements to the advice shown above.

    • The example above only allows traffic to/from the 192.168.4.0/24 and 172.30.42.0/24 subnets to travel through the tunnel. To send all traffic through the tunnel (say, to avoid prying eyes of your ISP, etc), you can set the AllowedIPs to 0.0.0.0/0. To send all IPv6 traffic through the tunnel, add ::/0
    • It neither necessary nor recommended to include the peer’s Address in the AllowedIPs list.
    • Although both Endpoint and PersistentKeepalive are listed as optional, you normally set both when using the macOS WireGuard client. Activating the tunnel (from the WireGuard menu), causes WireGuard to begin sending Keepalive packets to the Endpoint, which starts up the tunnel.
    • Dealing with NAT. If your ISP requires your remote peer to be behind NAT, you must configure your ISP’s router/modem to pass the WireGuard packets through. The setup varies from ISP to ISP, but in general, you’ll need to set up some kind of “virtual server”, “DMZ”, or “port forwarding” in the ISP router/modem to pass the WireGuard packets (on the port specified in the Endpoint) to the peer device.

Coffee Shop Bloat Test

We all have heard the perennial complaint, “the network is sooo slow.” A primary reason is the inelegantly-named bufferbloat – caused by a bad router that queues up too much data (“the router gets bloated because it buffers too many packets”).

The good news is that a fix has been known for quite a while now, and it’s often a matter of properly configuring the router.

Dave Täht likes to go into coffee shops and help the owners provide better network service for the customers. (Sometimes, he gets a free meal!) He developed a small script for measuring the bufferbloat to use for before and after tests.

I’ve tweaked the script to make it easier to run and display the results. (You still need to install Flent and fping to make it work on your laptop.) But now you can go to your favorite coffee shop to measure the state of the network. See the script at https://github.com/richb-hanover/coffee-shop-bloat-test

US Robotics Acoustic Coupler

Ahhh… the memories… Back in the day (around 1978), I had one of these beauties. All you had to do was place the telephone handset into those cups (really! [1]), dial up your favorite server, and Presto! You were on-line at 300 bits per second. And for only $139 – it was heaven!

While rummaging through my files, I came upon its (dot-matrix) printed manual, so I scanned it for posterity. Enjoy!

Photo credit: http://www.swtpc.com/mholley/USR/USR_Modem.htm

[1]: Wait… What? You had to insert the handset into those cups? Why? AT&T insisted on this  to “prevent damage to the telephone system” from third-party (unlicensed, untested, unreliable) equipment. Only after the Carterphone decision in 1968 would AT&T allow you to make any sort of electrical connection to the phone network. Before that, you could not connect your own telephone (you had to rent one from AT&T), or a fax machine, or a modem, etc.

USR-310 Acoustic Coupler Manual

RandomNeuronsFiring.com – now live!

I have reworked my blog so that the primary domain name is “Random Neurons Firing” (instead of the pedestrian richb-hanover.com). Same content, but a better name.

I’m also adding a new topic to those I’ve previously covered (“Software, Networking, Life”). Over the last two years, I have gone to many planning and zoning conferences to learn more about how to provide attractive housing within communities. I’ll post my notes from those conferences and workshops here. I need to note that these will be my own opinions, and not those of any public boards to which I might belong.


Feel free to share this post on Facebook, LinkedIn, Twitter, or email by clicking one of the icons below. Any opinions expressed here are solely my own, and not those of any public bodies, such as the Lyme Planning Board or the Lyme Community Development Committee, where I am/have been a member. I would be very interested to hear your thoughts – you can reach me at richb.lyme@gmail.com.

Netflow Collectors for Home Networks

Update – November 2017: Added descriptions for the other tools I had investigated.
Update – October 2018: Although it’s not based on Netflow, Al Caughy’s YAMon provides a good view of the traffic flowing through an OpenWrt or DD-WRT router. I use it myself.

Now that LEDE Project has an official release, I hungered for a way to see what kinds of traffic is going through my network. I wanted to answer the question, “who’s hogging the bandwidth?” To do that, I needed a Netflow Collector.

A Netflow Collector is a program that collects flow records from routers to show the kinds and volumes of traffic that passed through the router. The collector adds those flow records into its internal database, and lets you search/display the data. (You also need to configure your router to send (“export”) flow records to the collector. My experiments all employ the softflowd Netflow Exporter. It is a standard package you can install into your LEDE router.)

In an earlier life, I used a slick commercial Netflow monitoring program. But it wasn’t free, so it isn’t something that I can recommend to people for their home networks.

There are many open-source Netflow collectors which have varying degrees of ease of installation/ease of use/features. Most have install scripts that show the steps required to install it on an Ubuntu or CentOS machine, but they are fussy, and require that you have a freestanding computer (or VM) to run it.

Consequently, I created Docker containers that have all the essential packages/modules pre-configured. This means that you can simply install the Docker container, then launch it on a computer that’s continually operating, and let it monitor the data.

This is the first of a series of postings about Netflow Collectors. They include:

  • Webview Netflow Reporter Netflow collector and web-based display program. Makes it easy to see fine-grained information about traffic. More…
  • NFSEN/NFDUMP Netflow collector and web-based display program. Provides attractive graphs, and automatically detects Netflow exporters (so you can skip one configuration step.) More…
  • FlowViewer Another Netflow Collector with web-based GUI. I created a Docker Container for FlowViewer
  • FlowBAT A Javascript Netflow collector and display program. This requires an old version of Meteor (0.9.1), and seems not to be currently maintained. The Github repo for FlowBAT has been updated to install using the required (old) version of Meteor.
  • DDWarden This claims to work with DD-WRT’s rflow protocol (very similar to Netflow v5). No further investigation because I was interested in something to work with LEDE/OpenWrt.
  • Generating Netflow Datagrams A few ways to generate Netflow data: softflowd to run on LEDE/OpenWrt routers and nflow-generator to send mock data in the absence of real traffic.

Net Neutrality – Contacting the Congress (update)

The Battle for the Net site https://www.battleforthenet.com/ no longer seems to have the telephone form(!)

But… Boing Boing does. Go to https://boingboing.net/. You’ll see a popup window with a place to enter your phone number. Click OK, and they pop up a script on-screen.

They call you, you answer, then you supply your zip code.

Then they place calls to each of your legislators (in the House and Senate), then if you have time, they call the offices of Mitch McConnell, Chuck Schumer, and other leaders, so you can deliver the message.

I say my name, home town, and then ask that the FCC preserve the current Title II Net Neutrality rules. The staffer who answers is gonna be busy – you might chat them up though to see if they’re getting slammed. (Mitch McConnell’s office wasn’t even answering(!))